Routing & fallbacks

Every request that hits RouteShift gets matched against a routing rule chain before it leaves the proxy. Rules can rewrite the model, swap providers, attach metadata, or short-circuit with a static response — all without your client knowing.

Rules

Open Routing → Rules to create a rule. A rule has:

Match — match on virtual key, model name, message content regex, message size, system-prompt size, or metadata.
Action — route to a specific model, block with a status code, or tag for downstream analytics.
Priority — lower numbers win. Rules are evaluated top-to-bottom in priority order.
Conditions — optional time-of-day, environment, or per-key gates.

Rules are versioned; every save records a diff in the audit log so you can roll back a bad change.

Model aliases

A model alias is a per-team canonical name that resolves to a real upstream model. Use aliases to:

Standardize on a single name across providers (smart → gpt-4o, cheap → gpt-4o-mini, claude → claude-3-5-sonnet-20241022).
Pin a public name to a specific version while you migrate.
Swap models for a whole team without touching any client code.

Aliases live at Routing → Aliases. When the resolver changes, the proxy invalidates its cache via /admin/model-aliases/invalidate, so updates take effect on the next request.

Load balancing across N provider keys

If you have multiple credentials for the same provider — multiple OpenAI orgs, multiple Bedrock regions, multiple Azure deployments — RouteShift can balance traffic across them. Configure the strategy at Settings → Providers → … → Load balancing:

Strategy	When to use
Weighted round-robin	Default. Each key gets traffic proportional to its weight. Useful for cost arbitrage between paid and free tiers.
Latency-based	Routes to the fastest p50 over the last 5 minutes. Good for multi-region deployments.
Least-busy	Routes to the key with the lowest in-flight request count. Useful when keys have very different RPM caps.

Keys also have automatic cooldowns: a sustained burst of 5xx or 429 from upstream takes the key out of rotation for a configurable window. The dashboard shows cooldown state in real time.

Retry budget and cost ceiling

Fallbacks aren’t free — a chain of three retries against three different providers can cost 3× the original request. Every rule supports two caps:

Retry budget — max attempts across the chain. Defaults to 3.
Per-request cost ceiling — max USD a single client request can spend across all attempts. When the ceiling is hit, the next attempt is skipped and the last error is returned.

Both caps are enforced server-side, so a misconfigured rule chain can’t produce a runaway bill.

Provider-side rate-limit awareness

When an upstream provider returns a 429 or a Retry-After header, RouteShift honors it: the offending credential is parked until the cooldown expires, and any in-flight retry is rerouted to a different key (or the next model in the rule chain). The cooldown is per-credential, so noisy neighbors don’t take down a whole provider.

Get started

Features

Integrations

Settings

Routing & fallbacks

Rules

Model aliases

Load balancing across N provider keys

Retry budget and cost ceiling

Provider-side rate-limit awareness

Get started

Features

Integrations

Settings

Documentation Index

​Rules

​Model aliases

​Load balancing across N provider keys

​Retry budget and cost ceiling

​Provider-side rate-limit awareness

Rules

Model aliases

Load balancing across N provider keys

Retry budget and cost ceiling

Provider-side rate-limit awareness