Every request that hits RouteShift gets matched against a routing rule chain before it leaves the proxy. Rules can rewrite the model, swap providers, attach metadata, or short-circuit with a static response — all without your client knowing.Documentation Index
Fetch the complete documentation index at: https://docs.axiomancer.io/llms.txt
Use this file to discover all available pages before exploring further.
Rules
Open Routing → Rules to create a rule. A rule has:- Match — match on virtual key, model name, message content regex, message size, system-prompt size, or metadata.
- Action —
routeto a specific model,blockwith a status code, ortagfor downstream analytics. - Priority — lower numbers win. Rules are evaluated top-to-bottom in priority order.
- Conditions — optional time-of-day, environment, or per-key gates.
Model aliases
A model alias is a per-team canonical name that resolves to a real upstream model. Use aliases to:- Standardize on a single name across providers (
smart→gpt-4o,cheap→gpt-4o-mini,claude→claude-3-5-sonnet-20241022). - Pin a public name to a specific version while you migrate.
- Swap models for a whole team without touching any client code.
/admin/model-aliases/invalidate, so updates take effect on the next request.
Load balancing across N provider keys
If you have multiple credentials for the same provider — multiple OpenAI orgs, multiple Bedrock regions, multiple Azure deployments — RouteShift can balance traffic across them. Configure the strategy at Settings → Providers → … → Load balancing:| Strategy | When to use |
|---|---|
| Weighted round-robin | Default. Each key gets traffic proportional to its weight. Useful for cost arbitrage between paid and free tiers. |
| Latency-based | Routes to the fastest p50 over the last 5 minutes. Good for multi-region deployments. |
| Least-busy | Routes to the key with the lowest in-flight request count. Useful when keys have very different RPM caps. |
5xx or 429 from upstream takes the key out of rotation for a configurable window. The dashboard shows cooldown state in real time.
Retry budget and cost ceiling
Fallbacks aren’t free — a chain of three retries against three different providers can cost 3× the original request. Every rule supports two caps:- Retry budget — max attempts across the chain. Defaults to 3.
- Per-request cost ceiling — max USD a single client request can spend across all attempts. When the ceiling is hit, the next attempt is skipped and the last error is returned.
Provider-side rate-limit awareness
When an upstream provider returns a429 or a Retry-After header, RouteShift honors it: the offending credential is parked until the cooldown expires, and any in-flight retry is rerouted to a different key (or the next model in the rule chain). The cooldown is per-credential, so noisy neighbors don’t take down a whole provider.