Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.axiomancer.io/llms.txt

Use this file to discover all available pages before exploring further.

Every request that hits RouteShift gets matched against a routing rule chain before it leaves the proxy. Rules can rewrite the model, swap providers, attach metadata, or short-circuit with a static response — all without your client knowing.

Rules

Open Routing → Rules to create a rule. A rule has:
  • Match — match on virtual key, model name, message content regex, message size, system-prompt size, or metadata.
  • Actionroute to a specific model, block with a status code, or tag for downstream analytics.
  • Priority — lower numbers win. Rules are evaluated top-to-bottom in priority order.
  • Conditions — optional time-of-day, environment, or per-key gates.
Rules are versioned; every save records a diff in the audit log so you can roll back a bad change.

Model aliases

A model alias is a per-team canonical name that resolves to a real upstream model. Use aliases to:
  • Standardize on a single name across providers (smartgpt-4o, cheapgpt-4o-mini, claudeclaude-3-5-sonnet-20241022).
  • Pin a public name to a specific version while you migrate.
  • Swap models for a whole team without touching any client code.
Aliases live at Routing → Aliases. When the resolver changes, the proxy invalidates its cache via /admin/model-aliases/invalidate, so updates take effect on the next request.

Load balancing across N provider keys

If you have multiple credentials for the same provider — multiple OpenAI orgs, multiple Bedrock regions, multiple Azure deployments — RouteShift can balance traffic across them. Configure the strategy at Settings → Providers → … → Load balancing:
StrategyWhen to use
Weighted round-robinDefault. Each key gets traffic proportional to its weight. Useful for cost arbitrage between paid and free tiers.
Latency-basedRoutes to the fastest p50 over the last 5 minutes. Good for multi-region deployments.
Least-busyRoutes to the key with the lowest in-flight request count. Useful when keys have very different RPM caps.
Keys also have automatic cooldowns: a sustained burst of 5xx or 429 from upstream takes the key out of rotation for a configurable window. The dashboard shows cooldown state in real time.

Retry budget and cost ceiling

Fallbacks aren’t free — a chain of three retries against three different providers can cost 3× the original request. Every rule supports two caps:
  • Retry budget — max attempts across the chain. Defaults to 3.
  • Per-request cost ceiling — max USD a single client request can spend across all attempts. When the ceiling is hit, the next attempt is skipped and the last error is returned.
Both caps are enforced server-side, so a misconfigured rule chain can’t produce a runaway bill.

Provider-side rate-limit awareness

When an upstream provider returns a 429 or a Retry-After header, RouteShift honors it: the offending credential is parked until the cooldown expires, and any in-flight retry is rerouted to a different key (or the next model in the rule chain). The cooldown is per-credential, so noisy neighbors don’t take down a whole provider.