Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.axiomancer.io/llms.txt

Use this file to discover all available pages before exploring further.

RouteShift logs every request — input tokens, output tokens, cost, latency, resolved upstream model, and an inferred activity category — into request_logs. From there, downstream tables stitch sessions, score turn signals, and roll up per-session metrics for the dashboard.

What gets logged

Every proxied request writes one row containing:
  • Virtual key ID and provider key ID.
  • Requested model + resolved upstream model (after aliases and routing).
  • Input tokens, output tokens, total cost (microcents).
  • Wall-clock latency, time-to-first-token, and upstream HTTP status.
  • Inferred activity category.
  • Stitched session ID and turn signals.
  • Cache-hit flag (semantic cache hits skip token-counting and write [] / false for turn signals).
Logs are queryable from the dashboard’s Activity view and exposed via the admin API for ETL into your warehouse.

Activity categorization

RouteShift classifies each request into one of these categories based on the prompt shape and model behavior:
  • chat — interactive single-turn or short multi-turn dialog.
  • agentic — long tool-use chains, multi-turn with structured output.
  • code — code generation, refactoring, or completion.
  • research — long-context summarization or analysis.
  • embedding — embedding model calls.
  • other — anything that doesn’t match the above.
Categorization happens in the proxy, server-side, on the request path. The classifier is deterministic — same input always produces the same category — so categories are safe to use as analytics dimensions.

Session stitching

A session is a sequence of requests from the same virtual key that share contextual signals (continuation of a conversation, same session_id metadata, contiguous timestamps, similar prompts). RouteShift derives a session ID via deriveSessionId and writes it into request_logs.session_id. The aggregator (session-aggregator.ts) rolls those rows into session_metrics every five minutes, behind a pg_try_advisory_xact_lock so it’s safe to run on replicas. session_metrics powers:
  • Overview KPI — total sessions, mean tokens per session, mean cost per session.
  • Analytics by-model table — per-model session counts, one-shot rate, retry rate.
  • Activity drill-in — click any row in the activity feed to see all requests in that session.

Turn signals

Each request can be tagged with structured turn signals that describe what happened:
  • edited_paths text[] — file paths the model edited (when integrated with a coding agent).
  • had_bash boolean — whether the turn ran a shell command.
These are surfaced in the session view and used by the Optimize engine to detect duplicate or wasteful turns.

One-shot rate and retry rate

Two derived metrics capture how efficient a session was:
  • One-shot rate — fraction of sessions where the user got a usable response on the first turn. High one-shot rate means the model is sized correctly for the task.
  • Retry rate — fraction of turns where the user re-prompted (manual retry) within a short window. High retry rate is a leading indicator of model mismatch.
Both metrics are exposed at GET /api/usage/one-shot and broken down by model on the analytics page.

Model comparison

/models/compare plots two models side by side on the same traffic — same activity categories, same prompt sizes — and shows the cost / latency / one-shot-rate delta. Use it to A/B a candidate downgrade before flipping a routing rule.