Get started with Codex

Axiom Codex ships eight normalized datasets as monthly Parquet snapshots. Every record follows the APRS standard, carries pre-computed AI labels, and joins to any other Codex dataset via shared keys. Pick a tier below, grab a dataset, and run your first query.

Choose your tier

	Research	Commercial	Enterprise
Access	Free on Hugging Face	Signed R2 download URL	Signed R2 download URL
Records	100K-record stratified sample per dataset	Full dataset	Full dataset
Formats	Parquet	Parquet, CSV, JSON Lines, GeoParquet	All formats + Markdown-KV
Snapshots	Latest only	Monthly immutable snapshots	Monthly immutable snapshots
Entity graph	—	—	Full graph export (details)
License	CC-BY-4.0 (attribution required)	Commercial license, per dataset	Commercial license, all datasets
Price	Free	$299/dataset/month	Contact sales

Start with the Research tier to explore schema, field coverage, and join keys before committing to a commercial license.

Download a dataset

Research tier — Hugging Face

Browse the Axiom AI organization on Hugging Face and download any dataset directly. Each dataset includes a README with schema documentation and a 100K-record stratified sample in Parquet format.

Commercial or Enterprise tier

Purchase a license at axiomcodex.io. After checkout, your license key and signed download URL are emailed to the address used at checkout — delivery normally lands within a minute. The URL points to a monthly Parquet snapshot on Cloudflare R2.

If the email doesn’t arrive, your license key is also stored on the subscription itself in Stripe. Use the Manage subscription link in the original receipt to open the Stripe customer portal and surface the key, or contact support@axiomancer.io and we’ll resend.

Run your first query

Load any Codex Parquet file into DuckDB, Pandas, Spark, or your preferred tool. Every dataset uses the same APRS envelope, so once you learn one, you know them all.

DuckDB

SELECT record_id, occurred_at, jurisdiction_slug, event_type
FROM read_parquet('civic-intelligence-2026-04.parquet')
WHERE jurisdiction_slug = 'philadelphia-pa'
  AND occurred_at >= '2026-01-01'
LIMIT 20;

Python (Pandas)

import pandas as pd

df = pd.read_parquet("civic-intelligence-2026-04.parquet")
phila = df[df["jurisdiction_slug"] == "philadelphia-pa"]
print(phila[["record_id", "occurred_at", "event_type"]].head(20))

Join two datasets

Every Codex dataset shares a common set of join keys. Join Civic Intelligence to Urban Signal Grid via h3_index without any wrangling:

SELECT
  cells.h3_index,
  cells.composite_score,
  civic.event_type,
  civic.occurred_at
FROM read_parquet('urban-signal-grid-2026-04.parquet') cells
JOIN read_parquet('civic-intelligence-2026-04.parquet') civic
  USING (h3_index)
WHERE civic.event_type = 'zoning_vote'
  AND cells.composite_score > 0.7;

Use the LLM-ready surface

Every dataset ships a llm_text column in Markdown-KV format, optimized for RAG pipelines and LLM reasoning. Feed it directly into your retrieval system or prompt:

- chunk_id: urn:aprs:chunk:9f3a1b2c...
- record_id: urn:aprs:record:civic:us:granicus:phila-2024-03-15-item7b
- jurisdiction: Philadelphia, PA
- occurred_at: 2024-03-15
- event_type: zoning_vote
- summary: Council voted 11-6 to approve rezoning of the 2200 block...
- entities: {name: Kenyatta Johnson, role: councilmember, sentiment: supportive}
- litigation_risk_score: 0.42

Pre-computed chunks are available in a companion _chunks Parquet file for each dataset, ready for vector indexing.

Next steps

Data catalog

Browse all eight datasets with record counts, formats, and tags.

Normalization standard

The APRS contract every record satisfies — field definitions, versioning, and conformance.

Join keys

The registry of shared keys that make cross-dataset joins work.

Bitemporal fields

Per-dataset reference for every temporal field, so you always know which clock a timestamp is on.

Get started

Frameworks

Reference

Choose your tier

Download a dataset

Run your first query

DuckDB

Python (Pandas)

Join two datasets

Use the LLM-ready surface

Next steps

Data catalog

Normalization standard

Join keys

Bitemporal fields

​Choose your tier

​Download a dataset

​Run your first query

​DuckDB

​Python (Pandas)

​Join two datasets

​Use the LLM-ready surface

​Next steps

Data catalog

Normalization standard

Join keys

Bitemporal fields

Choose your tier

Download a dataset

Run your first query

DuckDB

Python (Pandas)

Join two datasets

Use the LLM-ready surface

Next steps