Skip to main content
Axiom Codex ships eight normalized datasets as monthly Parquet snapshots. Every record follows the APRS standard, carries pre-computed AI labels, and joins to any other Codex dataset via shared keys. Pick a tier below, grab a dataset, and run your first query.

Choose your tier

ResearchCommercialEnterprise
AccessFree on Hugging FaceSigned R2 download URLSigned R2 download URL
Records100K-record stratified sample per datasetFull datasetFull dataset
FormatsParquetParquet, CSV, JSON Lines, GeoParquetAll formats + Markdown-KV
SnapshotsLatest onlyMonthly immutable snapshotsMonthly immutable snapshots
Entity graphFull graph export (details)
LicenseCC-BY-4.0 (attribution required)Commercial license, per datasetCommercial license, all datasets
PriceFree$299/dataset/monthContact sales
Start with the Research tier to explore schema, field coverage, and join keys before committing to a commercial license.

Download a dataset

1

Research tier — Hugging Face

Browse the Axiom AI organization on Hugging Face and download any dataset directly. Each dataset includes a README with schema documentation and a 100K-record stratified sample in Parquet format.
2

Commercial or Enterprise tier

Purchase a license at axiomcodex.io. After checkout, your license key and signed download URL are emailed to the address used at checkout — delivery normally lands within a minute. The URL points to a monthly Parquet snapshot on Cloudflare R2.
If the email doesn’t arrive, your license key is also stored on the subscription itself in Stripe. Use the Manage subscription link in the original receipt to open the Stripe customer portal and surface the key, or contact support@axiomancer.io and we’ll resend.

Run your first query

Load any Codex Parquet file into DuckDB, Pandas, Spark, or your preferred tool. Every dataset uses the same APRS envelope, so once you learn one, you know them all.

DuckDB

SELECT record_id, occurred_at, jurisdiction_slug, event_type
FROM read_parquet('civic-intelligence-2026-04.parquet')
WHERE jurisdiction_slug = 'philadelphia-pa'
  AND occurred_at >= '2026-01-01'
LIMIT 20;

Python (Pandas)

import pandas as pd

df = pd.read_parquet("civic-intelligence-2026-04.parquet")
phila = df[df["jurisdiction_slug"] == "philadelphia-pa"]
print(phila[["record_id", "occurred_at", "event_type"]].head(20))

Join two datasets

Every Codex dataset shares a common set of join keys. Join Civic Intelligence to Urban Signal Grid via h3_index without any wrangling:
SELECT
  cells.h3_index,
  cells.composite_score,
  civic.event_type,
  civic.occurred_at
FROM read_parquet('urban-signal-grid-2026-04.parquet') cells
JOIN read_parquet('civic-intelligence-2026-04.parquet') civic
  USING (h3_index)
WHERE civic.event_type = 'zoning_vote'
  AND cells.composite_score > 0.7;

Use the LLM-ready surface

Every dataset ships a llm_text column in Markdown-KV format, optimized for RAG pipelines and LLM reasoning. Feed it directly into your retrieval system or prompt:
- chunk_id: urn:aprs:chunk:9f3a1b2c...
- record_id: urn:aprs:record:civic:us:granicus:phila-2024-03-15-item7b
- jurisdiction: Philadelphia, PA
- occurred_at: 2024-03-15
- event_type: zoning_vote
- summary: Council voted 11-6 to approve rezoning of the 2200 block...
- entities: {name: Kenyatta Johnson, role: councilmember, sentiment: supportive}
- litigation_risk_score: 0.42
Pre-computed chunks are available in a companion _chunks Parquet file for each dataset, ready for vector indexing.

Next steps

Data catalog

Browse all eight datasets with record counts, formats, and tags.

Normalization standard

The APRS contract every record satisfies — field definitions, versioning, and conformance.

Join keys

The registry of shared keys that make cross-dataset joins work.

Bitemporal fields

Per-dataset reference for every temporal field, so you always know which clock a timestamp is on.