The Axiom Portable Record Standard (APRS) defines the minimum contract every record in every Codex dataset must satisfy. It exists so you can join any two Codex datasets without wrangling code, pin a dataset to a schema version and trust forward compatibility, and feed records directly into an LLM, RAG index, or training pipeline.
Principles
Every Codex dataset conforms to seven principles:
- Schema-locked. Every dataset publishes a versioned schema. Breaking changes bump the major version. You pin to a version and trust forward compatibility within a major.
- Source-attributed. Every record carries provenance — what system emitted it, when, which pipeline version processed it, and a confidence score.
- AI-optimized labels. Categories, entity types, and sentiment labels are pre-computed at normalization time, not at query time.
- Spatially consistent. All geospatial data normalizes to H3 resolution 8 as the primary spatial key. Lat/lng is retained for display, not for joining.
- Bitemporally honest. Every record separates valid time (when the real-world event happened) from system time (when it was ingested or modified).
- Versioned snapshots. Monthly immutable snapshot releases with full changelogs. Pin to a month for reproducible research.
- Joinable by construction. A fixed set of shared keys lets any dataset join to any other without custom wrangling.
Record envelope
Every record carries four groups of mandatory fields, applied at normalization time.
Identity
| Field | Type | Description |
|---|
record_id | URN | Stable record URN: urn:aprs:record:{namespace}:{source_system}:{local_id} |
chunk_id | URN | Deterministic chunk URN derived from record_id + optional section label (SHA-256-backed) |
source_uri | URL or URN | Points to the original source record for citation and re-fetch |
source_system | string | Originating system name (e.g. granicus, aisstream, equasis) |
record_id and chunk_id are deterministic — two calls with the same arguments always produce the same URN. This enables incremental sync, deduplication, and stable vector index keys.
Schema and lineage
| Field | Type | Description |
|---|
schema_version | semver | APRS profile version, e.g. aprs.civic/1.0.0 |
normalization_version | semver | Version of the pipeline that produced this row |
acl_tier | enum | research, commercial, or internal — gates which exports may include this row |
Bitemporal fields
| Field | Type | Description |
|---|
ingested_at | ISO 8601 | When the system first ingested this row. Never mutates after insert. |
modified_at | ISO 8601 | When the row was last updated. Refreshed on any write. |
occurred_at | ISO 8601 | When the real-world event happened (e.g. council vote date, AIS position timestamp) |
filed_at | ISO 8601 | When the record was filed or submitted to the authority. Required for permits and OSHA records. |
published_at | ISO 8601 | When the source authority published the record |
effective_from | ISO 8601 | When a ruling or record became legally effective. Null if not applicable. |
effective_to | ISO 8601 | When it expired or was superseded. Null means currently active. |
See the bitemporal fields reference for per-dataset availability and domain-specific temporal extensions.
A permit can be voted on (occurred_at), published weeks later (published_at), become legally effective months after that (effective_from), and be ingested by Codex retroactively (ingested_at). Choose the clock that matches your analysis.
Confidence and provenance
| Field | Type | Description |
|---|
confidence_score | float [0,1] | Normalizer’s confidence in the record. Methodology is documented per-dataset. |
provenance | JSON array | Ordered list of transformations: [{stage, version, ts, notes?}] for full lineage reconstruction |
Spatial consistency
All geospatial data includes h3_index at resolution 8 (avg edge length ~461 m, cell area ~0.74 km²). This is the universal spatial join key across all Codex datasets.
- Point geometry —
h3_index is the H3 cell containing the point.
- Polygon geometry —
h3_indexes (array) covers the polygon at resolution 8. geometry_wkt is retained for display.
- Line geometry — H3 cells intersecting the line buffer.
Higher or lower resolutions may be published as supplementary fields (h3_index_9, etc.), but resolution 8 is authoritative.
LLM-ready surface
Every dataset publishes a llm_text view in Markdown-KV format alongside the structured Parquet/CSV view. This format is optimized for LLM reasoning — benchmarked at 60.7% accuracy versus 44.3% for CSV.
- chunk_id: urn:aprs:chunk:9f3a1b2c...
- record_id: urn:aprs:record:civic:us:granicus:phila-2024-03-15-item7b
- jurisdiction: Philadelphia, PA
- occurred_at: 2024-03-15
- event_type: zoning_vote
- summary: Council voted 11-6 to approve rezoning of the 2200 block...
- entities: {name: Kenyatta Johnson, role: councilmember, sentiment: supportive}
- litigation_risk_score: 0.42
- source_uri: https://phlcouncil.com/meetings/...
Claim vs. fact separation
For datasets where multiple sources may contradict each other (corporate ownership, permits, civic proceedings), records are published at two layers:
- Claim layer — what a source asserted, preserved verbatim with source attribution. Multiple claims per subject are expected.
- Fact layer — Codex’s resolution of contradictory claims into a canonical row, with a
resolution_method field explaining the choice (e.g. latest_by_published_at, authority_priority, manual_review).
This separation ensures you can always trace how a fact was derived and which original assertions it is based on. See the claim/fact separation page for full schemas, resolution methods, and query examples.
Versioning
- Schema semver —
MAJOR.MINOR.PATCH. Breaking changes (removing fields, changing types, narrowing enums) bump MAJOR. Additive changes bump MINOR. Doc-only clarifications bump PATCH.
- Normalization semver — independent of schema version. A normalization version bump is always accompanied by a changelog entry.
- Snapshot releases — first of each month, immutable once published. Labeled
YYYY-MM.
- Deprecation — fields marked
@deprecated in a MINOR release may be removed in the next MAJOR with at least 6 months notice.