Skip to main content
For datasets where multiple sources may contradict each other — corporate ownership, permits, civic proceedings — Codex publishes records at two layers: claims (what sources asserted) and facts (Codex’s resolution of those claims). This separation lets you trace every data point back to its origin and audit how contradictions were resolved.

Why two layers

Collapsing contradictory source records into a single “truth row” breaks three use cases:
  1. Legal and compliance. When two agencies disagree about a permit’s effective date, you need to see both assertions and who made them — not whichever one a normalizer picked.
  2. CRE diligence. When Assessor A says a parcel owner is X and Assessor B says Y, the answer is the distribution, not the pick.
  3. Regulated-AI workflows. Downstream LLMs need to cite the source, not a Codex inference. Inferences belong to Codex; claims belong to the source.

The two-layer model

  • Claims are source-authored assertions, preserved verbatim with source attribution. Multiple claims per subject are expected and intentional. A claim never mutates after insert.
  • Facts are Codex-authored resolutions. A fact always cites its supporting claims via supporting_claim_ids. A fact can be superseded but never deleted — the reasoning trail is permanent.

Claim schema

Each claim captures what was asserted, where it can be verified, and who extracted it:
{
  "claim_id": "clm:urn:aprs:chunk:9f3a1b2c3d4e5f6a:entity.0",
  "subject": "urn:aprs:entity:person:k-johnson-phila",
  "predicate": "supports",
  "object": "urn:aprs:record:permit:phila:variance-2024-7b",
  "evidence_anchor": {
    "doc_id": "urn:aprs:record:civic:us:granicus:phila-2024-03-15-item7b",
    "page": 3,
    "char_span": [42, 118],
    "excerpt": "Councilmember Johnson: I support the variance as submitted..."
  },
  "asserted_by": "civic_llm_extractor",
  "extractor_version": "civic_activation/0.9.1",
  "confidence": 0.91
}

Evidence anchor rules

  • doc_id must be a valid APRS record URN reachable in a Codex dataset.
  • At least one of page, char_span, or xpath must be populated so you can verify the claim against the source.
  • excerpt (up to 400 characters) is populated when available for display convenience.

Claim predicates

Claims use a controlled vocabulary of predicates. For Civic Intelligence v1: Entity-to-record predicates: mentions, supports, opposes, abstains_on, introduces, represents_client_in, has_conflict_on Record-to-record predicates: supersedes, amends, cites, contradicts, continues_from Attribute predicates: has_status, has_value, has_effective_date

Fact schema

Each fact records the resolution, its method, and which claims support it:
{
  "fact_id": "fct:urn:aprs:record:civic:us:granicus:phila-2024-03-15-item7b:upzoning_probability",
  "predicate": "upzoning_probability",
  "value": 0.82,
  "unit": "probability",
  "resolution_method": "llm_inference",
  "model_name": "distilbert-upzoning-v3",
  "model_version": "0.4.1",
  "supporting_claim_ids": [
    "clm:urn:aprs:chunk:9f3a1b2c3d4e5f6a:entity.0",
    "clm:urn:aprs:chunk:9f3a1b2c3d4e5f6a:language_signal.2"
  ],
  "status": "current",
  "computed_at": "2024-03-19T05:03:22Z",
  "valid_from": "2024-03-15T00:00:00Z",
  "valid_to": null
}

Resolution methods

Every fact carries a resolution_method explaining how Codex produced it:
MethodDescription
verbatim_single_claimSingle source claim, passed through unchanged
authority_priorityMultiple contradicting claims; higher-authority source wins
latest_by_published_atMost recent claim wins (typical for ownership, status fields)
majority_voteThree or more claims, simple majority (requires at least 2 agreeing)
weighted_voteClaims weighted by source reliability and confidence
llm_inferenceDownstream inference (e.g. upzoning_probability, litigation_risk_score)
manual_reviewAnalyst-reviewed override
ensembleWeighted combination of multiple method outputs

Fact status

Facts transition through these states:
StatusMeaning
currentThe active resolution
supersededReplaced by a newer fact (new evidence arrived)
retractedMarked invalid by an analyst, with a documented reason
under_reviewFlagged for human review
A fact is never deleted. Consumers filtering for status='current' get the active view; querying all statuses gives the full history.

Relationship to existing fields

The claim/fact layer is additive — no existing fields are deprecated. Familiar fields like entities_extracted, blockers, and score fields remain in place as convenience views.
Existing fieldIn claim/fact model
entities_extracted[]Each entity becomes one or more claims with entity predicates
blockers[]Each blocker tag becomes a claim with predicate has_blocker
language_signals[]Each signal becomes a claim with predicate exhibits_signal
upzoning_probabilityA fact with resolution_method='llm_inference'
hostility_indexA fact with resolution_method='llm_inference'
litigation_risk_scoreA fact with resolution_method='llm_inference'

Export format

Both layers ship as nested arrays on the parent record in Parquet exports and as sections in the Markdown-KV view.

Parquet

civic_records
├── id, record_id, chunk_id, ... (APRS envelope)
├── entities_extracted[], blockers[], ... (convenience views)
├── claims: list<struct<claim_id, subject, predicate, object,
│     evidence_anchor, asserted_by, confidence, ...>>
└── facts: list<struct<fact_id, predicate, value, unit,
      resolution_method, supporting_claim_ids[], status, ...>>

Markdown-KV

## Claims (3)
- **clm:urn:...:entity.0** — {subject} `supports` {object}  (conf 0.91)
  > "Councilmember Johnson: I support the variance as submitted..."

## Facts (2)
- **fct:urn:...:upzoning_probability** = 0.82
  - method: llm_inference (distilbert-upzoning-v3/0.4.1)
  - supports: clm:urn:...:entity.0, clm:urn:...:language_signal.2
  - status: current

Example queries

All claims about a specific entity

SELECT
  r.id AS record_id,
  r.occurred_at,
  r.jurisdiction_slug,
  c->>'predicate' AS predicate,
  c->>'object' AS object,
  c->>'excerpt' AS excerpt
FROM civic_records r,
     jsonb_array_elements(r.claims) AS c
WHERE c->>'subject' = 'urn:aprs:entity:person:k-johnson-phila'
ORDER BY r.occurred_at DESC;

Current upzoning probability with supporting claims

WITH f AS (
  SELECT r.id, fact->>'value' AS prob, fact->'supporting_claim_ids' AS claim_ids
  FROM civic_records r,
       jsonb_array_elements(r.facts) AS fact
  WHERE r.id = :record_id
    AND fact->>'predicate' = 'upzoning_probability'
    AND fact->>'status' = 'current'
)
SELECT f.prob,
       jsonb_agg(c) AS supporting_claims
FROM f, civic_records r,
     jsonb_array_elements(r.claims) AS c
WHERE r.id = f.id
  AND c->>'claim_id' = ANY (ARRAY(SELECT jsonb_array_elements_text(f.claim_ids)))
GROUP BY f.prob;

Records where facts changed in the last 7 days

SELECT r.id, r.jurisdiction_slug, COUNT(*) AS fact_revisions
FROM civic_records r,
     jsonb_array_elements(r.facts) AS f
WHERE (f->>'computed_at')::timestamptz > now() - interval '7 days'
  AND f->>'supersedes_fact_id' IS NOT NULL
GROUP BY r.id, r.jurisdiction_slug
ORDER BY fact_revisions DESC;