Claim/fact separation

For datasets where multiple sources may contradict each other — corporate ownership, permits, civic proceedings — Codex publishes records at two layers: claims (what sources asserted) and facts (Codex’s resolution of those claims). This separation lets you trace every data point back to its origin and audit how contradictions were resolved.

Why two layers

Collapsing contradictory source records into a single “truth row” breaks three use cases:

Legal and compliance. When two agencies disagree about a permit’s effective date, you need to see both assertions and who made them — not whichever one a normalizer picked.
CRE diligence. When Assessor A says a parcel owner is X and Assessor B says Y, the answer is the distribution, not the pick.
Regulated-AI workflows. Downstream LLMs need to cite the source, not a Codex inference. Inferences belong to Codex; claims belong to the source.

The two-layer model

Claims are source-authored assertions, preserved verbatim with source attribution. Multiple claims per subject are expected and intentional. A claim never mutates after insert.
Facts are Codex-authored resolutions. A fact always cites its supporting claims via supporting_claim_ids. A fact can be superseded but never deleted — the reasoning trail is permanent.

Claim schema

Each claim captures what was asserted, where it can be verified, and who extracted it:

{
  "claim_id": "clm:urn:aprs:chunk:9f3a1b2c3d4e5f6a:entity.0",
  "subject": "urn:aprs:entity:person:k-johnson-phila",
  "predicate": "supports",
  "object": "urn:aprs:record:permit:phila:variance-2024-7b",
  "evidence_anchor": {
    "doc_id": "urn:aprs:record:civic:us:granicus:phila-2024-03-15-item7b",
    "page": 3,
    "char_span": [42, 118],
    "excerpt": "Councilmember Johnson: I support the variance as submitted..."
  },
  "asserted_by": "civic_llm_extractor",
  "extractor_version": "civic_activation/0.9.1",
  "confidence": 0.91
}

Evidence anchor rules

doc_id must be a valid APRS record URN reachable in a Codex dataset.
At least one of page, char_span, or xpath must be populated so you can verify the claim against the source.
excerpt (up to 400 characters) is populated when available for display convenience.

Claim predicates

Claims use a controlled vocabulary of predicates. For Civic Intelligence v1: Entity-to-record predicates: mentions, supports, opposes, abstains_on, introduces, represents_client_in, has_conflict_on Record-to-record predicates: supersedes, amends, cites, contradicts, continues_from Attribute predicates: has_status, has_value, has_effective_date

Fact schema

Each fact records the resolution, its method, and which claims support it:

{
  "fact_id": "fct:urn:aprs:record:civic:us:granicus:phila-2024-03-15-item7b:upzoning_probability",
  "predicate": "upzoning_probability",
  "value": 0.82,
  "unit": "probability",
  "resolution_method": "llm_inference",
  "model_name": "distilbert-upzoning-v3",
  "model_version": "0.4.1",
  "supporting_claim_ids": [
    "clm:urn:aprs:chunk:9f3a1b2c3d4e5f6a:entity.0",
    "clm:urn:aprs:chunk:9f3a1b2c3d4e5f6a:language_signal.2"
  ],
  "status": "current",
  "computed_at": "2024-03-19T05:03:22Z",
  "valid_from": "2024-03-15T00:00:00Z",
  "valid_to": null
}

Resolution methods

Every fact carries a resolution_method explaining how Codex produced it:

Method	Description
`verbatim_single_claim`	Single source claim, passed through unchanged
`authority_priority`	Multiple contradicting claims; higher-authority source wins
`latest_by_published_at`	Most recent claim wins (typical for ownership, status fields)
`majority_vote`	Three or more claims, simple majority (requires at least 2 agreeing)
`weighted_vote`	Claims weighted by source reliability and confidence
`llm_inference`	Downstream inference (e.g. `upzoning_probability`, `litigation_risk_score`)
`manual_review`	Analyst-reviewed override
`ensemble`	Weighted combination of multiple method outputs

Fact status

Facts transition through these states:

Status	Meaning
`current`	The active resolution
`superseded`	Replaced by a newer fact (new evidence arrived)
`retracted`	Marked invalid by an analyst, with a documented reason
`under_review`	Flagged for human review

A fact is never deleted. Consumers filtering for status='current' get the active view; querying all statuses gives the full history.

Relationship to existing fields

The claim/fact layer is additive — no existing fields are deprecated. Familiar fields like entities_extracted, blockers, and score fields remain in place as convenience views.

Existing field	In claim/fact model
`entities_extracted[]`	Each entity becomes one or more claims with entity predicates
`blockers[]`	Each blocker tag becomes a claim with predicate `has_blocker`
`language_signals[]`	Each signal becomes a claim with predicate `exhibits_signal`
`upzoning_probability`	A fact with `resolution_method='llm_inference'`
`hostility_index`	A fact with `resolution_method='llm_inference'`
`litigation_risk_score`	A fact with `resolution_method='llm_inference'`

Export format

Both layers ship as nested arrays on the parent record in Parquet exports and as sections in the Markdown-KV view.

Parquet

civic_records
├── id, record_id, chunk_id, ... (APRS envelope)
├── entities_extracted[], blockers[], ... (convenience views)
├── claims: list<struct<claim_id, subject, predicate, object,
│     evidence_anchor, asserted_by, confidence, ...>>
└── facts: list<struct<fact_id, predicate, value, unit,
      resolution_method, supporting_claim_ids[], status, ...>>

Markdown-KV

## Claims (3)
- **clm:urn:...:entity.0** — {subject} `supports` {object}  (conf 0.91)
  > "Councilmember Johnson: I support the variance as submitted..."

## Facts (2)
- **fct:urn:...:upzoning_probability** = 0.82
  - method: llm_inference (distilbert-upzoning-v3/0.4.1)
  - supports: clm:urn:...:entity.0, clm:urn:...:language_signal.2
  - status: current

Example queries

All claims about a specific entity

SELECT
  r.id AS record_id,
  r.occurred_at,
  r.jurisdiction_slug,
  c->>'predicate' AS predicate,
  c->>'object' AS object,
  c->>'excerpt' AS excerpt
FROM civic_records r,
     jsonb_array_elements(r.claims) AS c
WHERE c->>'subject' = 'urn:aprs:entity:person:k-johnson-phila'
ORDER BY r.occurred_at DESC;

Current upzoning probability with supporting claims

WITH f AS (
  SELECT r.id, fact->>'value' AS prob, fact->'supporting_claim_ids' AS claim_ids
  FROM civic_records r,
       jsonb_array_elements(r.facts) AS fact
  WHERE r.id = :record_id
    AND fact->>'predicate' = 'upzoning_probability'
    AND fact->>'status' = 'current'
)
SELECT f.prob,
       jsonb_agg(c) AS supporting_claims
FROM f, civic_records r,
     jsonb_array_elements(r.claims) AS c
WHERE r.id = f.id
  AND c->>'claim_id' = ANY (ARRAY(SELECT jsonb_array_elements_text(f.claim_ids)))
GROUP BY f.prob;

Records where facts changed in the last 7 days

SELECT r.id, r.jurisdiction_slug, COUNT(*) AS fact_revisions
FROM civic_records r,
     jsonb_array_elements(r.facts) AS f
WHERE (f->>'computed_at')::timestamptz > now() - interval '7 days'
  AND f->>'supersedes_fact_id' IS NOT NULL
GROUP BY r.id, r.jurisdiction_slug
ORDER BY fact_revisions DESC;

Get started

Frameworks

Reference

Why two layers

The two-layer model

Claim schema

Evidence anchor rules

Claim predicates

Fact schema

Resolution methods

Fact status

Relationship to existing fields

Export format

Parquet

Markdown-KV

Example queries

All claims about a specific entity

Current upzoning probability with supporting claims

Records where facts changed in the last 7 days

​Why two layers

​The two-layer model

​Claim schema

​Evidence anchor rules

​Claim predicates

​Fact schema

​Resolution methods

​Fact status

​Relationship to existing fields

​Export format

​Parquet

​Markdown-KV

​Example queries

​All claims about a specific entity

​Current upzoning probability with supporting claims

​Records where facts changed in the last 7 days

Why two layers

The two-layer model

Claim schema

Evidence anchor rules

Claim predicates

Fact schema

Resolution methods

Fact status

Relationship to existing fields

Export format

Parquet

Markdown-KV

Example queries

All claims about a specific entity

Current upzoning probability with supporting claims

Records where facts changed in the last 7 days