Why two layers
Collapsing contradictory source records into a single “truth row” breaks three use cases:- Legal and compliance. When two agencies disagree about a permit’s effective date, you need to see both assertions and who made them — not whichever one a normalizer picked.
- CRE diligence. When Assessor A says a parcel owner is X and Assessor B says Y, the answer is the distribution, not the pick.
- Regulated-AI workflows. Downstream LLMs need to cite the source, not a Codex inference. Inferences belong to Codex; claims belong to the source.
The two-layer model
- Claims are source-authored assertions, preserved verbatim with source attribution. Multiple claims per subject are expected and intentional. A claim never mutates after insert.
- Facts are Codex-authored resolutions. A fact always cites its supporting claims via
supporting_claim_ids. A fact can be superseded but never deleted — the reasoning trail is permanent.
Claim schema
Each claim captures what was asserted, where it can be verified, and who extracted it:Evidence anchor rules
doc_idmust be a valid APRS record URN reachable in a Codex dataset.- At least one of
page,char_span, orxpathmust be populated so you can verify the claim against the source. excerpt(up to 400 characters) is populated when available for display convenience.
Claim predicates
Claims use a controlled vocabulary of predicates. For Civic Intelligence v1: Entity-to-record predicates:mentions, supports, opposes, abstains_on, introduces, represents_client_in, has_conflict_on
Record-to-record predicates: supersedes, amends, cites, contradicts, continues_from
Attribute predicates: has_status, has_value, has_effective_date
Fact schema
Each fact records the resolution, its method, and which claims support it:Resolution methods
Every fact carries aresolution_method explaining how Codex produced it:
| Method | Description |
|---|---|
verbatim_single_claim | Single source claim, passed through unchanged |
authority_priority | Multiple contradicting claims; higher-authority source wins |
latest_by_published_at | Most recent claim wins (typical for ownership, status fields) |
majority_vote | Three or more claims, simple majority (requires at least 2 agreeing) |
weighted_vote | Claims weighted by source reliability and confidence |
llm_inference | Downstream inference (e.g. upzoning_probability, litigation_risk_score) |
manual_review | Analyst-reviewed override |
ensemble | Weighted combination of multiple method outputs |
Fact status
Facts transition through these states:| Status | Meaning |
|---|---|
current | The active resolution |
superseded | Replaced by a newer fact (new evidence arrived) |
retracted | Marked invalid by an analyst, with a documented reason |
under_review | Flagged for human review |
status='current' get the active view; querying all statuses gives the full history.
Relationship to existing fields
The claim/fact layer is additive — no existing fields are deprecated. Familiar fields likeentities_extracted, blockers, and score fields remain in place as convenience views.
| Existing field | In claim/fact model |
|---|---|
entities_extracted[] | Each entity becomes one or more claims with entity predicates |
blockers[] | Each blocker tag becomes a claim with predicate has_blocker |
language_signals[] | Each signal becomes a claim with predicate exhibits_signal |
upzoning_probability | A fact with resolution_method='llm_inference' |
hostility_index | A fact with resolution_method='llm_inference' |
litigation_risk_score | A fact with resolution_method='llm_inference' |