The Civic Intelligence dataset contains ~517K records sourced from Granicus transcripts, Legistar council matters, CivicPlus, and Chicago ELMS. Records are collected daily and published as monthly immutable snapshots.
Every record inherits the full APRS envelope (record_id, chunk_id, bitemporal fields, confidence_score, provenance) and carries the join keys documented below.
Dataset-specific fields
| Field | Type | Nullable | Description |
|---|
jurisdiction_slug | string | no | Civic jurisdiction identifier (city-slug-state). |
h3_index | string | yes | H3 resolution-8 cell derived from meeting or parcel location. |
document_type | enum | no | Record classification. See document types. |
source | enum | no | Originating system (granicus, legistar, civicplus, chicago_elms, manual). |
source_id | string | yes | Source-native identifier (Granicus clip ID, Legistar matter ID). |
committee_name | string | yes | Committee or body that owned the proceeding. |
duration_min | integer | yes | Meeting duration in minutes (transcripts only). |
word_count | integer | yes | Transcript word count. |
summary | text | yes | LLM-generated summary. Max 500 chars in llm_text view, full length in Parquet. |
raw_text | text | yes | Source transcript or matter body. |
entities_extracted | JSON array | yes | Extracted entities with role, sentiment, and URN. See entities. |
blockers | JSON array | yes | Approval blockers from a controlled vocabulary. See blockers. |
contingency_dag | JSON object | yes | DAG of conditional approvals. See contingency DAG. |
language_signals | JSON array | yes | Detected topical signals from a controlled vocabulary. |
sentiment_polarity | enum | no | positive, neutral, or negative. |
topic_velocity | numeric | yes | Rate-of-mentions signal for this topic in this jurisdiction. |
momentum_score | numeric [0,1] | yes | Confidence-weighted aggregate score. |
upzoning_probability | numeric [0,1] | yes | Classifier-estimated probability of density-increasing zoning change. See scores. |
hostility_index | numeric [0,1] | yes | Aggregate of hostile-sentiment mentions and opposition-coded signals. |
litigation_risk_score | numeric [0,1] | yes | Probability of formal legal challenge within 24 months. |
source_url | URL | yes | Public-facing source link (council packet PDF, meeting recording). |
Document types
| Value | Description |
|---|
council_meeting | Full council or board meeting (transcript or minutes) |
zoning_vote | Council or Planning Commission vote on a zoning item |
rezoning_hearing | Public hearing on a proposed rezoning |
variance_hearing | Zoning board of adjustment or variance hearing |
environmental_review | CEQA/NEPA or state-level environmental review |
capital_improvement | Capital improvement plan item |
code_enforcement | Code enforcement proceeding |
tax_assessment | Tax assessment appeal or action |
building_inspection | Inspection outcome |
foia_request | Filed FOIA or public-records request |
planning_matter | Other planning matter (Legistar catch-all) |
civic_document | Uncategorized civic document (low classification confidence) |
Entities
The entities_extracted field contains an array of entities mentioned in the record. Each entity includes:
{
"name": "Kenyatta Johnson",
"role": "councilmember",
"sentiment": "supportive",
"entity_urn": "urn:aprs:entity:person:k-johnson-phila",
"quote_span": [12481, 12723],
"mention_count": 7
}
Role values: councilmember, mayor, planning_commissioner, zoning_board_member, developer, resident, attorney, city_agency_staff, state_agency_staff, nonprofit_representative, business_owner, expert_witness, other.
Sentiment values: supportive, favorable, neutral, concerned, opposed, hostile.
entity_urn is populated by the entity resolution pipeline. Records where resolution has not yet run will have null URNs.
Blockers
Ordered array of blocker tags identified by the LLM as standing between a proceeding and final approval:
| Tag | Meaning |
|---|
awaiting_eis / awaiting_ceqa | Environmental review incomplete |
community_opposition | Organized opposition beyond expected public comment |
litigation_threat / active_litigation | Legal challenge threatened or in progress |
design_revision_required | Design changes needed before approval |
affordability_covenant_negotiation | Affordability terms under negotiation |
traffic_study_pending | Traffic impact study not complete |
historic_preservation_review | Historic district review required |
infrastructure_funding_gap | Insufficient infrastructure funding |
inter_agency_coordination | Requires another jurisdiction’s sign-off |
political_holdover | No movement across multiple meetings with no stated reason |
Contingency DAG
The contingency_dag field represents conditional approval chains as a directed acyclic graph:
{
"nodes": [
{ "id": "n1", "label": "Council final vote", "status": "pending" },
{ "id": "n2", "label": "Planning Commission recommendation", "status": "approved", "date": "2024-02-01" },
{ "id": "n3", "label": "Traffic study", "status": "pending" }
],
"edges": [
{ "from": "n2", "to": "n1" },
{ "from": "n3", "to": "n1" }
]
}
Node status values: pending, approved, denied, withdrawn, deferred.
Scores
Upzoning probability
Classifier-estimated probability that the proceeding results in zoning changes allowing greater density or use intensity. Built with DistilBERT fine-tuned on council minutes with labeled upzoning outcomes. Features include language_signals, entity sentiment distribution, document_type, and historical base rate by jurisdiction.
>= 0.600 — flagged as “likely”
>= 0.850 — flagged as “highly likely”
The training corpus is weighted toward San Francisco, Philadelphia, and Chicago. Probability calibration for smaller jurisdictions may be less accurate.
Hostility index
Aggregate of hostile-sentiment entity mentions and opposition-coded language signals, normalized to document length. Predicts procedural delay, not outcome.
Litigation risk score
Probability of a formal legal challenge (lawsuit, appeal to state board) within 24 months. Built with a gradient boosted classifier trained on historic filings matched to upstream proceedings. Features include hostility_index, presence of attorney role in entities, litigation_threat blocker tag, and jurisdiction base rate.
>= 0.700 — flagged as “high risk”
Join keys
| Key | Presence | Notes |
|---|
record_id | always | APRS URN |
chunk_id | always | Deterministic from record_id |
h3_index | often | Null when no location is attributable |
event_id | often | Present when mapped to Events Timeline; null for routine filings |
jurisdiction_slug | always | Required on every record |
entity_urn | via entities | Through entities_extracted[].entity_urn |
parcel_id | sometimes | Present for zoning votes and variances |
Example query
Find high litigation-risk zoning votes in Philadelphia:
SELECT
record_id,
occurred_at,
summary,
upzoning_probability,
litigation_risk_score,
entities_extracted
FROM read_parquet('civic-intelligence-2026-04.parquet')
WHERE jurisdiction_slug = 'philadelphia-pa'
AND document_type = 'zoning_vote'
AND litigation_risk_score >= 0.7
ORDER BY occurred_at DESC
LIMIT 10;
Known limitations
- Jurisdictional coverage is uneven — dense for San Francisco, Philadelphia, and Boston; sparse for sunbelt growth markets.
language_signals vocabulary is English-only. Bilingual meetings may underperform on signal extraction.
- Transcripts lag real-time by 24–72 hours. Use
occurred_at for event-time analysis, not ingested_at.
- The legacy
document_date field is retained for backward compatibility. Use occurred_at instead.