Skip to main content
The Civic Intelligence dataset contains ~517K records sourced from Granicus transcripts, Legistar council matters, CivicPlus, and Chicago ELMS. Records are collected daily and published as monthly immutable snapshots. Every record inherits the full APRS envelope (record_id, chunk_id, bitemporal fields, confidence_score, provenance) and carries the join keys documented below.

Dataset-specific fields

FieldTypeNullableDescription
jurisdiction_slugstringnoCivic jurisdiction identifier (city-slug-state).
h3_indexstringyesH3 resolution-8 cell derived from meeting or parcel location.
document_typeenumnoRecord classification. See document types.
sourceenumnoOriginating system (granicus, legistar, civicplus, chicago_elms, manual).
source_idstringyesSource-native identifier (Granicus clip ID, Legistar matter ID).
committee_namestringyesCommittee or body that owned the proceeding.
duration_minintegeryesMeeting duration in minutes (transcripts only).
word_countintegeryesTranscript word count.
summarytextyesLLM-generated summary. Max 500 chars in llm_text view, full length in Parquet.
raw_texttextyesSource transcript or matter body.
entities_extractedJSON arrayyesExtracted entities with role, sentiment, and URN. See entities.
blockersJSON arrayyesApproval blockers from a controlled vocabulary. See blockers.
contingency_dagJSON objectyesDAG of conditional approvals. See contingency DAG.
language_signalsJSON arrayyesDetected topical signals from a controlled vocabulary.
sentiment_polarityenumnopositive, neutral, or negative.
topic_velocitynumericyesRate-of-mentions signal for this topic in this jurisdiction.
momentum_scorenumeric [0,1]yesConfidence-weighted aggregate score.
upzoning_probabilitynumeric [0,1]yesClassifier-estimated probability of density-increasing zoning change. See scores.
hostility_indexnumeric [0,1]yesAggregate of hostile-sentiment mentions and opposition-coded signals.
litigation_risk_scorenumeric [0,1]yesProbability of formal legal challenge within 24 months.
source_urlURLyesPublic-facing source link (council packet PDF, meeting recording).

Document types

ValueDescription
council_meetingFull council or board meeting (transcript or minutes)
zoning_voteCouncil or Planning Commission vote on a zoning item
rezoning_hearingPublic hearing on a proposed rezoning
variance_hearingZoning board of adjustment or variance hearing
environmental_reviewCEQA/NEPA or state-level environmental review
capital_improvementCapital improvement plan item
code_enforcementCode enforcement proceeding
tax_assessmentTax assessment appeal or action
building_inspectionInspection outcome
foia_requestFiled FOIA or public-records request
planning_matterOther planning matter (Legistar catch-all)
civic_documentUncategorized civic document (low classification confidence)

Entities

The entities_extracted field contains an array of entities mentioned in the record. Each entity includes:
{
  "name": "Kenyatta Johnson",
  "role": "councilmember",
  "sentiment": "supportive",
  "entity_urn": "urn:aprs:entity:person:k-johnson-phila",
  "quote_span": [12481, 12723],
  "mention_count": 7
}
Role values: councilmember, mayor, planning_commissioner, zoning_board_member, developer, resident, attorney, city_agency_staff, state_agency_staff, nonprofit_representative, business_owner, expert_witness, other. Sentiment values: supportive, favorable, neutral, concerned, opposed, hostile.
entity_urn is populated by the entity resolution pipeline. Records where resolution has not yet run will have null URNs.

Blockers

Ordered array of blocker tags identified by the LLM as standing between a proceeding and final approval:
TagMeaning
awaiting_eis / awaiting_ceqaEnvironmental review incomplete
community_oppositionOrganized opposition beyond expected public comment
litigation_threat / active_litigationLegal challenge threatened or in progress
design_revision_requiredDesign changes needed before approval
affordability_covenant_negotiationAffordability terms under negotiation
traffic_study_pendingTraffic impact study not complete
historic_preservation_reviewHistoric district review required
infrastructure_funding_gapInsufficient infrastructure funding
inter_agency_coordinationRequires another jurisdiction’s sign-off
political_holdoverNo movement across multiple meetings with no stated reason

Contingency DAG

The contingency_dag field represents conditional approval chains as a directed acyclic graph:
{
  "nodes": [
    { "id": "n1", "label": "Council final vote", "status": "pending" },
    { "id": "n2", "label": "Planning Commission recommendation", "status": "approved", "date": "2024-02-01" },
    { "id": "n3", "label": "Traffic study", "status": "pending" }
  ],
  "edges": [
    { "from": "n2", "to": "n1" },
    { "from": "n3", "to": "n1" }
  ]
}
Node status values: pending, approved, denied, withdrawn, deferred.

Scores

Upzoning probability

Classifier-estimated probability that the proceeding results in zoning changes allowing greater density or use intensity. Built with DistilBERT fine-tuned on council minutes with labeled upzoning outcomes. Features include language_signals, entity sentiment distribution, document_type, and historical base rate by jurisdiction.
  • >= 0.600 — flagged as “likely”
  • >= 0.850 — flagged as “highly likely”
The training corpus is weighted toward San Francisco, Philadelphia, and Chicago. Probability calibration for smaller jurisdictions may be less accurate.

Hostility index

Aggregate of hostile-sentiment entity mentions and opposition-coded language signals, normalized to document length. Predicts procedural delay, not outcome.

Litigation risk score

Probability of a formal legal challenge (lawsuit, appeal to state board) within 24 months. Built with a gradient boosted classifier trained on historic filings matched to upstream proceedings. Features include hostility_index, presence of attorney role in entities, litigation_threat blocker tag, and jurisdiction base rate.
  • >= 0.700 — flagged as “high risk”

Join keys

KeyPresenceNotes
record_idalwaysAPRS URN
chunk_idalwaysDeterministic from record_id
h3_indexoftenNull when no location is attributable
event_idoftenPresent when mapped to Events Timeline; null for routine filings
jurisdiction_slugalwaysRequired on every record
entity_urnvia entitiesThrough entities_extracted[].entity_urn
parcel_idsometimesPresent for zoning votes and variances

Example query

Find high litigation-risk zoning votes in Philadelphia:
SELECT
  record_id,
  occurred_at,
  summary,
  upzoning_probability,
  litigation_risk_score,
  entities_extracted
FROM read_parquet('civic-intelligence-2026-04.parquet')
WHERE jurisdiction_slug = 'philadelphia-pa'
  AND document_type = 'zoning_vote'
  AND litigation_risk_score >= 0.7
ORDER BY occurred_at DESC
LIMIT 10;

Known limitations

  • Jurisdictional coverage is uneven — dense for San Francisco, Philadelphia, and Boston; sparse for sunbelt growth markets.
  • language_signals vocabulary is English-only. Bilingual meetings may underperform on signal extraction.
  • Transcripts lag real-time by 24–72 hours. Use occurred_at for event-time analysis, not ingested_at.
  • The legacy document_date field is retained for backward compatibility. Use occurred_at instead.