2026-04 snapshot, and require no paid tier.
Upzoning classifier
Fine-tune DistilBERT on council meeting language to predict zoning outcomes, then benchmark against Codex’s pre-computed scores.
Civic risk map
Build an H3 choropleth of litigation risk and test whether elevated risk predicts slower permit issuance.
Prerequisites
Both notebooks require Python 3.10+ and the following core dependencies:torch, accelerate, evaluate, and scikit-learn. The civic risk map additionally requires geopandas and shapely.
Upzoning classifier
Goal: Fine-tune a DistilBERT model (~66M parameters, ~10 minutes on a single GPU) on thelanguage_signals and summary fields from Civic Intelligence to predict upzoning outcomes (approved, denied, or continued), then compare against Codex’s pre-computed upzoning_probability.
Datasets used: Civic Intelligence (research tier)
Workflow
Load the research-tier sample
Load the Civic Intelligence dataset from Hugging Face, pinned to the
2026-04 snapshot:Filter to zoning records
Keep only
zoning_vote and rezoning_hearing records with non-null outcomes for training:Build input features
Concatenate the top 30
language_signals phrases with the summary field to create a text input for the classifier:Split by jurisdiction
Use a jurisdiction-based train/test split to test generalization — train on Philadelphia, Chicago, and Dallas; evaluate on San Francisco and New York:
Fine-tune DistilBERT
Train for 3 epochs with batch size 16, learning rate 5e-5, and macro-F1 as the evaluation metric.
What you learn
- How to load and filter Codex datasets from Hugging Face
- How
language_signalsandsummarypower zoning outcome prediction - How Codex’s pre-computed scores compare to a custom fine-tuned model
- How jurisdiction-based splits reveal geographic generalization gaps
Civic risk map
Goal: Build an H3-cell choropleth of civic litigation risk and test whether elevated risk predicts depressed permit-issuance velocity in the following 6 months. Datasets used: Civic Intelligence, Urban Signal Grid, Permit Signals (all research tier)Workflow
Load three datasets
Load the research-tier samples for Civic Intelligence, Urban Signal Grid, and Permit Signals from Hugging Face:
Filter to a focus metro
Filter all three datasets to a single metro (e.g.,
chicago-il) for tractable analysis.Aggregate risk to H3 cells
Compute a confidence-weighted mean of
litigation_risk_score per h3_index from the Civic Intelligence dataset:Render an interactive choropleth
Materialize H3 hexagon polygons and render a Plotly choropleth colored by litigation risk score.
What you learn
- How all three datasets join on
h3_indexalone — no spatial library required - How to aggregate record-level scores to the H3 cell grid
- How civic proceedings signal downstream development activity
- How to validate Codex scores against observable permit outcomes