Skip to main content
The envelope coverage dashboard shows how far along each catalog table is in adopting the APRS data envelope. Open it at Admin → Codex coverage (/admin/codex-coverage) to see, at a glance, which tables are fully compliant, which are in progress, and which haven’t started yet.

Why it matters

Every row in the Codex catalog needs a populated record_id to be traceable, deduplicated, and exportable in audit packages. The coverage dashboard gives you a single view into that progress so you can answer questions like:
  • How close is my dataset to full APRS compliance?
  • Which tables still have unpopulated records?
  • Are nightly backfill jobs actually making progress?

What you’ll see

Summary cards

At the top of the dashboard, four summary cards show:
CardDescription
Tables trackedTotal number of catalog tables with a record_id column.
Fully compliantTables where ≥ 99% of rows have a populated record_id.
Empty / 0%Tables where no rows have been backfilled yet.
In progressEverything in between — backfill has started but isn’t complete.
An overall percentage is displayed alongside the total number of envelope-compliant rows out of all rows across every tracked table.

Coverage table

Below the summary cards, a sortable table lists every catalog table, ordered worst-first by default. Each row shows:
ColumnDescription
TableThe catalog table name and its source key (if available).
CategoryThe data category the table belongs to.
RowsTotal row count in the table.
PopulatedNumber of rows with a populated record_id.
%Percentage of rows that are envelope-compliant, color-coded by bucket.
Last backfillWhen the most recent backfill job ran for this table, plus how many rows it processed.

Color-coded status buckets

The percentage column is color-coded so you can scan for problem areas quickly:
ColorRangeMeaning
Gray0%No rows backfilled yet.
Red< 25%Early stages — backfill has started but has a long way to go.
Amber25%–74%In progress — backfill is underway.
Blue75%–98%Almost there — most rows are compliant.
Green≥ 99%Fully compliant.

How data refreshes

Coverage statistics are refreshed automatically by a nightly job that runs at 05:00 UTC, after the backfill window finishes. The dashboard header shows when the last refresh occurred. You don’t need to trigger refreshes manually. Each nightly run scans every catalog table that has a record_id column, counts total rows versus populated rows, and updates the dashboard.
If a table was recently added to the catalog, it will appear on the dashboard after the next nightly refresh.

Compliance views

Two SQL views give you direct access to coverage and backfill data without leaving your database client. Use them for ad-hoc queries, alerting integrations, or any workflow where you need the raw numbers behind the dashboard.

codex_compliance_summary

Returns one row per tracked table with coverage percentages for every APRS envelope column, plus h3_index and mmsi. Coverage is derived from PostgreSQL catalog statistics (pg_stats.null_frac), so the query is fast and doesn’t scan table data.
select *
from codex_compliance_summary;
ColumnDescription
table_nameThe catalog table (e.g. ais_positions, vessel_visits).
table_sizeHuman-readable table size on disk.
record_id_pctPercentage of rows with a populated record_id.
source_uri_pctPercentage of rows with a populated source_uri.
source_system_pctPercentage of rows with a populated source_system.
chunk_id_pctPercentage of rows with a populated chunk_id.
occurred_at_pctPercentage of rows with a populated occurred_at.
h3_index_pctPercentage of rows with a populated h3_index.
mmsi_pctPercentage of rows with a populated mmsi.
stats_refreshed_atWhen PostgreSQL last ran ANALYZE on the table.
Coverage percentages are based on the most recent ANALYZE run. If you need sub-percentage-point accuracy, run ANALYZE on the target table first:
analyze ais_positions;
select * from codex_compliance_summary where table_name = 'ais_positions';

codex_backfill_progress

Shows the status of h3_index and mmsi backfill jobs over the last seven days. Each row represents one table with aggregated run counts, rows updated, and a stall indicator.
select *
from codex_backfill_progress;
ColumnDescription
table_nameThe table being backfilled.
run_countNumber of backfill runs in the last seven days.
total_rows_updatedTotal rows updated across all runs.
last_run_atTimestamp of the most recent completed run.
since_last_runInterval since the last run (useful for spotting gaps).
last_cursorThe cursor position where the last run stopped.
statusrunning if the last run was within five minutes, stalled otherwise.
Use this view to detect stuck backfills before they fall behind:
select table_name, status, since_last_run
from codex_backfill_progress
where status = 'stalled';

Vessel-level coverage for port events

The port_events table now includes a dedicated mmsi column, completing the full APRS envelope for port-level activity data. This means you can:
  • Join port events to vessel records — link arrivals, departures, and berth assignments directly to a vessel’s MMSI without going through a separate lookup table.
  • Run spoofing-detection queries — the mmsi column is indexed (partial index on non-null values), so queries that cross-reference vessel identity against port activity are fast even at scale.
  • Track mmsi coverage in the dashboard — the codex_compliance_summary view reports mmsi_pct for every tracked table, including port_events.
-- Find port events for a specific vessel
select *
from port_events
where mmsi = '211331640'
order by occurred_at desc
limit 20;

CLI compliance checker

The codex.cli check command validates APRS envelope compliance for any table — or all tables at once — from the command line. It samples recent rows and runs nine validation checks covering identity, provenance, temporal ordering, and access-tier fields.

Running a check

Check a single table:
python -m groundswell_collector.codex.cli check civic_records
Check all tables and write a JSON report:
python -m groundswell_collector.codex.cli check all --limit 500 --report /tmp/aprs.json
FlagDefaultDescription
--limit1000Number of most-recent rows to sample per table.
--reportPath to write a JSON report file.

What gets checked

Each sampled row is validated against nine checks:
CheckSeverityRule
record_idcriticalMust be a valid APRS URN (urn:aprs:record:...).
source_uricriticalMust be non-null.
ingested_atcriticalMust be valid ISO 8601.
schema_versioncriticalMust match aprs.{domain}/{semver}.
normalization_versionwarningMust be semver.
occurred_atwarningShould be set when semantically applicable.
temporal_ordercriticaloccurred_at must not be after ingested_at.
effective_ordercriticaleffective_from must not be after effective_to.
acl_tierwarningMust be research, commercial, or internal.

Per-table column introspection

The compliance checker dynamically discovers which envelope columns each table actually has before running checks. This means:
  • Tables that use a timestamp column other than created_at for ordering (such as timestamp on AIS positions, started_at on events, loaded_at on building permits, or updated_at on vessels) are handled correctly.
  • Optional envelope fields like occurred_at, effective_from, and effective_to are only validated if the table has those columns. A missing optional column is not treated as non-compliant.
  • The checker exits non-zero if any table has critical issues affecting more than 5% of sampled rows, making it suitable for CI pipelines.
The compliance checker runs automatically in CI on every pull request that touches data collectors or migrations. You can also run it locally to validate your changes before pushing.

Reading the output

The checker prints a formatted report with pass, warning, and fail indicators for each table:
  • Pass — all critical checks above 95% compliance.
  • Warning — non-critical checks below threshold, but no critical failures.
  • Fail — one or more critical checks below 95% compliance.
The optional --report JSON file includes per-table compliance rates and individual issue details, suitable for dashboards or alerting integrations.

When to use it

  • During APRS rollout — monitor adoption as backfill jobs work through historical data. Commodity tables (conab_crop_reports, antt_road_freight, cepea_commodity_prices) and maritime tables (antaq_vessel_calls, usace_lock_passages) now populate envelope fields at write time, so new rows appear as compliant immediately.
  • Before an audit — confirm that all evidence tables are fully envelope-compliant before generating an audit export.
  • Troubleshooting stalled backfills — if a table’s percentage hasn’t changed in several days and the “Last backfill” timestamp is stale, the backfill job for that table may need attention. Query codex_backfill_progress to check whether the job is running or stalled.
  • Programmatic alerting — query codex_compliance_summary or codex_backfill_progress from your monitoring tools to trigger alerts when coverage drops or backfills stall.