The OSHA Safety Index dataset contains 500K+ enforcement actions, inspections, and citations from the Occupational Safety and Health Administration. Each record is linked to EPA Facility Registry Service (FRS) identifiers, classified by NAICS sector, and connected to parent corporate entities through entity resolution.
Every record inherits the full APRS envelope (record_id, chunk_id, bitemporal fields, confidence_score, provenance) and carries the join keys documented below.
Dataset-specific fields
| Field | Type | Nullable | Description |
|---|
source_ref | string | no | OSHA inspection number. |
occurred_at | timestamptz | no | Inspection open date. |
resolved_at | timestamptz | yes | Case close date. Null if still open. |
entity_id | string | no | Establishment identifier. |
entity_name | string | yes | Establishment name. |
location | PostGIS Point | yes | WGS84 geometry of the establishment. |
h3_index | string | yes | H3 resolution-8 cell. |
jurisdiction_slug | string | yes | Civic jurisdiction where the establishment is located. |
naics_code | string | no | 6-digit NAICS code (2022 revision). |
naics_title | string | no | Human-readable NAICS sector title. |
frs_id | string | yes | EPA Facility Registry Service identifier. See facility linkage. |
inspection_type | enum | no | Type of inspection. See inspection types. |
violation_count | integer | no | Number of violations cited in this inspection. |
citation_severity | enum | yes | Most severe citation issued. See citation severity. |
penalty_usd | numeric | yes | Current penalty amount in USD. |
hazard_categories | JSON array | yes | Human-readable hazard labels derived from CFR standards codes. See hazard categories. |
parent_entity_urn | string | yes | Entity resolution link to the parent corporate entity. |
Inspection types
| Type | Description |
|---|
Programmed | Planned inspection based on industry targeting criteria |
Unprogrammed | Not planned — triggered by complaint, referral, or accident |
Referral | Referred by another agency or inspector |
Complaint | Filed by an employee or representative |
FollowUp | Follow-up to a previous inspection |
Accident | Triggered by a workplace accident, injury, or fatality |
Citation severity
| Severity | Description |
|---|
willful | Employer knowingly committed a violation |
serious | Substantial probability of death or serious harm |
other | Non-serious violation |
repeat | Substantially similar violation within the last 5 years |
Facility linkage
Each OSHA record is matched against the EPA Facility Registry Service (FRS) using address, NAICS code, and name heuristics. The match confidence is available in metadata.frs_match_confidence.
Low-confidence FRS matches (below 0.7) are excluded from Commercial-tier exports. Research-tier users see all matches with the confidence score included for filtering.
Hazard categories
The hazard_categories field translates CFR standards references (e.g. 1926.501) into human-readable labels. A single inspection may cite multiple hazards:
["fall_protection", "scaffolding", "electrical_wiring"]
The raw CFR reference codes are preserved in metadata.cfr_codes.
Join keys
| Key | Presence | Notes |
|---|
record_id | always | APRS URN |
chunk_id | always | Deterministic from record_id |
h3_index | often | Null for establishments without geocodable addresses |
naics_code | always | Join with POI Intelligence, LEHD, and other NAICS-indexed datasets |
frs_id | often | EPA Facility Registry Service link |
jurisdiction_slug | often | Civic jurisdiction |
parent_entity_urn | sometimes | Corporate entity resolution link |
Example query
Find the top penalty inspections in a NAICS sector with willful citations:
SELECT
source_ref,
entity_name,
naics_title,
occurred_at,
citation_severity,
violation_count,
penalty_usd,
hazard_categories
FROM read_parquet('osha-safety-2026-04.parquet')
WHERE naics_code LIKE '2362%'
AND citation_severity = 'willful'
AND penalty_usd > 0
ORDER BY penalty_usd DESC
LIMIT 20;
Known limitations
- Only public OSHA establishment search data is included. Confidential injury and illness data per 29 CFR 1904 is not available.
frs_id linkage uses heuristic matching — verify metadata.frs_match_confidence before relying on facility cross-references.
parent_entity_urn is populated by the entity resolution pipeline and may be null for recently ingested records.
- Penalty amounts reflect the current (potentially reduced) penalty, not the originally proposed amount. Check
metadata.proposed_penalty_usd for the initial figure.