Skip to main content
The OSHA Safety Index dataset contains 500K+ enforcement actions, inspections, and citations from the Occupational Safety and Health Administration. Each record is linked to EPA Facility Registry Service (FRS) identifiers, classified by NAICS sector, and connected to parent corporate entities through entity resolution. Every record inherits the full APRS envelope (record_id, chunk_id, bitemporal fields, confidence_score, provenance) and carries the join keys documented below.

Dataset-specific fields

FieldTypeNullableDescription
source_refstringnoOSHA inspection number.
occurred_attimestamptznoInspection open date.
resolved_attimestamptzyesCase close date. Null if still open.
entity_idstringnoEstablishment identifier.
entity_namestringyesEstablishment name.
locationPostGIS PointyesWGS84 geometry of the establishment.
h3_indexstringyesH3 resolution-8 cell.
jurisdiction_slugstringyesCivic jurisdiction where the establishment is located.
naics_codestringno6-digit NAICS code (2022 revision).
naics_titlestringnoHuman-readable NAICS sector title.
frs_idstringyesEPA Facility Registry Service identifier. See facility linkage.
inspection_typeenumnoType of inspection. See inspection types.
violation_countintegernoNumber of violations cited in this inspection.
citation_severityenumyesMost severe citation issued. See citation severity.
penalty_usdnumericyesCurrent penalty amount in USD.
hazard_categoriesJSON arrayyesHuman-readable hazard labels derived from CFR standards codes. See hazard categories.
parent_entity_urnstringyesEntity resolution link to the parent corporate entity.

Inspection types

TypeDescription
ProgrammedPlanned inspection based on industry targeting criteria
UnprogrammedNot planned — triggered by complaint, referral, or accident
ReferralReferred by another agency or inspector
ComplaintFiled by an employee or representative
FollowUpFollow-up to a previous inspection
AccidentTriggered by a workplace accident, injury, or fatality

Citation severity

SeverityDescription
willfulEmployer knowingly committed a violation
seriousSubstantial probability of death or serious harm
otherNon-serious violation
repeatSubstantially similar violation within the last 5 years

Facility linkage

Each OSHA record is matched against the EPA Facility Registry Service (FRS) using address, NAICS code, and name heuristics. The match confidence is available in metadata.frs_match_confidence.
Low-confidence FRS matches (below 0.7) are excluded from Commercial-tier exports. Research-tier users see all matches with the confidence score included for filtering.

Hazard categories

The hazard_categories field translates CFR standards references (e.g. 1926.501) into human-readable labels. A single inspection may cite multiple hazards:
["fall_protection", "scaffolding", "electrical_wiring"]
The raw CFR reference codes are preserved in metadata.cfr_codes.

Join keys

KeyPresenceNotes
record_idalwaysAPRS URN
chunk_idalwaysDeterministic from record_id
h3_indexoftenNull for establishments without geocodable addresses
naics_codealwaysJoin with POI Intelligence, LEHD, and other NAICS-indexed datasets
frs_idoftenEPA Facility Registry Service link
jurisdiction_slugoftenCivic jurisdiction
parent_entity_urnsometimesCorporate entity resolution link

Example query

Find the top penalty inspections in a NAICS sector with willful citations:
SELECT
  source_ref,
  entity_name,
  naics_title,
  occurred_at,
  citation_severity,
  violation_count,
  penalty_usd,
  hazard_categories
FROM read_parquet('osha-safety-2026-04.parquet')
WHERE naics_code LIKE '2362%'
  AND citation_severity = 'willful'
  AND penalty_usd > 0
ORDER BY penalty_usd DESC
LIMIT 20;

Known limitations

  • Only public OSHA establishment search data is included. Confidential injury and illness data per 29 CFR 1904 is not available.
  • frs_id linkage uses heuristic matching — verify metadata.frs_match_confidence before relying on facility cross-references.
  • parent_entity_urn is populated by the entity resolution pipeline and may be null for recently ingested records.
  • Penalty amounts reflect the current (potentially reduced) penalty, not the originally proposed amount. Check metadata.proposed_penalty_usd for the initial figure.