Open Road Risk
  • Home
  • Project
    • Project overview
    • Current model status
    • AI-assisted development
  • Literature
    • Literature overview
    • Literature evidence register
    • Literature-pipeline alignment
    • Crash frequency models
    • Exposure and traffic volume
    • Spatial methods and network risk
    • Junctions and conflict structure
    • Severity modelling
    • Validation and metrics
    • Transferability and open data limits
  • Data Sources
    • Overview
    • STATS19 Collisions
    • OS Open Roads
    • AADF Traffic Counts
    • WebTRIS Sensors
    • Network Model GDB
    • OS Terrain 50 (grade)
    • Deprivation (IoD 2025)
  • Methodology
    • Methodology Overview
    • Joining the Datasets
    • Feature Engineering
    • Empirical Bayes Shrinkage
  • Exploratory Data Analysis
    • Collision EDA
    • Collision-Exposure Behaviour
    • Vehicle Mix Analysis
    • Road Curvature
    • Months and Days of Week
    • Traffic Volume EDA
    • OSM Coverage
  • Models
    • Modelling Approach
    • Stage 1a: Traffic Volume
    • Stage 1b: Time-Zone Profiles
    • Stage 2: Collision Risk Model
    • Facility Family Split
    • Model Inventory
  • Investigations
    • Investigations overview
    • KSI atlas diagnostic
    • Staffordshire data quality
    • Temporal descriptors evaluation
    • AADF counted-only filter
    • Rank stability harness
    • Zero-calibration diagnostic
  • Outputs
    • Top-risk map
  • Tools
    • ukgeo — UK Geocoder
  • Future Work

On this page

  • Source
  • Features used
  • Why the Indoors sub-domain, not the full Living Environment domain
  • Coverage and missingness
  • Provenance
  • Role in the pipeline

English Indices of Deprivation (IoD 2025)

MHCLG small-area deprivation deciles used as LSOA-level contextual predictors, and the sub-domain choice made to avoid outcome leakage.

The Indices of Deprivation provide a relative measure of small-area deprivation across England. Open Road Risk uses three deprivation deciles as contextual predictors joined at LSOA level. They describe the area a road link passes through, not the road itself, and are not interpreted causally.

Source

  • Publisher: Ministry of Housing, Communities and Local Government (MHCLG), English Indices of Deprivation 2025, released 30 October 2025.
  • Licence: Open Government Licence v3.0.
  • Geography: Lower-layer Super Output Area (LSOA), 33,755 areas in England, each ~1,000–3,000 residents.
  • Edition note: IoD 2025 keeps the 2019 domain structure and weighting but refreshes indicators (e.g. the Indoors sub-domain now uses an EPC-based energy efficiency indicator in place of “houses without central heating”). Decile values are therefore not directly comparable to 2019.

Features used

Feature Source measure Direction
imd_decile Overall Index of Multiple Deprivation decile 1 = most deprived, 10 = least deprived
imd_crime_decile Crime domain decile as above
imd_living_indoor_decile Living Environment Indoors sub-domain decile as above

Deciles run 1 (most deprived) to 10 (least deprived), so a negative model coefficient means less-deprived areas have a lower predicted collision rate, holding other features fixed. Live coefficients and incidence-rate ratios are on the Stage 2 collision model page.

Why the Indoors sub-domain, not the full Living Environment domain

ImportantOutcome leakage

The Living Environment domain has two sub-domains. The Outdoors sub-domain is built from air quality and road traffic accidents involving injury to pedestrians and cyclists. Using it — or the combined domain — as a predictor of collision risk would feed a function of the outcome back in as an input, inflating apparent performance without reflecting any real predictive signal.

The model therefore uses the Indoors sub-domain only (housing condition and energy efficiency), which carries no road-safety construction. The overall IMD decile is retained because road-traffic accidents are one indicator among 55 across seven domains and contribute negligibly to the headline index, but this is a judgement call rather than a guarantee of zero contamination.

Coverage and missingness

  • England only. Welsh LSOAs (and the Welsh-side edges of the study area) have no IoD 2025 value. Scotland and Wales publish separate, non-comparable indices.
  • Links that fail to join to an English LSOA — Welsh edges, unmatched geometry — receive imputed decile values, flagged with an _imputed suffix in the model feature set so the imputation is visible rather than silent.
  • Because deciles are a relative national ranking, they cannot be compared across editions or against the Welsh/Scottish indices.

Provenance

All three deciles are read directly from MHCLG File 7 (File_7_IoD2025_All_Ranks_Scores_Deciles_Population_Denominators.csv), which carries the Indoors Sub-domain Decile column alongside the overall IMD and domain deciles — so the Indoors decile is taken as published, not re-derived from a sub-domain score. The feature builder (src/road_risk/features/network.py) fails fast if that column is absent, so a silently missing decile cannot pass through unnoticed.

Role in the pipeline

These are area-context predictors, in the same tier as OS Terrain 50 grade and ONS Rural-Urban Classification — they sit alongside exposure and road-geometry features in Stage 2, not in the exposure model. They are contextual signal at LSOA resolution, not evidence that deprivation causes road risk.


Next: How the sources are joined

Open Road Risk

 

Built with Quarto