Future Work for Open Road Risk

Overview

This page catalogues research questions and analyses that the road_risk_analysis pipeline can support but which aren’t in the active backlog (see the repository TODO.md). It has two purposes:

For future work on this repo — by me, collaborators, or contributors. Each entry has enough context to pick up without reconstructing the reasoning.
As a statement of platform scope — most UK road-safety research is regionally narrow (M25, single county) and methodologically narrow (one hypothesis per paper). This pipeline is designed to support multiple research questions on a common open-data substrate with consistent methodology. The questions below illustrate what that scope makes possible.

Entries are not ranked against each other. They’re organised by theme.

Scope note

The pipeline from raw DfT / OS / OSM data through to modelled outputs at OS Open Roads link-year grain is reusable infrastructure. The research questions below illustrate what that substrate supports beyond the headline risk ranking. Reproducibility documentation and contribution paths are in active development in the repository backlog.

Known platform gaps: hourly/daily temporal grain not supported at Stage 2; SCRIM and pavement-friction data not available; OSM speed-limit coverage is only partial (56.4% overall, 59.4% on Unclassified links) and lane/surface tags are much sparser; DfT-estimated AADF rows are excluded from Stage 1a training (see Stage 1a: Traffic Volume); Network Model GDB coverage is SRN-only.

To adapt the pipeline for a new research question: - Fork and reproduce the baseline pipeline. - Identify the grain your question needs. - For a new target, subset or reweight the collision data and re-run Stage 2. - For a new feature, add to network_features.py and check coverage before inclusion. - For a new analysis, operate on pipeline outputs without modifying the core model. - Use the evaluation harness (5-seed rank stability) to compare your variant against baseline on identical held-out links.

Theme 1: Alternative ranking targets

Variants on “what should the production target be?” The current model ranks by expected all-injury collision count. Different stakeholders might want different targets. Each entry is a parallel Stage 2 training run rather than a replacement of the main model.

Weather warning analysis — temporal rank stability

What it is: Use Met Office historical weather warnings (date × region) or ERA5-threshold-reconstructed equivalents to test whether the top-1% risk ranking holds up under extreme conditions. Diagnostic analysis, not a Stage 2 feature — link-year aggregation blurs the temporal signal that makes weather warnings predictive.

Why it’s interesting: Literature consistently identifies weather as a strong crash-risk predictor, but at the per-crash or hourly grain. At link-year grain, the honest use is as a robustness check: does the model’s top-1% list reshuffle under warnings? Either answer is useful — a shift identifies conditionally-risky links the annual model misses; stability demonstrates the ranking is robust across conditions.

Why not now: Scope-creep relative to current backlog. Requires the 5-seed stability harness (see TODO.md → Queued tasks) to be in place first, otherwise the analysis conflates warning-effect with seed-effect. Historical Met Office warnings may be hard to obtain cleanly; ERA5- threshold reconstruction is a fallback.

Good starting point for someone else: - Check data availability for Met Office historical warnings 2015–2024 (GB regional). May require direct correspondence with Met Office or an archive request. - If unavailable, use ERA5-Land hourly data thresholded to amber-equivalent events (e.g. rainfall >20mm/day, wind gusts >50mph). - Join to STATS19 collisions by date × region. - Report top-1% Jaccard stability and rank correlation, warning vs normal days, by warning type (snow/rain/wind/fog).

Related: Temporal risk modelling via Stage 1b would be a more thorough treatment; this is the cheaper diagnostic version.

Severity-aware risk ranking (expected KSI)

What it is: Replace all-injury collision count as the production target with severity-weighted count — fatal-only, KSI (Killed or Seriously Injured), or a social-cost weighting. Rank links by expected KSI rather than expected all-injury collisions.

Why it’s interesting: For intervention prioritisation, harm matters more than count. A link with many slight-injury collisions may rank below a link with one fatality under severity weighting. Relevant UK literature includes two-stage mixed multivariate frequency-severity modelling on UK motorway data (see Wang/Quddus/Ison-era work — verify specific citations before use; see note in TODO.md → Citation verification). FHWA/HSM methodology also treats severity-weighted screening as standard for intervention-led deployments.

Why not now: Unclear whether stakeholders want severity-weighted ranking. Some safety-team workflows are anchored on all-injury counts for operational reasons (council reporting, consistency with DfT summary statistics). Also: KSI counts at link grain are sparse — likely requires shrinkage or composite social-cost weighting rather than a fatal-only cut. Decision is partly user-research, not just modelling.

Good starting point for someone else: - Scope conversation: which severity weighting matches the end-use case — local authority safety teams, strategic road operators, or wider road-safety stakeholders? - DfT publishes a social-cost-per-casualty table alongside road safety statistics — use those weights as a defensible starting point. - Fit a parallel Stage 2 model with DfT social-cost weights as target. - Compare ranking to current all-injury ranking; identify links that shift substantially between the two.

Related: Facility-family split (see TODO.md → Queued tasks) would help here — severity modelling is noisier on minor roads where KSI events are rare, and a per-family model could use different severity treatments per class.

Fatal-only collision modelling

What it is: Train a separate Stage 2 model with fatal_count as target rather than all-injury collision_count. Either a direct Poisson/NB fit on fatal counts, or a hurdle model separating “any fatal?” from “how many fatal given any?”.

Why it’s interesting: Fatal collisions are substantively different from slight-injury collisions — different mechanisms (speed, vehicle mass, occupant age), different road environments (higher-speed roads disproportionately), different policy implications. A fatal-only model would identify a different set of high-risk links and could be used as a fatality-focused screening layer alongside the main ranking. The methodology page notes this is not yet implemented.

Why not now: Fatal counts are sparse at link grain — 7,402 fatals across 2.17M links over 10 years means most links have zero fatals even over the full window. Direct modelling will be noisy without shrinkage or hierarchical structure. Better approached after EB shrinkage is in place (see TODO.md → Queued tasks) so sparse fatal counts benefit from shrinkage toward facility-family averages, and after facility-family split so motorway-specific fatal patterns don’t blur with unclassified-road patterns.

Good starting point for someone else: - After EB shrinkage and facility-family split are in place, add fatal_count target as a parallel Stage 2 model. - Consider a hurdle approach: fit “P(any fatal in link-year)” as a logistic model, then “E(fatals | any fatal)” as a truncated Poisson. Avoids zero-inflation problems at high road-class levels. - Output a parallel fatal_risk_percentile alongside the main ranking. Compare which links appear in both top-1% lists vs which are divergent.

Related: Severity-aware ranking (above) is the composite alternative — fatal-only is the purist version of the same idea.

Vulnerable-road-user-weighted target

What it is: Weight the Stage 2 target by casualty type rather than treating all collisions equally — pedestrian, cyclist, and motorcycle casualties weighted higher than car-occupant. Produces a ranking that prioritises links where vulnerable users are at risk, not just where total collision counts are high.

Why it’s interesting: UK road safety strategy (Vision Zero, active travel priorities) explicitly prioritises vulnerable road users. A ranking that gives equal weight to a motorway rear-end collision and a pedestrian fatality doesn’t match that strategic frame. STATS19 has casualty-type detail per collision that this project currently doesn’t use beyond severity.

Why not now: Requires casualty-level data joined to each collision (STATS19 Casualties.csv table), then aggregated to link-year with weights. Moderate ingestion work and a weighting choice which is itself a policy question, not a modelling one. Justifiable only after the base model structure is settled — adding this on top of an unsettled model makes attribution impossible.

Good starting point for someone else: - Ingest STATS19 Casualties.csv (not just the collision-level table currently used) and join to collisions by accident_index. - Per collision, compute a vulnerability-weighted severity score: e.g. pedestrian_serious = 3.0, cyclist_serious = 2.5, motorcycle_serious = 2.0, car_occupant_serious = 1.0, scaled similarly for fatal and slight. - Aggregate to link-year as vru_weighted_casualties. - Fit as a parallel Stage 2 target; compare top-1% list to the unweighted ranking.

Related: Pairs naturally with severity-aware ranking and fatal-only modelling — all three are variants on “what should the production target actually be?”

Theme 2: Vehicle-type-specific analyses

The data supports questions the aggregate risk model doesn’t directly answer. These analyses use the existing pipeline as substrate but subset or reframe the data to produce different outputs for different audiences.

Motorcycle-specific risk layer

What it is: Produce a separate risk ranking for motorcycle-involved collisions. Subset STATS19 collisions to those involving a motorcycle, re-run Stage 2 on the subset, and identify where motorcycle risk concentrates.

Why it’s interesting: Motorcycle safety is a distinct UK road-safety policy area with different risk drivers than general crash risk — rural A-roads, seasonal/weather exposure, specific corridor patterns. A parallel ranking targeted at motorcycle safety would be operationally useful for audiences concerned with motorcyclist casualty reduction: local authorities, safer-roads partnerships, motorcycling groups.

Why not now: Uses the main pipeline as substrate, so depends on main model being settled. Also raises the exposure question — using general AADT as exposure for motorcycle risk is defensible but imperfect. DfT AADF distinguishes two_wheeled_motor_vehicles in its vehicle breakdown, so motorcycle AADT is partially recoverable as exposure.

Good starting point for someone else: - Use two_wheeled_motor_vehicles from AADF as a motorcycle-specific exposure proxy at count-point grain; let Stage 1a predict motorcycle AADT for all links. - Subset STATS19 to collisions with at least one motorcycle casualty or vehicle. - Fit a parallel Stage 2 model with motorcycle-AADT offset and motorcycle-collision target. - Report ranking, top-1% list, and compare against the main risk ranking to identify motorcycle-specific hotspots.

Cyclist and pedestrian risk without direct exposure

What it is: Produce risk rankings for cyclist and pedestrian casualties despite not having direct active-travel exposure data. Use motor-vehicle AADT as the denominator with the caveat that this measures “cyclist casualty rate per motor-vehicle-km” rather than “cyclist casualty rate per cyclist-km” — still operationally useful for identifying hotspots, but conceptually different from the main ranking.

Why it’s interesting: Vision Zero and active-travel policy priorities target vulnerable road users. The aggregate risk ranking doesn’t surface cyclist or pedestrian hotspots clearly because those casualties are a small fraction of total injuries. A dedicated ranking would make cyclist and pedestrian risk concentration visible to policy audiences.

Why not now: Without cyclist/pedestrian exposure data, the denominator is imperfect. Strava Metro is parked in TODO.md on licensing grounds. DfT publishes some LSOA-level active travel statistics that could improve the denominator but aren’t in the pipeline. Analysis is scoped but caveats matter.

Good starting point for someone else: - Subset STATS19 collisions to those with pedestrian or cyclist casualties. - Use motor-vehicle AADT as interim exposure, with explicit methodology note that this measures motor-vehicle-adjusted rate, not per-active- traveller rate. - Fit parallel Stage 2 models for cyclist-casualty-rate and pedestrian-casualty-rate. - If DfT LSOA-level active travel statistics can be joined, improve the denominator and report comparison.

Related: Links to the parked Strava Metro entry in TODO.md — if that ever becomes available, this analysis becomes the use case.

Vehicle mix as risk modifier

What it is: Test whether the mix of vehicles on a road (already captured as hgv_proportion) modifies risk for other vehicle types. Do cars have higher collision rates on links with high HGV share? Do motorcycles have different risk profiles on roads with heavier traffic mix?

Why it’s interesting: Vehicle-mix effects are mentioned in crash- frequency literature but the project currently treats hgv_proportion as a single feature for aggregate risk. A subgroup analysis would test whether the feature behaves differently by casualty-vehicle type, with implications for intervention design (segregation, speed differentials).

Why not now: Subgroup analyses on rare-event data are statistically demanding. Likely requires facility-family split to be in place so that within-family HGV-mix effects can be estimated without confounding by road class.

Good starting point for someone else: - After facility-family split is in place, fit interaction terms between hgv_proportion and casualty-vehicle-type (car, motorcycle, cyclist) in the Stage 2 model. - Report whether interactions are meaningful, per family. - If motorcycle × HGV-proportion interaction is significant, that’s a policy-relevant finding (motorcycle safety on HGV-heavy routes).

Theme 3: Structural modelling changes

Major refactors of the Stage 2 modelling approach — not feature additions, but changes to the grain, structure, or estimation method of the core model.

Temporal disaggregation of Stage 2

What it is: Move Stage 2 from link × year grain to link × year × time-bucket grain (e.g. month, day-of-week × hour block). Would use Stage 1b temporal profiles as exposure-modifier infrastructure. Major refactor of the collision model; not a feature addition.

Why it’s interesting: Crash risk varies sharply by time of day and day of week in ways the current annual model cannot represent. Friday evenings and Saturday nights carry disproportionate risk; peak commute hours concentrate motorway collisions; school-run windows concentrate urban collisions near schools. A temporally disaggregated model would produce rankings that identify when a link is risky, not just which links are risky overall.

Why not now: Multi-session refactor requiring (a) Stage 1b to move from diagnostic to production output, (b) temporal joins to be rebuilt in Stage 2, (c) potentially different modelling approach entirely (hierarchical Poisson with time random effects, or stratified per time-bucket model). Also: today’s rank stability work suggests the current model has largely saturated what link-level features can explain — temporal is a plausible candidate for meaningful predictive lift beyond the current ceiling, but sizing that claim requires research.

Good starting point for someone else:

Deep-research pass on temporal disaggregation approaches in UK road safety literature. Specifically: how other STATS19-based analyses have handled time-of-day effects at link grain; whether hierarchical modelling or stratified models are preferred; what sample-size constraints per time-bucket apply.
Scope out what Stage 1b would need to produce to feed temporal Stage 2.
Design doc before implementation. Temporal work is methodology- heavy enough that a prompt can’t carry it.

Related: Would pair naturally with weather warning analysis (above) — temporal grain is the natural level for weather effects to manifest.

Theme 4: Exposure and network-data extensions

Larger platform extensions that would improve the denominator or add new open-road network signal. These are not active backlog items because they need new data infrastructure, non-trivial validation, or access to datasets outside the current open-data stack.

Synthetic flow estimation via Census OD routing

What it is: Use Census Origin-Destination data routed across the network graph (via pgRouting or equivalent) to estimate per-link demand on minor and unclassified roads where AADF has no measurement coverage.

Why it’s interesting: Stage 1a currently estimates AADT on uncounted roads via proxy features (road class, location, betweenness). OD-routed synthetic flow is closer to a direct estimate of demand and could materially improve Stage 1a accuracy on minor roads — exactly where AADF coverage is weakest. Methodology used by the Propensity to Cycle Tool (Lovelace et al., Leeds ITS) for cycling demand; same approach applies to motor vehicles.

Why not now: 2-4 weeks of focused work to integrate routing infrastructure (pgRouting needs PostGIS; stplanr needs R). Adds non-trivial dependencies. Validation against AADF on counted minor roads required. Lower priority than current methodology experiments.

Good starting point for someone else: - Identify Census OD year matching study window (2011 or 2021). - Set up pgRouting on the OS Open Roads network. - Route OD pairs and accumulate per-link contributions. - Validate against AADF on counted minor roads. - Compare resulting Stage 1a accuracy with and without synthetic flow feature.

Related: PCT methodology and any recent UK STATS19 + active travel papers from Leeds ITS.

NTS-derived active travel exposure

What it is: Use National Travel Survey per-capita mileage by mode, demographic, and region, combined with LSOA population demographics, to estimate pedestrian-hours and cyclist-km denominators for active travel risk analysis.

Why it’s interesting: The cyclist/pedestrian risk entries in this doc note that motor-vehicle AADT is an imperfect denominator for active travel. Strava Metro is parked on licensing grounds. NTS is open and the methodology of using it as an exposure denominator is established in road safety literature.

Why not now: Resolution caveat — NTS gives regional/demographic averages, not link-specific exposure. So it supports LSOA-level analysis of pedestrian/cyclist risk but not link-grain. Different output grain from the rest of the pipeline; would be a parallel layer not an integrated feature.

Good starting point for someone else: - Pull NTS data from the gov.uk archive. - Compute per-capita pedestrian-hours and cyclist-km by demographic. - Join to LSOA population data to estimate exposure per LSOA. - Apply as denominator for cyclist/pedestrian collision rate at LSOA grain. - Compare hotspot identification at LSOA grain to motor-vehicle-AADT denominator approach.

Related: Builds on the cyclist/pedestrian risk entries already in this doc.

Junction-density features from MasterMap RoadNode

What it is: Count of road-node intersections within a spatial buffer per link, derived from OS MasterMap Highways Network RoadNode data. Likely a strong predictor for non-motorway collisions where most events are at or near junctions.

Why it’s interesting: Junctions concentrate collision risk on minor and A-roads. Current network features (betweenness, degree_mean) capture this indirectly via graph structure but not directly as junction density. Direct measurement could materially improve Stage 2 predictions on non-motorway links. Standard feature in academic crash modelling work that has access to MasterMap.

Why not now: Requires OS MasterMap Highways access — not in the open licensing tier. Overlaps with the existing RAMI licensing entry but is methodologically distinct (network attributes vs routing extension). Both share the same access constraint.

Good starting point for someone else: - Resolve MasterMap Highways access via PSGA or equivalent route. - Compute junction-density-per-km within several buffer radii (250m, 500m, 1km) per link. - Add as Stage 2 feature; assess feature importance and predictive lift. - Compare against current betweenness/degree_mean to confirm distinct signal.

Related: RAMI licensing entry in TODO.md; both blocked by same access question.

Cross-theme note

The ideas above are not independent. Several natural combinations:

Motorcycle × weather — motorcycle crash risk is known to be highly weather-sensitive. Combining the motorcycle-specific risk layer with weather warning analysis would produce a parallel ranking that’s likely more actionable than either alone.
HGV enforcement × vehicle-mix — HGV risk distribution combined with vehicle-mix-as-risk-modifier would identify links where HGV enforcement protects not just the HGV crash rate but also other vehicle types affected by HGV presence.
VRU-weighted × cyclist/pedestrian layers — these are closely related target reframings. The VRU-weighted version produces a single composite ranking; the cyclist/pedestrian layers produce separate rankings per casualty type. Both are defensible depending on stakeholder.

These combinations are documented here rather than as separate entries because each needs the component analyses in place first.