Open Road Risk
  • Home
  • Project
    • Project overview
    • Current model status
    • AI-assisted development
  • Background
    • Metrics and methodology
    • Literature evidence register
  • Literature
    • Crash frequency models
    • Exposure and traffic volume
    • Spatial methods and network risk
    • Junctions and conflict structure
    • Severity modelling
    • Validation and metrics
    • Transferability and open data limits
  • Data Sources
    • Overview
    • STATS19 Collisions
    • OS Open Roads
    • AADF Traffic Counts
    • WebTRIS Sensors
    • Network Model GDB
  • Methodology
    • Methodology Overview
    • Joining the Datasets
    • Feature Engineering
    • Empirical Bayes Shrinkage
  • Exploratory Data Analysis
    • Collision EDA
    • Collision-Exposure Behaviour
    • Vehicle Mix Analysis
    • Road Curvature
    • Months and Days of Week
    • Traffic Volume EDA
    • OSM Coverage
  • Models
    • Modelling Approach
    • Stage 1a: Traffic Volume
    • Stage 1b: Time-Zone Profiles
    • Stage 2: Collision Risk Model
    • Facility Family Split
    • Model Inventory
  • Outputs
    • Top-risk map
  • Future Work

On this page

  • Repository structure
  • Documentation status
  • What the pipeline does
  • Data sources
  • Main outputs
  • Current scope
  • Known limitations
  • Site guide

Project overview

Repository structure

Path Purpose
quarto/ Quarto website source.
notebooks/ (Local only; ignored by git) Exploratory analysis and historical development work. Some content is superseded by QMD pages.
src/ Python package and modelling pipeline code.
data/ Data provenance and folder structure (actual data is excluded from git).
docs/ Internal technical notes and data quality documentation.
reports/ Detailed analysis reports and validation summaries.
config/ Project configuration and settings.
tests/ Automated tests for pipeline components.
todo/ Planning notes and active task tracking.

Documentation status

The .qmd pages located in the quarto/ directory serve as the canonical public documentation for this project.

Note that while a notebooks/ directory may exist in the local workspace for exploratory and historical analysis, it is excluded from source control. Internal documentation and historical notes found in docs/ or reports/ provide additional context but should be treated as supporting material. Generated outputs and rendered artifacts are excluded from the main source tree to maintain a clean repository.

What the pipeline does

The project is organised in three main modelling stages:

Stage 1a — Traffic exposure estimation Traffic counts from AADF are used to train a model that estimates AADT for road links without direct counts.

Stage 1b — Time-zone profiles WebTRIS sensor data provides supporting information on within-day traffic structure and vehicle mix on major roads.

Stage 2 — Collision risk modelling Collision outcomes from STATS19 are modelled against exposure, road class, network structure, and contextual features to estimate relative road risk. The result is a network-wide risk layer that can be used to identify unusually risky links, compare corridors, and support downstream applications.

Data sources

This project currently draws on the following core sources:

Dataset Provider Role
STATS19 Department for Transport reported road collisions and casualty context
AADF Department for Transport observed traffic counts at count points
WebTRIS National Highways measured traffic and vehicle-mix context on major roads
OS Open Roads Ordnance Survey road link geometry and classifications
OpenStreetMap OSM contributors supplementary road attributes
MRDB DfT / OS major-road reference network
LSOA population data ONS population-density and contextual features

Main outputs

The pipeline is designed to produce:

  • estimated traffic exposure for uncounted roads
  • link-level risk scores across the network
  • residual or excess-risk views showing where observed risk is higher than expected
  • corridor- and area-level summaries for applied use cases

Current scope

The project began as a Yorkshire pilot and is being used to test whether an open-data workflow can support full-network safety performance modelling. It is intended as: - a methodological prototype - a decision-support and analysis tool - a basis for more focused applications such as corridor screening and local safety prioritisation

It is not: - a real-time traffic management system - a causal intervention model - a definitive national risk product without wider validation

Known limitations

Important limitations include:

  • STATS19 reflects reported collisions, not all collisions
  • direct traffic counts are sparse outside major roads
  • WebTRIS only covers the National Highways network
  • some road attributes (e.g. speed limits, lanes, lighting) are incomplete
  • risk estimates are only as good as the joins and assumptions behind them

These are discussed in more detail throughout the site.

Site guide

The site is organised around the logic of the pipeline:

  • Data Sources — what each dataset contains and what it can and cannot tell us
  • Methodology — how sources are joined and transformed into modelling inputs
  • Analysis — model behaviour, outputs, and exploratory evaluation
  • Future Work — research questions and extensions the pipeline can support
  • API Reference — code structure and module-level documentation

A good place to start is with the source pages for STATS19, AADF, and WebTRIS, then move to the methodology pages on joining and feature engineering. For possible extensions, Future Work collects research questions that are not in the active backlog but are natural next uses of the same road-link, exposure, and collision-risk infrastructure.

Open Road Risk

 

Built with Quarto