Open Road Risk Project Overview

Repository structure

Path	Purpose
`quarto/`	Quarto website source.
`notebooks/`	(Local only; ignored by git) Exploratory analysis and historical development work. Some content is superseded by QMD pages.
`src/`	Python package and modelling pipeline code.
`data/`	Data provenance and folder structure (actual data is excluded from git).
`docs/`	Internal technical notes and data quality documentation.
`reports/`	Detailed analysis reports and validation summaries.
`config/`	Project configuration and settings.
`tests/`	Automated tests for pipeline components.
`todo/`	Planning notes and active task tracking.

Documentation status

The .qmd pages located in the quarto/ directory serve as the canonical public documentation for this project.

Note that while a notebooks/ directory may exist in the local workspace for exploratory and historical analysis, it is excluded from source control. Internal documentation and historical notes found in docs/ or reports/ provide additional context but should be treated as supporting material. Generated outputs and rendered artifacts are excluded from the main source tree to maintain a clean repository.

What the pipeline does

The project is organised in three main modelling stages:

Stage 1a — Traffic exposure estimation Traffic counts from AADF traffic counts are used to train a model that estimates AADT for road links without direct counts.

Stage 1b — Time-zone profiles WebTRIS sensor data provides supporting information on within-day traffic structure and vehicle mix on major roads.

Stage 2 — Collision risk modelling Collision outcomes from STATS19 collision data are modelled against exposure, road class, network structure, and contextual features to estimate relative road risk. The result is a network-wide risk layer that can be used to identify unusually risky links, compare corridors, and support downstream applications.

Data sources

This project currently draws on the following core sources:

Dataset	Provider	Role
STATS19	Department for Transport	reported road collisions and casualty context
AADF	Department for Transport	observed traffic counts at count points
WebTRIS	National Highways	measured traffic and vehicle-mix context on major roads
OS Open Roads	Ordnance Survey	road link geometry and classifications
OpenStreetMap	OSM contributors	supplementary road attributes
OS Terrain 50	Ordnance Survey	elevation-derived grade features
LSOA population estimates	ONS	population-density features
2021 Rural-Urban Classification	ONS	urban/rural context at LSOA level
English Indices of Deprivation 2025	MHCLG	deprivation context at LSOA level

Main outputs

The pipeline is designed to produce:

estimated traffic exposure for uncounted roads
link-level risk scores across the network
residual or excess-risk views showing where observed risk is higher than expected
corridor- and area-level summaries for applied use cases

Current scope

The project began as a Yorkshire pilot and is being used to test whether an open-data workflow can support full-network safety performance modelling. It is intended as: - a methodological prototype - a decision-support and analysis tool - a basis for more focused applications such as corridor screening and local safety prioritisation

It is not: - a real-time traffic management system - a causal intervention model - a definitive national risk product without wider validation

Known limitations

Important limitations include:

STATS19 reflects reported collisions, not all collisions
direct traffic counts are sparse outside major roads
WebTRIS only covers the National Highways network
some road attributes (e.g. speed limits, lanes, lighting) are incomplete
risk estimates are only as good as the joins and assumptions behind them

These are discussed in more detail throughout the site.

Site guide

The site is organised around the logic of the pipeline:

Data Sources — what each dataset contains and what it can and cannot tell us
Methodology — how sources are joined and transformed into modelling inputs
Analysis — model behaviour, outputs, and exploratory evaluation
Future Work — research questions and extensions the pipeline can support

A good place to start is with the source pages for STATS19 collision data, AADF traffic counts, and WebTRIS sensor data, then move to the methodology pages on joining road safety datasets and feature engineering. For possible extensions, Future Work collects research questions that are not in the active backlog but are natural next uses of the same road-link, exposure, and collision-risk infrastructure.