Stage 1a estimates total daily traffic (AADT) for every road link. This is sufficient for computing collision rates — collisions per million vehicle-kilometres — but it treats all hours of the day as equivalent. In practice, risk varies significantly by time of day: peak-hour traffic is denser, faster, and involves more interactions than overnight flow.
Stage 1b estimates the shape of the daily traffic profile for every road link: what fraction of daily traffic flows in each time band? Combined with the Stage 1a AADT estimate, this gives per-hour flow rates for any road in the network.
2 Time bands
WebTRIS annual reports provide cumulative flow totals for four hour windows. Differencing these gives four non-overlapping time bands:
Code label
True period
Hours
core_daytime_frac
07:00–18:59
12
shoulder_frac
06–07 + 19–22 (mixed shoulder)
4
late_evening_frac
22:00–24:00
2
overnight_frac
00:00–06:00
6
The model predicts the fraction of daily traffic in each band. Absolute per-hour flow is then reconstructed as:
from pathlib import Pathimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom road_risk.config import _ROOT as ROOTwebtris = pd.read_parquet(ROOT /"data"/"processed"/"webtris"/"webtris_clean.parquet")profiles_path = ROOT /"data"/"models"/"timezone_profiles.parquet"profiles = pd.read_parquet(profiles_path) if profiles_path.exists() else pd.DataFrame()road = pd.read_parquet(ROOT /"data"/"features"/"road_link_annual.parquet")road_class_col =next( (c for c in ["road_classification", "road_type"] if c in road.columns), None)print(f"WebTRIS training sites : {webtris['site_id'].nunique():,} sites × "f"{webtris['year'].nunique()} years = {len(webtris):,} rows")print(f"Profile estimates : {len(profiles):,} rows"ifnot profiles.emptyelse"Profile estimates : not found — run --stage profile")
WebTRIS training sites : 5,948 sites × 3 years = 15,011 rows
Profile estimates : 21,675,570 rows
4 Training data: WebTRIS time-zone fractions
The profile model is trained on WebTRIS sensors — National Highways sites on motorways and major A-roads. For each site×year, the four time-band fractions are computed from the measured flow windows and used as prediction targets.
print("Peak–offpeak ratio (measured sites):")print(wt["core_overnight_ratio"].describe(percentiles=[0.1,0.25,0.5,0.75,0.9]).round(2).to_string())plt.figure(figsize=(7, 4))plt.hist(wt["core_overnight_ratio"].clip(0, 30), bins=50, edgecolor="none")plt.xlabel("Peak flow/hr ÷ offpeak flow/hr")plt.ylabel("Count")plt.title("Peak–offpeak ratio distribution (WebTRIS sites)")plt.axvline(wt["core_overnight_ratio"].median(), color="red", linestyle="--", linewidth=1)plt.tight_layout()plt.show()
Peak–offpeak ratio (measured sites):
count 15003.00
mean 8.00
std 3.74
min 1.07
10% 4.81
25% 5.68
50% 7.07
75% 9.14
90% 12.44
max 91.56
5 Model performance
Five models are trained (one per target) using HistGradientBoostingRegressor with GroupKFold cross-validation grouped by site_id, preventing the same sensor from appearing in both train and validation folds.
Features: location (lat/lon), log(AADT), network betweenness, distance to major road, population density, year, COVID flag.
Note
Target
CV R²
Interpretation
core_daytime_frac
~0.65
Strong — peak shape driven by road function and location
shoulder_frac
~0.63
Strong — evening pattern correlates with urban density
overnight_frac
~0.53
Moderate — night freight patterns are more variable
late_evening_frac
~0.46
Weakest — short transition band with mixed signal
hgv_core_daytime_frac
~0.63
Strong — HGV peak timing is structurally driven
The MAE for all fraction targets is < 0.03, meaning the model predicts daily traffic fractions to within about ±3 percentage points on average.
6 Estimated profiles across the full network
ifnot profiles.empty: latest = profiles[profiles["year"] == profiles["year"].max()].copy()print(f"Year {latest['year'].iloc[0]} — {len(latest):,} link estimates\n") frac_cols = [c for c in fracs if c in latest.columns]print(latest[frac_cols + ["core_overnight_ratio"]].describe( percentiles=[0.1, 0.25, 0.5, 0.75, 0.9] ).round(3).to_string())
ifnot profiles.empty: fig, axes = plt.subplots(2, 2, figsize=(10, 7)) titles = ["Peak (12h)", "Pre-peak (4h)", "Pre-offpeak (2h)", "Offpeak (6h)"]for ax, col, title inzip(axes.flat, fracs, titles):if col in latest.columns: ax.hist(latest[col].dropna(), bins=50, edgecolor="none", alpha=0.7, label="All links (estimated)")if col in wt.columns: ax.hist(wt[col].dropna(), bins=50, edgecolor="none", alpha=0.5, label="WebTRIS measured") ax.set_title(title) ax.set_xlabel("Fraction of daily traffic") ax.legend(fontsize=8) plt.suptitle("Estimated fractions — full network vs measured WebTRIS", y=1.01) plt.tight_layout() plt.show()
7 Profile variation by road class
Roads with different functions show distinct daily profiles. Motorways have high peak fractions (commuter + freight); rural unclassified roads have flatter profiles. The model captures this through road classification and network betweenness.
Median time-zone fractions by road class:
core_daytime_frac overnight_frac core_overnight_ratio
road_classification
Classified Unnumbered 0.814 0.044 9.112
Not Classified 0.813 0.045 9.045
Unknown 0.810 0.045 8.947
Unclassified 0.806 0.045 8.932
B Road 0.802 0.048 8.384
A Road 0.793 0.050 7.945
Motorway 0.782 0.056 6.975
8 Reconstructed per-hour flows
Using Stage 1a AADT estimates and Stage 1b fractions, per-hour flow rates can be computed for any link in the network.
ifnot profiles.empty: flow_cols = [c for c in ["flow_ph_core_daytime", "flow_ph_shoulder", "flow_ph_late_evening", "flow_ph_overnight" ] if c in profiles.columns]if flow_cols: sample_yr = profiles[profiles["year"] == profiles["year"].max()]print("Per-hour flow estimates — full network (vehicles/hour):\n")print(sample_yr[flow_cols].describe( percentiles=[0.25, 0.5, 0.75, 0.9, 0.99] ).round(1).to_string())
The current production collision model uses total AADT as its exposure denominator. The per-hour flow rates from Stage 1b are produced for diagnostics and future temporal exposure weighting. A future Stage 2 model could use them to distinguish:
Roads where most collisions occur during peak hours (high peak exposure)
Roads with disproportionate night-time collisions (relative to low offpeak flow)
The core_overnight_ratio is also a candidate model feature, capturing whether a road functions primarily as a commuter route or serves mixed/freight traffic.
WebTRIS training data is concentrated on motorways and major A-roads. Profiles for unclassified roads are extrapolated from location and network position rather than directly observed — the model assumes that road function and urban context are the dominant drivers of profile shape.
The pre-offpeak band (23:00–01:00) has the weakest CV R² (~0.46) due to its short duration and mixed signal. Predictions for this band should be interpreted cautiously.
Year-on-year variation in profiles (e.g. COVID impact on commuting patterns) is captured through the year_norm and is_covid features but may not fully reflect rapid behavioural shifts.