AADF Counted-Only Filter

Adopted; Estimated-weighted alternative rejected as methodologically arbitrary.

Should Stage 1a train on directly-counted AADF rows only, or keep DfT Estimated rows with a downweighting scheme?

Decision register entry: 2026-04-19 — AADF counted-only filter — adopted; weighted-Estimated alternative rejected

Question

Should Stage 1a (AADT estimation) train on directly-counted AADF rows only, or should DfT Estimated rows be retained with a downweighting scheme?

Method

The comparison treated AADF provenance as part of the Stage 1a target definition. The pre-filter target mixed directly counted AADF observations with DfT Estimated rows. The adopted test filtered Stage 1a training and holdout validation to count points with at least one estimation_method == "Counted" observation in 2015-2024, then compared:

GroupKFold cross-validation performance;
local holdout performance;
spatial holdout performance;
the count-point coverage removed by the filter;
road-class and regional skew in the removed points.

The rejected alternative was to keep DfT Estimated rows but downweight them. That would preserve more count-point locations, but DfT does not publish uncertainty weights for its interpolation scheme, so the weighting would be a project judgement rather than an observed reliability measure.

Result

Counted-only training was adopted. The filter drops 1,288 of 14,193 count points (9.1%) that are always Estimated across the 2015-2024 training window, leaving 12,905 directly counted training count points.

Measure	Before filter	Counted-only
GroupKFold CV R²	~0.72	~0.83
Local holdout R²	0.776	0.832
Spatial holdout R²	0.707	0.788

The improvement should be read as a cleaner, more honest validation target rather than proof that the model became intrinsically stronger. DfT’s smoothed carry-forward estimates no longer contaminate the learning signal or holdout target. Log-MAE is slightly higher on counted-only holdout, which is expected because direct measurements retain real traffic variance that DfT-estimated rows partly smooth away.

The removed points are not completely neutral. Dropped points skew Major versus Minor (11.2% vs 4.6%), so the counted-only training set is modestly less representative of Major-road conditions than a raw AADF training set would be. Regional loss is broadly uniform, with Wales higher at 17% on a small sample.

The filter affects only the learning signal. The fitted Stage 1a model is still applied to every OS Open Roads link for every AADF year, so downstream exposure coverage remains unchanged.

Limitations

Direct counts are noisier than DfT-estimated rows, so counted-only validation is a harder target even though it is methodologically cleaner.
The filter removes a larger share of Major-road count points than Minor-road count points, leaving the counted-only target modestly less representative of raw AADF Major-road conditions.
Wales has a higher Estimated-only share, but on a small sample. This was treated as consistent with an edge-of-study-area effect and was not investigated further.
The decision does not add uncertainty propagation from Stage 1a AADT estimates into Stage 2 collision modelling.
Reopen if DfT publishes methodology that makes the Counted/Estimated distinction less load-bearing, or if a specific deliverable requires coverage on roads where only Estimated AADF exists.

Question

Method

Result

Limitations

Related artefacts