Literature Evidence Register

Note

This is a structured, maintainer-facing evidence register for tracing extracted literature evidence into documentation, model checks, and TODOs. It is not the final narrative literature review.

Purpose and scope

This is a maintainable evidence register for Open Road Risk. It is not the final narrative literature review.

It tracks extracted papers and source files.
It records methodological relevance to Open Road Risk.
It separates current repo relevance from future research relevance.
It records provenance and extraction quality.
It supports future Quarto literature pages, repo TODOs, and model evaluation.
It should be updated append-only when new paper extractions are added.

The main source of truth is the existing extraction Markdown in literature/papers_summary/. This register does not re-read source PDFs and does not infer beyond the extraction files. Where extractions are duplicated for the same paper, each extraction file is kept as its own row so provenance across AI tools/models is preserved.

How to update this register

Add one row to the inventory table for each new extraction file.
Add or update thematic rows only where the new paper contributes evidence.
Add repo TODOs only when supported by the extraction.
Add secondary review flags where needed.
Do not rewrite existing judgements unless the new paper changes the evidence base.
Preserve previous source filenames and extraction filenames.
If importance changes because the repo changes, update current_repo_relevance but preserve future_research_relevance.
Prefer append-only edits. If a judgement changes, add a note explaining why rather than silently replacing earlier context.
Keep current implementation actions separate from future research ideas.

Extraction inventory

register_id	extraction_file	source_pdf_filename	paper_title	authors	year	paper_type	geography	road_setting	main_method_or_model	outcome_or_target	exposure_handling	spatial_unit	temporal_unit	validation_type	key_transferable_idea	current_repo_relevance	future_research_relevance	literature_review_relevance	code_actionability_now	supports_production_change	extraction_ai_tool	model_name_if_known	extraction_quality_initial_judgement	secondary_review_needed	secondary_review_reason	notes
LIT-001	paper-extraction-aguero-valverde-2008-crash-frequency-spatial-models.md	Paper08-0088RG.pdf	Analysis of Road Crash Frequency with Spatial Models	Jonathan Aguero-Valverde; Paul P. Jovanis	2008	crash-frequency SPF; spatial model comparison	US; Pennsylvania	mixed road classes	Full Bayes spatial CAR vs non-spatial NB	total crashes per segment	VMT from AADT and length; observed/assumed	PennDOT variable-length segments	5-year aggregate	in-sample model comparison	Spatial correlation can change crash-frequency parameter estimates; VMT-style exposure supports current framing	high	high	high	medium	diagnostic-only	Gemini	Gemini	high	conditional	Manual check only if quoting coefficient/table values	Duplicate source paper with LIT-002; keep both for provenance.
LIT-002	paper-extraction-aguero-valverde-2008-spatial-car-crash-frequency.md	Paper08-0088RG.pdf	Analysis of Road Crash Frequency with Spatial Models	Jonathan Aguero-Valverde; Paul P. Jovanis	2008	crash-frequency SPF; spatial autocorrelation assessment	US; Pennsylvania	rural two-lane roads	Full Bayes Poisson lognormal with CAR random effects	annual crash count per segment	AADT as free log covariate; length as fixed offset	rural road segment; intersections and ramps excluded	annual segment-year	in-sample DIC and spatial diagnostics	Test AADT elasticity and spatial residual autocorrelation rather than assume offset is always correct	high	high	high	high	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Review Table 2 before citing exact elasticity values	Strong action source for Moran’s I, residual maps, and free-AADT elasticity diagnostics.
LIT-003	paper-extraction-al-omari-2021-florida-context-classification-spf.md	Crash_Analysis_And_Development_Of_Safety_Performance_Functions_Fo.pdf	Crash Analysis and Development of Safety Performance Functions for Florida Roads in the Framework of the Context Classification System	Ma’en Mohammad Ali Al-Omari	2021	thesis; SPF; network screening	US; Florida	rural to urban context classes; road segments	Negative binomial SPFs; EB network screening	annual crash frequency by segment; KABCO/KABC/PDO variants	observed FDOT AADT; AADT plus length offset and DVMT alternatives	merged homogeneous road segments	annual average over 5 years	mainly in-sample; no holdout noted	Context-class SPFs and urban sub-linear AADT coefficients motivate stratified diagnostics	medium	high	high	medium	baseline-comparison-first	Claude	Claude Sonnet 4.6	medium	conditional	Thesis, no peer-review/holdout; check counterintuitive PSP coefficient if citing	Useful for junction density/access density and road-class stratification, not direct production change.
LIT-004	paper-extraction-baddeley-2021-analysing-point-patterns-networks.md	91405.pdf	Analysing point patterns on networks - a review	Adrian Baddeley; Gopalan Nair; Suman Rakshit; Greg McSwiggan; Tilman M. Davies	2021	methodological review; point processes	mixed/theoretical	linear networks	Network point processes; network KDE; Cox/Poisson processes	spatial point intensity on a network	traffic volume in example; point-process intensity not traffic exposure	exact point coordinates on continuous linear network	mostly static; spatio-temporal noted	methodological review; no predictive validation	Segment aggregation can hide point-level clustering; avoid ordinary planar KDE for network crashes	medium	high	high	low	diagnostic-only	Gemini	Gemini	high	conditional	Review methods before any spatstat implementation	Critiques link-year aggregation but does not invalidate current pipeline.
LIT-005	paper-extraction-boulieri-2016-space-time-bayesian-severity.md	Boulieri_et_al-2016-Journal_of_the_Royal_Statistical_Society__Series_A_Statistics_in_Society.pdf	A space-time multivariate Bayesian model to analyse road traffic accidents by severity	Areti Boulieri; Silvia Liverani; Kees de Hoogh; Marta Blangiardo	2016	Bayesian severity/spatial model	England	mixed road types aggregated to wards	Bayesian hierarchical Poisson lognormal with CAR/MCAR/RW1 effects	ward-year accident counts by slight vs severe/fatal	ward traffic volume from AADF times road length; partly imputed; major-road coverage	electoral ward	annual	in-sample Bayesian model comparison and posterior checks	Severity levels can have distinct spatial structure; offset structure aligns at coarser grain	medium	high	high	medium	pilot-first	Claude	Claude Sonnet 4.6	high	conditional	Ward-level and MCMC scale limit direct transfer	Good documentation support for severity caveat and year-specific AADT need.
LIT-006	paper-extraction-brodersen-2010-balanced-accuracy-posterior.md	brodersen10post-balacc.pdf	The balanced accuracy and its posterior distribution	Kay H. Brodersen; Cheng Soon Ong; Klaas E. Stephan; Joachim M. Buhmann	2010	validation metric; classification methodology	not road-specific	not road-specific	Bayesian posterior distribution of balanced accuracy	binary class-label correctness	not applicable	abstract data point	not stated	cross-validation metric framework	Use balanced accuracy and posterior uncertainty for imbalanced zero/non-zero diagnostics	medium	medium	medium	medium	diagnostic-only	Gemini	Gemini 1.5 Pro	high	conditional	Manual check equations if implementing posterior intervals	Not road-safety evidence; validation reference only.
LIT-007	paper-extraction-brodersen-2010-balanced-accuracy.md	brodersen10post-balacc.pdf	The balanced accuracy and its posterior distribution	Kay H. Brodersen; Cheng Soon Ong; Klaas E. Stephan; Joachim M. Buhmann	2010	validation metric; classification methodology	not stated	not road-specific	posterior balanced accuracy estimators	binary classification correctness	not applicable	not stated	not stated	conceptual cross-validation examples	Warn against plain accuracy for rare collision/no-collision diagnostics	medium	medium	medium	medium	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	conditional	Manual check Equation 7 and MATLAB routines if implementing	Duplicate source paper with LIT-006; use to compare extraction consistency.
LIT-008	paper-extraction-chengye-2013-modelling-motorway-accidents-nb.md	Modelling Motorway Accidents using Negative Binomial Regression.pdf	Modelling Motorway Accidents using Negative Binomial Regression	Pan Chengye; Prakash Ranjitkar	2013	motorway SPF; NB regression	New Zealand; Auckland	motorway; urban/rural	Negative binomial regression; GEE considered	annual accident frequency per segment	AADT per lane and length as free log covariates	homogeneous motorway segments; ramp-defined	yearly	in-sample and temporally held-out prediction metrics	Temporal holdout and ramp context diagnostics are useful; standard NB struggles on short/zero-heavy links	medium	high	high	medium	diagnostic-only	Gemini	Gemini	high	conditional	Check Equation 10 before any exposure-specification comparison	Same source family as LIT-009 and LIT-024.
LIT-009	paper-extraction-chengye-ranjitkar-2013-motorway-nb-regression.md	Modelling_Motorway_Accidents_using_Negative_Binomial_Regression.pdf	Modelling Motorway Accidents using Negative Binomial Regression	Pan Chengye; Prakash Ranjitkar	2013	motorway SPF; feature importance	New Zealand; Auckland	motorway only	Negative binomial accident prediction model	annual accident count per motorway segment	observed AADT per lane and length as free log covariates; no formal offset	homogeneous motorway mainline segment; ramp crashes excluded	annual segment-year	2009-2010 temporal holdout after 2004-2008 training	Add temporal holdout, ramp/slip-road diagnostics, per-family overdispersion checks	high	high	high	high	baseline-comparison-first	Claude	Claude Sonnet 4.6	high	conditional	Review marginal variables due to weak 80% significance threshold	Strong TODO source for validation and motorway facility-family diagnostics.
LIT-010	paper-extraction-cronie-2019-inhomogeneous-linear-network.md	Inhomogeneous higher-order.pdf	Inhomogeneous higher-order summary statistics for linear network point processes	Ottmar Cronie; Mehdi Moradi; Jorge Mateu	2019	spatial point-process diagnostics	US example; Houston	road network example	inhomogeneous network F/G/J functions; simulations	accident point locations on linear network	no traffic exposure; spatial intensity reweighting only	point events on linear network	static one-month example	simulation/method diagnostics; no predictive validation	Distinguish exposure-normalised risk from point-pattern clustering diagnostics	low	medium	medium	low	diagnostic-only	ChatGPT	GPT-5.5 Thinking	medium	yes	Equation formatting imperfect; check before implementation	Useful for a small diagnostic pilot only.
LIT-011	paper-extraction-eckardt-2024-marked-point-process-rejoinder.md	Rejoinder on ’Marked spatial point processes_ current state and extensions.pdf	Rejoinder on ‘Marked spatial point processes: current state and extensions to point processes on linear networks’	Matthias Eckardt; Mehdi Moradi	2024	methodological discussion; marked point processes	not stated	not road-specific	marked point process summaries; K/J functions; mark correlation	point events with marks	traffic exposure not applicable; intensity is not exposure	point events on planar or linear networks	not stated	methodological discussion	Marked point-process diagnostics may explore severity/type clustering but not production ranking	low	medium	medium	low	no	ChatGPT	GPT-5.5 Thinking	high	conditional	Check formulas directly if citing	Keep as exploratory diagnostics reference.
LIT-012	paper-extraction-gilardi-2022-leeds-network-lattice-bayesian.md	jrsssa_185_3_1150.pdf	Multivariate hierarchical analysis of car crashes data considering a spatial network lattice	Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace	2022	UK network-lattice Bayesian SPF	UK; Leeds	urban/metropolitan major roads	Bayesian hierarchical Poisson INLA with ICAR/PCAR and multivariate severity	OS segment counts by severe and slight crash	length times Census-routed commuter flow as Poisson offset; estimated exposure	OS road segment	8-year aggregate cross-section	in-sample posterior predictive checks; no holdout	Direct UK support for OS segment lattice and log-offset form; balanced accuracy for sparse severity	high	high	high	high	diagnostic-only	Claude	Claude Sonnet 4.6	high	yes	Check wide Table 2 signs and Primary Road interpretation before citation	Primary UK anchor for Stage 2 documentation, but not external validation.
LIT-013	paper-extraction-gilardi-2022-multivariate-hierarchical-crashes.md	jrsssa_185_3_1150.pdf	Multivariate hierarchical analysis of car crashes data considering a spatial network lattice	Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace	2022	Bayesian spatial/severity model	UK; Leeds	urban/metropolitan	INLA Bayesian hierarchical Poisson	car crashes per street segment	length times estimated traffic flow	OS Vector OpenMap Local road segment	8-year aggregate	in-sample posterior predictive diagnostics	OS link lattice is a credible unit; balanced accuracy useful for zero/non-zero checks	high	high	high	medium	diagnostic-only	Gemini	Gemini 3.1 Pro	high	conditional	Check dodgr contraction details if replicating MAUP test	Duplicate source paper with LIT-012/LIT-014.
LIT-014	paper-extraction-gilardi-2022-network-lattice-crashes.md	jrsssa_185_3_1150.pdf	Multivariate hierarchical analysis of car crashes data considering a spatial network lattice	Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace	2022	Bayesian spatial/severity SPF	UK; Leeds	major roads including motorways, primary roads, A roads	bivariate Bayesian hierarchical Poisson models with ICAR/PCAR	slight and severe road traffic collision counts	length times estimated commuter flow as offset	OS road segment with adjacency by shared boundary	yearly available; collapsed to 8-year aggregate	in-sample posterior predictive checks	Spatial autocorrelation limitation and MAUP sensitivity tests are directly relevant	high	high	high	high	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Manual review noted in extraction for table/sign details	Richest extraction for repo actions from this paper.
LIT-015	paper-extraction-hauer-2001-eb-spf-tutorial.md	SPF_Basic_Tutorial_2001_by_Ezra_Hauer.pdf	Estimating Safety by the Empirical Bayes Method: A Tutorial	Ezra Hauer; Douglas W. Harwood; Forrest M. Council; Michael S. Griffith	2001	EB/SPF tutorial	not one geography	segments and intersections	Empirical Bayes expected crash frequency using SPF plus observed counts	expected accidents for road entity	ADT/AADT in SPF; length and years multiply expected count	road segment or intersection entity	annual	worked tutorial; no holdout	EB shrinkage should use correct overdispersion and full year-specific procedure	high	high	high	high	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Check equations if implementing exact EB changes	Primary EB methodology reference.
LIT-016	paper-extraction-huda-alkaisy-2024-lvr-network-screening.md	dot_78279_DS1.pdf	Network Screening on Low-Volume Roads Using Risk Factors	Kazi Tahsin Huda; Ahmed Al-Kaisy	2024	low-volume road network screening	US; Oregon	rural low-volume two-lane roads	OLS on log EB expected crashes; CART thresholds	EB expected crashes per 0.05-mile section	AADT covariate in one model; dropped in another; no offset due fixed length	fixed 0.05-mile sections; intersections excluded	annual aggregate	random split/high R2 on EB output; not raw-count validation	Low-volume links may need geometry-led diagnostics and careful AADT sensitivity framing	medium	high	high	medium	pilot-first	Claude	Claude Sonnet 4.6	high	yes	Check grade variable ambiguity and avoid over-reading high R2	Strong for curvature/grade diagnostics, not production thresholds.
LIT-017	paper-extraction-jayasinghe-2019-centrality-aadt.md	1-s2_0-S2215016119301128-main.pdf	A novel approach to model traffic on road segments of large-scale urban road networks	Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanam	2019	AADT estimation; traffic modelling	mixed developing-country cities	urban road networks	OLS/robust/Poisson regressions with dual-graph centrality	AADT/PCU per road segment	AADT is target; observed counts used for calibration	road segment in dual graph	annual AADT	random 80/20 validation; likely spatial leakage	Centrality features and learning curves can inform Stage 1a AADF sparsity diagnostics	high	high	medium	medium	baseline-comparison-first	Claude	Claude Sonnet 4.6	high	yes	Check low-AADT RMSE and exact final regression type	Traffic-volume paper, not collision-risk evidence.
LIT-018	paper-extraction-jayasinghe-2019-traffic-modeling-centrality.md	1-s2.0-S2215016119301128-main.pdf	A novel approach to model traffic on road segments of large-scale urban road networks	Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanama	2019	AADT estimation; traffic modelling	urban/mixed	urban road networks	centrality-based OLS/RR/Poisson traffic volume model	AADT in PCU	exposure is modelled output	road segment dual graph	yearly average daily traffic	random validation; spatial leakage risk	Stage 1a should report spatial holdouts and sensitivity to count-point sparsity	high	high	medium	medium	baseline-comparison-first	Gemini	Gemini 3.1 Pro	high	conditional	Verify centrality radius/compute feasibility before implementation	Duplicate source paper with LIT-017.
LIT-019	paper-extraction-lord-2010-crash-frequency-review.md	Lord-Mannering_Review.pdf	The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives	Dominique Lord; Fred Mannering	2010	methodological review; crash-frequency modelling	mixed	road segments/intersections across reviewed studies	review of Poisson, NB, zero-inflated, random effects, GAM, ML and Bayesian models	crash frequency over roadway units	traffic flow/length/VMT discussed across studies	road segment/intersection/other	mixed	review; no empirical validation	Use as risk checklist: overdispersion, zero-heavy counts, exposure functional form, omitted variables, spatial/temporal correlation	high	high	high	high	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	conditional	Review large tables if building full model-family comparison	Best general modelling limitations reference.
LIT-020	paper-extraction-ma-2019-xgboost-fatality.md	analyzing-the-leading-causes-of-traffic-fatalities-using-1jznp146gl.pdf	Analyzing the Leading Causes of Traffic Fatalities Using XGBoost and Grid-Based Analysis: A City Management Perspective	Jun Ma; Yuexiong Ding; Jack C. P. Cheng; Yi Tan; Vincent J. L. Gan; Jingcheng Zhang	2019	conditional severity/fatality classifier	US; Los Angeles County	mixed urban/peri-urban	XGBoost binary classifier plus grid GIS	fatal vs non-fatal crash given a crash	no traffic exposure; fatality rate not exposure-adjusted	crash record and 60x60 grid	crash-time fields included; no panel	train/test on balanced crash data; no exposure validation	Separate conditional severity/fatality from exposure-adjusted frequency; watch leakage from crash-record features	medium	medium	medium	medium	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Check unusual XGBoost learning-rate details if replicating	Do not compare fatality rate to Stage 2 risk percentile.
LIT-022	paper-extraction-michalaki-2015-motorway-accident-severity-chatgpt.md	1-s2.0-S0022437515000833-main.pdf	Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model	Paraskevi Michalaki; Mohammed A. Quddus; David Pitfield; Andrew Huetson	2015	motorway severity modelling	England	motorway	ordered logit; multilevel ordered logit; generalized ordered logit	accident severity conditional on crash	no formal exposure; time category proxy	accident record	accident-level with broad time categories	in-sample; no held-out validation	Frequency and severity are separate; post-event variables must not leak into Stage 2 predictors	medium	high	high	medium	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	conditional	Check STATS19 hard-shoulder/main-carriageway transfer	Duplicate source paper with LIT-023.
LIT-023	paper-extraction-michalaki-2015-motorway-accident-severity.md	1-s2.0-S0022437515000833-main.pdf	Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model	Paraskevi Michalaki; Mohammed A. Quddus; David Pitfield; Andrew Huetson	2015	motorway severity modelling	England	motorway hard shoulder/main carriageway	partially constrained generalized ordered logistic regression	accident severity	no explicit exposure; severity conditional on crash	location type/accident record	time-of-day/day/month categories	in-sample	HGV/off-peak/hard-shoulder diagnostics belong in severity work, not current frequency model	medium	high	high	medium	diagnostic-only	Gemini	Gemini 3.1 Pro	high	conditional	Check STATS20/STATS19 hard-shoulder encoding before diagnostics	Warns against using number of vehicles/casualties as prospective features.
LIT-024	paper-extraction-pan-2013-motorway-negative-binomial.md	Modelling Motorway Accidents using Negative Binomial Regression.pdf	Modelling Motorway Accidents using Negative Binomial Regression	Pan Chengye; Prakash Ranjitkar	2013	motorway SPF; NB regression	New Zealand; Auckland	motorway rural/urban	Poisson/NB/ZINB/GEE tested; NB selected	annual accident frequency per segment-year	observed AADT per lane plus length as free log regressors	homogeneous motorway segment-year; ramp segmentation	yearly	temporal holdout plus in-sample metrics	Facility context, ramp proximity, temporal holdout and geometry sanity checks are useful	medium	high	high	medium	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	conditional	Check table values and equations before formal citation	Duplicate source family with LIT-008/LIT-009.
LIT-025	paper-extraction-pan-2017-deep-belief-network-global-spf.md.md	1-s2_0-S2046043017300199-main.pdf	Development of a global road safety performance function using deep neural networks	Guangyuan Pan; Liping Fu; Lalita Thakali	2017	global SPF; neural model benchmark	Canada/US; multiple regions	mixed highway types	Deep Belief Network; NB benchmarks; Bayesian regularised ANN	annual crash frequency per homogeneous section-year	observed AADT and length as DBN features; NB uses log exposure variants	homogeneous road section	annual segment-year	train/test style performance; metrics mainly MAE/RMSE	DBN with MSE is not suitable for sparse count production; use NB/log-offset comparisons and minimum-length diagnostics	medium	medium	high	medium	baseline-comparison-first	Claude	Claude Sonnet 4.6	medium	conditional	DBN technical details and crash scope need checking	Has useful negative evidence against neural MSE production changes.
LIT-026	paper-extraction-poch-1996-intersection-negative-binomial.md	Negative_Binomial_Analysis_of_Intersection-Acciden.pdf	Negative Binomial Analysis of Intersection-Accident Frequencies	Mark Poch; Fred Mannering	1996	intersection SPF; NB regression	US	urban/suburban intersections	Negative binomial regression	annual accident frequency on intersection approach	turning and intersection traffic volumes as covariates; no formal offset	intersection approach	yearly	no external/held-out validation stated	Junction approach mechanisms are structurally different from link risk; use junction diagnostics/proxies	medium	high	high	medium	pilot-first	ChatGPT	GPT-5.5 Thinking	high	yes	OCR artefacts; check tables before formal literature table	Strong warning about junction under-representation.
LIT-027	paper-extraction-quddus-2010-m25-severity-ordered-response.md	road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf	Road Traffic Congestion and Crash Severity: Econometric Analysis Using Ordered Response Models	Mohammed A. Quddus; Chao Wang; Stephen G. Ison	circa 2010	motorway severity model; ordered response	UK; M25	motorway	OLOGIT/HCM/GOLOGIT/PC-GOLOGIT	ordinal crash severity given crash	no exposure; 15-minute flow as severity predictor with 30-min lag	individual crash matched to motorway segment	crash-level 15-minute traffic lag	in-sample ordered response metrics	Use 30-minute pre-crash lag if future WebTRIS crash-level work; separate frequency vs severity	medium	high	high	medium	diagnostic-only	Claude	Claude Sonnet 4.6	high	yes	Check dense Table 2 and whether junction crashes excluded	Supports severity/frequency separation and cautious congestion claims.
LIT-028	paper-extraction-roll-2026-oregon-pedestrian-spf.md	dot_89189_DS1.pdf	Developing a Pedestrian Safety Performance Function for Oregon	Josh Roll; Jason Anderson; Nathan McNeil	2026	pedestrian/intersection SPF; exposure estimation	US; Oregon	urban intersections	Poisson/NB SPFs; random forest exposure data fusion	pedestrian injury crashes per intersection-year	vehicle AADT and estimated pedestrian AADPT; both partly estimated	urban intersection	annual	10-fold CV for exposure model; SPF tables partly unreadable	Exposure-only vs full-feature baseline comparisons and CURE plots could inform Stage 2 diagnostics	low	medium	medium	medium	diagnostic-only	Claude	Claude Sonnet 4.6	medium	yes	Report tables not fully machine-readable; check SPF forms and AADPT metrics	Scope is pedestrian intersections, not link-level all-injury risk.
LIT-029	paper-extraction-wang-2009-m25-congestion-safety.md	Wang_et_al_AAP_Final_submitted1.pdf	Impact of Traffic Congestion on Road Safety: A Spatial Analysis of the M25 Motorway in England	Chao Wang; Mohammed A. Quddus; Stephen G. Ison	circa 2009	motorway SPF; congestion and spatial model	UK; M25	motorway	Poisson-lognormal, NB, CAR spatial variants	accident count per motorway segment	observed UKHA AADT and segment length as free log covariates; no offset	junction-to-junction motorway segment; junction crashes excluded	annual aggregate	in-sample Bayesian/model comparison	Motorway AADT elasticity and grade/congestion diagnostics; bearing can improve snapping QA	medium	high	high	medium	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Publication year not in document; DIC differences small	Companion to Quddus severity paper for congestion null result.
LIT-030	paper-extraction-wang-2015-investigating-safety-impacts-suburban-arterials.md	1805.06381v3.pdf	Investigating Safety Impacts of Roadway Network Features of Suburban Arterials in Shanghai, China	Xuesong Wang; Jinghui Yuan; Grant G. Schultz; Wenjing Meng	2015	zonal spatial crash model	China; Shanghai	suburban arterials	Bayesian Poisson-lognormal CAR	total crash frequency on arterials within TAZ	trip productions/attractions and arterial length as exposure proxies; no AADT	Traffic Analysis Zone	yearly	in-sample R2; no true held-out validation	Junction/signal/access-density proxies may matter, but zonal unit is low transferability	low	medium	medium	low	diagnostic-only	Gemini	Gemini 3.1 Pro	high	conditional	Betweenness computed within TAZ, not global; in-sample R2 only	Use as junction/network-complexity prompt, not model benchmark.
LIT-031	paper-extraction-ziakopoulos-2020-spatial-approaches-road-safety.md	A review of spatial approaches in road safety.pdf	A review of spatial approaches in road safety	Apostolos Ziakopoulos; George Yannis	not stated in visible metadata	spatial road-safety review	mixed	mixed	review of spatial/spatio-temporal methods	crash counts/rates/severity/hotspots across reviewed studies	mixed exposure definitions	mixed units: links, intersections, grids, zones, corridors	mixed	review; primary studies need checking for exact claims	Supports spatial validation, MAUP sensitivity, proximity/junction diagnostics and caution about production spatial models	high	high	high	high	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	yes	Year/DOI missing; check primary papers for exact numerical claims	Broad review; do not use alone to justify a production model swap.
LIT-032	paper-extraction-pew-2020-zero-inflated-crash.md	Justification_for_considering_zero-inflated_models_in_crash_frequency_analysis.pdf	Justification for considering zero-inflated models in crash frequency analysis	Timo Pew; Richard L. Warr; Grant G. Schultz; Matthew Heaton	2020	zero-inflated model comparison; Bayesian hierarchical count modelling	US; Utah	signalised intersections statewide (urban and rural)	Bayesian hierarchical ZIP; ZINB; NB-Lindley; MCMC via JAGS	annual injury and fatal crash count per intersection	entering vehicles per day as standardised covariate — no formal offset	signalised intersection	annual (2014–2017 fitting; 2018 held out)	temporal holdout (2018); Bayesian chi-squared goodness-of-fit; posterior predictive zero check; WAIC	ZINB improvement over Poisson driven mainly by overdispersion parameter (π ≈ 0); NB GLM with offset is the priority diagnostic step, not full zero-inflation	high	high	high	high	baseline-comparison-first	Claude	Claude Sonnet 4.6	high	conditional	Verify Table A1 π ≈ 0 finding in original PDF before citing; check prior sensitivity on Beta(0.15,1)	Critical nuance: π posterior mean ≈ 0 in both ZIP and ZINB — improvement over Poisson is from ϕ dispersion, not zero-inflation. Intersection unit not link; counts are much higher than Open Road Risk link-years. No exposure offset — does not challenge Open Road Risk offset design.
LIT-033	paper-extraction-mahoney-2023-spatial-cv.md	ASSESSING_THE_PERFORMANCE_OF_SPATIAL_CROSS-VALIDATION.pdf	Assessing the Performance of Spatial Cross-Validation Approaches for Models of Spatially Structured Data	Michael J Mahoney; Lucas K Johnson; Julia Silge; Hannah Frick; Max Kuhn; Colin M Beier	2023	spatial CV methodology; simulation study	simulation (no specific geography)	not road-specific	random forest on simulated spatially structured continuous outcome; five CV method comparisons	simulated continuous outcome (not a crash count)	not applicable	regular 50×50 grid cells	not applicable	cross-landscape prediction as external reference; 100 simulated landscapes	V-fold CV is severely optimistic for spatially autocorrelated data; spatial clustering CV with exclusion buffer ≈ autocorrelation range is the most practical improvement; current grouped-link split does not enforce spatial separation	high	high	medium	high	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Specific buffer sizes (25–41% of grid length) are simulation-specific and do not transfer directly to road network; must estimate autocorrelation range from Stage 2 residuals first	Not a road safety paper. Simulation uses continuous Gaussian outcome; zero-heavy count generalisation assumed but not tested. BLO3 performs poorly despite large buffers — do not assume larger buffer always helps. Regular grid assumption does not match OS Open Roads geometry.
LIT-034	paper-extraction-gao-2024-stzitd-gnn.md	Uncertainty-Aware_Probabilistic_Graph_Neural_Networks_for_Road-Level.pdf	Uncertainty-Aware Probabilistic Graph Neural Networks for Road-Level Traffic Crash Prediction	Xiaowei Gao; Xinke Jiang; Dingyi Zhuang; Huanfa Chen; Shenhao Wang; Stephen Law; James Haworth	2024	probabilistic GNN; zero-inflated Tweedie; road-level crash prediction	UK; London (Lambeth; Tower Hamlets; Westminster)	urban road segments	GRU temporal encoder + GAT spatial encoder + ZITD decoder (STZITD-GNN); baselines include STGCN; STZINB-GNN; STTD-GNN	daily severity-weighted crash risk score per road (y = sum of collision count × severity weight 1/2/3)	no exposure; no offset; no traffic volume data	urban road segment (OS-style link; ~4,700–5,700 nodes per borough)	daily; 2019 only; 8:2:2 within-year temporal split	within-year temporal holdout (no spatial holdout; no cross-year test)	AccHR@k metric is directly applicable to Open Road Risk risk percentile ranking; MPIW/PICP for future probabilistic outputs; Gaussian distributional assumption is clearly worst	medium	high	high	medium	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Verify Table 4 values against original; check whether 8:2:2 split is chronological or random (not stated); GitHub repo may be private	No exposure offset — cannot distinguish high-risk from high-traffic roads; major methodological gap relative to Open Road Risk. Severity-weighted composite response variable not directly comparable to raw injury count. Daily urban scale vs annual national scale: zero-inflation mechanisms differ. GNN architecture not feasible at 2.17M links. Validation: same roads in train and test; single year; no spatial holdout — weaker than current Open Road Risk grouped split.
LIT-035	paper-extraction-balawi-tenekeci-2024-arima-sarimax-london-aroads.md	Time_series_traffic_collision_analysis_of_London_hotspots__Patterns.pdf	Time series traffic collision analysis of London hotspots: Patterns, predictions and prevention strategies	Mohammad Balawi; Goktug Tenekeci	2024	ARIMA; SARIMAX; corridor-level time series	UK; London (A1; A3; A4; A6 corridors)	major A-road corridors (aggregate)	ARIMA(5,4,7); SARIMAX(4,1,2)×(4,1,2,8) on daily corridor-level aggregate time series	daily count of vehicles involved in accidents (not accident count — wrong response variable)	no exposure; no AADT; corridor-level aggregate only	four A-road corridors treated as a single aggregate time series	daily; 2016–2019; December 2019 holdout only	single-month temporal holdout (Christmas period); AIC/BIC in-sample	Post-event STATS19 attributes (severity; light condition; road surface) must not enter Stage 2 as features — this paper inadvertently illustrates why	low	low	low	low	no	Claude	Claude Sonnet 4.6	high — high confidence in the identified problems	no	No secondary review recommended; do not use as evidence for pipeline decisions	CRITICAL: wrong response variable (vehicles involved, not accident count); SARIMAX predicts negative counts (model specification error); R-squared values in Table 3 implausibly high and methodology opaque; ARIMA d=4 order from misconfigured grid search (excluded d=0,1,2); log-likelihood sign inconsistency between ARIMA and SARIMAX tables; 80-20 split described but only 30-day Christmas holdout reported. Published in Heliyon (broad open-access). Do not cite as methodological support for any decision. Retained for completeness of literature search only.
LIT-036	paper-extraction-huda-2024-network-screening-low-volume-roads.md	dot_78279_DS1.pdf	Network Screening on Low-Volume Roads Using Risk Factors	Kazi Tahsin Huda; Ahmed Al-Kaisy	2024	low-volume road network screening	US; Oregon	rural low-volume two-lane paved roads	HSM EB expected crashes; CART thresholds; OLS log-linear screening equations	EB expected crashes per 0.05-mile section; crash density for ranking	AADT in HSM SPF and one proposed model; no-volume alternative model; no exposure offset	fixed 0.05-mile roadway sections; intersections excluded	annual expected crashes from 2004-2013 crash data	random 80/20 split against EB expected-crash target; no spatial/temporal holdout	Confirms Huda/Al-Kaisy as diagnostic support for low-volume, curvature/grade, and volume/no-volume sensitivity checks; flags EB-target R2 caveat	medium	high	high	medium	pilot-first	ChatGPT	not stated	high	conditional	Check curvature CART inconsistency and grade treatment before using thresholds	Duplicate source paper with LIT-016. Stronger caveat that adjusted R2 predicts a smooth EB target, not raw future crashes.
LIT-037	paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md	1-s2.0-S2046043017300199-main.pdf	Development of a global road safety performance function using deep neural networks	Guangyuan Pan; Liping Fu; Lalita Thakali	2017	global SPF; DBN/ML benchmark	Canada and US	mixed highway segments	Deep Belief Network with NB benchmarks and pooled/local model comparisons	annual crash/collision frequency per homogeneous segment-year	AADT and length as DBN inputs or NB exposure-like covariates; no clearly fixed offset	homogeneous highway segments; coarse compared with OS Open Roads links	annual segment-year; temporal holdouts for Ontario/Colorado	temporally held-out MAE/RMSE for Ontario/Colorado; Washington split not fully stated; no spatial holdout	Supports temporal holdout, local-vs-global/facility-family comparisons, and short-segment sensitivity; not production DBN	medium	medium	high	medium	baseline-comparison-first	ChatGPT	GPT-5.5 Thinking	high	conditional	Check Washington years/split and DBN normalization if reproducing	Duplicate source paper with LIT-025; reinforces that DBN should be benchmark-only without ranking/spatial validation.
LIT-038	paper-extraction-poch-mannering-1996-nb-intersection.md	Negative_Binomial_Analysis_of_Intersection-Acciden.pdf	Negative Binomial Analysis of Intersection-Accident Frequencies	Mark Poch; Fred Mannering	1996	intersection approach SPF; NB regression	US; Bellevue, Washington	urban/suburban intersections	Negative binomial regression by intersection approach and crash type	annual accident frequency per intersection approach	approach turning/opposing/intersection traffic volumes as covariates; no offset	intersection approach	annual; 1987-1993	in-sample rho-squared and likelihood tests; no held-out validation	Stronger second extraction for overdispersion and junction-approach mechanisms; confirms in-sample-only limitations	medium	high	high	medium	pilot-first	Claude	Claude Sonnet 4.6	high	conditional	Check Table 1 coefficients and likelihood-ratio test values before citing	Duplicate source paper with LIT-026; improves confidence despite old validation standards.
LIT-039	paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md	road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf	Road Traffic Congestion and Crash Severity: An Econometric Analysis Using Ordered Response Models	Mohammed A. Quddus; Chao Wang; Stephen G. Ison	2010 / manuscript year unclear	motorway severity model; ordered response	UK; M25	motorway	OLOGIT/HCM/GOLOGIT/PC-GOLOGIT	ordered crash severity conditional on crash	15-minute traffic flow and congestion matched with 30-minute lag; no exposure offset because target is severity conditional on crash	crash records assigned to 72 motorway segments	crash-level; 2003-2006; 15-minute traffic state lag	in-sample ordered-response fit and marginal effects; no held-out validation	Confirms severity/frequency separation, lagged traffic-state design, and conditional interpretation caveat	medium	high	high	medium	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	conditional	Check published ASCE citation year and Tables 2-3 against final version	Duplicate source paper with LIT-027; clearer on in-sample metrics and conditional severity target.
LIT-040	paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md	dot_89189_DS1.pdf	Developing a Pedestrian Safety Performance Function for Oregon	Josh Roll; Jason Anderson; Nathan McNeil	2026	pedestrian/intersection SPF; exposure estimation	US; Oregon	urban intersections	Poisson/NB SPFs; pedestrian-volume data fusion; random forest/XGBoost/NN exposure models	pedestrian crash frequency at intersections	vehicle AADT and estimated pedestrian AADPT as explanatory exposure variables; no explicit offset	urban intersection; contracted complex nodes	annual average exposure; crash outcome years not fully stated in SPF sections	exposure model 10-fold CV; SPF validation details/table extraction require care	Supports junction/intersection future work, exposure-only vs proxy comparisons, and vulnerable-user exposure caveats	low	medium	medium	medium	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	yes	Long report; check SPF equations, crash-year window, and AADPT/AADT metrics before citing	Duplicate source paper with LIT-028; broader report extraction confirms report-table review still needed.
LIT-041	paper-extraction-ziakopoulos-yannis-2020-spatial-review.md	A_review_of_spatial_approaches_in_road_safety.pdf	A review of spatial approaches in road safety	Apostolos Ziakopoulos; George Yannis	not explicitly stated; circa 2020	spatial road-safety review	international review	mixed	review of spatial units, spatial models, MAUP, proximity, network KDE, VRU approaches	mixed crash counts/rates/severity/hotspots across reviewed studies	mixed: AADT, VMT/VDT, trips, road length, population; not offset-specific	mixed: links, intersections, grids, zones, regions, network lixels	mixed across reviewed studies	review-level synthesis; no single validation protocol	Second extraction reinforces spatial-unit, MAUP, junction-segment, and network-KDE cautions; exact primary-study values need source checks	high	high	high	high	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Check primary papers before using numerical claims from review tables	Duplicate source paper with LIT-031; improves confidence for high-level caution but not production model choice.
LIT-042	paper-extraction-huda-2024-COMBINED.md	dot_78279_DS1.pdf	Network Screening on Low-Volume Roads Using Risk Factors	Kazi Tahsin Huda; Ahmed Al-Kaisy	2024	combined reconciliation record; low-volume road network screening	US; Oregon	rural low-volume two-lane paved roads	HSM EB expected crashes; CART thresholds; OLS log-linear screening equations	EB-smoothed expected crashes per 0.05-mile section; crash density for ranking	AADT in HSM SPF/EB target and one proposed model; deliberate no-volume comparator; no exposure offset	fixed 0.05-mile roadway sections; intersections excluded	annual expected crashes from 2004-2013 crash data	random 80/20 split against EB expected-crash target; no spatial/temporal holdout	Canonical record clarifies EB target, no-offset structure, volume/no-volume scope, and curvature/grade caveats for low-volume diagnostics	medium	high	high	high	pilot-first	ChatGPT	GPT-5.5 Thinking	high	conditional	Curvature CART sharp-group value is internally inconsistent; grade should not be cited as final-model predictor without caution	Combined record from original PDF plus LIT-016 and LIT-036; use this row for future Huda/Al-Kaisy citations.
LIT-043	paper-extraction-jayasinghe-2019-COMBINED.md	1-s2.0-S2215016119301128-main.pdf	A novel approach to model traffic on road segments of large-scale urban road networks	Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanama	2019	combined reconciliation record; AADT estimation / traffic-volume modelling	Sri Lanka; Cambodia; Vietnam; Pakistan; Tanzania	urban road networks	centrality-based traffic-volume model using betweenness, closeness, and path-distance weighting	AADT / PCU per road segment	AADT is the modelled target; observed counts used for calibration/validation; no collision exposure offset	road segment in dual-graph road network	cross-sectional annual AADT base year by city	random 80/20 validation plus calibration-sample learning curve; no spatial holdout	Canonical record supports Stage 1a centrality diagnostics, learning curves, AADT-band errors, and warnings about random spatial leakage	high	high	medium	high	baseline-comparison-first	ChatGPT	GPT-5.5 Thinking	high	conditional	Final selected regression type is implied but not fully documented across OLS/robust/Poisson alternatives	Combined record from original PDF plus LIT-017 and LIT-018; traffic-exposure paper, not Stage 2 collision-risk evidence.
LIT-044	paper-extraction-poch-mannering-1996-COMBINED.md	Negative_Binomial_Analysis_of_Intersection-Acciden.pdf	Negative Binomial Analysis of Intersection-Accident Frequencies	Mark Poch; Fred Mannering	1996	combined reconciliation record; intersection approach SPF	US; Bellevue, Washington	urban/suburban intersections	Negative binomial regression for total and accident-type approach counts	annual accident frequency per intersection approach	approach and turning traffic volumes as covariates; no formal offset	intersection approach	annual observations from 1987-1993, excluding improvement year	in-sample likelihood/rho-squared diagnostics; no held-out validation	Canonical record confirms junction/approach mechanisms and NB-over-Poisson relevance while warning against link-level coefficient transfer	medium	high	high	medium	pilot-first	ChatGPT	GPT-5.5 Thinking	high	conditional	Exact accident-type table values should still be checked before formal publication because OCR is imperfect	Combined record from original PDF plus LIT-026 and LIT-038; use this for junction/intersection evidence.
LIT-045	paper-extraction-roll-2026-oregon-COMBINED.md	dot_89189_DS1.pdf	Developing a Pedestrian Safety Performance Function for Oregon	Josh Roll; Jason Anderson; Nathan McNeil	2026	combined reconciliation record; pedestrian/intersection SPF and exposure data fusion	US; Oregon	urban intersections	Poisson/NB pedestrian SPFs; random-forest AADT/AADPT data fusion; CURE-style diagnostics	pedestrian crash frequency at intersections	vehicle AADT and estimated pedestrian AADPT as explanatory exposure variables; no explicit offset	urban intersection with contraction of complex nodes	annual average exposure; final SPF crash period not fully stated	AADPT model 10-fold CV; final crash SPF diagnostics mainly in-sample; no clear held-out SPF validation	Canonical record supports exposure-only baselines, CURE diagnostics, Stage 1a distribution checks, and separate future junction/pedestrian layer	medium	high	high	medium	diagnostic-only	ChatGPT	GPT-5.5 Thinking	medium-high	conditional	Check appendices only if exact SPF coefficients, exposure-only comparisons, or crash-assignment rules are needed	Combined record from original PDF plus LIT-028 and LIT-040; use this for Roll/Oregon pedestrian SPF citations.
LIT-046	paper-extraction-quddus-wang-ison-COMBINED.md	road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf	Road Traffic Congestion and Crash Severity: An Econometric Analysis Using Ordered Response Models	Mohammed A. Quddus; Chao Wang; Stephen G. Ison	not clearly stated; circa 2010	combined reconciliation record; motorway conditional severity model	UK; M25	motorway	ordered logit; heteroskedastic choice model; generalized ordered logit; partially constrained generalized ordered logit	ordered crash severity conditional on crash occurrence	no exposure offset; 15-minute traffic flow/congestion assigned to crash records using 30-minute pre-crash lag	individual crash record matched to 72 motorway segments	crash-level records from 2003-2006 with 15-minute traffic state lag	in-sample ordered-response model fit and marginal effects; no held-out validation	Canonical record clarifies conditional severity scope, pre-crash traffic-state matching, no-frequency interpretation, and post-event leakage cautions	medium	high	high	medium	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	conditional	Published version year/bibliographic details and exact Tables 2-3 should be checked before formal numeric reporting	Combined record from original PDF plus LIT-027 and LIT-039; use this for Quddus/Wang/Ison severity citations.
LIT-047	paper-extraction-ziakopoulos-yannis-2020-COMBINED.md	A review of spatial approaches in road safety.pdf	A review of spatial approaches in road safety	Apostolos Ziakopoulos; George Yannis	not explicitly stated; circa 2020	combined reconciliation record; spatial road-safety review	international review	mixed	review of spatial units, MAUP, spatial dependence, proximity structures, network KDE, GWR/CAR/SAR and spatio-temporal methods	mixed crash counts, rates, severity outcomes, hotspot classifications, and spatial crash distributions	mixed: AADT, VMT/VDT, road length, population, trips, and vulnerable-road-user exposure variables; no single offset structure	mixed units: segments, intersections, corridors, grids, zones, regions, and network lixels	mixed across reviewed studies	review-level synthesis; primary studies need checking for exact method/validation claims	Canonical record supports spatial-unit documentation, MAUP/hotspot sensitivity notes, spatial residual diagnostics, and caution against production spatial models from review evidence alone	high	high	high	high	diagnostic-only	ChatGPT	GPT-5.5 Thinking	high	conditional	Check original cited papers before using exact study-level model specifications, validation methods, or numerical claims	Combined record from original PDF plus LIT-031 and LIT-041; use this for spatial-methods review citations.
LIT-048	paper-extraction-quddus-2007-inar-time-series-count.md	AAP_2007_INAR_revised_Final.pdf	Time Series Count Data Models: An Empirical Application to Traffic Accidents	Mohammed A. Quddus	2007	time-series count modelling; intervention analysis	Great Britain; London congestion charging zone	national aggregate and urban area aggregate	ARIMA/SARIMA; negative binomial; INAR(1) Poisson	annual fatalities; monthly casualties	VKT or total monthly accidents as control variables; not segment-level offset	aggregate national or London CC zone time series	annual and monthly	temporal holdout	Temporal holdout and serial-correlation diagnostics support adding year holdout and cluster/ACF checks	high	medium	high	high	diagnostic-only	not stated	not stated	high	conditional	Aggregate time series, not link-level SPF; check exact INAR estimates before formal numeric citation	Supports validation-and-metrics and crash-frequency temporal-dependence notes.
LIT-049	paper-extraction-mensah-hauer-1998-two-problems-averaging.md	263HauerMensahTwoproblemsofaveraging___.pdf	Two Problems of Averaging Arising in the Estimation of the Relationship Between Accidents and Traffic Flow	Abraham Mensah; Ezra Hauer	1998	SPF theory; traffic-flow averaging bias	illustrative; New York State rural road data used in example	rural two-lane illustrative example	theoretical SPF functions and averaging derivations	expected accident frequency	traffic flow q; AADT as averaged flow argument	road section	theoretical one-year observation context	not applicable; theoretical paper	Argument-averaging and function-averaging support WebTRIS/time-profile diagnostics and free-elasticity checks	high	high	high	medium	diagnostic-only	not stated	not stated	high	conditional	Theoretical paper; use for diagnostic rationale, not production temporal feature claim	Supports exposure-and-traffic-volume and crash-frequency temporal-exposure caveats.
LIT-050	paper-extraction-qin-et-al-2006-bayesian-hourly-exposure.md	AAP-2006-Hourlyexposure-1tfliyv_Bayesian_estimation_of_hourly_exposure_functions_by_crash_type_and_time_of_day.pdf	Bayesian estimation of hourly exposure functions by crash type and time of day	Xiao Qin; John N. Ivan; Nalini Ravishanker; Junfeng Liu; Donald Tepas	2006	hourly exposure / crash-type SPF	USA; Michigan and Connecticut	rural two-lane highways	hierarchical Bayesian binary logistic regression	hourly crash occurrence by crash type	hourly directional volume and segment length; additive or multiplicative exposure functions	road segment	hourly observations across 1995-1997 and 1995-2000 datasets	no heldout split; posterior estimation	Flow-crash relationships differ by crash type and time-of-day, supporting temporal-profile and SV/MV diagnostic caveats	high	high	high	medium	diagnostic-only	not stated	not stated	high	conditional	Rural US two-lane scope and no holdout; do not transfer coefficients directly	Supports exposure-and-traffic-volume and crash-frequency function-averaging notes.
LIT-051	paper-extraction-dutta-2020-freeway-crash-prediction-disaggregate-flow.md	dot_54482_DS1.pdf	Improving Freeway Crash Prediction Models Using Disaggregate Flow State Information	Nancy Dutta; Michael D. Fontaine	2020	freeway SPF; temporal flow disaggregation	US; Virginia	freeway; rural and urban	negative binomial GLMs; ZINB tested; GLMMs	crash frequency on freeway segments	AADT baseline; average hourly, average 15-minute, and raw hourly volume alternatives; length offset	directional basic freeway segment	2011-2017	random 70/30 train/test split	Smoothed hourly flow can outperform AADT, while raw noisy hourly data can underperform; supports cautious WebTRIS temporal diagnostics	high	high	high	medium	diagnostic-only	not stated	not stated	high	conditional	Freeway sensor coverage and random split limit transfer; use improvements as upper-bound context	Supports exposure temporal-conditioning and CURE validation notes.
LIT-052	paper-extraction-sung-et-al-2024-modified-temporal-spf.md	Development_of_Modified_Temporal_Safety_Performanc.pdf	Development of Modified Temporal Safety Performance Function Considering Various Time Flows	Yeji Sung; Seunghwan Kim; Juneyoung Park; Ling Wang	2024	temporal SPF; machine-learning comparison	South Korea	motorway / national highway	NB regression; RF; XGBoost; LightGBM; Dirichlet-weighted ensemble	crash frequency by segment and aggregation period	VDS traffic volume at annual/hourly/15-minute aggregation; segment length and lanes	highway cone-zone segment	2018-2022	random 8:2 split; no spatial holdout	Temporal flow disaggregation may improve SPF performance, but validation and sampling weaken direct transfer	medium	high	high	medium	diagnostic-only	not stated	not stated	high	conditional	Specific metrics have low transferability due random split and balanced sampling	Supports temporal-exposure notes and validation caveats about random split optimism.
LIT-053	paper-extraction-savolainen-et-al-2011-severity-modelling-review.md	Savolainen-Mannering-AAP-2011.pdf	The Statistical Analysis of Highway Crash-Injury Severities: A Review and Assessment of Methodological Alternatives	Peter T. Savolainen; Fred L. Mannering; Dominique Lord; Mohammed A. Quddus	2011	severity modelling methodological review	review; primarily US literature	mixed	review of ordered, multinomial, nested, mixed, and joint severity models	crash-injury severity	not applicable; severity models condition on crash occurrence	individual crash / occupant	mixed	review; no primary validation	Severity is a separate estimand; post-event variables and spatial/temporal correlation require careful interpretation	high	high	high	medium	diagnostic-only	not stated	not stated	high	conditional	Review paper; check primary papers for exact empirical claims	Supports severity-modelling and validation serial-correlation cautions.
LIT-054	paper-extraction-roshandel-2015-realtime-traffic-freeway-crash.md	ImpactofReal-timeTrafficCharacteristicsonFreewayCrashOccurrence-SystematicReviewandMeta-analysis.pdf	Impact of Real-time Traffic Characteristics on Freeway Crash Occurrence: Systematic Review and Meta-analysis	Saman Roshandel; Zuduo Zheng; Simon Washington	circa 2015; not confirmed	systematic review and meta-analysis; real-time crash prediction	international review	freeway	review of logistic and machine-learning real-time crash prediction models	binary crash occurrence	traffic variables as predictors; no explicit exposure offset	freeway segment	varies across reviewed studies	varies; many temporal/location splits	Behavioural/unobserved factors limit road-environment prediction, supporting cautious decision-support framing	medium	high	high	low	documentation-only	not stated	not stated	high	yes	Publication year and journal details need confirmation before formal citation	Supports structural explanatory ceiling note in validation page.
LIT-055	paper-extraction-national-highways-2022-comparing-collision-casualty-rates.md	statistical-methods-for-comparing-road-collision-and-casualty-rates-proposed-approach.pdf	Statistical methods for comparing road traffic collision and casualty rates: proposed approach	National Highways; individual authors not stated	2022	official methodology; rate comparison and hypothesis testing	England; National Highways network context	mixed; motorway focus in conclusions	non-homogeneous Poisson process; compound Poisson; parametric bootstrap; Monte-Carlo likelihood-ratio test	collision and casualty rates per vehicle mile	vehicle miles as rate denominator / Poisson scale parameter; assumes observed traffic	road, road type, or period aggregate	aggregate collection period	fictitious worked example only; no empirical validation	UK official support for Poisson exposure-rate form, low-count inference caution, and traffic-denominator sensitivity	high	medium	high	medium	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Proposed approach, not finalised empirical validation; check final status before implementing exact tests	Supports exposure, validation, and severity/casualty-rate notes.
LIT-056	paper-extraction-dft-2024-rsf-initial-analysis.md	Road_safety_factors_initial_analysis-_GOV_UK.pdf	Road Safety Factors: Initial Analysis	Department for Transport	2024	official statistics; CF to RSF transition analysis	Great Britain	mixed	descriptive statistics and CF-to-RSF mapping analysis	road safety factor distributions in fatal collisions	not applicable	collision-level factor categories; aggregate reporting	2022 fatal collision mapping and late-2023 early RSF data	official descriptive analysis; no predictive validation	CF to RSF mapping is a structural reclassification, not a like-for-like time-series continuation	medium	medium	high	low	documentation-only	Claude	Claude Sonnet 4.6	high	conditional	Full ODS mapping table not checked in extraction	Supports transferability/open-data and severity leakage/provenance notes.
LIT-057	paper-extraction-dft-2025-guide-cf-rsf-transition.md	Guide_to_road_safety_and_contributory_factors_for_reported_road_casualties_Great_Britain_-_GOV_UK.pdf	Guide to Road Safety and Contributory Factors for Reported Road Casualties Great Britain	Department for Transport	2025	official guidance; STATS19 CF/RSF data quality	Great Britain	mixed	methodological guidance and coverage tables	contributory factor and road safety factor recording	not applicable	collision-level factor fields; aggregate tables	2015-2024 coverage and transition status	official guidance; no predictive validation	CF/RSF fields are subjective, partially recorded, post-event, and structurally broken across 2024 transition	high	medium	high	low	documentation-only	Claude	Claude Sonnet 4.6	high	conditional	Numeric force-code mapping should be verified against STATS19 lookup	Canonical source for CF/RSF availability, coverage, and transition caveats.
LIT-058	paper-extraction-dft-2025-reported-road-casualties-gb-2024.md	Reported_road_casualties_Great_Britain__annual_report__2024_-_GOV_UK.pdf	Reported Road Casualties Great Britain, Annual Report: 2024	Department for Transport	2025	official statistics; national casualty and exposure rates	Great Britain	mixed	descriptive official statistics	police-reported casualties and collisions by severity, road type, and user type	national vehicle miles denominators for rates; not link-level AADT	national and road-type aggregates	2024 annual statistics	descriptive statistics; no predictive validation	Current UK under-reporting, severity-adjustment, road-type rate, and RSF transition context for documentation	high	medium	high	low	documentation-only	Claude	Claude Sonnet 4.6	high	conditional	Use adjusted/unadjusted severity figures carefully when comparing to raw STATS19 pipeline outputs	Supports severity limitations and national benchmark context.
LIT-059	paper-extraction-wang-2011-two-stage-severity-ranking.md	Predicting_accidents.pdf	Predicting accident frequency at their severity levels and its application in site ranking using a two-stage mixed multivariate model	Chao Wang; Mohammed A. Quddus; Stephen G. Ison	2011	prediction / hotspot detection / two-stage severity-frequency model	England; M25 motorway and surrounding major roads	motorway / major A roads	Bayesian spatial count model plus mixed logit severity model	annual fatal, serious injury, and slight injury accident counts per segment	complete observed HA traffic counts; log(AADT) and log(length) in frequency model; cost-rate normalised by vehicle-km	directional road segment between junctions	2003-2007 panel	in-sample MAD only; no held-out validation	Severity-disaggregated frequency model supports future severity methodology; log(length) near 1 supports exposure-offset documentation with motorway/major-road scope caveat	medium	high	high	medium	documentation-only	Claude	Claude Sonnet 4.6	high	conditional	Check published version if quoting exact table values	Relevant stage: Stage 2, severity, documentation. Transferability: medium. Complete HA traffic counts assumed for all segments, so data assumptions do not transfer to Open Road Risk’s minor road network.
LIT-060	paper-extraction-khodadadi-2021-NB-parameterisations-NFAS-SPF.md	2021__A__Khodadadi_NFAS.pdf	Application of different negative binomial parameterizations to develop safety performance functions for non-federal aid system roads	Ali Khodadadi; Ioannis Tsapakis; Subasish Das; Dominique Lord; Yingfeng Li	2021	SPF development; NB parameterisation comparison; Bayesian count modelling for zero-heavy low-volume crash data	USA; Virginia	rural and urban local low-volume NFAS roads (AADT ≤ 2347 vpd); non-federal aid system (6R; 7R; 7U)	Six NB parameterisations (NB-1; NB-2; NB-P; NB1-L; NB2-L; NBP-L) × five dispersion structures; full Bayesian MCMC (rjags); WAIC and LOO model selection; CURE plots	5-year aggregate crash count per road segment (all injury and property damage; intersection crashes excluded; no severity stratification)	Ln(AADT) and length as free predictors with estimated elasticities (AADT ~0.63–0.74; length ~0.47–0.68); no fixed log-offset; poor-quality AADT records excluded	NFAS road segment (variable length; mean 1.37 mi rural; 0.40 mi urban local)	5-year aggregate cross-section; no panel; no within-year structure	Approximate PSIS-LOO and WAIC from Bayesian posterior (no external holdout; no spatial or temporal split)	NB-L models outperform NB-2 by WAIC ≈ 600 units when zero proportion ≥ 37% and skewness ≥ 1.92; length elasticity statistically < 1.0 in all 30 model variants; CURE plots as standard SPF residual diagnostic; WAIC/LOO preferred over DIC for Bayesian model comparison	high	high	high	high	baseline-comparison-first	Claude	Claude Sonnet 4.6	high	conditional	Verify “AADT over 5 years” label means daily flow rate not 5-year total (confirmed by max 2347 vpd); check underlined coefficient counts match stated 95% HPD criterion in Tables 2–4	NB-L superiority corroborated across WAIC; LOO; MAD; and CURE plots — not a single measure. Open Road Risk at ~98–99% link-year zeros is in more extreme sparsity regime than this paper’s 37%; the NB-L case is stronger. Full Bayesian MCMC infeasible at 2.17M links; frequentist or sampled NB-L recommended first. Also updates evidence base for LIT-TODO-002; LIT-TODO-016; and LIT-TODO-022.
LIT-061	paper-extraction-asumadu-2015-poisson-NB-Ghana-road-accidents.md	Comparative_Assessment_Of_Poisson_And_Ne.pdf	Comparative Assessment Of Poisson And Negative Binomial Regressions As Best Models For Road Count Data	Oppong Richard Asumadu; Assuah Charles Kojo; Asiedu-Addo Samuel Kwesi	2015	Poisson vs NB model comparison; descriptive analysis	Ghana	national aggregate; no road class or spatial disaggregation	Poisson GLM and NB GLM with log link; MLE in R (glm; glm.nb)	count of road fatalities per day-of-week per year (national total; fatal only; no exposure offset)	none — no traffic volume; no offset; raw fatality count only	national aggregate (Ghana)	annual totals by day of week; 2001–2010; 70 observations	none; in-sample AIC and deviance comparison only	Confirmatory: NB reduces overdispersion vs Poisson (Poisson dispersion 2.297; NB 1.290; ΔAIC 12.5). Low transferability; better evidenced by Khodadadi 2021 for road SPF context.	low	low	low	low	no	Claude	Claude Sonnet 4.6	high	no	Paper is simple and short; findings clearly stated; no DOI; low-visibility open-access journal	No exposure offset — day-of-week coefficients confound traffic volume with risk. Ghana national aggregate has no spatial disaggregation. Do not cite as primary evidence for NB over Poisson in SPF context; Khodadadi 2021 is the appropriate reference. Retained for literature-search completeness only.
LIT-062	paper-extraction-verhoef-boveng-2007-quasipoisson-vs-NB-overdispersion.md	QUASI-POISSON_VS__NEGATIVE_BINOMIAL_REGRESSION__HOW_SHOULD_WE_MOD.pdf	Quasi-Poisson vs. Negative Binomial Regression: How Should We Model Overdispersed Count Data?	Jay M. Ver Hoef; Peter L. Boveng	2007	statistical methods (ecology application; not road safety)	USA; Alaska (harbor seal aerial surveys)	not applicable	quasi-Poisson GLM; NB GLM; IWLS weight derivation; variance-mean diagnostic plot	harbor seal counts per survey site (ecology); relevant as statistical methods reference only	not applicable; no exposure offset	individual haul-out site (423 sites)	10-day survey period (1998)	none; methods paper	Quasi-Poisson and NB differ in IWLS weights: QP weights scale linearly with mean (high-count observations dominate); NB weights level off at 1/κ (low-count observations get more relative influence); variance-mean diagnostic plot distinguishes them; AIC cannot compare QP vs NB directly	medium	medium	medium	medium	diagnostic-only	Claude	Claude Sonnet 4.6	high	conditional	Verify IWLS weight derivations (Eqs. 4–5) before implementing variance-mean diagnostic; abundance estimates do not need checking	Ecology paper — application does not transfer; statistical methodology does. For Open Road Risk’s ranking-across-all-links goal; NB’s weighting scheme is more appropriate than QP on scientific grounds. Paper explicitly states “no general answer” — the goal determines the choice.

Thematic evidence matrix

Crash-frequency and count modelling

paper	method	what it supports	what it does not support	relevance to current Stage 2	actionability
Aguero-Valverde & Jovanis 2008; Claude and Gemini extractions	Poisson/NB/Poisson-lognormal spatial crash-frequency models	Count modelling with exposure, overdispersion, and spatial residual diagnostics	Direct national-scale CAR production model	High: Stage 2 is a count/ranking model with exposure	Run diagnostics for AADT elasticity, residual spatial autocorrelation, and spatial uncertainty notes.
Lord & Mannering 2010	broad crash-frequency methodological review	Conservative framing around overdispersion, zero-heavy outcomes, omitted variables, exposure functional form	Any single best model family	High: maps directly to Stage 2 risks	Add modelling limitations and baseline comparison tables.
Chengye & Ranjitkar 2013; three extractions	NB motorway segment models with temporal holdout	Temporal holdout, ramp/facility-family diagnostics, motorway-specific geometry checks	Direct replacement of link-level model or uncritical coefficient transfer	Medium: motorway subset only	Add temporal holdout and ramp/slip-road diagnostic.
Gilardi et al. 2022; three extractions	Bayesian Poisson network lattice with spatial/severity effects	OS-segment count modelling, log-offset structure, balanced accuracy diagnostics	External validation of Open Road Risk or national-scale INLA production	High: closest UK link-network literature	Add documentation and balanced accuracy diagnostic, not production spatial model.
Al-Omari 2021	NB SPFs by context class with EB screening	Context/facility stratification and urban exposure elasticity diagnostics	Direct coefficient transfer from Florida thesis	Medium	Baseline comparison of global vs road-family/context split models.
Hauer et al. 2001	EB tutorial using SPF prior plus observed counts	EB shrinkage, regression-to-mean warning, overdispersion role	A specific predictive model for Open Road Risk	High for EB diagnostic layer	Audit EB formula and document approximation.
Pan et al. 2017	DBN vs NB global SPF	NB benchmark and minimum segment-length sensitivity	DBN/MSE as production model for sparse injury counts	Medium	Use as baseline-comparison and methods-to-avoid evidence.
Pew et al. 2020	Bayesian ZIP; ZINB; NB-Lindley on Utah intersection panel	Methodological justification for ZINB as candidate; posterior predictive zero check; NB GLM as priority diagnostic step	Full Bayesian MCMC at 2.17M links; intersection-unit coefficients; no exposure offset	High: π ≈ 0 finding means NB GLM with offset is the right first step, not full zero-inflation	Fit NB GLM candidate; run posterior predictive zero check on current Poisson GLM.
Gao et al. 2024	STZITD-GNN (GRU + GAT + zero-inflated Tweedie) on London urban road-day data	AccHR@k ranking metric; MPIW/PICP uncertainty metrics (future); Tweedie GLM as intermediate candidate	Full GNN at national scale; no exposure offset; daily urban resolution; severity-weighted composite not raw count	Medium: AccHR@k metric is immediately applicable; architecture does not transfer	Implement AccHR@k as validation metric for Stage 2 risk percentile output.
Khodadadi et al. 2021	Six NB parameterisations × five dispersion structures; Bayesian MCMC; WAIC/LOO; CURE plots	NB-L models strongly outperform NB-2 for zero-heavy low-sample-mean data (WAIC advantage ~600; CURE convergence); length elasticity < 1.0 confirmed across all parameterisations; WAIC/LOO preferred over DIC for Bayesian hierarchical model comparison	Full Bayesian MCMC infeasible at 2.17M links; Virginia NFAS roads (low-volume; no geometry features); coefficient values not transferable to UK	High: most directly relevant paper for Stage 2 model family choice after Pew 2020	Compute skewness and zero proportion of link-year crash distribution; implement CURE plots; test NB-2 then NB-L as Stage 2 candidates.
Ver Hoef & Boveng 2007	Quasi-Poisson vs NB IWLS weight derivation; variance-mean diagnostic	Theoretical framework for choosing between QP and NB: QP favours high-count observations; NB gives relatively more influence to low-count observations; variance-mean diagnostic distinguishes them; AIC invalid for QP vs NB comparison	Ecology application context; no exposure offset; no road safety data	Medium: scientific justification for NB over QP for Open Road Risk’s ranking-across-all-links goal	Add variance-mean diagnostic plot before NB implementation; document AIC limitation for QP vs NB comparison.
Asumadu et al. 2015	Poisson vs NB comparison on Ghana national fatality data	Confirmatory: NB reduces overdispersion vs Poisson (Poisson dispersion 2.297; NB 1.290; ΔAIC 12.5)	No exposure offset; national aggregate; Ghana-specific; fatality only; no SPF structure	Low: better evidenced by Khodadadi 2021 and Lord 2010	No action; retained for literature-search completeness only.

Exposure and traffic-volume handling

paper	exposure treatment	transferable part	non-transferable part	implication for AADF/WebTRIS	actionability
Gilardi et al. 2022	offset = segment length times estimated commuter flow	Same mathematical log-offset family on UK OS segments	Census commuter flow is weaker than AADF/AADT	Supports documenting Open Road Risk’s AADT x length offset as literature-aligned	Documentation note; no production change.
Hauer et al. 2001	ADT/AADT in SPF; length and years scale expected count	Year-specific exposure and EB weighting logic	Tutorial examples not full pipeline	Supports using year-specific AADT in EB diagnostic	Audit/upgrade EB diagnostic.
Aguero-Valverde & Jovanis 2008	AADT free coefficient; length offset	Test whether AADT elasticity differs from 1.0	Rural US scope and intersection exclusion	Run diagnostic freeing AADT coefficient from fixed VMT offset	Diagnostic only.
Wang et al. 2009	AADT and length as free covariates, not offset	Motorway-specific AADT elasticity check	No sparse AADF estimation and long segments	Motorway AADT coefficient may differ by road class	Motorway-only diagnostic.
Jayasinghe et al. 2019	AADT is target, estimated from centrality and sparse counts	Stage 1a centrality features, learning curves, sparse-count sensitivity	Not a collision-risk paper; random validation likely leaks spatially	Stage 1a should report spatial holdout and count-sparsity sensitivity	Baseline comparison/diagnostic.
Roll et al. 2026	data-fusion vehicle/pedestrian exposure	Compare exposure-only vs full-feature baselines; CURE plots	Pedestrian/intersection scope; commercial/US data tiers	Stage 1a analogy is conceptual only	Documentation and diagnostic baseline.
Huda & Al-Kaisy 2024	AADT covariate dropped in one low-volume model	Low-volume geometry/AADT-sensitivity diagnostic	LVR-specific and EB-output response	Test whether low-AADT links are dominated by geometry vs exposure uncertainty	Pilot-first.
Mensah & Hauer 1998	AADT as averaged traffic-flow argument in SPF theory	Argument-averaging and function-averaging bias diagnostics	Theoretical examples, not a fitted Open Road Risk-scale model	Estimate free AADT elasticity and CV(q) diagnostics from Stage 1b profiles before claiming temporal conditioning value	Diagnostic only.
Qin et al. 2006	hourly directional traffic volume by crash type and time of day	Time-of-day and crash-type-specific exposure functions	Rural US two-lane scope; no heldout validation	Supports documenting SV/MV and time-of-day aggregation limits in annual Stage 2	Diagnostic only.
Dutta & Fontaine 2020	AADT vs average-hourly/15-minute/raw-hourly freeway volumes	Smoothed temporal flow profiles can improve SPF validation metrics; raw noisy data can hurt	Direct freeway sensor coverage and random split; not national open-data coverage	Stage 1b WebTRIS profiles are plausible diagnostic features but expected gain is limited	Diagnostic only.
Sung et al. 2024	AADT/AHT/AMT temporal SPF comparison	Temporal aggregation can change SPF performance	Random split, balanced samples, complete Korean VDS coverage	Directional support only for temporal exposure diagnostics	Diagnostic only.
National Highways 2022	vehicle miles as Poisson rate scale	UK official support for exposure-rate mathematical structure and denominator sensitivity checks	Assumes observed traffic; aggregate comparison method	Supports AADT-denominator sensitivity analysis, not direct production ranking	Documentation and diagnostic.
Khodadadi et al. 2021	Ln(AADT) and length as free predictors; AADT elasticity 0.63–0.74; length elasticity 0.47–0.68; both statistically < 1.0	Second independent paper (after Wang et al. M25) confirming sub-linear length elasticity across a very different road type; AADT elasticity varies by road class	NFAS Virginia roads (low-volume; US; no geometry); coefficients not directly transferable to UK	Run diagnostic test of log(AADT) and log(length) as free predictors stratified by road class; supports LIT-TODO-002	Diagnostic; also updates evidence for per-family offset testing in Stage 2.

Spatial and network methods

paper	spatial unit / network concept	key spatial issue	relevance to OS Open Roads links	actionability
Gilardi et al. 2022	OS road segment lattice and shared-boundary adjacency	spatial autocorrelation and MAUP/segment contraction	High; closest OS-network analogue	Document support, add MAUP pilot and adjacency residual diagnostics.
Aguero-Valverde & Jovanis 2008	road segments with CAR neighbourhoods	unobserved spatial correlation biases coefficients/precision	High as diagnostic concept; lower as production model	Moran’s I and residual corridor mapping.
Ziakopoulos & Yannis 2020	review across links, intersections, zones, corridors	spatial-unit sensitivity, boundary effects, proximity weights	High as cautionary framework	Spatial validation section and segmentation sensitivity pilot.
Baddeley et al. 2021	continuous network point process	segment aggregation and planar KDE can mislead	Conceptually high, production low	Avoid ordinary 2D KDE; small point-process diagnostic only.
Cronie et al. 2019	linear-network point-process diagnostics	point clustering after intensity adjustment	Medium for snapped-collision diagnostics	Small pilot on one urban area; not Stage 2 replacement.
Wang et al. 2015	TAZ-level CAR arterial model	MAUP and zonal aggregation	Low direct transfer	Junction/signal density ideas only.

Junctions, intersections, and conflict structure

paper	junction/intersection mechanism	required data	transferability	current repo implication	actionability
Poch & Mannering 1996	intersection approach-level traffic, turning, signal, geometry variables	turning volumes, approach geometry, signal/control data	Medium conceptually; low direct data coverage	Pure link model under-represents junction mechanics	Add junction-adjacent residual diagnostic and proxy feature pilot.
Roll et al. 2026	urban intersection SPF by type/control/crossing	intersection inventory, pedestrian exposure, crossing/control data	Low direct transfer	Highlights missing junction-specific model class	Documentation/future work; CURE diagnostics transferable.
Al-Omari 2021	access-point and signalized-intersection density as segment features	junction/access density from inventory	Medium if derived from OS/OSM topology	Candidate junction density per link/corridor	Diagnostic before feature inclusion.
Wang et al. 2015	signal spacing/access density at TAZ level	signals/accesses and zonal network features	Low to medium	Possible missing urban conflict proxies	Low-priority diagnostic.
Aguero-Valverde & Jovanis 2008	intersections/ramp crashes excluded in one extraction	junction exclusion flag/sensitivity	Medium as scope caveat	Current STATS19-to-link snapping includes junction-proximate crashes	Document and test near-junction sensitivity.
Hauer et al. 2001	intersections treated as separate EB entity type	intersection entity definition and SPF	High conceptually for future junction module	Link and junction EB weights differ	Future junction-level methodology note.

Severity modelling

paper	severity target	model type	useful idea	leakage risk	current/future relevance
Boulieri et al. 2016	slight vs severe/fatal counts	multivariate Bayesian Poisson at ward-year	Severity strata can have distinct spatial patterns	Low if kept as aggregate target; scale mismatch	Current documentation; future severity target.
Gilardi et al. 2022	slight vs severe segment counts	bivariate Bayesian Poisson network lattice	Balanced accuracy for sparse severe counts; severity-specific rates	Low for target; no holdout caveat	High documentation/future relevance.
Michalaki et al. 2015	conditional motorway accident severity	ordered/generalized ordered logit	Frequency and severity mechanisms differ; HGV/hard-shoulder diagnostics	High if using post-event variables as predictors	Documentation and future accident-level severity module.
Quddus et al. circa 2010	conditional crash severity	ordered response models with 30-minute traffic lag	Pre-crash lag design for WebTRIS/crash-level work	Post-event crash variables could leak	Future severity/time-profile design.
Ma et al. 2019	fatal vs non-fatal crash	XGBoost classifier	Severity-feature importance and leakage warning	High for crash-record features	Diagnostic-only severity stratification.
Roll et al. 2026	pedestrian injury crashes	intersection SPF	Vulnerable-user exposure is separate from vehicle exposure	Low for current all-injury link model	Future active-travel literature only.
Savolainen et al. 2011	crash-injury severity review	methodological review	Severity modelling requires separate estimands and careful treatment of correlation/heterogeneity	High if post-event variables are used prospectively	Current documentation and future severity target.
National Highways 2022	casualty rate and casualties per collision	compound Poisson / non-parametric casualty component	Casualty-per-collision distribution should not be forced into simple count family	Low if kept separate; high if folded into frequency target	Future severity/casualty-rate diagnostic only.
DfT reported casualties 2024	national severity/casualty official statistics	descriptive statistics and severity adjustment caveats	STATS19 under-reporting and adjusted severity figures affect interpretation of outcome	Low leakage risk; high documentation relevance	Current documentation context.
DfT CF/RSF guidance 2025	contributory / road safety factors	official guidance on subjective post-event factors and transition break	CF/RSF fields are not prospective road attributes	High if used as Stage 2 features; low if diagnostic context	Documentation/provenance only.

Validation, metrics, and model assessment

paper	reported validation/metric type	what the metric actually tests	limitations	Open Road Risk implication
Brodersen et al. 2010	posterior balanced accuracy	imbalanced binary classifier performance and uncertainty	Only applies after binarising outcomes	Use for zero/non-zero or hotspot classification diagnostics, not count likelihood replacement.
Gilardi et al. 2022	posterior predictive balanced accuracy	in-sample posterior predictive adequacy	Not external/spatial holdout validation	Label clearly and report alongside grouped holdout metrics.
Chengye & Ranjitkar 2013	MAD/MSPE temporal holdout	temporal prediction for motorway segments	motorway-only; longer homogeneous segments	Add temporal holdout diagnostic to Stage 2.
Roll et al. 2026	exposure-only vs feature-rich SPF; CURE plots	model misspecification against covariates	intersection/pedestrian scope	Use CURE plots and exposure-only baseline for GLM diagnostics.
Huda & Al-Kaisy 2024	high R2 predicting EB expected crashes	fit to smoothed EB target, not raw crashes	random split and circularity inflate fit	Avoid comparing R2 to raw-count pseudo-R2.
Lord & Mannering 2010	review of fit/diagnostic issues	model risk checklist	no single empirical validation	Use as validation documentation scaffold.
Ma et al. 2019	classifier metrics on balanced fatality data	conditional fatal/nonfatal classification	not exposure-adjusted and not frequency prediction	Do not compare to Stage 2 risk percentile.
Mahoney et al. 2023	simulation comparison of V-fold vs spatial CV methods	which CV method best estimates true out-of-sample error for spatially autocorrelated data	simulation uses continuous outcome; regular grid not road network; zero-heavy counts not tested	Current grouped-link split is temporal, not spatial CV; document this limitation; pilot police-force holdout; estimate autocorrelation range from Stage 2 residuals via variogram.
Pew et al. 2020	Bayesian chi-squared goodness-of-fit; posterior predictive zero check; temporal holdout RPMSE/MAD	zero-calibration and distributional adequacy for zero-heavy count models	intersection unit; no spatial holdout; single-year holdout only	Run posterior predictive zero check on current Poisson GLM; π ≈ 0 finding supports NB GLM as priority next step before ZINB.
Gao et al. 2024	AccHR@k (hit rate at top-k% predicted risk roads); MPIW/PICP uncertainty	ranking precision at top-k; interval calibration	within-year temporal holdout only; same roads in train/test; no spatial holdout; weaker than current Open Road Risk CV	Implement AccHR@k for Stage 2 risk percentile validation; MPIW/PICP deferred until probabilistic outputs added.
Quddus 2007	temporal holdout and INAR/NB time-series comparisons	temporal generalisation and serial correlation	aggregate time series, not link-year validation	Add temporal holdout and link-level serial-correlation diagnostics cautiously.
Savolainen et al. 2011	severity-methodology review	spatial/temporal correlation and heterogeneity cautions	review evidence, not a single validation design	Supports cluster-robust SE and separate severity framing.
Roshandel et al. circa 2015	systematic review/meta-analysis of real-time freeway crash prediction	explanatory ceiling and operational false-positive caution	real-time freeway crash occurrence, not annual link risk	Supports cautious decision-support framing and non-operational claims.
Dutta & Fontaine 2020	CURE plots and 70/30 validation for temporal flow SPFs	functional-form diagnostics over volume range	freeway sensor data and random split	Add CURE-by-AADT/length diagnostics; avoid expecting large temporal-feature gains.
Sung et al. 2024	random split temporal SPF comparison	high R2 values under weak validation	random split and balanced sampling make metrics optimistic	Use as validation-caveat example, not performance benchmark.
National Highways 2022	Monte-Carlo likelihood-ratio test and bootstrap intervals	low-count rate-comparison sensitivity	aggregate pairwise comparisons; fictitious example only	Use for aggregate diagnostics and AADT denominator sensitivity, not production link ranking.
Khodadadi et al. 2021	WAIC and PSIS-LOO from full Bayesian posterior; CURE plots; MAD (in-sample)	WAIC/LOO approximate leave-one-out predictive accuracy; CURE plots diagnose systematic misfit over AADT/length range	No external holdout; Virginia-only; LOO is approximated not true holdout; WAIC not computable for quasi-likelihood models	WAIC/LOO preferred over DIC for Bayesian hierarchical model comparison; CURE plots by AADT and length quantile directly implementable as Stage 2 diagnostics; supports LIT-TODO-016.
Ver Hoef & Boveng 2007	Variance-mean diagnostic plot; cross-validation as alternative when AIC is unavailable	Determines empirically whether QP or NB better fits variance structure; AIC cannot compare QP vs NB	Methods paper; no external validation; harbor seal context	Run variance-mean diagnostic on Stage 2 Poisson residuals before choosing NB or QP; document AIC limitation for QP vs NB comparison; supports LIT-TODO-032 and LIT-TODO-033.

Point-process / hotspot / spatial diagnostics

paper	method	diagnostic use	production risk	recommended status
Baddeley et al. 2021	network point processes and network KDE	compare raw/snap collision clustering with link rankings	Does not scale easily and changes target from link-year risk to event intensity	small pilot / documentation note
Cronie et al. 2019	inhomogeneous network J/F/G functions	test point clustering after intensity correction	Not exposure-normalised traffic risk	small pilot only
Eckardt & Moradi 2024	marked point process summaries	explore severity/type mark dependence	exploratory summaries can be mistaken for predictive validation	small pilot only
Aguero-Valverde & Jovanis 2008	CAR residual/spatial effects	residual spatial autocorrelation and corridor clustering	national CAR production infeasible	diagnostic-only
Ziakopoulos & Yannis 2020	spatial-methods review	MAUP, proximity, hotspot sensitivity	review evidence cannot justify direct production swap	documentation and diagnostic queue

Methods to avoid as production changes for now

method/paper	why not production-ready	safer use	required evidence before production
Full national CAR/MCAR Bayesian model; Aguero-Valverde, Gilardi, Boulieri	computationally unrealistic at 2M+ links; often in-sample only	pilot area residual/spatial diagnostic	scalable implementation, grouped/spatial holdout benefit, compute budget
DBN with MSE crash-count regression; Pan et al. 2017	no count likelihood/offset; poor match to zero-heavy injury collisions	baseline comparison note; negative evidence	Poisson/NB loss with offset and strong held-out performance
Planar KDE for road crashes; Baddeley et al. 2021	ignores network geometry and can mislead	network-aware KDE/point process pilot	network-distance implementation and clear diagnostic framing
Post-event crash variables as Stage 2 predictors; Michalaki, Quddus, Ma	crash type/casualties/contributory factors happen after or during crash	retrospective severity diagnostics only	prospective feature availability and leakage audit
STATS19 CF/RSF fields as stable prospective Stage 2 predictors; DfT 2024/2025	officer-recorded post-event judgements; partial coverage; 2024 CF-to-RSF structural break; record-level RSF not in standard open download	provenance/EDA notes and diagnostic context only	audited open-data availability, stable definitions, and explicit non-leakage design
Zonal TAZ CAR model for link ranking; Wang et al. 2015	loses link-level geometry; MAUP risk	contextual/junction-density inspiration	link-level validation of derived proxies
STZITD-GNN full architecture; Gao et al. 2024	GRU+GAT+ZITD at 2.17M links is computationally infeasible; no exposure offset; daily resolution; severity-weighted composite not raw count	AccHR@k metric and Tweedie GLM as extractable contributions	scaled pilot (small area), exposure offset retained, annual aggregation, robust cross-year holdout
ARIMA/SARIMAX on corridor-level collision data without exposure; Balawi & Tenekeci 2024	wrong response variable (vehicles involved not collision count); no exposure denominator; negative predicted counts; implausible R-squared values; methodology not replicable	negative example: illustrates post-event feature leakage from STATS19 attributes	not recommended under any circumstances for this pipeline
Random V-fold CV as primary Stage 2 validation; implied by Mahoney et al. 2023	severely underestimates true prediction error for spatially autocorrelated data (2% within target range vs 37% for spatial CV)	current grouped-link temporal split is an improvement but does not enforce spatial separation; document limitation	spatial clustering CV with buffer sized to autocorrelation range of Stage 2 residuals
Pedestrian intersection SPF as all-injury link model; Roll et al. 2026	different mode, exposure, and unit	active-travel/future junction literature	UK-equivalent pedestrian exposure and junction inventory

Code and documentation implications

todo_id	suggested_action	action_type	relevant_stage	supporting_papers	why_supported	current_repo_relevance	future_research_relevance	effort	risk_if_done_badly	already_present_or_new	priority
LIT-TODO-001	Add Stage 2 documentation note on exposure-offset support and limitations	documentation note	Stage 2 / documentation	Gilardi 2022; Hauer 2001; Lord 2010; Aguero-Valverde 2008; Pan 2017	Multiple extractions support exposure-adjusted count framing but note elasticity/functional-form caveats	high	high	low	Overclaiming exact offset optimality	partly present in methodology pages	now
LIT-TODO-002	Run diagnostic Stage 2 GLM with log(AADT) and log(length) as free covariates or road-family interactions	diagnostic / baseline comparison	Stage 2	Aguero-Valverde 2008; Wang 2009; Al-Omari 2021; Lord 2010	Several papers estimate sub/super-linear AADT effects rather than fixed offset	high	high	medium	Confusing diagnostic with production replacement	new/partly implied	later
LIT-TODO-003	Add temporal holdout report for Stage 2	diagnostic	validation / Stage 2	Chengye & Ranjitkar 2013; Pan 2013; Lord 2010	Motorway NB papers use later-year prediction; current grouped split should be complemented	high	medium	medium	COVID-year split can distort results	likely partly present; verify	now
LIT-TODO-004	Add spatial residual/autocorrelation diagnostic on pilot area	diagnostic	validation / Stage 2	Aguero-Valverde 2008; Gilardi 2022; Ziakopoulos 2020	Spatial autocorrelation can bias inference and hotspot confidence	high	high	medium	Treating in-sample spatial smoothers as external validation	new	later
LIT-TODO-005	Add MAUP/segmentation sensitivity pilot for OS Open Roads links	small pilot	validation / feature engineering	Gilardi 2022; Baddeley 2021; Ziakopoulos 2020; Pan 2017	Link granularity and very short segments are repeated cautions	medium	high	high	Large refactor or inconsistent target grain	new	backlog
LIT-TODO-006	Add junction-adjacent residual/risk diagnostic	diagnostic	Stage 2 / feature engineering	Poch 1996; Al-Omari 2021; Ziakopoulos 2020; Baddeley 2021	Junction mechanisms differ from mid-link risk	high	high	medium	Using noisy OSM junction proxies as production features too early	future-work mentions junction density	later
LIT-TODO-007	Pilot junction-density or conflict-proxy features only after diagnostic	small pilot / candidate feature	feature engineering / Stage 2	Poch 1996; Al-Omari 2021; Wang 2015	Intersection/access density repeatedly appears as relevant but data differs	medium	high	medium	Proxy may measure urbanity/AADT rather than conflict	candidate in future-work	backlog
LIT-TODO-008	Audit EB shrinkage formula and overdispersion parameter usage	diagnostic	Stage 2 / validation	Hauer 2001; Al-Omari 2021; Huda 2024	EB weighting depends on correct dispersion and entity type	high	high	medium	Miscalibrated shrinkage overstates confidence in rankings	EB exists as diagnostic	now
LIT-TODO-009	Document regression-to-mean warning for before/after use of high-risk links	documentation note	documentation / validation	Hauer 2001	Users may evaluate interventions on links selected by high observed counts	high	medium	low	Users mistake ranking for treatment-effect evidence	likely new	now
LIT-TODO-010	Add balanced-accuracy diagnostic for zero/non-zero or severe/KSI checks	diagnostic	validation / Stage 2	Brodersen 2010; Gilardi 2022	Imbalanced sparse counts make ordinary accuracy misleading	medium	high	medium	Binarisation can obscure count calibration	possibly absent	later
LIT-TODO-011	Keep severity modelling separate from frequency model in docs	documentation note	documentation / Stage 2	Boulieri 2016; Gilardi 2022; Michalaki 2015; Quddus 2010; Ma 2019	Severity and frequency targets differ and may have different predictors	high	high	low	Implying severity-weighted validation exists	future-work covers severity	now
LIT-TODO-012	Add severity-stratified diagnostic comparing top-risk links with KSI/fatal proportions	diagnostic	Stage 2 / validation	Ma 2019; Quddus 2010; Michalaki 2015; Boulieri 2016	Tests whether current frequency ranking misses severity burden	medium	high	medium	Leakage if post-event proportions become production predictors	new	later
LIT-TODO-013	Add feature-interpretation leakage note for crash-record variables	documentation note	feature engineering / documentation	Ma 2019; Michalaki 2015; Quddus 2010	Post-event crash features are not prospective link predictors	high	high	low	Accidental use of target-derived variables	likely partly present	now
LIT-TODO-014	Add centrality-feature and count-sparsity diagnostics for Stage 1a	diagnostic / baseline comparison	Stage 1a	Jayasinghe 2019	Centrality-based AADT estimation depends on split design and sparse counts	high	medium	medium	Random splits overstate spatial generalisation	centrality likely present	later
LIT-TODO-015	Add learning curve for Stage 1a count-point sparsity	diagnostic	Stage 1a / validation	Jayasinghe 2019	Extraction explicitly suggests training-point sensitivity	medium	medium	medium	Misreading random split R2 as spatial transfer	new	backlog
LIT-TODO-016	Add CURE plots and exposure-only baseline comparison for Poisson GLM	diagnostic / baseline comparison	Stage 2 / validation	Roll 2026; Lord 2010	CURE plots and exposure-only baselines diagnose misspecification	medium	medium	medium	Applying pedestrian-intersection claims to link model	new	later
LIT-TODO-017	Add documentation note that congestion proxies are low priority for current Stage 2	documentation note	Stage 2 / documentation	Wang 2009; Quddus 2010	Two M25 companion extractions report congestion null findings; scope is motorway-specific	medium	medium	low	Generalising M25 null result to all roads	new	later
LIT-TODO-018	Run motorway slip-road/ramp residual diagnostic	diagnostic	Stage 2 / feature engineering	Chengye & Ranjitkar 2013; Pan 2013; Michalaki 2015	Motorway context differs around ramps/hard shoulder	medium	medium	medium	Sparse/noisy ramp coding	possibly available via form-of-way	backlog
LIT-TODO-019	Add curvature/grade interpretation note by road family	documentation note / diagnostic	feature engineering / Stage 2	Pan 2017; Chengye 2013; Wang 2009; Huda 2024; Quddus 2010	Geometry effects vary by road type and by frequency vs severity target	high	high	low-medium	Treating coefficient direction as causal	curvature active; grade candidate	now/later
LIT-TODO-020	Treat point-process methods as exploratory comparison layers only	documentation note / small pilot	validation / future work	Baddeley 2021; Cronie 2019; Eckardt 2024	Network point-process literature critiques aggregation but does not replace Stage 2	medium	high	low for note; high for pilot	Presenting in-sample clustering as predictive validation	new	backlog
LIT-TODO-021	Run posterior predictive zero check on current Stage 2 Poisson GLM	diagnostic	Stage 2 / validation	Pew 2020	Table 3 in Pew shows Poisson-equivalent model (ZIP with π=0) underestimates zeros; same structure expected for Open Road Risk Poisson GLM given ~98–99% link-year zero rate	high	medium	low	Drawing samples at link-year level must incorporate correct exposure offset per link	new	now
LIT-TODO-022	Fit negative binomial GLM with existing exposure offset as Stage 2 candidate and compare to Poisson GLM using grouped-link CV	baseline comparison	Stage 2	Pew 2020; Lord 2010; Chengye & Ranjitkar 2013	π ≈ 0 in Pew’s ZINB indicates overdispersion (ϕ = 17) drives improvement, not zero-inflation; NB GLM is the priority step before any ZINB complexity	high	high	low-medium	NB GLM dispersion can be sensitive to motorway overfitting already noted; check ϕ stability across facility families	new	now
LIT-TODO-023	Estimate empirical variogram of Stage 2 Poisson GLM residuals to determine spatial autocorrelation range	diagnostic	Stage 2 / validation	Mahoney 2023; Aguero-Valverde 2008; Gilardi 2022	Mahoney 2023 shows optimal spatial CV buffer ≈ autocorrelation range; without measuring the range for Open Road Risk, spatial CV design is uninformed	high	high	low-medium	Variogram on 2.17M links requires subsampling; use road-class-stratified subsample of ~10–50k links	new	later
LIT-TODO-024	Pilot police-force-level regional holdout as a spatial CV diagnostic	diagnostic / small pilot	Stage 2 / validation	Mahoney 2023; Gilardi 2022	~13–16 force areas provide pre-defined geographic groups of comparable size; holding each out in turn enforces real spatial separation and tests geographic generalisation	high	high	medium	Force areas vary substantially in size and collision density; compare force-holdout R²/pseudo-R² against current grouped-link metrics to quantify spatial optimism	new	later
LIT-TODO-025	Document current grouped-link CV as temporal grouped CV and record that it does not enforce spatial separation between neighbouring links	documentation note	Stage 2 / validation / documentation	Mahoney 2023	Paper shows V-fold without spatial separation is strongly optimistic; grouped-link split prevents same-link leakage but does not address neighbouring-link spatial autocorrelation	high	medium	low	None (documentation only)	new	now
LIT-TODO-026	Implement AccHR@k (accuracy hit rate at top-k% predicted risk links) as a Stage 2 validation metric	diagnostic / validation metric	Stage 2 / validation	Gao 2024	AccHR@k directly evaluates whether high-percentile risk predictions correspond to roads with actual collisions; more operationally meaningful than RMSE or pseudo-R² for a ranking output	high	medium	low	Choice of k matters at 2.17M links; consider AccHR@1, AccHR@5, and AccHR@20 rather than a single threshold; avoid treating a broad k as strong evidence of discrimination	new	now
LIT-TODO-027	Add AADT denominator sensitivity diagnostic for Stage 2 risk percentiles	diagnostic	Stage 1a / Stage 2 validation	National Highways 2022; Hauer 2001; Jayasinghe 2019	National Highways recommends traffic-denominator sensitivity, and Open Road Risk uses estimated AADT for most links	high	medium	low-medium	Sensitivity bands must be labelled illustrative, not formal uncertainty intervals	new	now
LIT-TODO-028	Document STATS19 CF/RSF transition and keep CF/RSF fields out of Stage 2 predictors	documentation note	feature engineering / documentation	DfT 2024; DfT 2025 CF/RSF guide; DfT 2025 casualty report	CF/RSF fields are subjective, partially recorded, post-event, and structurally broken around 2024	high	medium	low	Treating missing CF/RSF as “factor absent” or using converted RSFs as time-series evidence	partly present in literature pages	now
LIT-TODO-029	Add current UK STATS19 under-reporting and severity-adjustment caveat to outcome documentation	documentation note	Stage 2 / documentation	DfT 2025 casualty report; Savolainen 2011	Police-reported injury collisions under-report non-fatal casualties and adjusted severity series differ from raw records	high	medium	low	Mixing adjusted national figures with raw pipeline outcome without labelling	partly present in severity page	now
LIT-TODO-030	Add low-count aggregate rate-comparison note for any “significance” flagging	documentation note / diagnostic	validation / Stage 2	National Highways 2022	Asymptotic tests can mislead at low exposure; Monte-Carlo LRT is useful for pairwise aggregates but not at-scale ranking	medium	medium	low	Users may overinterpret link-level p-values as practical safety certainty	new	later
LIT-TODO-031	Compute skewness and zero proportion of Open Road Risk link-year crash distribution; compare against Shirazi et al. (2017) NB-L preference threshold (skewness > 1.92; zero proportion as context); then test NB-L as Stage 2 candidate on a stratified sample of ~100k link-years	diagnostic + candidate model extension	Stage 2 / pre-modelling analysis	Khodadadi et al. 2021; Pew 2020; Lord 2010	NB-L outperforms NB-2 by WAIC ~600 when skewness > 1.92 and zero proportion ≥ 37%; Open Road Risk at ~98–99% zeros is in a more extreme regime; frequentist NB-L implementation recommended before full Bayesian MCMC	high	high	low (skewness check) / medium (NB-L fit)	NB-L dispersion estimation can be unstable at very low mean values; test on filtered subsample (e.g. links with ≥1 crash over observation period) before applying at full 2.17M scale	new	now (skewness check) / later (NB-L fit)
LIT-TODO-032	Before implementing NB-2 or NB-L as Stage 2 alternatives, run a variance-mean diagnostic plot: bin Stage 2 Poisson fitted means into ~10 categories; compute average (Y − μ̂)² per bin; overlay linear (quasi-Poisson) and quadratic (NB) variance curves	diagnostic	Stage 2 / pre-model-family-choice	Ver Hoef & Boveng 2007; Khodadadi 2021	Directly determines whether Var ∝ μ or Var ∝ μ² better describes link-year crash variance; low effort using existing GLM output	medium	medium	low	With ~99% zero link-years most bins cluster near μ ≈ 0; compute on non-zero link-years or filter to links with ≥1 crash; document filtering choice	new	later
LIT-TODO-033	Add documentation note that AIC cannot compare quasi-Poisson against NB-2 or NB-L; if quasi-Poisson is tested as a Stage 2 alternative use variance-mean diagnostic; CURE plots; and cross-validation for the comparison	documentation note	Stage 2 / validation / documentation	Ver Hoef & Boveng 2007	Quasi-Poisson lacks a full distributional likelihood; AIC and BIC require full log-likelihood; QAIC is only valid within the quasi class	medium	low	low	None — documentation only	new	later

Current-code alignment assessment

Current strengths

The exposure-adjusted crash-frequency framing is supported by multiple extractions: Hauer 2001, Gilardi 2022, Aguero-Valverde 2008, Lord 2010, and Pan 2017.
Link-year modelling is consistent with the crash-frequency/SPF literature, while the Gilardi et al. 2022 extractions provide a direct UK OS-segment analogue.
Grouped or held-out validation is directionally aligned with the caution in Lord 2010 and with temporal holdout practice in the Chengye/Ranjitkar motorway papers.
The repository’s attention to spatial units is aligned with Gilardi 2022, Baddeley 2021, Ziakopoulos 2020, and Aguero-Valverde 2008.
Use of open data is a defensible distinction versus studies relying on complete motorway counters, commercial probe data, or inspection/video logs.
Keeping EB shrinkage, spatial models, and point-process methods as diagnostics or future work is consistent with computational and validation cautions in the extractions.

Current weaknesses / limitations to document

Exposure uncertainty is not fully propagated from Stage 1a into Stage 2; several papers treat AADT as observed, but that is not true for Open Road Risk.
The fixed VMT-style offset implies exposure elasticity of 1.0; several extractions support testing free AADT/length coefficients diagnostically.
OS Open Roads link choice may be sensitive to very short links, junction proximity, and MAUP-like segmentation effects.
Junction/intersection mechanisms are under-represented in a pure link-level model.
Severity is not separately modelled; the severity papers show this is a different target, not just a weighted version of frequency.
Spatial autocorrelation is not fully handled in production; this may affect coefficient interpretation and ranking confidence.
The grouped-by-road-link CV split prevents same-link leakage across years but does not enforce spatial separation between neighbouring links on the same corridor. Mahoney 2023 shows that this kind of temporal grouped split produces estimates close to V-fold (optimistically biased) rather than true out-of-sample performance. The degree of bias depends on the spatial autocorrelation range of collision risk, which has not been measured.
Hotspot/risk percentile sensitivity to spatial unit and residual clustering needs explicit documentation.
The current Stage 2 Poisson GLM likely underestimates zeros at link-year level. Pew 2020 shows that a Poisson-equivalent model (ZIP with π ≈ 0) calibrates poorly on zero-heavy count data; the improvement from NB/ZINB comes from the dispersion parameter, not zero-inflation per se. Confirmed on Open Road Risk data: the zero-calibration diagnostic (see Zero-Calibration Diagnostic) finds Poisson severely underestimates zeros (p = 0.000); NB closes the gap (p = 0.722, α = 2.057). NB GLM is the warranted next step; ZINB is not the priority next step.
Post-event variables from collision records must not leak into prospective Stage 2 features.
STATS19 contributory factors and road safety factors are subjective post-event fields with partial coverage and a 2024 structural break; they should remain provenance/EDA context unless a separate non-prospective analysis is explicitly labelled.
Current UK official statistics confirm that non-fatal casualties remain under-reported and that severity-adjusted national series are not automatically comparable to raw pipeline collision records.

Current areas where the repo is deliberately conservative

The current pipeline should not claim causality from road-feature coefficients.
It should not use post-event collision variables as predictors in the production frequency model.
Spatial, point-process, and CAR/INLA methods should remain diagnostics or pilots before any production use.
Severity-weighted, fatal-only, motorcycle, cyclist, or pedestrian risk targets should remain parallel/future models unless exposure and validation are made explicit.
Machine-learning rankings should be presented as decision-support indicators, not as calibrated external safety scores.

Claims Open Road Risk can safely make

Safer claims

The project estimates exposure-adjusted injury-collision risk.
The outputs are exploratory decision-support indicators.
The model can help identify links with unusually high observed collisions relative to estimated exposure and context.
Spatial-unit choice and hotspot outputs are known limitations.
Severity and frequency are distinct modelling targets.
EB shrinkage and spatial diagnostics can help assess ranking confidence, but they do not prove causal treatment effects.
Open Road Risk uses open transport/collision/network data, which brings reproducibility advantages and exposure-coverage limitations.

Claims not yet supported

The model proves causal effects of road features.
The production risk percentile is externally validated.
High-ranked links are definitely unsafe independent of exposure uncertainty.
Severity-weighted risk is validated.
The current model fully handles junction conflict mechanisms.
The model is directly comparable to proprietary inspection scores without further validation.
Spatial autocorrelation is fully captured in the production model.
XGBoost feature importance is a causal interpretation of crash mechanisms.
The grouped-by-road-link cross-validation provides a spatially robust estimate of model performance. It controls for same-link temporal leakage but does not enforce spatial separation between adjacent links; reported pseudo-R² values may be optimistically biased by an unknown amount relative to true geographic holdout performance.

Secondary review queue

Use literature/prompts/literature_extraction_additional_prompts.md for these checks:

Use the Cross-Audit Prompt when there is one extraction and the PDF/tables need checking.
Use the Lightweight Sanity Check Prompt for low-priority single extractions.
Use the Reconciliation Prompt when two or more independent extractions need to be combined into a final record.
Use the Human Review Checklist before treating a reconciled extraction as final.

Missing or weak review queue

These papers need an additional source check because extraction coverage is thin, the extraction flags OCR/table problems, or the paper could support repo actions.

priority	paper	extraction_file	review_gap	prompt_to_use	what_to_check	likely_impact_if_wrong	recommended_next_action
conditional	Pew et al. 2020	paper-extraction-pew-2020-zero-inflated-crash.md	One extraction; key π ≈ 0 finding drives NB-vs-ZINB TODO ordering	Targeted Cross-Audit Prompt	Confirm π posterior mean ≈ 0.00 (SD 0.01) for both ZIP and ZINB in original PDF Table A1; confirm ϕ = 17.04 for ZINB; check prior specification Beta(0.15,1) on π	If π is not near zero, the argument for NB GLM priority over ZINB weakens and TODO ordering changes	Check before citing π ≈ 0 or using it as justification for NB-first approach.
conditional	Gao et al. 2024	paper-extraction-gao-2024-stzitd-gnn.md	One extraction; Table 4 performance values may contain transcription errors; train/val/test split chronology not stated	Targeted Cross-Audit Prompt	Verify Table 4 MAE/RMSE/AccHR@20 values; confirm whether 8:2:2 split is chronological or random; check GitHub repo accessibility	Could misstate AccHR@20 values or overstate validation strength	Check Table 4 and split description before writing AccHR@k diagnostic or citing improvement percentages.

Completed reconciliation records

These papers now have final combined records. Use the combined record for future citation and TODO work, while preserving the earlier extraction files for provenance.

paper	combined_record	source_extraction_files	remaining_caution	recommended_status
Huda & Al-Kaisy 2024	paper-extraction-huda-2024-COMBINED.md	paper-extraction-huda-alkaisy-2024-lvr-network-screening.md; paper-extraction-huda-2024-network-screening-low-volume-roads.md	Curvature CART sharp-group value is internally inconsistent; grade should not be cited as a final-model predictor without caution	Use combined record; no further extraction needed unless quoting disputed threshold values.
Jayasinghe et al. 2019	paper-extraction-jayasinghe-2019-COMBINED.md	paper-extraction-jayasinghe-2019-centrality-aadt.md; paper-extraction-jayasinghe-2019-traffic-modeling-centrality.md	Final selected regression type is implied but not fully documented across OLS/robust/Poisson alternatives	Use combined record; cite as Stage 1a exposure-modelling evidence, not collision-risk evidence.
Poch & Mannering 1996	paper-extraction-poch-mannering-1996-COMBINED.md	paper-extraction-poch-1996-intersection-negative-binomial.md; paper-extraction-poch-mannering-1996-nb-intersection.md	Accident-type table values should still be checked before formal publication because OCR is imperfect	Use combined record for junction/approach mechanism claims; table-value citation remains conditional.
Roll et al. 2026	paper-extraction-roll-2026-oregon-COMBINED.md	paper-extraction-roll-2026-oregon-pedestrian-spf.md; paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md	Appendices should be checked if exact SPF coefficients, exposure-only comparisons, or crash-assignment rules are needed	Use combined record for exposure-model and future junction/pedestrian-layer evidence.
Quddus, Wang & Ison	paper-extraction-quddus-wang-ison-COMBINED.md	paper-extraction-quddus-2010-m25-severity-ordered-response.md; paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md	Published version year/bibliographic details and exact Tables 2-3 should be checked before formal numeric reporting	Use combined record for severity target, traffic-lag design, and leakage-guardrail claims.
Wang, Quddus & Ison 2011	paper-extraction-wang-2011-two-stage-severity-ranking.md	paper-extraction-wang-2011-two-stage-severity-ranking.md	In-sample MAD only; no held-out validation. Complete HA traffic counts assumed for all segments — not replicable for Open Road Risk’s minor road network. Log(length) ≈ 1 finding supports exposure offset but is from motorway/major roads only.	Use for severity-disaggregation methodology reference and exposure-offset documentation; do not use MAD values as external benchmarks.
Ziakopoulos & Yannis 2020	paper-extraction-ziakopoulos-yannis-2020-COMBINED.md	paper-extraction-ziakopoulos-2020-spatial-approaches-road-safety.md; paper-extraction-ziakopoulos-yannis-2020-spatial-review.md	Primary cited papers still need checking before using exact study-level model specifications, validation methods, or numerical claims	Use combined record for high-level spatial methods, MAUP, hotspot sensitivity, and spatial-diagnostic claims.

Active reconciliation / combination check queue

These papers still have multiple extraction passes but no final combined record. Do not re-extract them from scratch; use the Reconciliation Prompt in literature_extraction_additional_prompts.md after any needed cross-audit notes exist.

priority	paper	extraction_files	why_reconcile	prompt_to_use	reconciliation_focus	expected_output
conditional	Gilardi et al. 2022	paper-extraction-gilardi-2022-leeds-network-lattice-bayesian.md; paper-extraction-gilardi-2022-multivariate-hierarchical-crashes.md; paper-extraction-gilardi-2022-network-lattice-crashes.md	Three extraction records already exist; only targeted citation checks remain for table/sign ambiguity and MAUP/contraction details	Reconciliation Prompt only if creating a final canonical record; otherwise Human Review Checklist plus targeted PDF check	Table 2 coefficient signs; Primary Road interpretation; balanced accuracy wording; dodgr/network-contraction details	Do not re-extract; manually inspect disputed PDF tables/text before citing coefficient directions or balanced-accuracy values.
conditional	Pan et al. 2017	paper-extraction-pan-2017-deep-belief-network-global-spf.md.md; paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md	Two extraction records now exist; use reconciliation only if writing the neural/global-SPF comparison page	Reconciliation Prompt	DBN training details; crash scope; NB benchmark coefficients; Washington split; normalization and minimum-length handling	Reconciled benchmark note; do not treat DBN as a production recommendation without stronger validation.
low	Brodersen et al. 2010	paper-extraction-brodersen-2010-balanced-accuracy.md; paper-extraction-brodersen-2010-balanced-accuracy-posterior.md	Two extraction records already exist; equations only needed for implementation	Reconciliation Prompt only if implementing posterior intervals	Posterior balanced accuracy equations, Equation 7 wording, and examples	Reconciled implementation note if adding code for posterior intervals.

Candidate Quarto literature pages

proposed_qmd_file	purpose	papers_to_use	key_claims	figures/tables_needed	readiness
quarto/literature/crash-frequency-models.qmd	Explain Poisson/NB/SPF count-model basis and limitations	Lord 2010; Hauer 2001; Aguero-Valverde 2008; Chengye/Ranjitkar 2013; Pan 2017; Poch 1996; Al-Omari 2021; Pew 2020	Count models need exposure, dispersion, validation, and cautious interpretation; overdispersion is the immediate model-family issue before zero-inflation; intersection evidence should not be transferred directly to link risk	Model-family comparison table; Open Road Risk alignment table; zero-calibration diagnostic summary; NB-over-Poisson evidence note	exists — mostly current; update references to use the combined Poch record and verify Pew π≈0 before quoting exact values
quarto/literature/exposure-and-traffic-volume.qmd	Document AADT/AADF/WebTRIS exposure handling	Gilardi 2022; Hauer 2001; Jayasinghe 2019 combined; Roll 2026 combined; Aguero-Valverde 2008; Wang 2009; Pew 2020; Gao 2024; Mensah & Hauer 1998; Qin 2006; Dutta 2020; Sung 2024; National Highways 2022	Exposure is central but elasticity, estimated-AADT uncertainty, no-exposure contrast cases, and temporal-flow aggregation need clear separation	Exposure-treatment matrix; Stage 1a validation summary; no-offset contrast table; AADT/AADPT data-fusion note; temporal exposure caveats	exists — current after 2026-05-12 update; use LIT-049 to LIT-052 and LIT-055 for temporal/exposure additions
quarto/literature/spatial-methods-and-network-risk.qmd	Review OS-link lattice, CAR, MAUP, point-process diagnostics	Gilardi 2022; Aguero-Valverde 2008; Ziakopoulos 2020 combined; Baddeley 2021; Cronie 2019; Eckardt 2024; Mahoney 2023	Spatial methods support diagnostics, not immediate production replacement; spatial CV evidence strengthens validation caveats	Spatial-unit comparison; diagnostic queue; CV method comparison table from Mahoney	exists — linked in site nav; use combined Ziakopoulos record and keep Gilardi table/sign details conditional
quarto/literature/junctions-and-conflict-structure.qmd	Separate junction/approach mechanisms from link risk	Poch 1996 combined; Roll 2026 combined; Al-Omari 2021; Wang 2015; Ziakopoulos 2020 combined	Junction risk needs different units, data, and exposure structures from the current link-year model	Junction mechanism table; available/open-data proxy table; link-vs-intersection transferability table	exists — linked in site nav; use combined Poch/Roll/Ziakopoulos records, with exact Poch table values and Roll appendices conditional for formal citation
quarto/literature/severity-modelling.qmd	Separate severity from frequency and define future severity path	Boulieri 2016; Gilardi 2022; Michalaki 2015; Quddus/Wang/Ison combined; Wang/Quddus/Ison 2011; Ma 2019; Gao 2024; Savolainen 2011; DfT 2025 casualty report; DfT 2025 CF/RSF guide; National Highways 2022	Severity is conditional/different target and can conflict with frequency; severity-weighted composites should not be treated as validated Stage 2 risk; CF/RSF fields are post-event and structurally unstable across 2024	Severity target taxonomy; leakage warning table; composite-vs-separate response variable note; STATS19 under-reporting and CF/RSF provenance notes	exists — current after 2026-05-12 update; use LIT-059 for Wang/Quddus/Ison 2011 severity-disaggregated frequency citation
quarto/literature/validation-and-metrics.qmd	Document heldout, balanced accuracy, CURE, pseudo-R2 limitations	Brodersen 2010; Gilardi 2022; Chengye 2013; Roll 2026 combined; Lord 2010; Huda 2024 combined; Mahoney 2023; Pew 2020; Pan 2017; Gao 2024; Quddus 2007; Savolainen 2011; Roshandel circa 2015; Dutta 2020; Sung 2024; National Highways 2022	Metrics test different things; avoid in-sample/holdout confusion; spatial CV, temporal holdout, zero-calibration, AccHR@k, CURE plots, and AADT denominator sensitivity are candidate validation additions	Metric taxonomy; current repo validation map; CV method performance table from Mahoney; zero-check diagnostic table from Pew; AccHR@k definition from Gao; low-count and denominator sensitivity notes	exists — current after 2026-05-12 update; keep Pew/Gao exact-value checks conditional
quarto/literature/transferability-and-open-data-limits.qmd	Explain what transfers to open UK data and what does not	All papers, with combined records for Huda, Jayasinghe, Poch, Roll, Quddus/Wang/Ison, and Ziakopoulos/Yannis; Gao 2024 and Balawi & Tenekeci 2024 as negative-transfer examples; DfT 2024/2025 CF/RSF and casualty-statistics records for open-data limits	Some evidence is blocked by missing lane/turning/exposure data or different unit/target; apparent UK relevance still needs data-stack checks; CF/RSF fields have a 2024 structural break and RSFs are not open at record level	Transferability table; data-availability matrix; negative-transfer rows; combined-record provenance note; CF/RSF provenance note	exists — current after 2026-05-12 update; keep DfT records as documentation context, not model evidence

Appendices

Register taxonomy

current_repo_relevance: how directly the extraction informs the current Open Road Risk pipeline, code, model, validation, or documentation.
- high: directly relevant to current Stage 1a, Stage 1b, Stage 2, validation, or docs.
- medium: relevant to a subset, diagnostic, or caution.
- low: indirect or future-only.
future_research_relevance: usefulness for extensions beyond the current implementation.
- high: directly informs plausible future Open Road Risk research.
- medium: useful if a specific future branch exists.
- low: peripheral.
literature_review_relevance: usefulness for future narrative Quarto literature pages.
- high: should be cited or tabulated.
- medium: include in specialised page or caveat table.
- low: likely appendix/background only.
code_actionability_now: whether the extraction supports a near-term code/doc action.
- high: a clear documentation, diagnostic, or baseline action is supported.
- medium: action is plausible but should be scoped.
- low: no near-term code action.
supports_production_change:
- no: no production change supported.
- diagnostic-only: supports checks, reporting, documentation, or sensitivity analysis.
- pilot-first: supports a limited pilot before any production consideration.
- baseline-comparison-first: supports comparing against current implementation before adopting.
- possible-later: may support a future production change after more evidence.
secondary_review_needed:
- no: extraction is sufficient for high-level register use.
- yes: manual PDF/table review is needed before use in TODOs or literature prose.
- conditional: adequate for cautious register use, but check before quoting numbers, equations, or coefficient signs.
extraction_quality_initial_judgement:
- high: extraction reports high confidence or appears complete for register-level use.
- medium: useful but has missing tables, indirect relevance, or stated uncertainty.
- low: do not use without review.
- unknown: extraction does not state enough to judge.

Known extraction files not yet processed

file	reason not included
literature/prompts/road_safety_literature_extraction_prompt.md	Prompt template, not a paper extraction.
literature/prompts/OLD_road_safety_literature_extraction_prompt.md	Old prompt template, not a paper extraction.
literature/prompts/literature_extraction_additional_prompts.md	Companion prompt file, not a paper extraction.
literature/prompts/README_literature_extraction.md	Workflow guide, not a paper extraction.
literature/prompts/grep_extraction.sh	Utility script, not a paper extraction.
literature/prompts/grep_extraction_output.txt	Generated grep output/provenance helper, not an extraction source.

No literature/papers_raw/ extraction Markdown was found during this pass. No Quarto or docs files were treated as paper extractions; quarto/future-work.qmd, todo/TODO.md, docs/internal/sites_todo.md, and quarto/background/metrics-and-methodology.qmd were used only as roadmap/methodology context.

Update (2026-05-10): Four extraction files previously not in this register have now been added as LIT-032 through LIT-035: paper-extraction-pew-2020-zero-inflated-crash.md, paper-extraction-mahoney-2023-spatial-cv.md, paper-extraction-gao-2024-stzitd-gnn.md, and paper-extraction-balawi-tenekeci-2024-arima-sarimax-london-aroads.md. The file paper-extraction-chengye-ranjitkar-2013-motorway-nb-regression.md was confirmed as the source file for the existing LIT-009 row and required no new entry.

Update (2026-05-10): Six additional review-pass extraction files have been added as LIT-036 through LIT-041: paper-extraction-huda-2024-network-screening-low-volume-roads.md, paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md, paper-extraction-poch-mannering-1996-nb-intersection.md, paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md, paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md, and paper-extraction-ziakopoulos-yannis-2020-spatial-review.md. paper-extraction-mcfadden-not-stated-conditional-logit.md was removed from this register because it is not a road-safety paper and has no material current relevance to Open Road Risk.

Update (2026-05-10): Four final combined/reconciled records have been added as LIT-042 through LIT-045: paper-extraction-huda-2024-COMBINED.md, paper-extraction-jayasinghe-2019-COMBINED.md, paper-extraction-poch-mannering-1996-COMBINED.md, and paper-extraction-roll-2026-oregon-COMBINED.md. These are now the preferred records for future citation/TODO work for those papers; the earlier extraction files remain in the inventory for provenance.

Update (2026-05-10): Two further combined/reconciled records have been added as LIT-046 and LIT-047: paper-extraction-quddus-wang-ison-COMBINED.md and paper-extraction-ziakopoulos-yannis-2020-COMBINED.md. These move Quddus/Wang/Ison and Ziakopoulos/Yannis out of the active reconciliation queue. All seven candidate Quarto literature pages now exist under quarto/literature/ and have been added to the website Literature menu.

Update (2026-05-12): Eleven additional extraction files have been added as LIT-048 through LIT-058: paper-extraction-quddus-2007-inar-time-series-count.md, paper-extraction-mensah-hauer-1998-two-problems-averaging.md, paper-extraction-qin-et-al-2006-bayesian-hourly-exposure.md, paper-extraction-dutta-2020-freeway-crash-prediction-disaggregate-flow.md, paper-extraction-sung-et-al-2024-modified-temporal-spf.md, paper-extraction-savolainen-et-al-2011-severity-modelling-review.md, paper-extraction-roshandel-2015-realtime-traffic-freeway-crash.md, paper-extraction-national-highways-2022-comparing-collision-casualty-rates.md, paper-extraction-dft-2024-rsf-initial-analysis.md, paper-extraction-dft-2025-guide-cf-rsf-transition.md, and paper-extraction-dft-2025-reported-road-casualties-gb-2024.md. The associated Quarto literature references have been updated from LIT-PENDING to these register IDs where an extraction exists.

Update (2026-05-12): paper-extraction-wang-2011-two-stage-severity-ranking.md has been added as LIT-059 and linked from quarto/literature/severity-modelling.qmd.

Update (2026-05-24): Three new extraction files have been added as LIT-060 through LIT-062: paper-extraction-khodadadi-2021-NB-parameterisations-NFAS-SPF.md (LIT-060), paper-extraction-asumadu-2015-poisson-NB-Ghana-road-accidents.md (LIT-061), and paper-extraction-verhoef-boveng-2007-quasipoisson-vs-NB-overdispersion.md (LIT-062). LIT-060 (Khodadadi 2021) is the highest-priority new addition: it provides direct evidence for NB-L model superiority on zero-heavy low-sample-mean crash data and confirms sub-linear length elasticity across 30 model variants, strengthening the evidence base for LIT-TODO-002, LIT-TODO-016, and LIT-TODO-022. Three new TODOs have been added: LIT-TODO-031 (NB-L candidate model on sampled data), LIT-TODO-032 (variance-mean diagnostic plot), LIT-TODO-033 (document AIC limitation for QP vs NB comparison).