Literature Evidence Register
This is a structured, maintainer-facing evidence register for tracing extracted literature evidence into documentation, model checks, and TODOs. It is not the final narrative literature review.
Purpose and scope
This is a maintainable evidence register for Open Road Risk. It is not the final narrative literature review.
- It tracks extracted papers and source files.
- It records methodological relevance to Open Road Risk.
- It separates current repo relevance from future research relevance.
- It records provenance and extraction quality.
- It supports future Quarto literature pages, repo TODOs, and model evaluation.
- It should be updated append-only when new paper extractions are added.
The main source of truth is the existing extraction Markdown in literature/papers_summary/. This register does not re-read source PDFs and does not infer beyond the extraction files. Where extractions are duplicated for the same paper, each extraction file is kept as its own row so provenance across AI tools/models is preserved.
How to update this register
- Add one row to the inventory table for each new extraction file.
- Add or update thematic rows only where the new paper contributes evidence.
- Add repo TODOs only when supported by the extraction.
- Add secondary review flags where needed.
- Do not rewrite existing judgements unless the new paper changes the evidence base.
- Preserve previous source filenames and extraction filenames.
- If importance changes because the repo changes, update
current_repo_relevancebut preservefuture_research_relevance. - Prefer append-only edits. If a judgement changes, add a note explaining why rather than silently replacing earlier context.
- Keep current implementation actions separate from future research ideas.
Extraction inventory
| register_id | extraction_file | source_pdf_filename | paper_title | authors | year | paper_type | geography | road_setting | main_method_or_model | outcome_or_target | exposure_handling | spatial_unit | temporal_unit | validation_type | key_transferable_idea | current_repo_relevance | future_research_relevance | literature_review_relevance | code_actionability_now | supports_production_change | extraction_ai_tool | model_name_if_known | extraction_quality_initial_judgement | secondary_review_needed | secondary_review_reason | notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LIT-001 | paper-extraction-aguero-valverde-2008-crash-frequency-spatial-models.md | Paper08-0088RG.pdf | Analysis of Road Crash Frequency with Spatial Models | Jonathan Aguero-Valverde; Paul P. Jovanis | 2008 | crash-frequency SPF; spatial model comparison | US; Pennsylvania | mixed road classes | Full Bayes spatial CAR vs non-spatial NB | total crashes per segment | VMT from AADT and length; observed/assumed | PennDOT variable-length segments | 5-year aggregate | in-sample model comparison | Spatial correlation can change crash-frequency parameter estimates; VMT-style exposure supports current framing | high | high | high | medium | diagnostic-only | Gemini | Gemini | high | conditional | Manual check only if quoting coefficient/table values | Duplicate source paper with LIT-002; keep both for provenance. |
| LIT-002 | paper-extraction-aguero-valverde-2008-spatial-car-crash-frequency.md | Paper08-0088RG.pdf | Analysis of Road Crash Frequency with Spatial Models | Jonathan Aguero-Valverde; Paul P. Jovanis | 2008 | crash-frequency SPF; spatial autocorrelation assessment | US; Pennsylvania | rural two-lane roads | Full Bayes Poisson lognormal with CAR random effects | annual crash count per segment | AADT as free log covariate; length as fixed offset | rural road segment; intersections and ramps excluded | annual segment-year | in-sample DIC and spatial diagnostics | Test AADT elasticity and spatial residual autocorrelation rather than assume offset is always correct | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Review Table 2 before citing exact elasticity values | Strong action source for Moran’s I, residual maps, and free-AADT elasticity diagnostics. |
| LIT-003 | paper-extraction-al-omari-2021-florida-context-classification-spf.md | Crash_Analysis_And_Development_Of_Safety_Performance_Functions_Fo.pdf | Crash Analysis and Development of Safety Performance Functions for Florida Roads in the Framework of the Context Classification System | Ma’en Mohammad Ali Al-Omari | 2021 | thesis; SPF; network screening | US; Florida | rural to urban context classes; road segments | Negative binomial SPFs; EB network screening | annual crash frequency by segment; KABCO/KABC/PDO variants | observed FDOT AADT; AADT plus length offset and DVMT alternatives | merged homogeneous road segments | annual average over 5 years | mainly in-sample; no holdout noted | Context-class SPFs and urban sub-linear AADT coefficients motivate stratified diagnostics | medium | high | high | medium | baseline-comparison-first | Claude | Claude Sonnet 4.6 | medium | conditional | Thesis, no peer-review/holdout; check counterintuitive PSP coefficient if citing | Useful for junction density/access density and road-class stratification, not direct production change. |
| LIT-004 | paper-extraction-baddeley-2021-analysing-point-patterns-networks.md | 91405.pdf | Analysing point patterns on networks - a review | Adrian Baddeley; Gopalan Nair; Suman Rakshit; Greg McSwiggan; Tilman M. Davies | 2021 | methodological review; point processes | mixed/theoretical | linear networks | Network point processes; network KDE; Cox/Poisson processes | spatial point intensity on a network | traffic volume in example; point-process intensity not traffic exposure | exact point coordinates on continuous linear network | mostly static; spatio-temporal noted | methodological review; no predictive validation | Segment aggregation can hide point-level clustering; avoid ordinary planar KDE for network crashes | medium | high | high | low | diagnostic-only | Gemini | Gemini | high | conditional | Review methods before any spatstat implementation | Critiques link-year aggregation but does not invalidate current pipeline. |
| LIT-005 | paper-extraction-boulieri-2016-space-time-bayesian-severity.md | Boulieri_et_al-2016-Journal_of_the_Royal_Statistical_Society__Series_A_Statistics_in_Society.pdf | A space-time multivariate Bayesian model to analyse road traffic accidents by severity | Areti Boulieri; Silvia Liverani; Kees de Hoogh; Marta Blangiardo | 2016 | Bayesian severity/spatial model | England | mixed road types aggregated to wards | Bayesian hierarchical Poisson lognormal with CAR/MCAR/RW1 effects | ward-year accident counts by slight vs severe/fatal | ward traffic volume from AADF times road length; partly imputed; major-road coverage | electoral ward | annual | in-sample Bayesian model comparison and posterior checks | Severity levels can have distinct spatial structure; offset structure aligns at coarser grain | medium | high | high | medium | pilot-first | Claude | Claude Sonnet 4.6 | high | conditional | Ward-level and MCMC scale limit direct transfer | Good documentation support for severity caveat and year-specific AADT need. |
| LIT-006 | paper-extraction-brodersen-2010-balanced-accuracy-posterior.md | brodersen10post-balacc.pdf | The balanced accuracy and its posterior distribution | Kay H. Brodersen; Cheng Soon Ong; Klaas E. Stephan; Joachim M. Buhmann | 2010 | validation metric; classification methodology | not road-specific | not road-specific | Bayesian posterior distribution of balanced accuracy | binary class-label correctness | not applicable | abstract data point | not stated | cross-validation metric framework | Use balanced accuracy and posterior uncertainty for imbalanced zero/non-zero diagnostics | medium | medium | medium | medium | diagnostic-only | Gemini | Gemini 1.5 Pro | high | conditional | Manual check equations if implementing posterior intervals | Not road-safety evidence; validation reference only. |
| LIT-007 | paper-extraction-brodersen-2010-balanced-accuracy.md | brodersen10post-balacc.pdf | The balanced accuracy and its posterior distribution | Kay H. Brodersen; Cheng Soon Ong; Klaas E. Stephan; Joachim M. Buhmann | 2010 | validation metric; classification methodology | not stated | not road-specific | posterior balanced accuracy estimators | binary classification correctness | not applicable | not stated | not stated | conceptual cross-validation examples | Warn against plain accuracy for rare collision/no-collision diagnostics | medium | medium | medium | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Manual check Equation 7 and MATLAB routines if implementing | Duplicate source paper with LIT-006; use to compare extraction consistency. |
| LIT-008 | paper-extraction-chengye-2013-modelling-motorway-accidents-nb.md | Modelling Motorway Accidents using Negative Binomial Regression.pdf | Modelling Motorway Accidents using Negative Binomial Regression | Pan Chengye; Prakash Ranjitkar | 2013 | motorway SPF; NB regression | New Zealand; Auckland | motorway; urban/rural | Negative binomial regression; GEE considered | annual accident frequency per segment | AADT per lane and length as free log covariates | homogeneous motorway segments; ramp-defined | yearly | in-sample and temporally held-out prediction metrics | Temporal holdout and ramp context diagnostics are useful; standard NB struggles on short/zero-heavy links | medium | high | high | medium | diagnostic-only | Gemini | Gemini | high | conditional | Check Equation 10 before any exposure-specification comparison | Same source family as LIT-009 and LIT-024. |
| LIT-009 | paper-extraction-chengye-ranjitkar-2013-motorway-nb-regression.md | Modelling_Motorway_Accidents_using_Negative_Binomial_Regression.pdf | Modelling Motorway Accidents using Negative Binomial Regression | Pan Chengye; Prakash Ranjitkar | 2013 | motorway SPF; feature importance | New Zealand; Auckland | motorway only | Negative binomial accident prediction model | annual accident count per motorway segment | observed AADT per lane and length as free log covariates; no formal offset | homogeneous motorway mainline segment; ramp crashes excluded | annual segment-year | 2009-2010 temporal holdout after 2004-2008 training | Add temporal holdout, ramp/slip-road diagnostics, per-family overdispersion checks | high | high | high | high | baseline-comparison-first | Claude | Claude Sonnet 4.6 | high | conditional | Review marginal variables due to weak 80% significance threshold | Strong TODO source for validation and motorway facility-family diagnostics. |
| LIT-010 | paper-extraction-cronie-2019-inhomogeneous-linear-network.md | Inhomogeneous higher-order.pdf | Inhomogeneous higher-order summary statistics for linear network point processes | Ottmar Cronie; Mehdi Moradi; Jorge Mateu | 2019 | spatial point-process diagnostics | US example; Houston | road network example | inhomogeneous network F/G/J functions; simulations | accident point locations on linear network | no traffic exposure; spatial intensity reweighting only | point events on linear network | static one-month example | simulation/method diagnostics; no predictive validation | Distinguish exposure-normalised risk from point-pattern clustering diagnostics | low | medium | medium | low | diagnostic-only | ChatGPT | GPT-5.5 Thinking | medium | yes | Equation formatting imperfect; check before implementation | Useful for a small diagnostic pilot only. |
| LIT-011 | paper-extraction-eckardt-2024-marked-point-process-rejoinder.md | Rejoinder on ’Marked spatial point processes_ current state and extensions.pdf | Rejoinder on ‘Marked spatial point processes: current state and extensions to point processes on linear networks’ | Matthias Eckardt; Mehdi Moradi | 2024 | methodological discussion; marked point processes | not stated | not road-specific | marked point process summaries; K/J functions; mark correlation | point events with marks | traffic exposure not applicable; intensity is not exposure | point events on planar or linear networks | not stated | methodological discussion | Marked point-process diagnostics may explore severity/type clustering but not production ranking | low | medium | medium | low | no | ChatGPT | GPT-5.5 Thinking | high | conditional | Check formulas directly if citing | Keep as exploratory diagnostics reference. |
| LIT-012 | paper-extraction-gilardi-2022-leeds-network-lattice-bayesian.md | jrsssa_185_3_1150.pdf | Multivariate hierarchical analysis of car crashes data considering a spatial network lattice | Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace | 2022 | UK network-lattice Bayesian SPF | UK; Leeds | urban/metropolitan major roads | Bayesian hierarchical Poisson INLA with ICAR/PCAR and multivariate severity | OS segment counts by severe and slight crash | length times Census-routed commuter flow as Poisson offset; estimated exposure | OS road segment | 8-year aggregate cross-section | in-sample posterior predictive checks; no holdout | Direct UK support for OS segment lattice and log-offset form; balanced accuracy for sparse severity | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | yes | Check wide Table 2 signs and Primary Road interpretation before citation | Primary UK anchor for Stage 2 documentation, but not external validation. |
| LIT-013 | paper-extraction-gilardi-2022-multivariate-hierarchical-crashes.md | jrsssa_185_3_1150.pdf | Multivariate hierarchical analysis of car crashes data considering a spatial network lattice | Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace | 2022 | Bayesian spatial/severity model | UK; Leeds | urban/metropolitan | INLA Bayesian hierarchical Poisson | car crashes per street segment | length times estimated traffic flow | OS Vector OpenMap Local road segment | 8-year aggregate | in-sample posterior predictive diagnostics | OS link lattice is a credible unit; balanced accuracy useful for zero/non-zero checks | high | high | high | medium | diagnostic-only | Gemini | Gemini 3.1 Pro | high | conditional | Check dodgr contraction details if replicating MAUP test | Duplicate source paper with LIT-012/LIT-014. |
| LIT-014 | paper-extraction-gilardi-2022-network-lattice-crashes.md | jrsssa_185_3_1150.pdf | Multivariate hierarchical analysis of car crashes data considering a spatial network lattice | Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace | 2022 | Bayesian spatial/severity SPF | UK; Leeds | major roads including motorways, primary roads, A roads | bivariate Bayesian hierarchical Poisson models with ICAR/PCAR | slight and severe road traffic collision counts | length times estimated commuter flow as offset | OS road segment with adjacency by shared boundary | yearly available; collapsed to 8-year aggregate | in-sample posterior predictive checks | Spatial autocorrelation limitation and MAUP sensitivity tests are directly relevant | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Manual review noted in extraction for table/sign details | Richest extraction for repo actions from this paper. |
| LIT-015 | paper-extraction-hauer-2001-eb-spf-tutorial.md | SPF_Basic_Tutorial_2001_by_Ezra_Hauer.pdf | Estimating Safety by the Empirical Bayes Method: A Tutorial | Ezra Hauer; Douglas W. Harwood; Forrest M. Council; Michael S. Griffith | 2001 | EB/SPF tutorial | not one geography | segments and intersections | Empirical Bayes expected crash frequency using SPF plus observed counts | expected accidents for road entity | ADT/AADT in SPF; length and years multiply expected count | road segment or intersection entity | annual | worked tutorial; no holdout | EB shrinkage should use correct overdispersion and full year-specific procedure | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Check equations if implementing exact EB changes | Primary EB methodology reference. |
| LIT-016 | paper-extraction-huda-alkaisy-2024-lvr-network-screening.md | dot_78279_DS1.pdf | Network Screening on Low-Volume Roads Using Risk Factors | Kazi Tahsin Huda; Ahmed Al-Kaisy | 2024 | low-volume road network screening | US; Oregon | rural low-volume two-lane roads | OLS on log EB expected crashes; CART thresholds | EB expected crashes per 0.05-mile section | AADT covariate in one model; dropped in another; no offset due fixed length | fixed 0.05-mile sections; intersections excluded | annual aggregate | random split/high R2 on EB output; not raw-count validation | Low-volume links may need geometry-led diagnostics and careful AADT sensitivity framing | medium | high | high | medium | pilot-first | Claude | Claude Sonnet 4.6 | high | yes | Check grade variable ambiguity and avoid over-reading high R2 | Strong for curvature/grade diagnostics, not production thresholds. |
| LIT-017 | paper-extraction-jayasinghe-2019-centrality-aadt.md | 1-s2_0-S2215016119301128-main.pdf | A novel approach to model traffic on road segments of large-scale urban road networks | Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanam | 2019 | AADT estimation; traffic modelling | mixed developing-country cities | urban road networks | OLS/robust/Poisson regressions with dual-graph centrality | AADT/PCU per road segment | AADT is target; observed counts used for calibration | road segment in dual graph | annual AADT | random 80/20 validation; likely spatial leakage | Centrality features and learning curves can inform Stage 1a AADF sparsity diagnostics | high | high | medium | medium | baseline-comparison-first | Claude | Claude Sonnet 4.6 | high | yes | Check low-AADT RMSE and exact final regression type | Traffic-volume paper, not collision-risk evidence. |
| LIT-018 | paper-extraction-jayasinghe-2019-traffic-modeling-centrality.md | 1-s2.0-S2215016119301128-main.pdf | A novel approach to model traffic on road segments of large-scale urban road networks | Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanama | 2019 | AADT estimation; traffic modelling | urban/mixed | urban road networks | centrality-based OLS/RR/Poisson traffic volume model | AADT in PCU | exposure is modelled output | road segment dual graph | yearly average daily traffic | random validation; spatial leakage risk | Stage 1a should report spatial holdouts and sensitivity to count-point sparsity | high | high | medium | medium | baseline-comparison-first | Gemini | Gemini 3.1 Pro | high | conditional | Verify centrality radius/compute feasibility before implementation | Duplicate source paper with LIT-017. |
| LIT-019 | paper-extraction-lord-2010-crash-frequency-review.md | Lord-Mannering_Review.pdf | The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives | Dominique Lord; Fred Mannering | 2010 | methodological review; crash-frequency modelling | mixed | road segments/intersections across reviewed studies | review of Poisson, NB, zero-inflated, random effects, GAM, ML and Bayesian models | crash frequency over roadway units | traffic flow/length/VMT discussed across studies | road segment/intersection/other | mixed | review; no empirical validation | Use as risk checklist: overdispersion, zero-heavy counts, exposure functional form, omitted variables, spatial/temporal correlation | high | high | high | high | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Review large tables if building full model-family comparison | Best general modelling limitations reference. |
| LIT-020 | paper-extraction-ma-2019-xgboost-fatality.md | analyzing-the-leading-causes-of-traffic-fatalities-using-1jznp146gl.pdf | Analyzing the Leading Causes of Traffic Fatalities Using XGBoost and Grid-Based Analysis: A City Management Perspective | Jun Ma; Yuexiong Ding; Jack C. P. Cheng; Yi Tan; Vincent J. L. Gan; Jingcheng Zhang | 2019 | conditional severity/fatality classifier | US; Los Angeles County | mixed urban/peri-urban | XGBoost binary classifier plus grid GIS | fatal vs non-fatal crash given a crash | no traffic exposure; fatality rate not exposure-adjusted | crash record and 60x60 grid | crash-time fields included; no panel | train/test on balanced crash data; no exposure validation | Separate conditional severity/fatality from exposure-adjusted frequency; watch leakage from crash-record features | medium | medium | medium | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Check unusual XGBoost learning-rate details if replicating | Do not compare fatality rate to Stage 2 risk percentile. |
| LIT-022 | paper-extraction-michalaki-2015-motorway-accident-severity-chatgpt.md | 1-s2.0-S0022437515000833-main.pdf | Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model | Paraskevi Michalaki; Mohammed A. Quddus; David Pitfield; Andrew Huetson | 2015 | motorway severity modelling | England | motorway | ordered logit; multilevel ordered logit; generalized ordered logit | accident severity conditional on crash | no formal exposure; time category proxy | accident record | accident-level with broad time categories | in-sample; no held-out validation | Frequency and severity are separate; post-event variables must not leak into Stage 2 predictors | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Check STATS19 hard-shoulder/main-carriageway transfer | Duplicate source paper with LIT-023. |
| LIT-023 | paper-extraction-michalaki-2015-motorway-accident-severity.md | 1-s2.0-S0022437515000833-main.pdf | Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model | Paraskevi Michalaki; Mohammed A. Quddus; David Pitfield; Andrew Huetson | 2015 | motorway severity modelling | England | motorway hard shoulder/main carriageway | partially constrained generalized ordered logistic regression | accident severity | no explicit exposure; severity conditional on crash | location type/accident record | time-of-day/day/month categories | in-sample | HGV/off-peak/hard-shoulder diagnostics belong in severity work, not current frequency model | medium | high | high | medium | diagnostic-only | Gemini | Gemini 3.1 Pro | high | conditional | Check STATS20/STATS19 hard-shoulder encoding before diagnostics | Warns against using number of vehicles/casualties as prospective features. |
| LIT-024 | paper-extraction-pan-2013-motorway-negative-binomial.md | Modelling Motorway Accidents using Negative Binomial Regression.pdf | Modelling Motorway Accidents using Negative Binomial Regression | Pan Chengye; Prakash Ranjitkar | 2013 | motorway SPF; NB regression | New Zealand; Auckland | motorway rural/urban | Poisson/NB/ZINB/GEE tested; NB selected | annual accident frequency per segment-year | observed AADT per lane plus length as free log regressors | homogeneous motorway segment-year; ramp segmentation | yearly | temporal holdout plus in-sample metrics | Facility context, ramp proximity, temporal holdout and geometry sanity checks are useful | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Check table values and equations before formal citation | Duplicate source family with LIT-008/LIT-009. |
| LIT-025 | paper-extraction-pan-2017-deep-belief-network-global-spf.md.md | 1-s2_0-S2046043017300199-main.pdf | Development of a global road safety performance function using deep neural networks | Guangyuan Pan; Liping Fu; Lalita Thakali | 2017 | global SPF; neural model benchmark | Canada/US; multiple regions | mixed highway types | Deep Belief Network; NB benchmarks; Bayesian regularised ANN | annual crash frequency per homogeneous section-year | observed AADT and length as DBN features; NB uses log exposure variants | homogeneous road section | annual segment-year | train/test style performance; metrics mainly MAE/RMSE | DBN with MSE is not suitable for sparse count production; use NB/log-offset comparisons and minimum-length diagnostics | medium | medium | high | medium | baseline-comparison-first | Claude | Claude Sonnet 4.6 | medium | conditional | DBN technical details and crash scope need checking | Has useful negative evidence against neural MSE production changes. |
| LIT-026 | paper-extraction-poch-1996-intersection-negative-binomial.md | Negative_Binomial_Analysis_of_Intersection-Acciden.pdf | Negative Binomial Analysis of Intersection-Accident Frequencies | Mark Poch; Fred Mannering | 1996 | intersection SPF; NB regression | US | urban/suburban intersections | Negative binomial regression | annual accident frequency on intersection approach | turning and intersection traffic volumes as covariates; no formal offset | intersection approach | yearly | no external/held-out validation stated | Junction approach mechanisms are structurally different from link risk; use junction diagnostics/proxies | medium | high | high | medium | pilot-first | ChatGPT | GPT-5.5 Thinking | high | yes | OCR artefacts; check tables before formal literature table | Strong warning about junction under-representation. |
| LIT-027 | paper-extraction-quddus-2010-m25-severity-ordered-response.md | road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf | Road Traffic Congestion and Crash Severity: Econometric Analysis Using Ordered Response Models | Mohammed A. Quddus; Chao Wang; Stephen G. Ison | circa 2010 | motorway severity model; ordered response | UK; M25 | motorway | OLOGIT/HCM/GOLOGIT/PC-GOLOGIT | ordinal crash severity given crash | no exposure; 15-minute flow as severity predictor with 30-min lag | individual crash matched to motorway segment | crash-level 15-minute traffic lag | in-sample ordered response metrics | Use 30-minute pre-crash lag if future WebTRIS crash-level work; separate frequency vs severity | medium | high | high | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | yes | Check dense Table 2 and whether junction crashes excluded | Supports severity/frequency separation and cautious congestion claims. |
| LIT-028 | paper-extraction-roll-2026-oregon-pedestrian-spf.md | dot_89189_DS1.pdf | Developing a Pedestrian Safety Performance Function for Oregon | Josh Roll; Jason Anderson; Nathan McNeil | 2026 | pedestrian/intersection SPF; exposure estimation | US; Oregon | urban intersections | Poisson/NB SPFs; random forest exposure data fusion | pedestrian injury crashes per intersection-year | vehicle AADT and estimated pedestrian AADPT; both partly estimated | urban intersection | annual | 10-fold CV for exposure model; SPF tables partly unreadable | Exposure-only vs full-feature baseline comparisons and CURE plots could inform Stage 2 diagnostics | low | medium | medium | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | medium | yes | Report tables not fully machine-readable; check SPF forms and AADPT metrics | Scope is pedestrian intersections, not link-level all-injury risk. |
| LIT-029 | paper-extraction-wang-2009-m25-congestion-safety.md | Wang_et_al_AAP_Final_submitted1.pdf | Impact of Traffic Congestion on Road Safety: A Spatial Analysis of the M25 Motorway in England | Chao Wang; Mohammed A. Quddus; Stephen G. Ison | circa 2009 | motorway SPF; congestion and spatial model | UK; M25 | motorway | Poisson-lognormal, NB, CAR spatial variants | accident count per motorway segment | observed UKHA AADT and segment length as free log covariates; no offset | junction-to-junction motorway segment; junction crashes excluded | annual aggregate | in-sample Bayesian/model comparison | Motorway AADT elasticity and grade/congestion diagnostics; bearing can improve snapping QA | medium | high | high | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Publication year not in document; DIC differences small | Companion to Quddus severity paper for congestion null result. |
| LIT-030 | paper-extraction-wang-2015-investigating-safety-impacts-suburban-arterials.md | 1805.06381v3.pdf | Investigating Safety Impacts of Roadway Network Features of Suburban Arterials in Shanghai, China | Xuesong Wang; Jinghui Yuan; Grant G. Schultz; Wenjing Meng | 2015 | zonal spatial crash model | China; Shanghai | suburban arterials | Bayesian Poisson-lognormal CAR | total crash frequency on arterials within TAZ | trip productions/attractions and arterial length as exposure proxies; no AADT | Traffic Analysis Zone | yearly | in-sample R2; no true held-out validation | Junction/signal/access-density proxies may matter, but zonal unit is low transferability | low | medium | medium | low | diagnostic-only | Gemini | Gemini 3.1 Pro | high | conditional | Betweenness computed within TAZ, not global; in-sample R2 only | Use as junction/network-complexity prompt, not model benchmark. |
| LIT-031 | paper-extraction-ziakopoulos-2020-spatial-approaches-road-safety.md | A review of spatial approaches in road safety.pdf | A review of spatial approaches in road safety | Apostolos Ziakopoulos; George Yannis | not stated in visible metadata | spatial road-safety review | mixed | mixed | review of spatial/spatio-temporal methods | crash counts/rates/severity/hotspots across reviewed studies | mixed exposure definitions | mixed units: links, intersections, grids, zones, corridors | mixed | review; primary studies need checking for exact claims | Supports spatial validation, MAUP sensitivity, proximity/junction diagnostics and caution about production spatial models | high | high | high | high | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | yes | Year/DOI missing; check primary papers for exact numerical claims | Broad review; do not use alone to justify a production model swap. |
| LIT-032 | paper-extraction-pew-2020-zero-inflated-crash.md | Justification_for_considering_zero-inflated_models_in_crash_frequency_analysis.pdf | Justification for considering zero-inflated models in crash frequency analysis | Timo Pew; Richard L. Warr; Grant G. Schultz; Matthew Heaton | 2020 | zero-inflated model comparison; Bayesian hierarchical count modelling | US; Utah | signalised intersections statewide (urban and rural) | Bayesian hierarchical ZIP; ZINB; NB-Lindley; MCMC via JAGS | annual injury and fatal crash count per intersection | entering vehicles per day as standardised covariate — no formal offset | signalised intersection | annual (2014–2017 fitting; 2018 held out) | temporal holdout (2018); Bayesian chi-squared goodness-of-fit; posterior predictive zero check; WAIC | ZINB improvement over Poisson driven mainly by overdispersion parameter (π ≈ 0); NB GLM with offset is the priority diagnostic step, not full zero-inflation | high | high | high | high | baseline-comparison-first | Claude | Claude Sonnet 4.6 | high | conditional | Verify Table A1 π ≈ 0 finding in original PDF before citing; check prior sensitivity on Beta(0.15,1) | Critical nuance: π posterior mean ≈ 0 in both ZIP and ZINB — improvement over Poisson is from ϕ dispersion, not zero-inflation. Intersection unit not link; counts are much higher than Open Road Risk link-years. No exposure offset — does not challenge Open Road Risk offset design. |
| LIT-033 | paper-extraction-mahoney-2023-spatial-cv.md | ASSESSING_THE_PERFORMANCE_OF_SPATIAL_CROSS-VALIDATION.pdf | Assessing the Performance of Spatial Cross-Validation Approaches for Models of Spatially Structured Data | Michael J Mahoney; Lucas K Johnson; Julia Silge; Hannah Frick; Max Kuhn; Colin M Beier | 2023 | spatial CV methodology; simulation study | simulation (no specific geography) | not road-specific | random forest on simulated spatially structured continuous outcome; five CV method comparisons | simulated continuous outcome (not a crash count) | not applicable | regular 50×50 grid cells | not applicable | cross-landscape prediction as external reference; 100 simulated landscapes | V-fold CV is severely optimistic for spatially autocorrelated data; spatial clustering CV with exclusion buffer ≈ autocorrelation range is the most practical improvement; current grouped-link split does not enforce spatial separation | high | high | medium | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Specific buffer sizes (25–41% of grid length) are simulation-specific and do not transfer directly to road network; must estimate autocorrelation range from Stage 2 residuals first | Not a road safety paper. Simulation uses continuous Gaussian outcome; zero-heavy count generalisation assumed but not tested. BLO3 performs poorly despite large buffers — do not assume larger buffer always helps. Regular grid assumption does not match OS Open Roads geometry. |
| LIT-034 | paper-extraction-gao-2024-stzitd-gnn.md | Uncertainty-Aware_Probabilistic_Graph_Neural_Networks_for_Road-Level.pdf | Uncertainty-Aware Probabilistic Graph Neural Networks for Road-Level Traffic Crash Prediction | Xiaowei Gao; Xinke Jiang; Dingyi Zhuang; Huanfa Chen; Shenhao Wang; Stephen Law; James Haworth | 2024 | probabilistic GNN; zero-inflated Tweedie; road-level crash prediction | UK; London (Lambeth; Tower Hamlets; Westminster) | urban road segments | GRU temporal encoder + GAT spatial encoder + ZITD decoder (STZITD-GNN); baselines include STGCN; STZINB-GNN; STTD-GNN | daily severity-weighted crash risk score per road (y = sum of collision count × severity weight 1/2/3) | no exposure; no offset; no traffic volume data | urban road segment (OS-style link; ~4,700–5,700 nodes per borough) | daily; 2019 only; 8:2:2 within-year temporal split | within-year temporal holdout (no spatial holdout; no cross-year test) | AccHR@k metric is directly applicable to Open Road Risk risk percentile ranking; MPIW/PICP for future probabilistic outputs; Gaussian distributional assumption is clearly worst | medium | high | high | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Verify Table 4 values against original; check whether 8:2:2 split is chronological or random (not stated); GitHub repo may be private | No exposure offset — cannot distinguish high-risk from high-traffic roads; major methodological gap relative to Open Road Risk. Severity-weighted composite response variable not directly comparable to raw injury count. Daily urban scale vs annual national scale: zero-inflation mechanisms differ. GNN architecture not feasible at 2.17M links. Validation: same roads in train and test; single year; no spatial holdout — weaker than current Open Road Risk grouped split. |
| LIT-035 | paper-extraction-balawi-tenekeci-2024-arima-sarimax-london-aroads.md | Time_series_traffic_collision_analysis_of_London_hotspots__Patterns.pdf | Time series traffic collision analysis of London hotspots: Patterns, predictions and prevention strategies | Mohammad Balawi; Goktug Tenekeci | 2024 | ARIMA; SARIMAX; corridor-level time series | UK; London (A1; A3; A4; A6 corridors) | major A-road corridors (aggregate) | ARIMA(5,4,7); SARIMAX(4,1,2)×(4,1,2,8) on daily corridor-level aggregate time series | daily count of vehicles involved in accidents (not accident count — wrong response variable) | no exposure; no AADT; corridor-level aggregate only | four A-road corridors treated as a single aggregate time series | daily; 2016–2019; December 2019 holdout only | single-month temporal holdout (Christmas period); AIC/BIC in-sample | Post-event STATS19 attributes (severity; light condition; road surface) must not enter Stage 2 as features — this paper inadvertently illustrates why | low | low | low | low | no | Claude | Claude Sonnet 4.6 | high — high confidence in the identified problems | no | No secondary review recommended; do not use as evidence for pipeline decisions | CRITICAL: wrong response variable (vehicles involved, not accident count); SARIMAX predicts negative counts (model specification error); R-squared values in Table 3 implausibly high and methodology opaque; ARIMA d=4 order from misconfigured grid search (excluded d=0,1,2); log-likelihood sign inconsistency between ARIMA and SARIMAX tables; 80-20 split described but only 30-day Christmas holdout reported. Published in Heliyon (broad open-access). Do not cite as methodological support for any decision. Retained for completeness of literature search only. |
| LIT-036 | paper-extraction-huda-2024-network-screening-low-volume-roads.md | dot_78279_DS1.pdf | Network Screening on Low-Volume Roads Using Risk Factors | Kazi Tahsin Huda; Ahmed Al-Kaisy | 2024 | low-volume road network screening | US; Oregon | rural low-volume two-lane paved roads | HSM EB expected crashes; CART thresholds; OLS log-linear screening equations | EB expected crashes per 0.05-mile section; crash density for ranking | AADT in HSM SPF and one proposed model; no-volume alternative model; no exposure offset | fixed 0.05-mile roadway sections; intersections excluded | annual expected crashes from 2004-2013 crash data | random 80/20 split against EB expected-crash target; no spatial/temporal holdout | Confirms Huda/Al-Kaisy as diagnostic support for low-volume, curvature/grade, and volume/no-volume sensitivity checks; flags EB-target R2 caveat | medium | high | high | medium | pilot-first | ChatGPT | not stated | high | conditional | Check curvature CART inconsistency and grade treatment before using thresholds | Duplicate source paper with LIT-016. Stronger caveat that adjusted R2 predicts a smooth EB target, not raw future crashes. |
| LIT-037 | paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md | 1-s2.0-S2046043017300199-main.pdf | Development of a global road safety performance function using deep neural networks | Guangyuan Pan; Liping Fu; Lalita Thakali | 2017 | global SPF; DBN/ML benchmark | Canada and US | mixed highway segments | Deep Belief Network with NB benchmarks and pooled/local model comparisons | annual crash/collision frequency per homogeneous segment-year | AADT and length as DBN inputs or NB exposure-like covariates; no clearly fixed offset | homogeneous highway segments; coarse compared with OS Open Roads links | annual segment-year; temporal holdouts for Ontario/Colorado | temporally held-out MAE/RMSE for Ontario/Colorado; Washington split not fully stated; no spatial holdout | Supports temporal holdout, local-vs-global/facility-family comparisons, and short-segment sensitivity; not production DBN | medium | medium | high | medium | baseline-comparison-first | ChatGPT | GPT-5.5 Thinking | high | conditional | Check Washington years/split and DBN normalization if reproducing | Duplicate source paper with LIT-025; reinforces that DBN should be benchmark-only without ranking/spatial validation. |
| LIT-038 | paper-extraction-poch-mannering-1996-nb-intersection.md | Negative_Binomial_Analysis_of_Intersection-Acciden.pdf | Negative Binomial Analysis of Intersection-Accident Frequencies | Mark Poch; Fred Mannering | 1996 | intersection approach SPF; NB regression | US; Bellevue, Washington | urban/suburban intersections | Negative binomial regression by intersection approach and crash type | annual accident frequency per intersection approach | approach turning/opposing/intersection traffic volumes as covariates; no offset | intersection approach | annual; 1987-1993 | in-sample rho-squared and likelihood tests; no held-out validation | Stronger second extraction for overdispersion and junction-approach mechanisms; confirms in-sample-only limitations | medium | high | high | medium | pilot-first | Claude | Claude Sonnet 4.6 | high | conditional | Check Table 1 coefficients and likelihood-ratio test values before citing | Duplicate source paper with LIT-026; improves confidence despite old validation standards. |
| LIT-039 | paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md | road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf | Road Traffic Congestion and Crash Severity: An Econometric Analysis Using Ordered Response Models | Mohammed A. Quddus; Chao Wang; Stephen G. Ison | 2010 / manuscript year unclear | motorway severity model; ordered response | UK; M25 | motorway | OLOGIT/HCM/GOLOGIT/PC-GOLOGIT | ordered crash severity conditional on crash | 15-minute traffic flow and congestion matched with 30-minute lag; no exposure offset because target is severity conditional on crash | crash records assigned to 72 motorway segments | crash-level; 2003-2006; 15-minute traffic state lag | in-sample ordered-response fit and marginal effects; no held-out validation | Confirms severity/frequency separation, lagged traffic-state design, and conditional interpretation caveat | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Check published ASCE citation year and Tables 2-3 against final version | Duplicate source paper with LIT-027; clearer on in-sample metrics and conditional severity target. |
| LIT-040 | paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md | dot_89189_DS1.pdf | Developing a Pedestrian Safety Performance Function for Oregon | Josh Roll; Jason Anderson; Nathan McNeil | 2026 | pedestrian/intersection SPF; exposure estimation | US; Oregon | urban intersections | Poisson/NB SPFs; pedestrian-volume data fusion; random forest/XGBoost/NN exposure models | pedestrian crash frequency at intersections | vehicle AADT and estimated pedestrian AADPT as explanatory exposure variables; no explicit offset | urban intersection; contracted complex nodes | annual average exposure; crash outcome years not fully stated in SPF sections | exposure model 10-fold CV; SPF validation details/table extraction require care | Supports junction/intersection future work, exposure-only vs proxy comparisons, and vulnerable-user exposure caveats | low | medium | medium | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | yes | Long report; check SPF equations, crash-year window, and AADPT/AADT metrics before citing | Duplicate source paper with LIT-028; broader report extraction confirms report-table review still needed. |
| LIT-041 | paper-extraction-ziakopoulos-yannis-2020-spatial-review.md | A_review_of_spatial_approaches_in_road_safety.pdf | A review of spatial approaches in road safety | Apostolos Ziakopoulos; George Yannis | not explicitly stated; circa 2020 | spatial road-safety review | international review | mixed | review of spatial units, spatial models, MAUP, proximity, network KDE, VRU approaches | mixed crash counts/rates/severity/hotspots across reviewed studies | mixed: AADT, VMT/VDT, trips, road length, population; not offset-specific | mixed: links, intersections, grids, zones, regions, network lixels | mixed across reviewed studies | review-level synthesis; no single validation protocol | Second extraction reinforces spatial-unit, MAUP, junction-segment, and network-KDE cautions; exact primary-study values need source checks | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Check primary papers before using numerical claims from review tables | Duplicate source paper with LIT-031; improves confidence for high-level caution but not production model choice. |
| LIT-042 | paper-extraction-huda-2024-COMBINED.md | dot_78279_DS1.pdf | Network Screening on Low-Volume Roads Using Risk Factors | Kazi Tahsin Huda; Ahmed Al-Kaisy | 2024 | combined reconciliation record; low-volume road network screening | US; Oregon | rural low-volume two-lane paved roads | HSM EB expected crashes; CART thresholds; OLS log-linear screening equations | EB-smoothed expected crashes per 0.05-mile section; crash density for ranking | AADT in HSM SPF/EB target and one proposed model; deliberate no-volume comparator; no exposure offset | fixed 0.05-mile roadway sections; intersections excluded | annual expected crashes from 2004-2013 crash data | random 80/20 split against EB expected-crash target; no spatial/temporal holdout | Canonical record clarifies EB target, no-offset structure, volume/no-volume scope, and curvature/grade caveats for low-volume diagnostics | medium | high | high | high | pilot-first | ChatGPT | GPT-5.5 Thinking | high | conditional | Curvature CART sharp-group value is internally inconsistent; grade should not be cited as final-model predictor without caution | Combined record from original PDF plus LIT-016 and LIT-036; use this row for future Huda/Al-Kaisy citations. |
| LIT-043 | paper-extraction-jayasinghe-2019-COMBINED.md | 1-s2.0-S2215016119301128-main.pdf | A novel approach to model traffic on road segments of large-scale urban road networks | Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanama | 2019 | combined reconciliation record; AADT estimation / traffic-volume modelling | Sri Lanka; Cambodia; Vietnam; Pakistan; Tanzania | urban road networks | centrality-based traffic-volume model using betweenness, closeness, and path-distance weighting | AADT / PCU per road segment | AADT is the modelled target; observed counts used for calibration/validation; no collision exposure offset | road segment in dual-graph road network | cross-sectional annual AADT base year by city | random 80/20 validation plus calibration-sample learning curve; no spatial holdout | Canonical record supports Stage 1a centrality diagnostics, learning curves, AADT-band errors, and warnings about random spatial leakage | high | high | medium | high | baseline-comparison-first | ChatGPT | GPT-5.5 Thinking | high | conditional | Final selected regression type is implied but not fully documented across OLS/robust/Poisson alternatives | Combined record from original PDF plus LIT-017 and LIT-018; traffic-exposure paper, not Stage 2 collision-risk evidence. |
| LIT-044 | paper-extraction-poch-mannering-1996-COMBINED.md | Negative_Binomial_Analysis_of_Intersection-Acciden.pdf | Negative Binomial Analysis of Intersection-Accident Frequencies | Mark Poch; Fred Mannering | 1996 | combined reconciliation record; intersection approach SPF | US; Bellevue, Washington | urban/suburban intersections | Negative binomial regression for total and accident-type approach counts | annual accident frequency per intersection approach | approach and turning traffic volumes as covariates; no formal offset | intersection approach | annual observations from 1987-1993, excluding improvement year | in-sample likelihood/rho-squared diagnostics; no held-out validation | Canonical record confirms junction/approach mechanisms and NB-over-Poisson relevance while warning against link-level coefficient transfer | medium | high | high | medium | pilot-first | ChatGPT | GPT-5.5 Thinking | high | conditional | Exact accident-type table values should still be checked before formal publication because OCR is imperfect | Combined record from original PDF plus LIT-026 and LIT-038; use this for junction/intersection evidence. |
| LIT-045 | paper-extraction-roll-2026-oregon-COMBINED.md | dot_89189_DS1.pdf | Developing a Pedestrian Safety Performance Function for Oregon | Josh Roll; Jason Anderson; Nathan McNeil | 2026 | combined reconciliation record; pedestrian/intersection SPF and exposure data fusion | US; Oregon | urban intersections | Poisson/NB pedestrian SPFs; random-forest AADT/AADPT data fusion; CURE-style diagnostics | pedestrian crash frequency at intersections | vehicle AADT and estimated pedestrian AADPT as explanatory exposure variables; no explicit offset | urban intersection with contraction of complex nodes | annual average exposure; final SPF crash period not fully stated | AADPT model 10-fold CV; final crash SPF diagnostics mainly in-sample; no clear held-out SPF validation | Canonical record supports exposure-only baselines, CURE diagnostics, Stage 1a distribution checks, and separate future junction/pedestrian layer | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | medium-high | conditional | Check appendices only if exact SPF coefficients, exposure-only comparisons, or crash-assignment rules are needed | Combined record from original PDF plus LIT-028 and LIT-040; use this for Roll/Oregon pedestrian SPF citations. |
| LIT-046 | paper-extraction-quddus-wang-ison-COMBINED.md | road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf | Road Traffic Congestion and Crash Severity: An Econometric Analysis Using Ordered Response Models | Mohammed A. Quddus; Chao Wang; Stephen G. Ison | not clearly stated; circa 2010 | combined reconciliation record; motorway conditional severity model | UK; M25 | motorway | ordered logit; heteroskedastic choice model; generalized ordered logit; partially constrained generalized ordered logit | ordered crash severity conditional on crash occurrence | no exposure offset; 15-minute traffic flow/congestion assigned to crash records using 30-minute pre-crash lag | individual crash record matched to 72 motorway segments | crash-level records from 2003-2006 with 15-minute traffic state lag | in-sample ordered-response model fit and marginal effects; no held-out validation | Canonical record clarifies conditional severity scope, pre-crash traffic-state matching, no-frequency interpretation, and post-event leakage cautions | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Published version year/bibliographic details and exact Tables 2-3 should be checked before formal numeric reporting | Combined record from original PDF plus LIT-027 and LIT-039; use this for Quddus/Wang/Ison severity citations. |
| LIT-047 | paper-extraction-ziakopoulos-yannis-2020-COMBINED.md | A review of spatial approaches in road safety.pdf | A review of spatial approaches in road safety | Apostolos Ziakopoulos; George Yannis | not explicitly stated; circa 2020 | combined reconciliation record; spatial road-safety review | international review | mixed | review of spatial units, MAUP, spatial dependence, proximity structures, network KDE, GWR/CAR/SAR and spatio-temporal methods | mixed crash counts, rates, severity outcomes, hotspot classifications, and spatial crash distributions | mixed: AADT, VMT/VDT, road length, population, trips, and vulnerable-road-user exposure variables; no single offset structure | mixed units: segments, intersections, corridors, grids, zones, regions, and network lixels | mixed across reviewed studies | review-level synthesis; primary studies need checking for exact method/validation claims | Canonical record supports spatial-unit documentation, MAUP/hotspot sensitivity notes, spatial residual diagnostics, and caution against production spatial models from review evidence alone | high | high | high | high | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Check original cited papers before using exact study-level model specifications, validation methods, or numerical claims | Combined record from original PDF plus LIT-031 and LIT-041; use this for spatial-methods review citations. |
Thematic evidence matrix
Crash-frequency and count modelling
| paper | method | what it supports | what it does not support | relevance to current Stage 2 | actionability |
|---|---|---|---|---|---|
| Aguero-Valverde & Jovanis 2008; Claude and Gemini extractions | Poisson/NB/Poisson-lognormal spatial crash-frequency models | Count modelling with exposure, overdispersion, and spatial residual diagnostics | Direct national-scale CAR production model | High: Stage 2 is a count/ranking model with exposure | Run diagnostics for AADT elasticity, residual spatial autocorrelation, and spatial uncertainty notes. |
| Lord & Mannering 2010 | broad crash-frequency methodological review | Conservative framing around overdispersion, zero-heavy outcomes, omitted variables, exposure functional form | Any single best model family | High: maps directly to Stage 2 risks | Add modelling limitations and baseline comparison tables. |
| Chengye & Ranjitkar 2013; three extractions | NB motorway segment models with temporal holdout | Temporal holdout, ramp/facility-family diagnostics, motorway-specific geometry checks | Direct replacement of link-level model or uncritical coefficient transfer | Medium: motorway subset only | Add temporal holdout and ramp/slip-road diagnostic. |
| Gilardi et al. 2022; three extractions | Bayesian Poisson network lattice with spatial/severity effects | OS-segment count modelling, log-offset structure, balanced accuracy diagnostics | External validation of Open Road Risk or national-scale INLA production | High: closest UK link-network literature | Add documentation and balanced accuracy diagnostic, not production spatial model. |
| Al-Omari 2021 | NB SPFs by context class with EB screening | Context/facility stratification and urban exposure elasticity diagnostics | Direct coefficient transfer from Florida thesis | Medium | Baseline comparison of global vs road-family/context split models. |
| Hauer et al. 2001 | EB tutorial using SPF prior plus observed counts | EB shrinkage, regression-to-mean warning, overdispersion role | A specific predictive model for Open Road Risk | High for EB diagnostic layer | Audit EB formula and document approximation. |
| Pan et al. 2017 | DBN vs NB global SPF | NB benchmark and minimum segment-length sensitivity | DBN/MSE as production model for sparse injury counts | Medium | Use as baseline-comparison and methods-to-avoid evidence. |
| Pew et al. 2020 | Bayesian ZIP; ZINB; NB-Lindley on Utah intersection panel | Methodological justification for ZINB as candidate; posterior predictive zero check; NB GLM as priority diagnostic step | Full Bayesian MCMC at 2.17M links; intersection-unit coefficients; no exposure offset | High: π ≈ 0 finding means NB GLM with offset is the right first step, not full zero-inflation | Fit NB GLM candidate; run posterior predictive zero check on current Poisson GLM. |
| Gao et al. 2024 | STZITD-GNN (GRU + GAT + zero-inflated Tweedie) on London urban road-day data | AccHR@k ranking metric; MPIW/PICP uncertainty metrics (future); Tweedie GLM as intermediate candidate | Full GNN at national scale; no exposure offset; daily urban resolution; severity-weighted composite not raw count | Medium: AccHR@k metric is immediately applicable; architecture does not transfer | Implement AccHR@k as validation metric for Stage 2 risk percentile output. |
Exposure and traffic-volume handling
| paper | exposure treatment | transferable part | non-transferable part | implication for AADF/WebTRIS | actionability |
|---|---|---|---|---|---|
| Gilardi et al. 2022 | offset = segment length times estimated commuter flow | Same mathematical log-offset family on UK OS segments | Census commuter flow is weaker than AADF/AADT | Supports documenting Open Road Risk’s AADT x length offset as literature-aligned | Documentation note; no production change. |
| Hauer et al. 2001 | ADT/AADT in SPF; length and years scale expected count | Year-specific exposure and EB weighting logic | Tutorial examples not full pipeline | Supports using year-specific AADT in EB diagnostic | Audit/upgrade EB diagnostic. |
| Aguero-Valverde & Jovanis 2008 | AADT free coefficient; length offset | Test whether AADT elasticity differs from 1.0 | Rural US scope and intersection exclusion | Run diagnostic freeing AADT coefficient from fixed VMT offset | Diagnostic only. |
| Wang et al. 2009 | AADT and length as free covariates, not offset | Motorway-specific AADT elasticity check | No sparse AADF estimation and long segments | Motorway AADT coefficient may differ by road class | Motorway-only diagnostic. |
| Jayasinghe et al. 2019 | AADT is target, estimated from centrality and sparse counts | Stage 1a centrality features, learning curves, sparse-count sensitivity | Not a collision-risk paper; random validation likely leaks spatially | Stage 1a should report spatial holdout and count-sparsity sensitivity | Baseline comparison/diagnostic. |
| Roll et al. 2026 | data-fusion vehicle/pedestrian exposure | Compare exposure-only vs full-feature baselines; CURE plots | Pedestrian/intersection scope; commercial/US data tiers | Stage 1a analogy is conceptual only | Documentation and diagnostic baseline. |
| Huda & Al-Kaisy 2024 | AADT covariate dropped in one low-volume model | Low-volume geometry/AADT-sensitivity diagnostic | LVR-specific and EB-output response | Test whether low-AADT links are dominated by geometry vs exposure uncertainty | Pilot-first. |
Spatial and network methods
| paper | spatial unit / network concept | key spatial issue | relevance to OS Open Roads links | actionability |
|---|---|---|---|---|
| Gilardi et al. 2022 | OS road segment lattice and shared-boundary adjacency | spatial autocorrelation and MAUP/segment contraction | High; closest OS-network analogue | Document support, add MAUP pilot and adjacency residual diagnostics. |
| Aguero-Valverde & Jovanis 2008 | road segments with CAR neighbourhoods | unobserved spatial correlation biases coefficients/precision | High as diagnostic concept; lower as production model | Moran’s I and residual corridor mapping. |
| Ziakopoulos & Yannis 2020 | review across links, intersections, zones, corridors | spatial-unit sensitivity, boundary effects, proximity weights | High as cautionary framework | Spatial validation section and segmentation sensitivity pilot. |
| Baddeley et al. 2021 | continuous network point process | segment aggregation and planar KDE can mislead | Conceptually high, production low | Avoid ordinary 2D KDE; small point-process diagnostic only. |
| Cronie et al. 2019 | linear-network point-process diagnostics | point clustering after intensity adjustment | Medium for snapped-collision diagnostics | Small pilot on one urban area; not Stage 2 replacement. |
| Wang et al. 2015 | TAZ-level CAR arterial model | MAUP and zonal aggregation | Low direct transfer | Junction/signal density ideas only. |
Junctions, intersections, and conflict structure
| paper | junction/intersection mechanism | required data | transferability | current repo implication | actionability |
|---|---|---|---|---|---|
| Poch & Mannering 1996 | intersection approach-level traffic, turning, signal, geometry variables | turning volumes, approach geometry, signal/control data | Medium conceptually; low direct data coverage | Pure link model under-represents junction mechanics | Add junction-adjacent residual diagnostic and proxy feature pilot. |
| Roll et al. 2026 | urban intersection SPF by type/control/crossing | intersection inventory, pedestrian exposure, crossing/control data | Low direct transfer | Highlights missing junction-specific model class | Documentation/future work; CURE diagnostics transferable. |
| Al-Omari 2021 | access-point and signalized-intersection density as segment features | junction/access density from inventory | Medium if derived from OS/OSM topology | Candidate junction density per link/corridor | Diagnostic before feature inclusion. |
| Wang et al. 2015 | signal spacing/access density at TAZ level | signals/accesses and zonal network features | Low to medium | Possible missing urban conflict proxies | Low-priority diagnostic. |
| Aguero-Valverde & Jovanis 2008 | intersections/ramp crashes excluded in one extraction | junction exclusion flag/sensitivity | Medium as scope caveat | Current STATS19-to-link snapping includes junction-proximate crashes | Document and test near-junction sensitivity. |
| Hauer et al. 2001 | intersections treated as separate EB entity type | intersection entity definition and SPF | High conceptually for future junction module | Link and junction EB weights differ | Future junction-level methodology note. |
Severity modelling
| paper | severity target | model type | useful idea | leakage risk | current/future relevance |
|---|---|---|---|---|---|
| Boulieri et al. 2016 | slight vs severe/fatal counts | multivariate Bayesian Poisson at ward-year | Severity strata can have distinct spatial patterns | Low if kept as aggregate target; scale mismatch | Current documentation; future severity target. |
| Gilardi et al. 2022 | slight vs severe segment counts | bivariate Bayesian Poisson network lattice | Balanced accuracy for sparse severe counts; severity-specific rates | Low for target; no holdout caveat | High documentation/future relevance. |
| Michalaki et al. 2015 | conditional motorway accident severity | ordered/generalized ordered logit | Frequency and severity mechanisms differ; HGV/hard-shoulder diagnostics | High if using post-event variables as predictors | Documentation and future accident-level severity module. |
| Quddus et al. circa 2010 | conditional crash severity | ordered response models with 30-minute traffic lag | Pre-crash lag design for WebTRIS/crash-level work | Post-event crash variables could leak | Future severity/time-profile design. |
| Ma et al. 2019 | fatal vs non-fatal crash | XGBoost classifier | Severity-feature importance and leakage warning | High for crash-record features | Diagnostic-only severity stratification. |
| Roll et al. 2026 | pedestrian injury crashes | intersection SPF | Vulnerable-user exposure is separate from vehicle exposure | Low for current all-injury link model | Future active-travel literature only. |
Validation, metrics, and model assessment
| paper | reported validation/metric type | what the metric actually tests | limitations | Open Road Risk implication |
|---|---|---|---|---|
| Brodersen et al. 2010 | posterior balanced accuracy | imbalanced binary classifier performance and uncertainty | Only applies after binarising outcomes | Use for zero/non-zero or hotspot classification diagnostics, not count likelihood replacement. |
| Gilardi et al. 2022 | posterior predictive balanced accuracy | in-sample posterior predictive adequacy | Not external/spatial holdout validation | Label clearly and report alongside grouped holdout metrics. |
| Chengye & Ranjitkar 2013 | MAD/MSPE temporal holdout | temporal prediction for motorway segments | motorway-only; longer homogeneous segments | Add temporal holdout diagnostic to Stage 2. |
| Roll et al. 2026 | exposure-only vs feature-rich SPF; CURE plots | model misspecification against covariates | intersection/pedestrian scope | Use CURE plots and exposure-only baseline for GLM diagnostics. |
| Huda & Al-Kaisy 2024 | high R2 predicting EB expected crashes | fit to smoothed EB target, not raw crashes | random split and circularity inflate fit | Avoid comparing R2 to raw-count pseudo-R2. |
| Lord & Mannering 2010 | review of fit/diagnostic issues | model risk checklist | no single empirical validation | Use as validation documentation scaffold. |
| Ma et al. 2019 | classifier metrics on balanced fatality data | conditional fatal/nonfatal classification | not exposure-adjusted and not frequency prediction | Do not compare to Stage 2 risk percentile. |
| Mahoney et al. 2023 | simulation comparison of V-fold vs spatial CV methods | which CV method best estimates true out-of-sample error for spatially autocorrelated data | simulation uses continuous outcome; regular grid not road network; zero-heavy counts not tested | Current grouped-link split is temporal, not spatial CV; document this limitation; pilot police-force holdout; estimate autocorrelation range from Stage 2 residuals via variogram. |
| Pew et al. 2020 | Bayesian chi-squared goodness-of-fit; posterior predictive zero check; temporal holdout RPMSE/MAD | zero-calibration and distributional adequacy for zero-heavy count models | intersection unit; no spatial holdout; single-year holdout only | Run posterior predictive zero check on current Poisson GLM; π ≈ 0 finding supports NB GLM as priority next step before ZINB. |
| Gao et al. 2024 | AccHR@k (hit rate at top-k% predicted risk roads); MPIW/PICP uncertainty | ranking precision at top-k; interval calibration | within-year temporal holdout only; same roads in train/test; no spatial holdout; weaker than current Open Road Risk CV | Implement AccHR@k for Stage 2 risk percentile validation; MPIW/PICP deferred until probabilistic outputs added. |
Point-process / hotspot / spatial diagnostics
| paper | method | diagnostic use | production risk | recommended status |
|---|---|---|---|---|
| Baddeley et al. 2021 | network point processes and network KDE | compare raw/snap collision clustering with link rankings | Does not scale easily and changes target from link-year risk to event intensity | small pilot / documentation note |
| Cronie et al. 2019 | inhomogeneous network J/F/G functions | test point clustering after intensity correction | Not exposure-normalised traffic risk | small pilot only |
| Eckardt & Moradi 2024 | marked point process summaries | explore severity/type mark dependence | exploratory summaries can be mistaken for predictive validation | small pilot only |
| Aguero-Valverde & Jovanis 2008 | CAR residual/spatial effects | residual spatial autocorrelation and corridor clustering | national CAR production infeasible | diagnostic-only |
| Ziakopoulos & Yannis 2020 | spatial-methods review | MAUP, proximity, hotspot sensitivity | review evidence cannot justify direct production swap | documentation and diagnostic queue |
Methods to avoid as production changes for now
| method/paper | why not production-ready | safer use | required evidence before production |
|---|---|---|---|
| Full national CAR/MCAR Bayesian model; Aguero-Valverde, Gilardi, Boulieri | computationally unrealistic at 2M+ links; often in-sample only | pilot area residual/spatial diagnostic | scalable implementation, grouped/spatial holdout benefit, compute budget |
| DBN with MSE crash-count regression; Pan et al. 2017 | no count likelihood/offset; poor match to zero-heavy injury collisions | baseline comparison note; negative evidence | Poisson/NB loss with offset and strong held-out performance |
| Planar KDE for road crashes; Baddeley et al. 2021 | ignores network geometry and can mislead | network-aware KDE/point process pilot | network-distance implementation and clear diagnostic framing |
| Post-event crash variables as Stage 2 predictors; Michalaki, Quddus, Ma | crash type/casualties/contributory factors happen after or during crash | retrospective severity diagnostics only | prospective feature availability and leakage audit |
| Zonal TAZ CAR model for link ranking; Wang et al. 2015 | loses link-level geometry; MAUP risk | contextual/junction-density inspiration | link-level validation of derived proxies |
| STZITD-GNN full architecture; Gao et al. 2024 | GRU+GAT+ZITD at 2.17M links is computationally infeasible; no exposure offset; daily resolution; severity-weighted composite not raw count | AccHR@k metric and Tweedie GLM as extractable contributions | scaled pilot (small area), exposure offset retained, annual aggregation, robust cross-year holdout |
| ARIMA/SARIMAX on corridor-level collision data without exposure; Balawi & Tenekeci 2024 | wrong response variable (vehicles involved not collision count); no exposure denominator; negative predicted counts; implausible R-squared values; methodology not replicable | negative example: illustrates post-event feature leakage from STATS19 attributes | not recommended under any circumstances for this pipeline |
| Random V-fold CV as primary Stage 2 validation; implied by Mahoney et al. 2023 | severely underestimates true prediction error for spatially autocorrelated data (2% within target range vs 37% for spatial CV) | current grouped-link temporal split is an improvement but does not enforce spatial separation; document limitation | spatial clustering CV with buffer sized to autocorrelation range of Stage 2 residuals |
| Pedestrian intersection SPF as all-injury link model; Roll et al. 2026 | different mode, exposure, and unit | active-travel/future junction literature | UK-equivalent pedestrian exposure and junction inventory |
Code and documentation implications
| todo_id | suggested_action | action_type | relevant_stage | supporting_papers | why_supported | current_repo_relevance | future_research_relevance | effort | risk_if_done_badly | already_present_or_new | priority |
|---|---|---|---|---|---|---|---|---|---|---|---|
| LIT-TODO-001 | Add Stage 2 documentation note on exposure-offset support and limitations | documentation note | Stage 2 / documentation | Gilardi 2022; Hauer 2001; Lord 2010; Aguero-Valverde 2008; Pan 2017 | Multiple extractions support exposure-adjusted count framing but note elasticity/functional-form caveats | high | high | low | Overclaiming exact offset optimality | partly present in methodology pages | now |
| LIT-TODO-002 | Run diagnostic Stage 2 GLM with log(AADT) and log(length) as free covariates or road-family interactions | diagnostic / baseline comparison | Stage 2 | Aguero-Valverde 2008; Wang 2009; Al-Omari 2021; Lord 2010 | Several papers estimate sub/super-linear AADT effects rather than fixed offset | high | high | medium | Confusing diagnostic with production replacement | new/partly implied | later |
| LIT-TODO-003 | Add temporal holdout report for Stage 2 | diagnostic | validation / Stage 2 | Chengye & Ranjitkar 2013; Pan 2013; Lord 2010 | Motorway NB papers use later-year prediction; current grouped split should be complemented | high | medium | medium | COVID-year split can distort results | likely partly present; verify | now |
| LIT-TODO-004 | Add spatial residual/autocorrelation diagnostic on pilot area | diagnostic | validation / Stage 2 | Aguero-Valverde 2008; Gilardi 2022; Ziakopoulos 2020 | Spatial autocorrelation can bias inference and hotspot confidence | high | high | medium | Treating in-sample spatial smoothers as external validation | new | later |
| LIT-TODO-005 | Add MAUP/segmentation sensitivity pilot for OS Open Roads links | small pilot | validation / feature engineering | Gilardi 2022; Baddeley 2021; Ziakopoulos 2020; Pan 2017 | Link granularity and very short segments are repeated cautions | medium | high | high | Large refactor or inconsistent target grain | new | backlog |
| LIT-TODO-006 | Add junction-adjacent residual/risk diagnostic | diagnostic | Stage 2 / feature engineering | Poch 1996; Al-Omari 2021; Ziakopoulos 2020; Baddeley 2021 | Junction mechanisms differ from mid-link risk | high | high | medium | Using noisy OSM junction proxies as production features too early | future-work mentions junction density | later |
| LIT-TODO-007 | Pilot junction-density or conflict-proxy features only after diagnostic | small pilot / candidate feature | feature engineering / Stage 2 | Poch 1996; Al-Omari 2021; Wang 2015 | Intersection/access density repeatedly appears as relevant but data differs | medium | high | medium | Proxy may measure urbanity/AADT rather than conflict | candidate in future-work | backlog |
| LIT-TODO-008 | Audit EB shrinkage formula and overdispersion parameter usage | diagnostic | Stage 2 / validation | Hauer 2001; Al-Omari 2021; Huda 2024 | EB weighting depends on correct dispersion and entity type | high | high | medium | Miscalibrated shrinkage overstates confidence in rankings | EB exists as diagnostic | now |
| LIT-TODO-009 | Document regression-to-mean warning for before/after use of high-risk links | documentation note | documentation / validation | Hauer 2001 | Users may evaluate interventions on links selected by high observed counts | high | medium | low | Users mistake ranking for treatment-effect evidence | likely new | now |
| LIT-TODO-010 | Add balanced-accuracy diagnostic for zero/non-zero or severe/KSI checks | diagnostic | validation / Stage 2 | Brodersen 2010; Gilardi 2022 | Imbalanced sparse counts make ordinary accuracy misleading | medium | high | medium | Binarisation can obscure count calibration | possibly absent | later |
| LIT-TODO-011 | Keep severity modelling separate from frequency model in docs | documentation note | documentation / Stage 2 | Boulieri 2016; Gilardi 2022; Michalaki 2015; Quddus 2010; Ma 2019 | Severity and frequency targets differ and may have different predictors | high | high | low | Implying severity-weighted validation exists | future-work covers severity | now |
| LIT-TODO-012 | Add severity-stratified diagnostic comparing top-risk links with KSI/fatal proportions | diagnostic | Stage 2 / validation | Ma 2019; Quddus 2010; Michalaki 2015; Boulieri 2016 | Tests whether current frequency ranking misses severity burden | medium | high | medium | Leakage if post-event proportions become production predictors | new | later |
| LIT-TODO-013 | Add feature-interpretation leakage note for crash-record variables | documentation note | feature engineering / documentation | Ma 2019; Michalaki 2015; Quddus 2010 | Post-event crash features are not prospective link predictors | high | high | low | Accidental use of target-derived variables | likely partly present | now |
| LIT-TODO-014 | Add centrality-feature and count-sparsity diagnostics for Stage 1a | diagnostic / baseline comparison | Stage 1a | Jayasinghe 2019 | Centrality-based AADT estimation depends on split design and sparse counts | high | medium | medium | Random splits overstate spatial generalisation | centrality likely present | later |
| LIT-TODO-015 | Add learning curve for Stage 1a count-point sparsity | diagnostic | Stage 1a / validation | Jayasinghe 2019 | Extraction explicitly suggests training-point sensitivity | medium | medium | medium | Misreading random split R2 as spatial transfer | new | backlog |
| LIT-TODO-016 | Add CURE plots and exposure-only baseline comparison for Poisson GLM | diagnostic / baseline comparison | Stage 2 / validation | Roll 2026; Lord 2010 | CURE plots and exposure-only baselines diagnose misspecification | medium | medium | medium | Applying pedestrian-intersection claims to link model | new | later |
| LIT-TODO-017 | Add documentation note that congestion proxies are low priority for current Stage 2 | documentation note | Stage 2 / documentation | Wang 2009; Quddus 2010 | Two M25 companion extractions report congestion null findings; scope is motorway-specific | medium | medium | low | Generalising M25 null result to all roads | new | later |
| LIT-TODO-018 | Run motorway slip-road/ramp residual diagnostic | diagnostic | Stage 2 / feature engineering | Chengye & Ranjitkar 2013; Pan 2013; Michalaki 2015 | Motorway context differs around ramps/hard shoulder | medium | medium | medium | Sparse/noisy ramp coding | possibly available via form-of-way | backlog |
| LIT-TODO-019 | Add curvature/grade interpretation note by road family | documentation note / diagnostic | feature engineering / Stage 2 | Pan 2017; Chengye 2013; Wang 2009; Huda 2024; Quddus 2010 | Geometry effects vary by road type and by frequency vs severity target | high | high | low-medium | Treating coefficient direction as causal | curvature active; grade candidate | now/later |
| LIT-TODO-020 | Treat point-process methods as exploratory comparison layers only | documentation note / small pilot | validation / future work | Baddeley 2021; Cronie 2019; Eckardt 2024 | Network point-process literature critiques aggregation but does not replace Stage 2 | medium | high | low for note; high for pilot | Presenting in-sample clustering as predictive validation | new | backlog |
| LIT-TODO-021 | Run posterior predictive zero check on current Stage 2 Poisson GLM | diagnostic | Stage 2 / validation | Pew 2020 | Table 3 in Pew shows Poisson-equivalent model (ZIP with π=0) underestimates zeros; same structure expected for Open Road Risk Poisson GLM given ~98–99% link-year zero rate | high | medium | low | Drawing samples at link-year level must incorporate correct exposure offset per link | new | now |
| LIT-TODO-022 | Fit negative binomial GLM with existing exposure offset as Stage 2 candidate and compare to Poisson GLM using grouped-link CV | baseline comparison | Stage 2 | Pew 2020; Lord 2010; Chengye & Ranjitkar 2013 | π ≈ 0 in Pew’s ZINB indicates overdispersion (ϕ = 17) drives improvement, not zero-inflation; NB GLM is the priority step before any ZINB complexity | high | high | low-medium | NB GLM dispersion can be sensitive to motorway overfitting already noted; check ϕ stability across facility families | new | now |
| LIT-TODO-023 | Estimate empirical variogram of Stage 2 Poisson GLM residuals to determine spatial autocorrelation range | diagnostic | Stage 2 / validation | Mahoney 2023; Aguero-Valverde 2008; Gilardi 2022 | Mahoney 2023 shows optimal spatial CV buffer ≈ autocorrelation range; without measuring the range for Open Road Risk, spatial CV design is uninformed | high | high | low-medium | Variogram on 2.17M links requires subsampling; use road-class-stratified subsample of ~10–50k links | new | later |
| LIT-TODO-024 | Pilot police-force-level regional holdout as a spatial CV diagnostic | diagnostic / small pilot | Stage 2 / validation | Mahoney 2023; Gilardi 2022 | ~13–16 force areas provide pre-defined geographic groups of comparable size; holding each out in turn enforces real spatial separation and tests geographic generalisation | high | high | medium | Force areas vary substantially in size and collision density; compare force-holdout R²/pseudo-R² against current grouped-link metrics to quantify spatial optimism | new | later |
| LIT-TODO-025 | Document current grouped-link CV as temporal grouped CV and record that it does not enforce spatial separation between neighbouring links | documentation note | Stage 2 / validation / documentation | Mahoney 2023 | Paper shows V-fold without spatial separation is strongly optimistic; grouped-link split prevents same-link leakage but does not address neighbouring-link spatial autocorrelation | high | medium | low | None (documentation only) | new | now |
| LIT-TODO-026 | Implement AccHR@k (accuracy hit rate at top-k% predicted risk links) as a Stage 2 validation metric | diagnostic / validation metric | Stage 2 / validation | Gao 2024 | AccHR@k directly evaluates whether high-percentile risk predictions correspond to roads with actual collisions; more operationally meaningful than RMSE or pseudo-R² for a ranking output | high | medium | low | Choice of k matters at 2.17M links; consider AccHR@1, AccHR@5, and AccHR@20 rather than a single threshold; avoid treating a broad k as strong evidence of discrimination | new | now |
Current-code alignment assessment
Current strengths
- The exposure-adjusted crash-frequency framing is supported by multiple extractions: Hauer 2001, Gilardi 2022, Aguero-Valverde 2008, Lord 2010, and Pan 2017.
- Link-year modelling is consistent with the crash-frequency/SPF literature, while the Gilardi et al. 2022 extractions provide a direct UK OS-segment analogue.
- Grouped or held-out validation is directionally aligned with the caution in Lord 2010 and with temporal holdout practice in the Chengye/Ranjitkar motorway papers.
- The repository’s attention to spatial units is aligned with Gilardi 2022, Baddeley 2021, Ziakopoulos 2020, and Aguero-Valverde 2008.
- Use of open data is a defensible distinction versus studies relying on complete motorway counters, commercial probe data, or inspection/video logs.
- Keeping EB shrinkage, spatial models, and point-process methods as diagnostics or future work is consistent with computational and validation cautions in the extractions.
Current weaknesses / limitations to document
- Exposure uncertainty is not fully propagated from Stage 1a into Stage 2; several papers treat AADT as observed, but that is not true for Open Road Risk.
- The fixed VMT-style offset implies exposure elasticity of 1.0; several extractions support testing free AADT/length coefficients diagnostically.
- OS Open Roads link choice may be sensitive to very short links, junction proximity, and MAUP-like segmentation effects.
- Junction/intersection mechanisms are under-represented in a pure link-level model.
- Severity is not separately modelled; the severity papers show this is a different target, not just a weighted version of frequency.
- Spatial autocorrelation is not fully handled in production; this may affect coefficient interpretation and ranking confidence.
- The grouped-by-road-link CV split prevents same-link leakage across years but does not enforce spatial separation between neighbouring links on the same corridor. Mahoney 2023 shows that this kind of temporal grouped split produces estimates close to V-fold (optimistically biased) rather than true out-of-sample performance. The degree of bias depends on the spatial autocorrelation range of collision risk, which has not been measured.
- Hotspot/risk percentile sensitivity to spatial unit and residual clustering needs explicit documentation.
- The current Stage 2 Poisson GLM likely underestimates zeros at link-year level. Pew 2020 shows that a Poisson-equivalent model (ZIP with π ≈ 0) calibrates poorly on zero-heavy count data; the improvement from NB/ZINB comes from the dispersion parameter, not zero-inflation per se. This has not been tested on Open Road Risk data.
- Post-event variables from collision records must not leak into prospective Stage 2 features.
Current areas where the repo is deliberately conservative
- The current pipeline should not claim causality from road-feature coefficients.
- It should not use post-event collision variables as predictors in the production frequency model.
- Spatial, point-process, and CAR/INLA methods should remain diagnostics or pilots before any production use.
- Severity-weighted, fatal-only, motorcycle, cyclist, or pedestrian risk targets should remain parallel/future models unless exposure and validation are made explicit.
- Machine-learning rankings should be presented as decision-support indicators, not as calibrated external safety scores.
Claims Open Road Risk can safely make
Safer claims
- The project estimates exposure-adjusted injury-collision risk.
- The outputs are exploratory decision-support indicators.
- The model can help identify links with unusually high observed collisions relative to estimated exposure and context.
- Spatial-unit choice and hotspot outputs are known limitations.
- Severity and frequency are distinct modelling targets.
- EB shrinkage and spatial diagnostics can help assess ranking confidence, but they do not prove causal treatment effects.
- Open Road Risk uses open transport/collision/network data, which brings reproducibility advantages and exposure-coverage limitations.
Claims not yet supported
- The model proves causal effects of road features.
- The production risk percentile is externally validated.
- High-ranked links are definitely unsafe independent of exposure uncertainty.
- Severity-weighted risk is validated.
- The current model fully handles junction conflict mechanisms.
- The model is directly comparable to proprietary inspection scores without further validation.
- Spatial autocorrelation is fully captured in the production model.
- XGBoost feature importance is a causal interpretation of crash mechanisms.
- The grouped-by-road-link cross-validation provides a spatially robust estimate of model performance. It controls for same-link temporal leakage but does not enforce spatial separation between adjacent links; reported pseudo-R² values may be optimistically biased by an unknown amount relative to true geographic holdout performance.
Secondary review queue
Use literature/prompts/literature_extraction_additional_prompts.md for these checks:
- Use the Cross-Audit Prompt when there is one extraction and the PDF/tables need checking.
- Use the Lightweight Sanity Check Prompt for low-priority single extractions.
- Use the Reconciliation Prompt when two or more independent extractions need to be combined into a final record.
- Use the Human Review Checklist before treating a reconciled extraction as final.
Missing or weak review queue
These papers need an additional source check because extraction coverage is thin, the extraction flags OCR/table problems, or the paper could support repo actions.
| priority | paper | extraction_file | review_gap | prompt_to_use | what_to_check | likely_impact_if_wrong | recommended_next_action |
|---|---|---|---|---|---|---|---|
| conditional | Pew et al. 2020 | paper-extraction-pew-2020-zero-inflated-crash.md | One extraction; key π ≈ 0 finding drives NB-vs-ZINB TODO ordering | Targeted Cross-Audit Prompt | Confirm π posterior mean ≈ 0.00 (SD 0.01) for both ZIP and ZINB in original PDF Table A1; confirm ϕ = 17.04 for ZINB; check prior specification Beta(0.15,1) on π | If π is not near zero, the argument for NB GLM priority over ZINB weakens and TODO ordering changes | Check before citing π ≈ 0 or using it as justification for NB-first approach. |
| conditional | Gao et al. 2024 | paper-extraction-gao-2024-stzitd-gnn.md | One extraction; Table 4 performance values may contain transcription errors; train/val/test split chronology not stated | Targeted Cross-Audit Prompt | Verify Table 4 MAE/RMSE/AccHR@20 values; confirm whether 8:2:2 split is chronological or random; check GitHub repo accessibility | Could misstate AccHR@20 values or overstate validation strength | Check Table 4 and split description before writing AccHR@k diagnostic or citing improvement percentages. |
Completed reconciliation records
These papers now have final combined records. Use the combined record for future citation and TODO work, while preserving the earlier extraction files for provenance.
| paper | combined_record | source_extraction_files | remaining_caution | recommended_status |
|---|---|---|---|---|
| Huda & Al-Kaisy 2024 | paper-extraction-huda-2024-COMBINED.md | paper-extraction-huda-alkaisy-2024-lvr-network-screening.md; paper-extraction-huda-2024-network-screening-low-volume-roads.md | Curvature CART sharp-group value is internally inconsistent; grade should not be cited as a final-model predictor without caution | Use combined record; no further extraction needed unless quoting disputed threshold values. |
| Jayasinghe et al. 2019 | paper-extraction-jayasinghe-2019-COMBINED.md | paper-extraction-jayasinghe-2019-centrality-aadt.md; paper-extraction-jayasinghe-2019-traffic-modeling-centrality.md | Final selected regression type is implied but not fully documented across OLS/robust/Poisson alternatives | Use combined record; cite as Stage 1a exposure-modelling evidence, not collision-risk evidence. |
| Poch & Mannering 1996 | paper-extraction-poch-mannering-1996-COMBINED.md | paper-extraction-poch-1996-intersection-negative-binomial.md; paper-extraction-poch-mannering-1996-nb-intersection.md | Accident-type table values should still be checked before formal publication because OCR is imperfect | Use combined record for junction/approach mechanism claims; table-value citation remains conditional. |
| Roll et al. 2026 | paper-extraction-roll-2026-oregon-COMBINED.md | paper-extraction-roll-2026-oregon-pedestrian-spf.md; paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md | Appendices should be checked if exact SPF coefficients, exposure-only comparisons, or crash-assignment rules are needed | Use combined record for exposure-model and future junction/pedestrian-layer evidence. |
| Quddus, Wang & Ison | paper-extraction-quddus-wang-ison-COMBINED.md | paper-extraction-quddus-2010-m25-severity-ordered-response.md; paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md | Published version year/bibliographic details and exact Tables 2-3 should be checked before formal numeric reporting | Use combined record for severity target, traffic-lag design, and leakage-guardrail claims. |
| Ziakopoulos & Yannis 2020 | paper-extraction-ziakopoulos-yannis-2020-COMBINED.md | paper-extraction-ziakopoulos-2020-spatial-approaches-road-safety.md; paper-extraction-ziakopoulos-yannis-2020-spatial-review.md | Primary cited papers still need checking before using exact study-level model specifications, validation methods, or numerical claims | Use combined record for high-level spatial methods, MAUP, hotspot sensitivity, and spatial-diagnostic claims. |
Active reconciliation / combination check queue
These papers still have multiple extraction passes but no final combined record. Do not re-extract them from scratch; use the Reconciliation Prompt in literature_extraction_additional_prompts.md after any needed cross-audit notes exist.
| priority | paper | extraction_files | why_reconcile | prompt_to_use | reconciliation_focus | expected_output |
|---|---|---|---|---|---|---|
| conditional | Gilardi et al. 2022 | paper-extraction-gilardi-2022-leeds-network-lattice-bayesian.md; paper-extraction-gilardi-2022-multivariate-hierarchical-crashes.md; paper-extraction-gilardi-2022-network-lattice-crashes.md | Three extraction records already exist; only targeted citation checks remain for table/sign ambiguity and MAUP/contraction details | Reconciliation Prompt only if creating a final canonical record; otherwise Human Review Checklist plus targeted PDF check | Table 2 coefficient signs; Primary Road interpretation; balanced accuracy wording; dodgr/network-contraction details | Do not re-extract; manually inspect disputed PDF tables/text before citing coefficient directions or balanced-accuracy values. |
| conditional | Pan et al. 2017 | paper-extraction-pan-2017-deep-belief-network-global-spf.md.md; paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md | Two extraction records now exist; use reconciliation only if writing the neural/global-SPF comparison page | Reconciliation Prompt | DBN training details; crash scope; NB benchmark coefficients; Washington split; normalization and minimum-length handling | Reconciled benchmark note; do not treat DBN as a production recommendation without stronger validation. |
| low | Brodersen et al. 2010 | paper-extraction-brodersen-2010-balanced-accuracy.md; paper-extraction-brodersen-2010-balanced-accuracy-posterior.md | Two extraction records already exist; equations only needed for implementation | Reconciliation Prompt only if implementing posterior intervals | Posterior balanced accuracy equations, Equation 7 wording, and examples | Reconciled implementation note if adding code for posterior intervals. |
Candidate Quarto literature pages
| proposed_qmd_file | purpose | papers_to_use | key_claims | figures/tables_needed | readiness |
|---|---|---|---|---|---|
| quarto/literature/crash-frequency-models.qmd | Explain Poisson/NB/SPF count-model basis and limitations | Lord 2010; Hauer 2001; Aguero-Valverde 2008; Chengye/Ranjitkar 2013; Pan 2017; Poch 1996; Al-Omari 2021; Pew 2020 | Count models need exposure, dispersion, validation, and cautious interpretation; overdispersion is the immediate model-family issue before zero-inflation; intersection evidence should not be transferred directly to link risk | Model-family comparison table; Open Road Risk alignment table; zero-calibration diagnostic summary; NB-over-Poisson evidence note | exists — mostly current; update references to use the combined Poch record and verify Pew π≈0 before quoting exact values |
| quarto/literature/exposure-and-traffic-volume.qmd | Document AADT/AADF/WebTRIS exposure handling | Gilardi 2022; Hauer 2001; Jayasinghe 2019 combined; Roll 2026 combined; Aguero-Valverde 2008; Wang 2009; Pew 2020; Gao 2024 | Exposure is central but elasticity, estimated-AADT uncertainty, and no-exposure contrast cases need clear separation | Exposure-treatment matrix; Stage 1a validation summary; no-offset contrast table; AADT/AADPT data-fusion note | exists — linked in site nav; use combined Jayasinghe/Roll records and keep Pew/Gao as cautious no-offset contrasts |
| quarto/literature/spatial-methods-and-network-risk.qmd | Review OS-link lattice, CAR, MAUP, point-process diagnostics | Gilardi 2022; Aguero-Valverde 2008; Ziakopoulos 2020 combined; Baddeley 2021; Cronie 2019; Eckardt 2024; Mahoney 2023 | Spatial methods support diagnostics, not immediate production replacement; spatial CV evidence strengthens validation caveats | Spatial-unit comparison; diagnostic queue; CV method comparison table from Mahoney | exists — linked in site nav; use combined Ziakopoulos record and keep Gilardi table/sign details conditional |
| quarto/literature/junctions-and-conflict-structure.qmd | Separate junction/approach mechanisms from link risk | Poch 1996 combined; Roll 2026 combined; Al-Omari 2021; Wang 2015; Ziakopoulos 2020 combined | Junction risk needs different units, data, and exposure structures from the current link-year model | Junction mechanism table; available/open-data proxy table; link-vs-intersection transferability table | exists — linked in site nav; use combined Poch/Roll/Ziakopoulos records, with exact Poch table values and Roll appendices conditional for formal citation |
| quarto/literature/severity-modelling.qmd | Separate severity from frequency and define future severity path | Boulieri 2016; Gilardi 2022; Michalaki 2015; Quddus/Wang/Ison combined; Ma 2019; Gao 2024 | Severity is conditional/different target and can conflict with frequency; severity-weighted composites should not be treated as validated Stage 2 risk | Severity target taxonomy; leakage warning table; composite-vs-separate response variable note | exists — linked in site nav; use combined Quddus record and keep exact table/bibliographic details conditional |
| quarto/literature/validation-and-metrics.qmd | Document heldout, balanced accuracy, CURE, pseudo-R2 limitations | Brodersen 2010; Gilardi 2022; Chengye 2013; Roll 2026 combined; Lord 2010; Huda 2024 combined; Mahoney 2023; Pew 2020; Gao 2024 | Metrics test different things; avoid in-sample/holdout confusion; spatial CV, zero-calibration, AccHR@k, and CURE diagnostics are candidate validation additions | Metric taxonomy; current repo validation map; CV method performance table from Mahoney; zero-check diagnostic table from Pew; AccHR@k definition from Gao | exists — needs light update to cite LIT-042/LIT-045 combined records and keep Pew/Gao exact-value checks conditional |
| quarto/literature/transferability-and-open-data-limits.qmd | Explain what transfers to open UK data and what does not | All papers, with combined records for Huda, Jayasinghe, Poch, Roll, Quddus/Wang/Ison, and Ziakopoulos/Yannis; Gao 2024 and Balawi & Tenekeci 2024 as negative-transfer examples | Some evidence is blocked by missing lane/turning/exposure data or different unit/target; apparent UK relevance still needs data-stack checks | Transferability table; data-availability matrix; negative-transfer rows; combined-record provenance note | exists — linked in site nav; needs light citation refresh to prefer combined records |
Appendices
Register taxonomy
current_repo_relevance: how directly the extraction informs the current Open Road Risk pipeline, code, model, validation, or documentation.high: directly relevant to current Stage 1a, Stage 1b, Stage 2, validation, or docs.medium: relevant to a subset, diagnostic, or caution.low: indirect or future-only.
future_research_relevance: usefulness for extensions beyond the current implementation.high: directly informs plausible future Open Road Risk research.medium: useful if a specific future branch exists.low: peripheral.
literature_review_relevance: usefulness for future narrative Quarto literature pages.high: should be cited or tabulated.medium: include in specialised page or caveat table.low: likely appendix/background only.
code_actionability_now: whether the extraction supports a near-term code/doc action.high: a clear documentation, diagnostic, or baseline action is supported.medium: action is plausible but should be scoped.low: no near-term code action.
supports_production_change:no: no production change supported.diagnostic-only: supports checks, reporting, documentation, or sensitivity analysis.pilot-first: supports a limited pilot before any production consideration.baseline-comparison-first: supports comparing against current implementation before adopting.possible-later: may support a future production change after more evidence.
secondary_review_needed:no: extraction is sufficient for high-level register use.yes: manual PDF/table review is needed before use in TODOs or literature prose.conditional: adequate for cautious register use, but check before quoting numbers, equations, or coefficient signs.
extraction_quality_initial_judgement:high: extraction reports high confidence or appears complete for register-level use.medium: useful but has missing tables, indirect relevance, or stated uncertainty.low: do not use without review.unknown: extraction does not state enough to judge.
Known extraction files not yet processed
| file | reason not included |
|---|---|
| literature/prompts/road_safety_literature_extraction_prompt.md | Prompt template, not a paper extraction. |
| literature/prompts/OLD_road_safety_literature_extraction_prompt.md | Old prompt template, not a paper extraction. |
| literature/prompts/literature_extraction_additional_prompts.md | Companion prompt file, not a paper extraction. |
| literature/prompts/README_literature_extraction.md | Workflow guide, not a paper extraction. |
| literature/prompts/grep_extraction.sh | Utility script, not a paper extraction. |
| literature/prompts/grep_extraction_output.txt | Generated grep output/provenance helper, not an extraction source. |
No literature/papers_raw/ extraction Markdown was found during this pass. No Quarto or docs files were treated as paper extractions; quarto/future-work.qmd, todo/TODO.md, docs/internal/sites_todo.md, and quarto/background/metrics-and-methodology.qmd were used only as roadmap/methodology context.
Update (2026-05-10): Four extraction files previously not in this register have now been added as LIT-032 through LIT-035: paper-extraction-pew-2020-zero-inflated-crash.md, paper-extraction-mahoney-2023-spatial-cv.md, paper-extraction-gao-2024-stzitd-gnn.md, and paper-extraction-balawi-tenekeci-2024-arima-sarimax-london-aroads.md. The file paper-extraction-chengye-ranjitkar-2013-motorway-nb-regression.md was confirmed as the source file for the existing LIT-009 row and required no new entry.
Update (2026-05-10): Six additional review-pass extraction files have been added as LIT-036 through LIT-041: paper-extraction-huda-2024-network-screening-low-volume-roads.md, paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md, paper-extraction-poch-mannering-1996-nb-intersection.md, paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md, paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md, and paper-extraction-ziakopoulos-yannis-2020-spatial-review.md. paper-extraction-mcfadden-not-stated-conditional-logit.md was removed from this register because it is not a road-safety paper and has no material current relevance to Open Road Risk.
Update (2026-05-10): Four final combined/reconciled records have been added as LIT-042 through LIT-045: paper-extraction-huda-2024-COMBINED.md, paper-extraction-jayasinghe-2019-COMBINED.md, paper-extraction-poch-mannering-1996-COMBINED.md, and paper-extraction-roll-2026-oregon-COMBINED.md. These are now the preferred records for future citation/TODO work for those papers; the earlier extraction files remain in the inventory for provenance.
Update (2026-05-10): Two further combined/reconciled records have been added as LIT-046 and LIT-047: paper-extraction-quddus-wang-ison-COMBINED.md and paper-extraction-ziakopoulos-yannis-2020-COMBINED.md. These move Quddus/Wang/Ison and Ziakopoulos/Yannis out of the active reconciliation queue. All seven candidate Quarto literature pages now exist under quarto/literature/ and have been added to the website Literature menu.