This is a structured, maintainer-facing evidence register for tracing extracted literature evidence into documentation, model checks, and TODOs. It is not the final narrative literature review.
Purpose and scope
This is a maintainable evidence register for Open Road Risk. It is not the final narrative literature review.
- It tracks extracted papers and source files.
- It records methodological relevance to Open Road Risk.
- It separates current repo relevance from future research relevance.
- It records provenance and extraction quality.
- It supports future Quarto literature pages, repo TODOs, and model evaluation.
- It should be updated append-only when new paper extractions are added.
The main source of truth is the existing extraction Markdown in literature/papers_summary/. This register does not re-read source PDFs and does not infer beyond the extraction files. Where extractions are duplicated for the same paper, each extraction file is kept as its own row so provenance across AI tools/models is preserved.
How to update this register
- Add one row to the inventory table for each new extraction file.
- Add or update thematic rows only where the new paper contributes evidence.
- Add repo TODOs only when supported by the extraction.
- Add secondary review flags where needed.
- Do not rewrite existing judgements unless the new paper changes the evidence base.
- Preserve previous source filenames and extraction filenames.
- If importance changes because the repo changes, update
current_repo_relevancebut preservefuture_research_relevance. - Prefer append-only edits. If a judgement changes, add a note explaining why rather than silently replacing earlier context.
- Keep current implementation actions separate from future research ideas.
Extraction inventory
| register_id | extraction_file | source_pdf_filename | paper_title | authors | year | paper_type | geography | road_setting | main_method_or_model | outcome_or_target | exposure_handling | spatial_unit | temporal_unit | validation_type | key_transferable_idea | current_repo_relevance | future_research_relevance | literature_review_relevance | code_actionability_now | supports_production_change | extraction_ai_tool | model_name_if_known | extraction_quality_initial_judgement | secondary_review_needed | secondary_review_reason | notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LIT-001 | paper-extraction-aguero-valverde-2008-crash-frequency-spatial-models.md | Paper08-0088RG.pdf | Analysis of Road Crash Frequency with Spatial Models | Jonathan Aguero-Valverde; Paul P. Jovanis | 2008 | crash-frequency SPF; spatial model comparison | US; Pennsylvania | mixed road classes | Full Bayes spatial CAR vs non-spatial NB | total crashes per segment | VMT from AADT and length; observed/assumed | PennDOT variable-length segments | 5-year aggregate | in-sample model comparison | Spatial correlation can change crash-frequency parameter estimates; VMT-style exposure supports current framing | high | high | high | medium | diagnostic-only | Gemini | Gemini | high | conditional | Manual check only if quoting coefficient/table values | Duplicate source paper with LIT-002; keep both for provenance. |
| LIT-002 | paper-extraction-aguero-valverde-2008-spatial-car-crash-frequency.md | Paper08-0088RG.pdf | Analysis of Road Crash Frequency with Spatial Models | Jonathan Aguero-Valverde; Paul P. Jovanis | 2008 | crash-frequency SPF; spatial autocorrelation assessment | US; Pennsylvania | rural two-lane roads | Full Bayes Poisson lognormal with CAR random effects | annual crash count per segment | AADT as free log covariate; length as fixed offset | rural road segment; intersections and ramps excluded | annual segment-year | in-sample DIC and spatial diagnostics | Test AADT elasticity and spatial residual autocorrelation rather than assume offset is always correct | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Review Table 2 before citing exact elasticity values | Strong action source for Moran’s I, residual maps, and free-AADT elasticity diagnostics. |
| LIT-003 | paper-extraction-al-omari-2021-florida-context-classification-spf.md | Crash_Analysis_And_Development_Of_Safety_Performance_Functions_Fo.pdf | Crash Analysis and Development of Safety Performance Functions for Florida Roads in the Framework of the Context Classification System | Ma’en Mohammad Ali Al-Omari | 2021 | thesis; SPF; network screening | US; Florida | rural to urban context classes; road segments | Negative binomial SPFs; EB network screening | annual crash frequency by segment; KABCO/KABC/PDO variants | observed FDOT AADT; AADT plus length offset and DVMT alternatives | merged homogeneous road segments | annual average over 5 years | mainly in-sample; no holdout noted | Context-class SPFs and urban sub-linear AADT coefficients motivate stratified diagnostics | medium | high | high | medium | baseline-comparison-first | Claude | Claude Sonnet 4.6 | medium | conditional | Thesis, no peer-review/holdout; check counterintuitive PSP coefficient if citing | Useful for junction density/access density and road-class stratification, not direct production change. |
| LIT-004 | paper-extraction-baddeley-2021-analysing-point-patterns-networks.md | 91405.pdf | Analysing point patterns on networks - a review | Adrian Baddeley; Gopalan Nair; Suman Rakshit; Greg McSwiggan; Tilman M. Davies | 2021 | methodological review; point processes | mixed/theoretical | linear networks | Network point processes; network KDE; Cox/Poisson processes | spatial point intensity on a network | traffic volume in example; point-process intensity not traffic exposure | exact point coordinates on continuous linear network | mostly static; spatio-temporal noted | methodological review; no predictive validation | Segment aggregation can hide point-level clustering; avoid ordinary planar KDE for network crashes | medium | high | high | low | diagnostic-only | Gemini | Gemini | high | conditional | Review methods before any spatstat implementation | Critiques link-year aggregation but does not invalidate current pipeline. |
| LIT-005 | paper-extraction-boulieri-2016-space-time-bayesian-severity.md | Boulieri_et_al-2016-Journal_of_the_Royal_Statistical_Society__Series_A_Statistics_in_Society.pdf | A space-time multivariate Bayesian model to analyse road traffic accidents by severity | Areti Boulieri; Silvia Liverani; Kees de Hoogh; Marta Blangiardo | 2016 | Bayesian severity/spatial model | England | mixed road types aggregated to wards | Bayesian hierarchical Poisson lognormal with CAR/MCAR/RW1 effects | ward-year accident counts by slight vs severe/fatal | ward traffic volume from AADF times road length; partly imputed; major-road coverage | electoral ward | annual | in-sample Bayesian model comparison and posterior checks | Severity levels can have distinct spatial structure; offset structure aligns at coarser grain | medium | high | high | medium | pilot-first | Claude | Claude Sonnet 4.6 | high | conditional | Ward-level and MCMC scale limit direct transfer | Good documentation support for severity caveat and year-specific AADT need. |
| LIT-006 | paper-extraction-brodersen-2010-balanced-accuracy-posterior.md | brodersen10post-balacc.pdf | The balanced accuracy and its posterior distribution | Kay H. Brodersen; Cheng Soon Ong; Klaas E. Stephan; Joachim M. Buhmann | 2010 | validation metric; classification methodology | not road-specific | not road-specific | Bayesian posterior distribution of balanced accuracy | binary class-label correctness | not applicable | abstract data point | not stated | cross-validation metric framework | Use balanced accuracy and posterior uncertainty for imbalanced zero/non-zero diagnostics | medium | medium | medium | medium | diagnostic-only | Gemini | Gemini 1.5 Pro | high | conditional | Manual check equations if implementing posterior intervals | Not road-safety evidence; validation reference only. |
| LIT-007 | paper-extraction-brodersen-2010-balanced-accuracy.md | brodersen10post-balacc.pdf | The balanced accuracy and its posterior distribution | Kay H. Brodersen; Cheng Soon Ong; Klaas E. Stephan; Joachim M. Buhmann | 2010 | validation metric; classification methodology | not stated | not road-specific | posterior balanced accuracy estimators | binary classification correctness | not applicable | not stated | not stated | conceptual cross-validation examples | Warn against plain accuracy for rare collision/no-collision diagnostics | medium | medium | medium | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Manual check Equation 7 and MATLAB routines if implementing | Duplicate source paper with LIT-006; use to compare extraction consistency. |
| LIT-008 | paper-extraction-chengye-2013-modelling-motorway-accidents-nb.md | Modelling Motorway Accidents using Negative Binomial Regression.pdf | Modelling Motorway Accidents using Negative Binomial Regression | Pan Chengye; Prakash Ranjitkar | 2013 | motorway SPF; NB regression | New Zealand; Auckland | motorway; urban/rural | Negative binomial regression; GEE considered | annual accident frequency per segment | AADT per lane and length as free log covariates | homogeneous motorway segments; ramp-defined | yearly | in-sample and temporally held-out prediction metrics | Temporal holdout and ramp context diagnostics are useful; standard NB struggles on short/zero-heavy links | medium | high | high | medium | diagnostic-only | Gemini | Gemini | high | conditional | Check Equation 10 before any exposure-specification comparison | Same source family as LIT-009 and LIT-024. |
| LIT-009 | paper-extraction-chengye-ranjitkar-2013-motorway-nb-regression.md | Modelling_Motorway_Accidents_using_Negative_Binomial_Regression.pdf | Modelling Motorway Accidents using Negative Binomial Regression | Pan Chengye; Prakash Ranjitkar | 2013 | motorway SPF; feature importance | New Zealand; Auckland | motorway only | Negative binomial accident prediction model | annual accident count per motorway segment | observed AADT per lane and length as free log covariates; no formal offset | homogeneous motorway mainline segment; ramp crashes excluded | annual segment-year | 2009-2010 temporal holdout after 2004-2008 training | Add temporal holdout, ramp/slip-road diagnostics, per-family overdispersion checks | high | high | high | high | baseline-comparison-first | Claude | Claude Sonnet 4.6 | high | conditional | Review marginal variables due to weak 80% significance threshold | Strong TODO source for validation and motorway facility-family diagnostics. |
| LIT-010 | paper-extraction-cronie-2019-inhomogeneous-linear-network.md | Inhomogeneous higher-order.pdf | Inhomogeneous higher-order summary statistics for linear network point processes | Ottmar Cronie; Mehdi Moradi; Jorge Mateu | 2019 | spatial point-process diagnostics | US example; Houston | road network example | inhomogeneous network F/G/J functions; simulations | accident point locations on linear network | no traffic exposure; spatial intensity reweighting only | point events on linear network | static one-month example | simulation/method diagnostics; no predictive validation | Distinguish exposure-normalised risk from point-pattern clustering diagnostics | low | medium | medium | low | diagnostic-only | ChatGPT | GPT-5.5 Thinking | medium | yes | Equation formatting imperfect; check before implementation | Useful for a small diagnostic pilot only. |
| LIT-011 | paper-extraction-eckardt-2024-marked-point-process-rejoinder.md | Rejoinder on ’Marked spatial point processes_ current state and extensions.pdf | Rejoinder on ‘Marked spatial point processes: current state and extensions to point processes on linear networks’ | Matthias Eckardt; Mehdi Moradi | 2024 | methodological discussion; marked point processes | not stated | not road-specific | marked point process summaries; K/J functions; mark correlation | point events with marks | traffic exposure not applicable; intensity is not exposure | point events on planar or linear networks | not stated | methodological discussion | Marked point-process diagnostics may explore severity/type clustering but not production ranking | low | medium | medium | low | no | ChatGPT | GPT-5.5 Thinking | high | conditional | Check formulas directly if citing | Keep as exploratory diagnostics reference. |
| LIT-012 | paper-extraction-gilardi-2022-leeds-network-lattice-bayesian.md | jrsssa_185_3_1150.pdf | Multivariate hierarchical analysis of car crashes data considering a spatial network lattice | Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace | 2022 | UK network-lattice Bayesian SPF | UK; Leeds | urban/metropolitan major roads | Bayesian hierarchical Poisson INLA with ICAR/PCAR and multivariate severity | OS segment counts by severe and slight crash | length times Census-routed commuter flow as Poisson offset; estimated exposure | OS road segment | 8-year aggregate cross-section | in-sample posterior predictive checks; no holdout | Direct UK support for OS segment lattice and log-offset form; balanced accuracy for sparse severity | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | yes | Check wide Table 2 signs and Primary Road interpretation before citation | Primary UK anchor for Stage 2 documentation, but not external validation. |
| LIT-013 | paper-extraction-gilardi-2022-multivariate-hierarchical-crashes.md | jrsssa_185_3_1150.pdf | Multivariate hierarchical analysis of car crashes data considering a spatial network lattice | Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace | 2022 | Bayesian spatial/severity model | UK; Leeds | urban/metropolitan | INLA Bayesian hierarchical Poisson | car crashes per street segment | length times estimated traffic flow | OS Vector OpenMap Local road segment | 8-year aggregate | in-sample posterior predictive diagnostics | OS link lattice is a credible unit; balanced accuracy useful for zero/non-zero checks | high | high | high | medium | diagnostic-only | Gemini | Gemini 3.1 Pro | high | conditional | Check dodgr contraction details if replicating MAUP test | Duplicate source paper with LIT-012/LIT-014. |
| LIT-014 | paper-extraction-gilardi-2022-network-lattice-crashes.md | jrsssa_185_3_1150.pdf | Multivariate hierarchical analysis of car crashes data considering a spatial network lattice | Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace | 2022 | Bayesian spatial/severity SPF | UK; Leeds | major roads including motorways, primary roads, A roads | bivariate Bayesian hierarchical Poisson models with ICAR/PCAR | slight and severe road traffic collision counts | length times estimated commuter flow as offset | OS road segment with adjacency by shared boundary | yearly available; collapsed to 8-year aggregate | in-sample posterior predictive checks | Spatial autocorrelation limitation and MAUP sensitivity tests are directly relevant | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Manual review noted in extraction for table/sign details | Richest extraction for repo actions from this paper. |
| LIT-015 | paper-extraction-hauer-2001-eb-spf-tutorial.md | SPF_Basic_Tutorial_2001_by_Ezra_Hauer.pdf | Estimating Safety by the Empirical Bayes Method: A Tutorial | Ezra Hauer; Douglas W. Harwood; Forrest M. Council; Michael S. Griffith | 2001 | EB/SPF tutorial | not one geography | segments and intersections | Empirical Bayes expected crash frequency using SPF plus observed counts | expected accidents for road entity | ADT/AADT in SPF; length and years multiply expected count | road segment or intersection entity | annual | worked tutorial; no holdout | EB shrinkage should use correct overdispersion and full year-specific procedure | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Check equations if implementing exact EB changes | Primary EB methodology reference. |
| LIT-016 | paper-extraction-huda-alkaisy-2024-lvr-network-screening.md | dot_78279_DS1.pdf | Network Screening on Low-Volume Roads Using Risk Factors | Kazi Tahsin Huda; Ahmed Al-Kaisy | 2024 | low-volume road network screening | US; Oregon | rural low-volume two-lane roads | OLS on log EB expected crashes; CART thresholds | EB expected crashes per 0.05-mile section | AADT covariate in one model; dropped in another; no offset due fixed length | fixed 0.05-mile sections; intersections excluded | annual aggregate | random split/high R2 on EB output; not raw-count validation | Low-volume links may need geometry-led diagnostics and careful AADT sensitivity framing | medium | high | high | medium | pilot-first | Claude | Claude Sonnet 4.6 | high | yes | Check grade variable ambiguity and avoid over-reading high R2 | Strong for curvature/grade diagnostics, not production thresholds. |
| LIT-017 | paper-extraction-jayasinghe-2019-centrality-aadt.md | 1-s2_0-S2215016119301128-main.pdf | A novel approach to model traffic on road segments of large-scale urban road networks | Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanam | 2019 | AADT estimation; traffic modelling | mixed developing-country cities | urban road networks | OLS/robust/Poisson regressions with dual-graph centrality | AADT/PCU per road segment | AADT is target; observed counts used for calibration | road segment in dual graph | annual AADT | random 80/20 validation; likely spatial leakage | Centrality features and learning curves can inform Stage 1a AADF sparsity diagnostics | high | high | medium | medium | baseline-comparison-first | Claude | Claude Sonnet 4.6 | high | yes | Check low-AADT RMSE and exact final regression type | Traffic-volume paper, not collision-risk evidence. |
| LIT-018 | paper-extraction-jayasinghe-2019-traffic-modeling-centrality.md | 1-s2.0-S2215016119301128-main.pdf | A novel approach to model traffic on road segments of large-scale urban road networks | Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanama | 2019 | AADT estimation; traffic modelling | urban/mixed | urban road networks | centrality-based OLS/RR/Poisson traffic volume model | AADT in PCU | exposure is modelled output | road segment dual graph | yearly average daily traffic | random validation; spatial leakage risk | Stage 1a should report spatial holdouts and sensitivity to count-point sparsity | high | high | medium | medium | baseline-comparison-first | Gemini | Gemini 3.1 Pro | high | conditional | Verify centrality radius/compute feasibility before implementation | Duplicate source paper with LIT-017. |
| LIT-019 | paper-extraction-lord-2010-crash-frequency-review.md | Lord-Mannering_Review.pdf | The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives | Dominique Lord; Fred Mannering | 2010 | methodological review; crash-frequency modelling | mixed | road segments/intersections across reviewed studies | review of Poisson, NB, zero-inflated, random effects, GAM, ML and Bayesian models | crash frequency over roadway units | traffic flow/length/VMT discussed across studies | road segment/intersection/other | mixed | review; no empirical validation | Use as risk checklist: overdispersion, zero-heavy counts, exposure functional form, omitted variables, spatial/temporal correlation | high | high | high | high | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Review large tables if building full model-family comparison | Best general modelling limitations reference. |
| LIT-020 | paper-extraction-ma-2019-xgboost-fatality.md | analyzing-the-leading-causes-of-traffic-fatalities-using-1jznp146gl.pdf | Analyzing the Leading Causes of Traffic Fatalities Using XGBoost and Grid-Based Analysis: A City Management Perspective | Jun Ma; Yuexiong Ding; Jack C. P. Cheng; Yi Tan; Vincent J. L. Gan; Jingcheng Zhang | 2019 | conditional severity/fatality classifier | US; Los Angeles County | mixed urban/peri-urban | XGBoost binary classifier plus grid GIS | fatal vs non-fatal crash given a crash | no traffic exposure; fatality rate not exposure-adjusted | crash record and 60x60 grid | crash-time fields included; no panel | train/test on balanced crash data; no exposure validation | Separate conditional severity/fatality from exposure-adjusted frequency; watch leakage from crash-record features | medium | medium | medium | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Check unusual XGBoost learning-rate details if replicating | Do not compare fatality rate to Stage 2 risk percentile. |
| LIT-022 | paper-extraction-michalaki-2015-motorway-accident-severity-chatgpt.md | 1-s2.0-S0022437515000833-main.pdf | Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model | Paraskevi Michalaki; Mohammed A. Quddus; David Pitfield; Andrew Huetson | 2015 | motorway severity modelling | England | motorway | ordered logit; multilevel ordered logit; generalized ordered logit | accident severity conditional on crash | no formal exposure; time category proxy | accident record | accident-level with broad time categories | in-sample; no held-out validation | Frequency and severity are separate; post-event variables must not leak into Stage 2 predictors | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Check STATS19 hard-shoulder/main-carriageway transfer | Duplicate source paper with LIT-023. |
| LIT-023 | paper-extraction-michalaki-2015-motorway-accident-severity.md | 1-s2.0-S0022437515000833-main.pdf | Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model | Paraskevi Michalaki; Mohammed A. Quddus; David Pitfield; Andrew Huetson | 2015 | motorway severity modelling | England | motorway hard shoulder/main carriageway | partially constrained generalized ordered logistic regression | accident severity | no explicit exposure; severity conditional on crash | location type/accident record | time-of-day/day/month categories | in-sample | HGV/off-peak/hard-shoulder diagnostics belong in severity work, not current frequency model | medium | high | high | medium | diagnostic-only | Gemini | Gemini 3.1 Pro | high | conditional | Check STATS20/STATS19 hard-shoulder encoding before diagnostics | Warns against using number of vehicles/casualties as prospective features. |
| LIT-024 | paper-extraction-pan-2013-motorway-negative-binomial.md | Modelling Motorway Accidents using Negative Binomial Regression.pdf | Modelling Motorway Accidents using Negative Binomial Regression | Pan Chengye; Prakash Ranjitkar | 2013 | motorway SPF; NB regression | New Zealand; Auckland | motorway rural/urban | Poisson/NB/ZINB/GEE tested; NB selected | annual accident frequency per segment-year | observed AADT per lane plus length as free log regressors | homogeneous motorway segment-year; ramp segmentation | yearly | temporal holdout plus in-sample metrics | Facility context, ramp proximity, temporal holdout and geometry sanity checks are useful | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Check table values and equations before formal citation | Duplicate source family with LIT-008/LIT-009. |
| LIT-025 | paper-extraction-pan-2017-deep-belief-network-global-spf.md.md | 1-s2_0-S2046043017300199-main.pdf | Development of a global road safety performance function using deep neural networks | Guangyuan Pan; Liping Fu; Lalita Thakali | 2017 | global SPF; neural model benchmark | Canada/US; multiple regions | mixed highway types | Deep Belief Network; NB benchmarks; Bayesian regularised ANN | annual crash frequency per homogeneous section-year | observed AADT and length as DBN features; NB uses log exposure variants | homogeneous road section | annual segment-year | train/test style performance; metrics mainly MAE/RMSE | DBN with MSE is not suitable for sparse count production; use NB/log-offset comparisons and minimum-length diagnostics | medium | medium | high | medium | baseline-comparison-first | Claude | Claude Sonnet 4.6 | medium | conditional | DBN technical details and crash scope need checking | Has useful negative evidence against neural MSE production changes. |
| LIT-026 | paper-extraction-poch-1996-intersection-negative-binomial.md | Negative_Binomial_Analysis_of_Intersection-Acciden.pdf | Negative Binomial Analysis of Intersection-Accident Frequencies | Mark Poch; Fred Mannering | 1996 | intersection SPF; NB regression | US | urban/suburban intersections | Negative binomial regression | annual accident frequency on intersection approach | turning and intersection traffic volumes as covariates; no formal offset | intersection approach | yearly | no external/held-out validation stated | Junction approach mechanisms are structurally different from link risk; use junction diagnostics/proxies | medium | high | high | medium | pilot-first | ChatGPT | GPT-5.5 Thinking | high | yes | OCR artefacts; check tables before formal literature table | Strong warning about junction under-representation. |
| LIT-027 | paper-extraction-quddus-2010-m25-severity-ordered-response.md | road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf | Road Traffic Congestion and Crash Severity: Econometric Analysis Using Ordered Response Models | Mohammed A. Quddus; Chao Wang; Stephen G. Ison | circa 2010 | motorway severity model; ordered response | UK; M25 | motorway | OLOGIT/HCM/GOLOGIT/PC-GOLOGIT | ordinal crash severity given crash | no exposure; 15-minute flow as severity predictor with 30-min lag | individual crash matched to motorway segment | crash-level 15-minute traffic lag | in-sample ordered response metrics | Use 30-minute pre-crash lag if future WebTRIS crash-level work; separate frequency vs severity | medium | high | high | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | yes | Check dense Table 2 and whether junction crashes excluded | Supports severity/frequency separation and cautious congestion claims. |
| LIT-028 | paper-extraction-roll-2026-oregon-pedestrian-spf.md | dot_89189_DS1.pdf | Developing a Pedestrian Safety Performance Function for Oregon | Josh Roll; Jason Anderson; Nathan McNeil | 2026 | pedestrian/intersection SPF; exposure estimation | US; Oregon | urban intersections | Poisson/NB SPFs; random forest exposure data fusion | pedestrian injury crashes per intersection-year | vehicle AADT and estimated pedestrian AADPT; both partly estimated | urban intersection | annual | 10-fold CV for exposure model; SPF tables partly unreadable | Exposure-only vs full-feature baseline comparisons and CURE plots could inform Stage 2 diagnostics | low | medium | medium | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | medium | yes | Report tables not fully machine-readable; check SPF forms and AADPT metrics | Scope is pedestrian intersections, not link-level all-injury risk. |
| LIT-029 | paper-extraction-wang-2009-m25-congestion-safety.md | Wang_et_al_AAP_Final_submitted1.pdf | Impact of Traffic Congestion on Road Safety: A Spatial Analysis of the M25 Motorway in England | Chao Wang; Mohammed A. Quddus; Stephen G. Ison | circa 2009 | motorway SPF; congestion and spatial model | UK; M25 | motorway | Poisson-lognormal, NB, CAR spatial variants | accident count per motorway segment | observed UKHA AADT and segment length as free log covariates; no offset | junction-to-junction motorway segment; junction crashes excluded | annual aggregate | in-sample Bayesian/model comparison | Motorway AADT elasticity and grade/congestion diagnostics; bearing can improve snapping QA | medium | high | high | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Publication year not in document; DIC differences small | Companion to Quddus severity paper for congestion null result. |
| LIT-030 | paper-extraction-wang-2015-investigating-safety-impacts-suburban-arterials.md | 1805.06381v3.pdf | Investigating Safety Impacts of Roadway Network Features of Suburban Arterials in Shanghai, China | Xuesong Wang; Jinghui Yuan; Grant G. Schultz; Wenjing Meng | 2015 | zonal spatial crash model | China; Shanghai | suburban arterials | Bayesian Poisson-lognormal CAR | total crash frequency on arterials within TAZ | trip productions/attractions and arterial length as exposure proxies; no AADT | Traffic Analysis Zone | yearly | in-sample R2; no true held-out validation | Junction/signal/access-density proxies may matter, but zonal unit is low transferability | low | medium | medium | low | diagnostic-only | Gemini | Gemini 3.1 Pro | high | conditional | Betweenness computed within TAZ, not global; in-sample R2 only | Use as junction/network-complexity prompt, not model benchmark. |
| LIT-031 | paper-extraction-ziakopoulos-2020-spatial-approaches-road-safety.md | A review of spatial approaches in road safety.pdf | A review of spatial approaches in road safety | Apostolos Ziakopoulos; George Yannis | not stated in visible metadata | spatial road-safety review | mixed | mixed | review of spatial/spatio-temporal methods | crash counts/rates/severity/hotspots across reviewed studies | mixed exposure definitions | mixed units: links, intersections, grids, zones, corridors | mixed | review; primary studies need checking for exact claims | Supports spatial validation, MAUP sensitivity, proximity/junction diagnostics and caution about production spatial models | high | high | high | high | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | yes | Year/DOI missing; check primary papers for exact numerical claims | Broad review; do not use alone to justify a production model swap. |
| LIT-032 | paper-extraction-pew-2020-zero-inflated-crash.md | Justification_for_considering_zero-inflated_models_in_crash_frequency_analysis.pdf | Justification for considering zero-inflated models in crash frequency analysis | Timo Pew; Richard L. Warr; Grant G. Schultz; Matthew Heaton | 2020 | zero-inflated model comparison; Bayesian hierarchical count modelling | US; Utah | signalised intersections statewide (urban and rural) | Bayesian hierarchical ZIP; ZINB; NB-Lindley; MCMC via JAGS | annual injury and fatal crash count per intersection | entering vehicles per day as standardised covariate — no formal offset | signalised intersection | annual (2014–2017 fitting; 2018 held out) | temporal holdout (2018); Bayesian chi-squared goodness-of-fit; posterior predictive zero check; WAIC | ZINB improvement over Poisson driven mainly by overdispersion parameter (π ≈ 0); NB GLM with offset is the priority diagnostic step, not full zero-inflation | high | high | high | high | baseline-comparison-first | Claude | Claude Sonnet 4.6 | high | conditional | Verify Table A1 π ≈ 0 finding in original PDF before citing; check prior sensitivity on Beta(0.15,1) | Critical nuance: π posterior mean ≈ 0 in both ZIP and ZINB — improvement over Poisson is from ϕ dispersion, not zero-inflation. Intersection unit not link; counts are much higher than Open Road Risk link-years. No exposure offset — does not challenge Open Road Risk offset design. |
| LIT-033 | paper-extraction-mahoney-2023-spatial-cv.md | ASSESSING_THE_PERFORMANCE_OF_SPATIAL_CROSS-VALIDATION.pdf | Assessing the Performance of Spatial Cross-Validation Approaches for Models of Spatially Structured Data | Michael J Mahoney; Lucas K Johnson; Julia Silge; Hannah Frick; Max Kuhn; Colin M Beier | 2023 | spatial CV methodology; simulation study | simulation (no specific geography) | not road-specific | random forest on simulated spatially structured continuous outcome; five CV method comparisons | simulated continuous outcome (not a crash count) | not applicable | regular 50×50 grid cells | not applicable | cross-landscape prediction as external reference; 100 simulated landscapes | V-fold CV is severely optimistic for spatially autocorrelated data; spatial clustering CV with exclusion buffer ≈ autocorrelation range is the most practical improvement; current grouped-link split does not enforce spatial separation | high | high | medium | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Specific buffer sizes (25–41% of grid length) are simulation-specific and do not transfer directly to road network; must estimate autocorrelation range from Stage 2 residuals first | Not a road safety paper. Simulation uses continuous Gaussian outcome; zero-heavy count generalisation assumed but not tested. BLO3 performs poorly despite large buffers — do not assume larger buffer always helps. Regular grid assumption does not match OS Open Roads geometry. |
| LIT-034 | paper-extraction-gao-2024-stzitd-gnn.md | Uncertainty-Aware_Probabilistic_Graph_Neural_Networks_for_Road-Level.pdf | Uncertainty-Aware Probabilistic Graph Neural Networks for Road-Level Traffic Crash Prediction | Xiaowei Gao; Xinke Jiang; Dingyi Zhuang; Huanfa Chen; Shenhao Wang; Stephen Law; James Haworth | 2024 | probabilistic GNN; zero-inflated Tweedie; road-level crash prediction | UK; London (Lambeth; Tower Hamlets; Westminster) | urban road segments | GRU temporal encoder + GAT spatial encoder + ZITD decoder (STZITD-GNN); baselines include STGCN; STZINB-GNN; STTD-GNN | daily severity-weighted crash risk score per road (y = sum of collision count × severity weight 1/2/3) | no exposure; no offset; no traffic volume data | urban road segment (OS-style link; ~4,700–5,700 nodes per borough) | daily; 2019 only; 8:2:2 within-year temporal split | within-year temporal holdout (no spatial holdout; no cross-year test) | AccHR@k metric is directly applicable to Open Road Risk risk percentile ranking; MPIW/PICP for future probabilistic outputs; Gaussian distributional assumption is clearly worst | medium | high | high | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Verify Table 4 values against original; check whether 8:2:2 split is chronological or random (not stated); GitHub repo may be private | No exposure offset — cannot distinguish high-risk from high-traffic roads; major methodological gap relative to Open Road Risk. Severity-weighted composite response variable not directly comparable to raw injury count. Daily urban scale vs annual national scale: zero-inflation mechanisms differ. GNN architecture not feasible at 2.17M links. Validation: same roads in train and test; single year; no spatial holdout — weaker than current Open Road Risk grouped split. |
| LIT-035 | paper-extraction-balawi-tenekeci-2024-arima-sarimax-london-aroads.md | Time_series_traffic_collision_analysis_of_London_hotspots__Patterns.pdf | Time series traffic collision analysis of London hotspots: Patterns, predictions and prevention strategies | Mohammad Balawi; Goktug Tenekeci | 2024 | ARIMA; SARIMAX; corridor-level time series | UK; London (A1; A3; A4; A6 corridors) | major A-road corridors (aggregate) | ARIMA(5,4,7); SARIMAX(4,1,2)×(4,1,2,8) on daily corridor-level aggregate time series | daily count of vehicles involved in accidents (not accident count — wrong response variable) | no exposure; no AADT; corridor-level aggregate only | four A-road corridors treated as a single aggregate time series | daily; 2016–2019; December 2019 holdout only | single-month temporal holdout (Christmas period); AIC/BIC in-sample | Post-event STATS19 attributes (severity; light condition; road surface) must not enter Stage 2 as features — this paper inadvertently illustrates why | low | low | low | low | no | Claude | Claude Sonnet 4.6 | high — high confidence in the identified problems | no | No secondary review recommended; do not use as evidence for pipeline decisions | CRITICAL: wrong response variable (vehicles involved, not accident count); SARIMAX predicts negative counts (model specification error); R-squared values in Table 3 implausibly high and methodology opaque; ARIMA d=4 order from misconfigured grid search (excluded d=0,1,2); log-likelihood sign inconsistency between ARIMA and SARIMAX tables; 80-20 split described but only 30-day Christmas holdout reported. Published in Heliyon (broad open-access). Do not cite as methodological support for any decision. Retained for completeness of literature search only. |
| LIT-036 | paper-extraction-huda-2024-network-screening-low-volume-roads.md | dot_78279_DS1.pdf | Network Screening on Low-Volume Roads Using Risk Factors | Kazi Tahsin Huda; Ahmed Al-Kaisy | 2024 | low-volume road network screening | US; Oregon | rural low-volume two-lane paved roads | HSM EB expected crashes; CART thresholds; OLS log-linear screening equations | EB expected crashes per 0.05-mile section; crash density for ranking | AADT in HSM SPF and one proposed model; no-volume alternative model; no exposure offset | fixed 0.05-mile roadway sections; intersections excluded | annual expected crashes from 2004-2013 crash data | random 80/20 split against EB expected-crash target; no spatial/temporal holdout | Confirms Huda/Al-Kaisy as diagnostic support for low-volume, curvature/grade, and volume/no-volume sensitivity checks; flags EB-target R2 caveat | medium | high | high | medium | pilot-first | ChatGPT | not stated | high | conditional | Check curvature CART inconsistency and grade treatment before using thresholds | Duplicate source paper with LIT-016. Stronger caveat that adjusted R2 predicts a smooth EB target, not raw future crashes. |
| LIT-037 | paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md | 1-s2.0-S2046043017300199-main.pdf | Development of a global road safety performance function using deep neural networks | Guangyuan Pan; Liping Fu; Lalita Thakali | 2017 | global SPF; DBN/ML benchmark | Canada and US | mixed highway segments | Deep Belief Network with NB benchmarks and pooled/local model comparisons | annual crash/collision frequency per homogeneous segment-year | AADT and length as DBN inputs or NB exposure-like covariates; no clearly fixed offset | homogeneous highway segments; coarse compared with OS Open Roads links | annual segment-year; temporal holdouts for Ontario/Colorado | temporally held-out MAE/RMSE for Ontario/Colorado; Washington split not fully stated; no spatial holdout | Supports temporal holdout, local-vs-global/facility-family comparisons, and short-segment sensitivity; not production DBN | medium | medium | high | medium | baseline-comparison-first | ChatGPT | GPT-5.5 Thinking | high | conditional | Check Washington years/split and DBN normalization if reproducing | Duplicate source paper with LIT-025; reinforces that DBN should be benchmark-only without ranking/spatial validation. |
| LIT-038 | paper-extraction-poch-mannering-1996-nb-intersection.md | Negative_Binomial_Analysis_of_Intersection-Acciden.pdf | Negative Binomial Analysis of Intersection-Accident Frequencies | Mark Poch; Fred Mannering | 1996 | intersection approach SPF; NB regression | US; Bellevue, Washington | urban/suburban intersections | Negative binomial regression by intersection approach and crash type | annual accident frequency per intersection approach | approach turning/opposing/intersection traffic volumes as covariates; no offset | intersection approach | annual; 1987-1993 | in-sample rho-squared and likelihood tests; no held-out validation | Stronger second extraction for overdispersion and junction-approach mechanisms; confirms in-sample-only limitations | medium | high | high | medium | pilot-first | Claude | Claude Sonnet 4.6 | high | conditional | Check Table 1 coefficients and likelihood-ratio test values before citing | Duplicate source paper with LIT-026; improves confidence despite old validation standards. |
| LIT-039 | paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md | road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf | Road Traffic Congestion and Crash Severity: An Econometric Analysis Using Ordered Response Models | Mohammed A. Quddus; Chao Wang; Stephen G. Ison | 2010 / manuscript year unclear | motorway severity model; ordered response | UK; M25 | motorway | OLOGIT/HCM/GOLOGIT/PC-GOLOGIT | ordered crash severity conditional on crash | 15-minute traffic flow and congestion matched with 30-minute lag; no exposure offset because target is severity conditional on crash | crash records assigned to 72 motorway segments | crash-level; 2003-2006; 15-minute traffic state lag | in-sample ordered-response fit and marginal effects; no held-out validation | Confirms severity/frequency separation, lagged traffic-state design, and conditional interpretation caveat | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Check published ASCE citation year and Tables 2-3 against final version | Duplicate source paper with LIT-027; clearer on in-sample metrics and conditional severity target. |
| LIT-040 | paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md | dot_89189_DS1.pdf | Developing a Pedestrian Safety Performance Function for Oregon | Josh Roll; Jason Anderson; Nathan McNeil | 2026 | pedestrian/intersection SPF; exposure estimation | US; Oregon | urban intersections | Poisson/NB SPFs; pedestrian-volume data fusion; random forest/XGBoost/NN exposure models | pedestrian crash frequency at intersections | vehicle AADT and estimated pedestrian AADPT as explanatory exposure variables; no explicit offset | urban intersection; contracted complex nodes | annual average exposure; crash outcome years not fully stated in SPF sections | exposure model 10-fold CV; SPF validation details/table extraction require care | Supports junction/intersection future work, exposure-only vs proxy comparisons, and vulnerable-user exposure caveats | low | medium | medium | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | yes | Long report; check SPF equations, crash-year window, and AADPT/AADT metrics before citing | Duplicate source paper with LIT-028; broader report extraction confirms report-table review still needed. |
| LIT-041 | paper-extraction-ziakopoulos-yannis-2020-spatial-review.md | A_review_of_spatial_approaches_in_road_safety.pdf | A review of spatial approaches in road safety | Apostolos Ziakopoulos; George Yannis | not explicitly stated; circa 2020 | spatial road-safety review | international review | mixed | review of spatial units, spatial models, MAUP, proximity, network KDE, VRU approaches | mixed crash counts/rates/severity/hotspots across reviewed studies | mixed: AADT, VMT/VDT, trips, road length, population; not offset-specific | mixed: links, intersections, grids, zones, regions, network lixels | mixed across reviewed studies | review-level synthesis; no single validation protocol | Second extraction reinforces spatial-unit, MAUP, junction-segment, and network-KDE cautions; exact primary-study values need source checks | high | high | high | high | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Check primary papers before using numerical claims from review tables | Duplicate source paper with LIT-031; improves confidence for high-level caution but not production model choice. |
| LIT-042 | paper-extraction-huda-2024-COMBINED.md | dot_78279_DS1.pdf | Network Screening on Low-Volume Roads Using Risk Factors | Kazi Tahsin Huda; Ahmed Al-Kaisy | 2024 | combined reconciliation record; low-volume road network screening | US; Oregon | rural low-volume two-lane paved roads | HSM EB expected crashes; CART thresholds; OLS log-linear screening equations | EB-smoothed expected crashes per 0.05-mile section; crash density for ranking | AADT in HSM SPF/EB target and one proposed model; deliberate no-volume comparator; no exposure offset | fixed 0.05-mile roadway sections; intersections excluded | annual expected crashes from 2004-2013 crash data | random 80/20 split against EB expected-crash target; no spatial/temporal holdout | Canonical record clarifies EB target, no-offset structure, volume/no-volume scope, and curvature/grade caveats for low-volume diagnostics | medium | high | high | high | pilot-first | ChatGPT | GPT-5.5 Thinking | high | conditional | Curvature CART sharp-group value is internally inconsistent; grade should not be cited as final-model predictor without caution | Combined record from original PDF plus LIT-016 and LIT-036; use this row for future Huda/Al-Kaisy citations. |
| LIT-043 | paper-extraction-jayasinghe-2019-COMBINED.md | 1-s2.0-S2215016119301128-main.pdf | A novel approach to model traffic on road segments of large-scale urban road networks | Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanama | 2019 | combined reconciliation record; AADT estimation / traffic-volume modelling | Sri Lanka; Cambodia; Vietnam; Pakistan; Tanzania | urban road networks | centrality-based traffic-volume model using betweenness, closeness, and path-distance weighting | AADT / PCU per road segment | AADT is the modelled target; observed counts used for calibration/validation; no collision exposure offset | road segment in dual-graph road network | cross-sectional annual AADT base year by city | random 80/20 validation plus calibration-sample learning curve; no spatial holdout | Canonical record supports Stage 1a centrality diagnostics, learning curves, AADT-band errors, and warnings about random spatial leakage | high | high | medium | high | baseline-comparison-first | ChatGPT | GPT-5.5 Thinking | high | conditional | Final selected regression type is implied but not fully documented across OLS/robust/Poisson alternatives | Combined record from original PDF plus LIT-017 and LIT-018; traffic-exposure paper, not Stage 2 collision-risk evidence. |
| LIT-044 | paper-extraction-poch-mannering-1996-COMBINED.md | Negative_Binomial_Analysis_of_Intersection-Acciden.pdf | Negative Binomial Analysis of Intersection-Accident Frequencies | Mark Poch; Fred Mannering | 1996 | combined reconciliation record; intersection approach SPF | US; Bellevue, Washington | urban/suburban intersections | Negative binomial regression for total and accident-type approach counts | annual accident frequency per intersection approach | approach and turning traffic volumes as covariates; no formal offset | intersection approach | annual observations from 1987-1993, excluding improvement year | in-sample likelihood/rho-squared diagnostics; no held-out validation | Canonical record confirms junction/approach mechanisms and NB-over-Poisson relevance while warning against link-level coefficient transfer | medium | high | high | medium | pilot-first | ChatGPT | GPT-5.5 Thinking | high | conditional | Exact accident-type table values should still be checked before formal publication because OCR is imperfect | Combined record from original PDF plus LIT-026 and LIT-038; use this for junction/intersection evidence. |
| LIT-045 | paper-extraction-roll-2026-oregon-COMBINED.md | dot_89189_DS1.pdf | Developing a Pedestrian Safety Performance Function for Oregon | Josh Roll; Jason Anderson; Nathan McNeil | 2026 | combined reconciliation record; pedestrian/intersection SPF and exposure data fusion | US; Oregon | urban intersections | Poisson/NB pedestrian SPFs; random-forest AADT/AADPT data fusion; CURE-style diagnostics | pedestrian crash frequency at intersections | vehicle AADT and estimated pedestrian AADPT as explanatory exposure variables; no explicit offset | urban intersection with contraction of complex nodes | annual average exposure; final SPF crash period not fully stated | AADPT model 10-fold CV; final crash SPF diagnostics mainly in-sample; no clear held-out SPF validation | Canonical record supports exposure-only baselines, CURE diagnostics, Stage 1a distribution checks, and separate future junction/pedestrian layer | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | medium-high | conditional | Check appendices only if exact SPF coefficients, exposure-only comparisons, or crash-assignment rules are needed | Combined record from original PDF plus LIT-028 and LIT-040; use this for Roll/Oregon pedestrian SPF citations. |
| LIT-046 | paper-extraction-quddus-wang-ison-COMBINED.md | road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf | Road Traffic Congestion and Crash Severity: An Econometric Analysis Using Ordered Response Models | Mohammed A. Quddus; Chao Wang; Stephen G. Ison | not clearly stated; circa 2010 | combined reconciliation record; motorway conditional severity model | UK; M25 | motorway | ordered logit; heteroskedastic choice model; generalized ordered logit; partially constrained generalized ordered logit | ordered crash severity conditional on crash occurrence | no exposure offset; 15-minute traffic flow/congestion assigned to crash records using 30-minute pre-crash lag | individual crash record matched to 72 motorway segments | crash-level records from 2003-2006 with 15-minute traffic state lag | in-sample ordered-response model fit and marginal effects; no held-out validation | Canonical record clarifies conditional severity scope, pre-crash traffic-state matching, no-frequency interpretation, and post-event leakage cautions | medium | high | high | medium | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Published version year/bibliographic details and exact Tables 2-3 should be checked before formal numeric reporting | Combined record from original PDF plus LIT-027 and LIT-039; use this for Quddus/Wang/Ison severity citations. |
| LIT-047 | paper-extraction-ziakopoulos-yannis-2020-COMBINED.md | A review of spatial approaches in road safety.pdf | A review of spatial approaches in road safety | Apostolos Ziakopoulos; George Yannis | not explicitly stated; circa 2020 | combined reconciliation record; spatial road-safety review | international review | mixed | review of spatial units, MAUP, spatial dependence, proximity structures, network KDE, GWR/CAR/SAR and spatio-temporal methods | mixed crash counts, rates, severity outcomes, hotspot classifications, and spatial crash distributions | mixed: AADT, VMT/VDT, road length, population, trips, and vulnerable-road-user exposure variables; no single offset structure | mixed units: segments, intersections, corridors, grids, zones, regions, and network lixels | mixed across reviewed studies | review-level synthesis; primary studies need checking for exact method/validation claims | Canonical record supports spatial-unit documentation, MAUP/hotspot sensitivity notes, spatial residual diagnostics, and caution against production spatial models from review evidence alone | high | high | high | high | diagnostic-only | ChatGPT | GPT-5.5 Thinking | high | conditional | Check original cited papers before using exact study-level model specifications, validation methods, or numerical claims | Combined record from original PDF plus LIT-031 and LIT-041; use this for spatial-methods review citations. |
| LIT-048 | paper-extraction-quddus-2007-inar-time-series-count.md | AAP_2007_INAR_revised_Final.pdf | Time Series Count Data Models: An Empirical Application to Traffic Accidents | Mohammed A. Quddus | 2007 | time-series count modelling; intervention analysis | Great Britain; London congestion charging zone | national aggregate and urban area aggregate | ARIMA/SARIMA; negative binomial; INAR(1) Poisson | annual fatalities; monthly casualties | VKT or total monthly accidents as control variables; not segment-level offset | aggregate national or London CC zone time series | annual and monthly | temporal holdout | Temporal holdout and serial-correlation diagnostics support adding year holdout and cluster/ACF checks | high | medium | high | high | diagnostic-only | not stated | not stated | high | conditional | Aggregate time series, not link-level SPF; check exact INAR estimates before formal numeric citation | Supports validation-and-metrics and crash-frequency temporal-dependence notes. |
| LIT-049 | paper-extraction-mensah-hauer-1998-two-problems-averaging.md | 263HauerMensahTwoproblemsofaveraging___.pdf | Two Problems of Averaging Arising in the Estimation of the Relationship Between Accidents and Traffic Flow | Abraham Mensah; Ezra Hauer | 1998 | SPF theory; traffic-flow averaging bias | illustrative; New York State rural road data used in example | rural two-lane illustrative example | theoretical SPF functions and averaging derivations | expected accident frequency | traffic flow q; AADT as averaged flow argument | road section | theoretical one-year observation context | not applicable; theoretical paper | Argument-averaging and function-averaging support WebTRIS/time-profile diagnostics and free-elasticity checks | high | high | high | medium | diagnostic-only | not stated | not stated | high | conditional | Theoretical paper; use for diagnostic rationale, not production temporal feature claim | Supports exposure-and-traffic-volume and crash-frequency temporal-exposure caveats. |
| LIT-050 | paper-extraction-qin-et-al-2006-bayesian-hourly-exposure.md | AAP-2006-Hourlyexposure-1tfliyv_Bayesian_estimation_of_hourly_exposure_functions_by_crash_type_and_time_of_day.pdf | Bayesian estimation of hourly exposure functions by crash type and time of day | Xiao Qin; John N. Ivan; Nalini Ravishanker; Junfeng Liu; Donald Tepas | 2006 | hourly exposure / crash-type SPF | USA; Michigan and Connecticut | rural two-lane highways | hierarchical Bayesian binary logistic regression | hourly crash occurrence by crash type | hourly directional volume and segment length; additive or multiplicative exposure functions | road segment | hourly observations across 1995-1997 and 1995-2000 datasets | no heldout split; posterior estimation | Flow-crash relationships differ by crash type and time-of-day, supporting temporal-profile and SV/MV diagnostic caveats | high | high | high | medium | diagnostic-only | not stated | not stated | high | conditional | Rural US two-lane scope and no holdout; do not transfer coefficients directly | Supports exposure-and-traffic-volume and crash-frequency function-averaging notes. |
| LIT-051 | paper-extraction-dutta-2020-freeway-crash-prediction-disaggregate-flow.md | dot_54482_DS1.pdf | Improving Freeway Crash Prediction Models Using Disaggregate Flow State Information | Nancy Dutta; Michael D. Fontaine | 2020 | freeway SPF; temporal flow disaggregation | US; Virginia | freeway; rural and urban | negative binomial GLMs; ZINB tested; GLMMs | crash frequency on freeway segments | AADT baseline; average hourly, average 15-minute, and raw hourly volume alternatives; length offset | directional basic freeway segment | 2011-2017 | random 70/30 train/test split | Smoothed hourly flow can outperform AADT, while raw noisy hourly data can underperform; supports cautious WebTRIS temporal diagnostics | high | high | high | medium | diagnostic-only | not stated | not stated | high | conditional | Freeway sensor coverage and random split limit transfer; use improvements as upper-bound context | Supports exposure temporal-conditioning and CURE validation notes. |
| LIT-052 | paper-extraction-sung-et-al-2024-modified-temporal-spf.md | Development_of_Modified_Temporal_Safety_Performanc.pdf | Development of Modified Temporal Safety Performance Function Considering Various Time Flows | Yeji Sung; Seunghwan Kim; Juneyoung Park; Ling Wang | 2024 | temporal SPF; machine-learning comparison | South Korea | motorway / national highway | NB regression; RF; XGBoost; LightGBM; Dirichlet-weighted ensemble | crash frequency by segment and aggregation period | VDS traffic volume at annual/hourly/15-minute aggregation; segment length and lanes | highway cone-zone segment | 2018-2022 | random 8:2 split; no spatial holdout | Temporal flow disaggregation may improve SPF performance, but validation and sampling weaken direct transfer | medium | high | high | medium | diagnostic-only | not stated | not stated | high | conditional | Specific metrics have low transferability due random split and balanced sampling | Supports temporal-exposure notes and validation caveats about random split optimism. |
| LIT-053 | paper-extraction-savolainen-et-al-2011-severity-modelling-review.md | Savolainen-Mannering-AAP-2011.pdf | The Statistical Analysis of Highway Crash-Injury Severities: A Review and Assessment of Methodological Alternatives | Peter T. Savolainen; Fred L. Mannering; Dominique Lord; Mohammed A. Quddus | 2011 | severity modelling methodological review | review; primarily US literature | mixed | review of ordered, multinomial, nested, mixed, and joint severity models | crash-injury severity | not applicable; severity models condition on crash occurrence | individual crash / occupant | mixed | review; no primary validation | Severity is a separate estimand; post-event variables and spatial/temporal correlation require careful interpretation | high | high | high | medium | diagnostic-only | not stated | not stated | high | conditional | Review paper; check primary papers for exact empirical claims | Supports severity-modelling and validation serial-correlation cautions. |
| LIT-054 | paper-extraction-roshandel-2015-realtime-traffic-freeway-crash.md | ImpactofReal-timeTrafficCharacteristicsonFreewayCrashOccurrence-SystematicReviewandMeta-analysis.pdf | Impact of Real-time Traffic Characteristics on Freeway Crash Occurrence: Systematic Review and Meta-analysis | Saman Roshandel; Zuduo Zheng; Simon Washington | circa 2015; not confirmed | systematic review and meta-analysis; real-time crash prediction | international review | freeway | review of logistic and machine-learning real-time crash prediction models | binary crash occurrence | traffic variables as predictors; no explicit exposure offset | freeway segment | varies across reviewed studies | varies; many temporal/location splits | Behavioural/unobserved factors limit road-environment prediction, supporting cautious decision-support framing | medium | high | high | low | documentation-only | not stated | not stated | high | yes | Publication year and journal details need confirmation before formal citation | Supports structural explanatory ceiling note in validation page. |
| LIT-055 | paper-extraction-national-highways-2022-comparing-collision-casualty-rates.md | statistical-methods-for-comparing-road-collision-and-casualty-rates-proposed-approach.pdf | Statistical methods for comparing road traffic collision and casualty rates: proposed approach | National Highways; individual authors not stated | 2022 | official methodology; rate comparison and hypothesis testing | England; National Highways network context | mixed; motorway focus in conclusions | non-homogeneous Poisson process; compound Poisson; parametric bootstrap; Monte-Carlo likelihood-ratio test | collision and casualty rates per vehicle mile | vehicle miles as rate denominator / Poisson scale parameter; assumes observed traffic | road, road type, or period aggregate | aggregate collection period | fictitious worked example only; no empirical validation | UK official support for Poisson exposure-rate form, low-count inference caution, and traffic-denominator sensitivity | high | medium | high | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Proposed approach, not finalised empirical validation; check final status before implementing exact tests | Supports exposure, validation, and severity/casualty-rate notes. |
| LIT-056 | paper-extraction-dft-2024-rsf-initial-analysis.md | Road_safety_factors_initial_analysis-_GOV_UK.pdf | Road Safety Factors: Initial Analysis | Department for Transport | 2024 | official statistics; CF to RSF transition analysis | Great Britain | mixed | descriptive statistics and CF-to-RSF mapping analysis | road safety factor distributions in fatal collisions | not applicable | collision-level factor categories; aggregate reporting | 2022 fatal collision mapping and late-2023 early RSF data | official descriptive analysis; no predictive validation | CF to RSF mapping is a structural reclassification, not a like-for-like time-series continuation | medium | medium | high | low | documentation-only | Claude | Claude Sonnet 4.6 | high | conditional | Full ODS mapping table not checked in extraction | Supports transferability/open-data and severity leakage/provenance notes. |
| LIT-057 | paper-extraction-dft-2025-guide-cf-rsf-transition.md | Guide_to_road_safety_and_contributory_factors_for_reported_road_casualties_Great_Britain_-_GOV_UK.pdf | Guide to Road Safety and Contributory Factors for Reported Road Casualties Great Britain | Department for Transport | 2025 | official guidance; STATS19 CF/RSF data quality | Great Britain | mixed | methodological guidance and coverage tables | contributory factor and road safety factor recording | not applicable | collision-level factor fields; aggregate tables | 2015-2024 coverage and transition status | official guidance; no predictive validation | CF/RSF fields are subjective, partially recorded, post-event, and structurally broken across 2024 transition | high | medium | high | low | documentation-only | Claude | Claude Sonnet 4.6 | high | conditional | Numeric force-code mapping should be verified against STATS19 lookup | Canonical source for CF/RSF availability, coverage, and transition caveats. |
| LIT-058 | paper-extraction-dft-2025-reported-road-casualties-gb-2024.md | Reported_road_casualties_Great_Britain__annual_report__2024_-_GOV_UK.pdf | Reported Road Casualties Great Britain, Annual Report: 2024 | Department for Transport | 2025 | official statistics; national casualty and exposure rates | Great Britain | mixed | descriptive official statistics | police-reported casualties and collisions by severity, road type, and user type | national vehicle miles denominators for rates; not link-level AADT | national and road-type aggregates | 2024 annual statistics | descriptive statistics; no predictive validation | Current UK under-reporting, severity-adjustment, road-type rate, and RSF transition context for documentation | high | medium | high | low | documentation-only | Claude | Claude Sonnet 4.6 | high | conditional | Use adjusted/unadjusted severity figures carefully when comparing to raw STATS19 pipeline outputs | Supports severity limitations and national benchmark context. |
| LIT-059 | paper-extraction-wang-2011-two-stage-severity-ranking.md | Predicting_accidents.pdf | Predicting accident frequency at their severity levels and its application in site ranking using a two-stage mixed multivariate model | Chao Wang; Mohammed A. Quddus; Stephen G. Ison | 2011 | prediction / hotspot detection / two-stage severity-frequency model | England; M25 motorway and surrounding major roads | motorway / major A roads | Bayesian spatial count model plus mixed logit severity model | annual fatal, serious injury, and slight injury accident counts per segment | complete observed HA traffic counts; log(AADT) and log(length) in frequency model; cost-rate normalised by vehicle-km | directional road segment between junctions | 2003-2007 panel | in-sample MAD only; no held-out validation | Severity-disaggregated frequency model supports future severity methodology; log(length) near 1 supports exposure-offset documentation with motorway/major-road scope caveat | medium | high | high | medium | documentation-only | Claude | Claude Sonnet 4.6 | high | conditional | Check published version if quoting exact table values | Relevant stage: Stage 2, severity, documentation. Transferability: medium. Complete HA traffic counts assumed for all segments, so data assumptions do not transfer to Open Road Risk’s minor road network. |
| LIT-060 | paper-extraction-khodadadi-2021-NB-parameterisations-NFAS-SPF.md | 2021__A__Khodadadi_NFAS.pdf | Application of different negative binomial parameterizations to develop safety performance functions for non-federal aid system roads | Ali Khodadadi; Ioannis Tsapakis; Subasish Das; Dominique Lord; Yingfeng Li | 2021 | SPF development; NB parameterisation comparison; Bayesian count modelling for zero-heavy low-volume crash data | USA; Virginia | rural and urban local low-volume NFAS roads (AADT ≤ 2347 vpd); non-federal aid system (6R; 7R; 7U) | Six NB parameterisations (NB-1; NB-2; NB-P; NB1-L; NB2-L; NBP-L) × five dispersion structures; full Bayesian MCMC (rjags); WAIC and LOO model selection; CURE plots | 5-year aggregate crash count per road segment (all injury and property damage; intersection crashes excluded; no severity stratification) | Ln(AADT) and length as free predictors with estimated elasticities (AADT ~0.63–0.74; length ~0.47–0.68); no fixed log-offset; poor-quality AADT records excluded | NFAS road segment (variable length; mean 1.37 mi rural; 0.40 mi urban local) | 5-year aggregate cross-section; no panel; no within-year structure | Approximate PSIS-LOO and WAIC from Bayesian posterior (no external holdout; no spatial or temporal split) | NB-L models outperform NB-2 by WAIC ≈ 600 units when zero proportion ≥ 37% and skewness ≥ 1.92; length elasticity statistically < 1.0 in all 30 model variants; CURE plots as standard SPF residual diagnostic; WAIC/LOO preferred over DIC for Bayesian model comparison | high | high | high | high | baseline-comparison-first | Claude | Claude Sonnet 4.6 | high | conditional | Verify “AADT over 5 years” label means daily flow rate not 5-year total (confirmed by max 2347 vpd); check underlined coefficient counts match stated 95% HPD criterion in Tables 2–4 | NB-L superiority corroborated across WAIC; LOO; MAD; and CURE plots — not a single measure. Open Road Risk at ~98–99% link-year zeros is in more extreme sparsity regime than this paper’s 37%; the NB-L case is stronger. Full Bayesian MCMC infeasible at 2.17M links; frequentist or sampled NB-L recommended first. Also updates evidence base for LIT-TODO-002; LIT-TODO-016; and LIT-TODO-022. |
| LIT-061 | paper-extraction-asumadu-2015-poisson-NB-Ghana-road-accidents.md | Comparative_Assessment_Of_Poisson_And_Ne.pdf | Comparative Assessment Of Poisson And Negative Binomial Regressions As Best Models For Road Count Data | Oppong Richard Asumadu; Assuah Charles Kojo; Asiedu-Addo Samuel Kwesi | 2015 | Poisson vs NB model comparison; descriptive analysis | Ghana | national aggregate; no road class or spatial disaggregation | Poisson GLM and NB GLM with log link; MLE in R (glm; glm.nb) | count of road fatalities per day-of-week per year (national total; fatal only; no exposure offset) | none — no traffic volume; no offset; raw fatality count only | national aggregate (Ghana) | annual totals by day of week; 2001–2010; 70 observations | none; in-sample AIC and deviance comparison only | Confirmatory: NB reduces overdispersion vs Poisson (Poisson dispersion 2.297; NB 1.290; ΔAIC 12.5). Low transferability; better evidenced by Khodadadi 2021 for road SPF context. | low | low | low | low | no | Claude | Claude Sonnet 4.6 | high | no | Paper is simple and short; findings clearly stated; no DOI; low-visibility open-access journal | No exposure offset — day-of-week coefficients confound traffic volume with risk. Ghana national aggregate has no spatial disaggregation. Do not cite as primary evidence for NB over Poisson in SPF context; Khodadadi 2021 is the appropriate reference. Retained for literature-search completeness only. |
| LIT-062 | paper-extraction-verhoef-boveng-2007-quasipoisson-vs-NB-overdispersion.md | QUASI-POISSON_VS__NEGATIVE_BINOMIAL_REGRESSION__HOW_SHOULD_WE_MOD.pdf | Quasi-Poisson vs. Negative Binomial Regression: How Should We Model Overdispersed Count Data? | Jay M. Ver Hoef; Peter L. Boveng | 2007 | statistical methods (ecology application; not road safety) | USA; Alaska (harbor seal aerial surveys) | not applicable | quasi-Poisson GLM; NB GLM; IWLS weight derivation; variance-mean diagnostic plot | harbor seal counts per survey site (ecology); relevant as statistical methods reference only | not applicable; no exposure offset | individual haul-out site (423 sites) | 10-day survey period (1998) | none; methods paper | Quasi-Poisson and NB differ in IWLS weights: QP weights scale linearly with mean (high-count observations dominate); NB weights level off at 1/κ (low-count observations get more relative influence); variance-mean diagnostic plot distinguishes them; AIC cannot compare QP vs NB directly | medium | medium | medium | medium | diagnostic-only | Claude | Claude Sonnet 4.6 | high | conditional | Verify IWLS weight derivations (Eqs. 4–5) before implementing variance-mean diagnostic; abundance estimates do not need checking | Ecology paper — application does not transfer; statistical methodology does. For Open Road Risk’s ranking-across-all-links goal; NB’s weighting scheme is more appropriate than QP on scientific grounds. Paper explicitly states “no general answer” — the goal determines the choice. |
Thematic evidence matrix
Crash-frequency and count modelling
| paper | method | what it supports | what it does not support | relevance to current Stage 2 | actionability |
|---|---|---|---|---|---|
| Aguero-Valverde & Jovanis 2008; Claude and Gemini extractions | Poisson/NB/Poisson-lognormal spatial crash-frequency models | Count modelling with exposure, overdispersion, and spatial residual diagnostics | Direct national-scale CAR production model | High: Stage 2 is a count/ranking model with exposure | Run diagnostics for AADT elasticity, residual spatial autocorrelation, and spatial uncertainty notes. |
| Lord & Mannering 2010 | broad crash-frequency methodological review | Conservative framing around overdispersion, zero-heavy outcomes, omitted variables, exposure functional form | Any single best model family | High: maps directly to Stage 2 risks | Add modelling limitations and baseline comparison tables. |
| Chengye & Ranjitkar 2013; three extractions | NB motorway segment models with temporal holdout | Temporal holdout, ramp/facility-family diagnostics, motorway-specific geometry checks | Direct replacement of link-level model or uncritical coefficient transfer | Medium: motorway subset only | Add temporal holdout and ramp/slip-road diagnostic. |
| Gilardi et al. 2022; three extractions | Bayesian Poisson network lattice with spatial/severity effects | OS-segment count modelling, log-offset structure, balanced accuracy diagnostics | External validation of Open Road Risk or national-scale INLA production | High: closest UK link-network literature | Add documentation and balanced accuracy diagnostic, not production spatial model. |
| Al-Omari 2021 | NB SPFs by context class with EB screening | Context/facility stratification and urban exposure elasticity diagnostics | Direct coefficient transfer from Florida thesis | Medium | Baseline comparison of global vs road-family/context split models. |
| Hauer et al. 2001 | EB tutorial using SPF prior plus observed counts | EB shrinkage, regression-to-mean warning, overdispersion role | A specific predictive model for Open Road Risk | High for EB diagnostic layer | Audit EB formula and document approximation. |
| Pan et al. 2017 | DBN vs NB global SPF | NB benchmark and minimum segment-length sensitivity | DBN/MSE as production model for sparse injury counts | Medium | Use as baseline-comparison and methods-to-avoid evidence. |
| Pew et al. 2020 | Bayesian ZIP; ZINB; NB-Lindley on Utah intersection panel | Methodological justification for ZINB as candidate; posterior predictive zero check; NB GLM as priority diagnostic step | Full Bayesian MCMC at 2.17M links; intersection-unit coefficients; no exposure offset | High: π ≈ 0 finding means NB GLM with offset is the right first step, not full zero-inflation | Fit NB GLM candidate; run posterior predictive zero check on current Poisson GLM. |
| Gao et al. 2024 | STZITD-GNN (GRU + GAT + zero-inflated Tweedie) on London urban road-day data | AccHR@k ranking metric; MPIW/PICP uncertainty metrics (future); Tweedie GLM as intermediate candidate | Full GNN at national scale; no exposure offset; daily urban resolution; severity-weighted composite not raw count | Medium: AccHR@k metric is immediately applicable; architecture does not transfer | Implement AccHR@k as validation metric for Stage 2 risk percentile output. |
| Khodadadi et al. 2021 | Six NB parameterisations × five dispersion structures; Bayesian MCMC; WAIC/LOO; CURE plots | NB-L models strongly outperform NB-2 for zero-heavy low-sample-mean data (WAIC advantage ~600; CURE convergence); length elasticity < 1.0 confirmed across all parameterisations; WAIC/LOO preferred over DIC for Bayesian hierarchical model comparison | Full Bayesian MCMC infeasible at 2.17M links; Virginia NFAS roads (low-volume; no geometry features); coefficient values not transferable to UK | High: most directly relevant paper for Stage 2 model family choice after Pew 2020 | Compute skewness and zero proportion of link-year crash distribution; implement CURE plots; test NB-2 then NB-L as Stage 2 candidates. |
| Ver Hoef & Boveng 2007 | Quasi-Poisson vs NB IWLS weight derivation; variance-mean diagnostic | Theoretical framework for choosing between QP and NB: QP favours high-count observations; NB gives relatively more influence to low-count observations; variance-mean diagnostic distinguishes them; AIC invalid for QP vs NB comparison | Ecology application context; no exposure offset; no road safety data | Medium: scientific justification for NB over QP for Open Road Risk’s ranking-across-all-links goal | Add variance-mean diagnostic plot before NB implementation; document AIC limitation for QP vs NB comparison. |
| Asumadu et al. 2015 | Poisson vs NB comparison on Ghana national fatality data | Confirmatory: NB reduces overdispersion vs Poisson (Poisson dispersion 2.297; NB 1.290; ΔAIC 12.5) | No exposure offset; national aggregate; Ghana-specific; fatality only; no SPF structure | Low: better evidenced by Khodadadi 2021 and Lord 2010 | No action; retained for literature-search completeness only. |
Exposure and traffic-volume handling
| paper | exposure treatment | transferable part | non-transferable part | implication for AADF/WebTRIS | actionability |
|---|---|---|---|---|---|
| Gilardi et al. 2022 | offset = segment length times estimated commuter flow | Same mathematical log-offset family on UK OS segments | Census commuter flow is weaker than AADF/AADT | Supports documenting Open Road Risk’s AADT x length offset as literature-aligned | Documentation note; no production change. |
| Hauer et al. 2001 | ADT/AADT in SPF; length and years scale expected count | Year-specific exposure and EB weighting logic | Tutorial examples not full pipeline | Supports using year-specific AADT in EB diagnostic | Audit/upgrade EB diagnostic. |
| Aguero-Valverde & Jovanis 2008 | AADT free coefficient; length offset | Test whether AADT elasticity differs from 1.0 | Rural US scope and intersection exclusion | Run diagnostic freeing AADT coefficient from fixed VMT offset | Diagnostic only. |
| Wang et al. 2009 | AADT and length as free covariates, not offset | Motorway-specific AADT elasticity check | No sparse AADF estimation and long segments | Motorway AADT coefficient may differ by road class | Motorway-only diagnostic. |
| Jayasinghe et al. 2019 | AADT is target, estimated from centrality and sparse counts | Stage 1a centrality features, learning curves, sparse-count sensitivity | Not a collision-risk paper; random validation likely leaks spatially | Stage 1a should report spatial holdout and count-sparsity sensitivity | Baseline comparison/diagnostic. |
| Roll et al. 2026 | data-fusion vehicle/pedestrian exposure | Compare exposure-only vs full-feature baselines; CURE plots | Pedestrian/intersection scope; commercial/US data tiers | Stage 1a analogy is conceptual only | Documentation and diagnostic baseline. |
| Huda & Al-Kaisy 2024 | AADT covariate dropped in one low-volume model | Low-volume geometry/AADT-sensitivity diagnostic | LVR-specific and EB-output response | Test whether low-AADT links are dominated by geometry vs exposure uncertainty | Pilot-first. |
| Mensah & Hauer 1998 | AADT as averaged traffic-flow argument in SPF theory | Argument-averaging and function-averaging bias diagnostics | Theoretical examples, not a fitted Open Road Risk-scale model | Estimate free AADT elasticity and CV(q) diagnostics from Stage 1b profiles before claiming temporal conditioning value | Diagnostic only. |
| Qin et al. 2006 | hourly directional traffic volume by crash type and time of day | Time-of-day and crash-type-specific exposure functions | Rural US two-lane scope; no heldout validation | Supports documenting SV/MV and time-of-day aggregation limits in annual Stage 2 | Diagnostic only. |
| Dutta & Fontaine 2020 | AADT vs average-hourly/15-minute/raw-hourly freeway volumes | Smoothed temporal flow profiles can improve SPF validation metrics; raw noisy data can hurt | Direct freeway sensor coverage and random split; not national open-data coverage | Stage 1b WebTRIS profiles are plausible diagnostic features but expected gain is limited | Diagnostic only. |
| Sung et al. 2024 | AADT/AHT/AMT temporal SPF comparison | Temporal aggregation can change SPF performance | Random split, balanced samples, complete Korean VDS coverage | Directional support only for temporal exposure diagnostics | Diagnostic only. |
| National Highways 2022 | vehicle miles as Poisson rate scale | UK official support for exposure-rate mathematical structure and denominator sensitivity checks | Assumes observed traffic; aggregate comparison method | Supports AADT-denominator sensitivity analysis, not direct production ranking | Documentation and diagnostic. |
| Khodadadi et al. 2021 | Ln(AADT) and length as free predictors; AADT elasticity 0.63–0.74; length elasticity 0.47–0.68; both statistically < 1.0 | Second independent paper (after Wang et al. M25) confirming sub-linear length elasticity across a very different road type; AADT elasticity varies by road class | NFAS Virginia roads (low-volume; US; no geometry); coefficients not directly transferable to UK | Run diagnostic test of log(AADT) and log(length) as free predictors stratified by road class; supports LIT-TODO-002 | Diagnostic; also updates evidence for per-family offset testing in Stage 2. |
Spatial and network methods
| paper | spatial unit / network concept | key spatial issue | relevance to OS Open Roads links | actionability |
|---|---|---|---|---|
| Gilardi et al. 2022 | OS road segment lattice and shared-boundary adjacency | spatial autocorrelation and MAUP/segment contraction | High; closest OS-network analogue | Document support, add MAUP pilot and adjacency residual diagnostics. |
| Aguero-Valverde & Jovanis 2008 | road segments with CAR neighbourhoods | unobserved spatial correlation biases coefficients/precision | High as diagnostic concept; lower as production model | Moran’s I and residual corridor mapping. |
| Ziakopoulos & Yannis 2020 | review across links, intersections, zones, corridors | spatial-unit sensitivity, boundary effects, proximity weights | High as cautionary framework | Spatial validation section and segmentation sensitivity pilot. |
| Baddeley et al. 2021 | continuous network point process | segment aggregation and planar KDE can mislead | Conceptually high, production low | Avoid ordinary 2D KDE; small point-process diagnostic only. |
| Cronie et al. 2019 | linear-network point-process diagnostics | point clustering after intensity adjustment | Medium for snapped-collision diagnostics | Small pilot on one urban area; not Stage 2 replacement. |
| Wang et al. 2015 | TAZ-level CAR arterial model | MAUP and zonal aggregation | Low direct transfer | Junction/signal density ideas only. |
Junctions, intersections, and conflict structure
| paper | junction/intersection mechanism | required data | transferability | current repo implication | actionability |
|---|---|---|---|---|---|
| Poch & Mannering 1996 | intersection approach-level traffic, turning, signal, geometry variables | turning volumes, approach geometry, signal/control data | Medium conceptually; low direct data coverage | Pure link model under-represents junction mechanics | Add junction-adjacent residual diagnostic and proxy feature pilot. |
| Roll et al. 2026 | urban intersection SPF by type/control/crossing | intersection inventory, pedestrian exposure, crossing/control data | Low direct transfer | Highlights missing junction-specific model class | Documentation/future work; CURE diagnostics transferable. |
| Al-Omari 2021 | access-point and signalized-intersection density as segment features | junction/access density from inventory | Medium if derived from OS/OSM topology | Candidate junction density per link/corridor | Diagnostic before feature inclusion. |
| Wang et al. 2015 | signal spacing/access density at TAZ level | signals/accesses and zonal network features | Low to medium | Possible missing urban conflict proxies | Low-priority diagnostic. |
| Aguero-Valverde & Jovanis 2008 | intersections/ramp crashes excluded in one extraction | junction exclusion flag/sensitivity | Medium as scope caveat | Current STATS19-to-link snapping includes junction-proximate crashes | Document and test near-junction sensitivity. |
| Hauer et al. 2001 | intersections treated as separate EB entity type | intersection entity definition and SPF | High conceptually for future junction module | Link and junction EB weights differ | Future junction-level methodology note. |
Severity modelling
| paper | severity target | model type | useful idea | leakage risk | current/future relevance |
|---|---|---|---|---|---|
| Boulieri et al. 2016 | slight vs severe/fatal counts | multivariate Bayesian Poisson at ward-year | Severity strata can have distinct spatial patterns | Low if kept as aggregate target; scale mismatch | Current documentation; future severity target. |
| Gilardi et al. 2022 | slight vs severe segment counts | bivariate Bayesian Poisson network lattice | Balanced accuracy for sparse severe counts; severity-specific rates | Low for target; no holdout caveat | High documentation/future relevance. |
| Michalaki et al. 2015 | conditional motorway accident severity | ordered/generalized ordered logit | Frequency and severity mechanisms differ; HGV/hard-shoulder diagnostics | High if using post-event variables as predictors | Documentation and future accident-level severity module. |
| Quddus et al. circa 2010 | conditional crash severity | ordered response models with 30-minute traffic lag | Pre-crash lag design for WebTRIS/crash-level work | Post-event crash variables could leak | Future severity/time-profile design. |
| Ma et al. 2019 | fatal vs non-fatal crash | XGBoost classifier | Severity-feature importance and leakage warning | High for crash-record features | Diagnostic-only severity stratification. |
| Roll et al. 2026 | pedestrian injury crashes | intersection SPF | Vulnerable-user exposure is separate from vehicle exposure | Low for current all-injury link model | Future active-travel literature only. |
| Savolainen et al. 2011 | crash-injury severity review | methodological review | Severity modelling requires separate estimands and careful treatment of correlation/heterogeneity | High if post-event variables are used prospectively | Current documentation and future severity target. |
| National Highways 2022 | casualty rate and casualties per collision | compound Poisson / non-parametric casualty component | Casualty-per-collision distribution should not be forced into simple count family | Low if kept separate; high if folded into frequency target | Future severity/casualty-rate diagnostic only. |
| DfT reported casualties 2024 | national severity/casualty official statistics | descriptive statistics and severity adjustment caveats | STATS19 under-reporting and adjusted severity figures affect interpretation of outcome | Low leakage risk; high documentation relevance | Current documentation context. |
| DfT CF/RSF guidance 2025 | contributory / road safety factors | official guidance on subjective post-event factors and transition break | CF/RSF fields are not prospective road attributes | High if used as Stage 2 features; low if diagnostic context | Documentation/provenance only. |
Validation, metrics, and model assessment
| paper | reported validation/metric type | what the metric actually tests | limitations | Open Road Risk implication |
|---|---|---|---|---|
| Brodersen et al. 2010 | posterior balanced accuracy | imbalanced binary classifier performance and uncertainty | Only applies after binarising outcomes | Use for zero/non-zero or hotspot classification diagnostics, not count likelihood replacement. |
| Gilardi et al. 2022 | posterior predictive balanced accuracy | in-sample posterior predictive adequacy | Not external/spatial holdout validation | Label clearly and report alongside grouped holdout metrics. |
| Chengye & Ranjitkar 2013 | MAD/MSPE temporal holdout | temporal prediction for motorway segments | motorway-only; longer homogeneous segments | Add temporal holdout diagnostic to Stage 2. |
| Roll et al. 2026 | exposure-only vs feature-rich SPF; CURE plots | model misspecification against covariates | intersection/pedestrian scope | Use CURE plots and exposure-only baseline for GLM diagnostics. |
| Huda & Al-Kaisy 2024 | high R2 predicting EB expected crashes | fit to smoothed EB target, not raw crashes | random split and circularity inflate fit | Avoid comparing R2 to raw-count pseudo-R2. |
| Lord & Mannering 2010 | review of fit/diagnostic issues | model risk checklist | no single empirical validation | Use as validation documentation scaffold. |
| Ma et al. 2019 | classifier metrics on balanced fatality data | conditional fatal/nonfatal classification | not exposure-adjusted and not frequency prediction | Do not compare to Stage 2 risk percentile. |
| Mahoney et al. 2023 | simulation comparison of V-fold vs spatial CV methods | which CV method best estimates true out-of-sample error for spatially autocorrelated data | simulation uses continuous outcome; regular grid not road network; zero-heavy counts not tested | Current grouped-link split is temporal, not spatial CV; document this limitation; pilot police-force holdout; estimate autocorrelation range from Stage 2 residuals via variogram. |
| Pew et al. 2020 | Bayesian chi-squared goodness-of-fit; posterior predictive zero check; temporal holdout RPMSE/MAD | zero-calibration and distributional adequacy for zero-heavy count models | intersection unit; no spatial holdout; single-year holdout only | Run posterior predictive zero check on current Poisson GLM; π ≈ 0 finding supports NB GLM as priority next step before ZINB. |
| Gao et al. 2024 | AccHR@k (hit rate at top-k% predicted risk roads); MPIW/PICP uncertainty | ranking precision at top-k; interval calibration | within-year temporal holdout only; same roads in train/test; no spatial holdout; weaker than current Open Road Risk CV | Implement AccHR@k for Stage 2 risk percentile validation; MPIW/PICP deferred until probabilistic outputs added. |
| Quddus 2007 | temporal holdout and INAR/NB time-series comparisons | temporal generalisation and serial correlation | aggregate time series, not link-year validation | Add temporal holdout and link-level serial-correlation diagnostics cautiously. |
| Savolainen et al. 2011 | severity-methodology review | spatial/temporal correlation and heterogeneity cautions | review evidence, not a single validation design | Supports cluster-robust SE and separate severity framing. |
| Roshandel et al. circa 2015 | systematic review/meta-analysis of real-time freeway crash prediction | explanatory ceiling and operational false-positive caution | real-time freeway crash occurrence, not annual link risk | Supports cautious decision-support framing and non-operational claims. |
| Dutta & Fontaine 2020 | CURE plots and 70/30 validation for temporal flow SPFs | functional-form diagnostics over volume range | freeway sensor data and random split | Add CURE-by-AADT/length diagnostics; avoid expecting large temporal-feature gains. |
| Sung et al. 2024 | random split temporal SPF comparison | high R2 values under weak validation | random split and balanced sampling make metrics optimistic | Use as validation-caveat example, not performance benchmark. |
| National Highways 2022 | Monte-Carlo likelihood-ratio test and bootstrap intervals | low-count rate-comparison sensitivity | aggregate pairwise comparisons; fictitious example only | Use for aggregate diagnostics and AADT denominator sensitivity, not production link ranking. |
| Khodadadi et al. 2021 | WAIC and PSIS-LOO from full Bayesian posterior; CURE plots; MAD (in-sample) | WAIC/LOO approximate leave-one-out predictive accuracy; CURE plots diagnose systematic misfit over AADT/length range | No external holdout; Virginia-only; LOO is approximated not true holdout; WAIC not computable for quasi-likelihood models | WAIC/LOO preferred over DIC for Bayesian hierarchical model comparison; CURE plots by AADT and length quantile directly implementable as Stage 2 diagnostics; supports LIT-TODO-016. |
| Ver Hoef & Boveng 2007 | Variance-mean diagnostic plot; cross-validation as alternative when AIC is unavailable | Determines empirically whether QP or NB better fits variance structure; AIC cannot compare QP vs NB | Methods paper; no external validation; harbor seal context | Run variance-mean diagnostic on Stage 2 Poisson residuals before choosing NB or QP; document AIC limitation for QP vs NB comparison; supports LIT-TODO-032 and LIT-TODO-033. |
Point-process / hotspot / spatial diagnostics
| paper | method | diagnostic use | production risk | recommended status |
|---|---|---|---|---|
| Baddeley et al. 2021 | network point processes and network KDE | compare raw/snap collision clustering with link rankings | Does not scale easily and changes target from link-year risk to event intensity | small pilot / documentation note |
| Cronie et al. 2019 | inhomogeneous network J/F/G functions | test point clustering after intensity correction | Not exposure-normalised traffic risk | small pilot only |
| Eckardt & Moradi 2024 | marked point process summaries | explore severity/type mark dependence | exploratory summaries can be mistaken for predictive validation | small pilot only |
| Aguero-Valverde & Jovanis 2008 | CAR residual/spatial effects | residual spatial autocorrelation and corridor clustering | national CAR production infeasible | diagnostic-only |
| Ziakopoulos & Yannis 2020 | spatial-methods review | MAUP, proximity, hotspot sensitivity | review evidence cannot justify direct production swap | documentation and diagnostic queue |
Methods to avoid as production changes for now
| method/paper | why not production-ready | safer use | required evidence before production |
|---|---|---|---|
| Full national CAR/MCAR Bayesian model; Aguero-Valverde, Gilardi, Boulieri | computationally unrealistic at 2M+ links; often in-sample only | pilot area residual/spatial diagnostic | scalable implementation, grouped/spatial holdout benefit, compute budget |
| DBN with MSE crash-count regression; Pan et al. 2017 | no count likelihood/offset; poor match to zero-heavy injury collisions | baseline comparison note; negative evidence | Poisson/NB loss with offset and strong held-out performance |
| Planar KDE for road crashes; Baddeley et al. 2021 | ignores network geometry and can mislead | network-aware KDE/point process pilot | network-distance implementation and clear diagnostic framing |
| Post-event crash variables as Stage 2 predictors; Michalaki, Quddus, Ma | crash type/casualties/contributory factors happen after or during crash | retrospective severity diagnostics only | prospective feature availability and leakage audit |
| STATS19 CF/RSF fields as stable prospective Stage 2 predictors; DfT 2024/2025 | officer-recorded post-event judgements; partial coverage; 2024 CF-to-RSF structural break; record-level RSF not in standard open download | provenance/EDA notes and diagnostic context only | audited open-data availability, stable definitions, and explicit non-leakage design |
| Zonal TAZ CAR model for link ranking; Wang et al. 2015 | loses link-level geometry; MAUP risk | contextual/junction-density inspiration | link-level validation of derived proxies |
| STZITD-GNN full architecture; Gao et al. 2024 | GRU+GAT+ZITD at 2.17M links is computationally infeasible; no exposure offset; daily resolution; severity-weighted composite not raw count | AccHR@k metric and Tweedie GLM as extractable contributions | scaled pilot (small area), exposure offset retained, annual aggregation, robust cross-year holdout |
| ARIMA/SARIMAX on corridor-level collision data without exposure; Balawi & Tenekeci 2024 | wrong response variable (vehicles involved not collision count); no exposure denominator; negative predicted counts; implausible R-squared values; methodology not replicable | negative example: illustrates post-event feature leakage from STATS19 attributes | not recommended under any circumstances for this pipeline |
| Random V-fold CV as primary Stage 2 validation; implied by Mahoney et al. 2023 | severely underestimates true prediction error for spatially autocorrelated data (2% within target range vs 37% for spatial CV) | current grouped-link temporal split is an improvement but does not enforce spatial separation; document limitation | spatial clustering CV with buffer sized to autocorrelation range of Stage 2 residuals |
| Pedestrian intersection SPF as all-injury link model; Roll et al. 2026 | different mode, exposure, and unit | active-travel/future junction literature | UK-equivalent pedestrian exposure and junction inventory |
Code and documentation implications
| todo_id | suggested_action | action_type | relevant_stage | supporting_papers | why_supported | current_repo_relevance | future_research_relevance | effort | risk_if_done_badly | already_present_or_new | priority |
|---|---|---|---|---|---|---|---|---|---|---|---|
| LIT-TODO-001 | Add Stage 2 documentation note on exposure-offset support and limitations | documentation note | Stage 2 / documentation | Gilardi 2022; Hauer 2001; Lord 2010; Aguero-Valverde 2008; Pan 2017 | Multiple extractions support exposure-adjusted count framing but note elasticity/functional-form caveats | high | high | low | Overclaiming exact offset optimality | partly present in methodology pages | now |
| LIT-TODO-002 | Run diagnostic Stage 2 GLM with log(AADT) and log(length) as free covariates or road-family interactions | diagnostic / baseline comparison | Stage 2 | Aguero-Valverde 2008; Wang 2009; Al-Omari 2021; Lord 2010 | Several papers estimate sub/super-linear AADT effects rather than fixed offset | high | high | medium | Confusing diagnostic with production replacement | new/partly implied | later |
| LIT-TODO-003 | Add temporal holdout report for Stage 2 | diagnostic | validation / Stage 2 | Chengye & Ranjitkar 2013; Pan 2013; Lord 2010 | Motorway NB papers use later-year prediction; current grouped split should be complemented | high | medium | medium | COVID-year split can distort results | likely partly present; verify | now |
| LIT-TODO-004 | Add spatial residual/autocorrelation diagnostic on pilot area | diagnostic | validation / Stage 2 | Aguero-Valverde 2008; Gilardi 2022; Ziakopoulos 2020 | Spatial autocorrelation can bias inference and hotspot confidence | high | high | medium | Treating in-sample spatial smoothers as external validation | new | later |
| LIT-TODO-005 | Add MAUP/segmentation sensitivity pilot for OS Open Roads links | small pilot | validation / feature engineering | Gilardi 2022; Baddeley 2021; Ziakopoulos 2020; Pan 2017 | Link granularity and very short segments are repeated cautions | medium | high | high | Large refactor or inconsistent target grain | new | backlog |
| LIT-TODO-006 | Add junction-adjacent residual/risk diagnostic | diagnostic | Stage 2 / feature engineering | Poch 1996; Al-Omari 2021; Ziakopoulos 2020; Baddeley 2021 | Junction mechanisms differ from mid-link risk | high | high | medium | Using noisy OSM junction proxies as production features too early | future-work mentions junction density | later |
| LIT-TODO-007 | Pilot junction-density or conflict-proxy features only after diagnostic | small pilot / candidate feature | feature engineering / Stage 2 | Poch 1996; Al-Omari 2021; Wang 2015 | Intersection/access density repeatedly appears as relevant but data differs | medium | high | medium | Proxy may measure urbanity/AADT rather than conflict | candidate in future-work | backlog |
| LIT-TODO-008 | Audit EB shrinkage formula and overdispersion parameter usage | diagnostic | Stage 2 / validation | Hauer 2001; Al-Omari 2021; Huda 2024 | EB weighting depends on correct dispersion and entity type | high | high | medium | Miscalibrated shrinkage overstates confidence in rankings | EB exists as diagnostic | now |
| LIT-TODO-009 | Document regression-to-mean warning for before/after use of high-risk links | documentation note | documentation / validation | Hauer 2001 | Users may evaluate interventions on links selected by high observed counts | high | medium | low | Users mistake ranking for treatment-effect evidence | likely new | now |
| LIT-TODO-010 | Add balanced-accuracy diagnostic for zero/non-zero or severe/KSI checks | diagnostic | validation / Stage 2 | Brodersen 2010; Gilardi 2022 | Imbalanced sparse counts make ordinary accuracy misleading | medium | high | medium | Binarisation can obscure count calibration | possibly absent | later |
| LIT-TODO-011 | Keep severity modelling separate from frequency model in docs | documentation note | documentation / Stage 2 | Boulieri 2016; Gilardi 2022; Michalaki 2015; Quddus 2010; Ma 2019 | Severity and frequency targets differ and may have different predictors | high | high | low | Implying severity-weighted validation exists | future-work covers severity | now |
| LIT-TODO-012 | Add severity-stratified diagnostic comparing top-risk links with KSI/fatal proportions | diagnostic | Stage 2 / validation | Ma 2019; Quddus 2010; Michalaki 2015; Boulieri 2016 | Tests whether current frequency ranking misses severity burden | medium | high | medium | Leakage if post-event proportions become production predictors | new | later |
| LIT-TODO-013 | Add feature-interpretation leakage note for crash-record variables | documentation note | feature engineering / documentation | Ma 2019; Michalaki 2015; Quddus 2010 | Post-event crash features are not prospective link predictors | high | high | low | Accidental use of target-derived variables | likely partly present | now |
| LIT-TODO-014 | Add centrality-feature and count-sparsity diagnostics for Stage 1a | diagnostic / baseline comparison | Stage 1a | Jayasinghe 2019 | Centrality-based AADT estimation depends on split design and sparse counts | high | medium | medium | Random splits overstate spatial generalisation | centrality likely present | later |
| LIT-TODO-015 | Add learning curve for Stage 1a count-point sparsity | diagnostic | Stage 1a / validation | Jayasinghe 2019 | Extraction explicitly suggests training-point sensitivity | medium | medium | medium | Misreading random split R2 as spatial transfer | new | backlog |
| LIT-TODO-016 | Add CURE plots and exposure-only baseline comparison for Poisson GLM | diagnostic / baseline comparison | Stage 2 / validation | Roll 2026; Lord 2010 | CURE plots and exposure-only baselines diagnose misspecification | medium | medium | medium | Applying pedestrian-intersection claims to link model | new | later |
| LIT-TODO-017 | Add documentation note that congestion proxies are low priority for current Stage 2 | documentation note | Stage 2 / documentation | Wang 2009; Quddus 2010 | Two M25 companion extractions report congestion null findings; scope is motorway-specific | medium | medium | low | Generalising M25 null result to all roads | new | later |
| LIT-TODO-018 | Run motorway slip-road/ramp residual diagnostic | diagnostic | Stage 2 / feature engineering | Chengye & Ranjitkar 2013; Pan 2013; Michalaki 2015 | Motorway context differs around ramps/hard shoulder | medium | medium | medium | Sparse/noisy ramp coding | possibly available via form-of-way | backlog |
| LIT-TODO-019 | Add curvature/grade interpretation note by road family | documentation note / diagnostic | feature engineering / Stage 2 | Pan 2017; Chengye 2013; Wang 2009; Huda 2024; Quddus 2010 | Geometry effects vary by road type and by frequency vs severity target | high | high | low-medium | Treating coefficient direction as causal | curvature active; grade candidate | now/later |
| LIT-TODO-020 | Treat point-process methods as exploratory comparison layers only | documentation note / small pilot | validation / future work | Baddeley 2021; Cronie 2019; Eckardt 2024 | Network point-process literature critiques aggregation but does not replace Stage 2 | medium | high | low for note; high for pilot | Presenting in-sample clustering as predictive validation | new | backlog |
| LIT-TODO-021 | Run posterior predictive zero check on current Stage 2 Poisson GLM | diagnostic | Stage 2 / validation | Pew 2020 | Table 3 in Pew shows Poisson-equivalent model (ZIP with π=0) underestimates zeros; same structure expected for Open Road Risk Poisson GLM given ~98–99% link-year zero rate | high | medium | low | Drawing samples at link-year level must incorporate correct exposure offset per link | new | now |
| LIT-TODO-022 | Fit negative binomial GLM with existing exposure offset as Stage 2 candidate and compare to Poisson GLM using grouped-link CV | baseline comparison | Stage 2 | Pew 2020; Lord 2010; Chengye & Ranjitkar 2013 | π ≈ 0 in Pew’s ZINB indicates overdispersion (ϕ = 17) drives improvement, not zero-inflation; NB GLM is the priority step before any ZINB complexity | high | high | low-medium | NB GLM dispersion can be sensitive to motorway overfitting already noted; check ϕ stability across facility families | new | now |
| LIT-TODO-023 | Estimate empirical variogram of Stage 2 Poisson GLM residuals to determine spatial autocorrelation range | diagnostic | Stage 2 / validation | Mahoney 2023; Aguero-Valverde 2008; Gilardi 2022 | Mahoney 2023 shows optimal spatial CV buffer ≈ autocorrelation range; without measuring the range for Open Road Risk, spatial CV design is uninformed | high | high | low-medium | Variogram on 2.17M links requires subsampling; use road-class-stratified subsample of ~10–50k links | new | later |
| LIT-TODO-024 | Pilot police-force-level regional holdout as a spatial CV diagnostic | diagnostic / small pilot | Stage 2 / validation | Mahoney 2023; Gilardi 2022 | ~13–16 force areas provide pre-defined geographic groups of comparable size; holding each out in turn enforces real spatial separation and tests geographic generalisation | high | high | medium | Force areas vary substantially in size and collision density; compare force-holdout R²/pseudo-R² against current grouped-link metrics to quantify spatial optimism | new | later |
| LIT-TODO-025 | Document current grouped-link CV as temporal grouped CV and record that it does not enforce spatial separation between neighbouring links | documentation note | Stage 2 / validation / documentation | Mahoney 2023 | Paper shows V-fold without spatial separation is strongly optimistic; grouped-link split prevents same-link leakage but does not address neighbouring-link spatial autocorrelation | high | medium | low | None (documentation only) | new | now |
| LIT-TODO-026 | Implement AccHR@k (accuracy hit rate at top-k% predicted risk links) as a Stage 2 validation metric | diagnostic / validation metric | Stage 2 / validation | Gao 2024 | AccHR@k directly evaluates whether high-percentile risk predictions correspond to roads with actual collisions; more operationally meaningful than RMSE or pseudo-R² for a ranking output | high | medium | low | Choice of k matters at 2.17M links; consider AccHR@1, AccHR@5, and AccHR@20 rather than a single threshold; avoid treating a broad k as strong evidence of discrimination | new | now |
| LIT-TODO-027 | Add AADT denominator sensitivity diagnostic for Stage 2 risk percentiles | diagnostic | Stage 1a / Stage 2 validation | National Highways 2022; Hauer 2001; Jayasinghe 2019 | National Highways recommends traffic-denominator sensitivity, and Open Road Risk uses estimated AADT for most links | high | medium | low-medium | Sensitivity bands must be labelled illustrative, not formal uncertainty intervals | new | now |
| LIT-TODO-028 | Document STATS19 CF/RSF transition and keep CF/RSF fields out of Stage 2 predictors | documentation note | feature engineering / documentation | DfT 2024; DfT 2025 CF/RSF guide; DfT 2025 casualty report | CF/RSF fields are subjective, partially recorded, post-event, and structurally broken around 2024 | high | medium | low | Treating missing CF/RSF as “factor absent” or using converted RSFs as time-series evidence | partly present in literature pages | now |
| LIT-TODO-029 | Add current UK STATS19 under-reporting and severity-adjustment caveat to outcome documentation | documentation note | Stage 2 / documentation | DfT 2025 casualty report; Savolainen 2011 | Police-reported injury collisions under-report non-fatal casualties and adjusted severity series differ from raw records | high | medium | low | Mixing adjusted national figures with raw pipeline outcome without labelling | partly present in severity page | now |
| LIT-TODO-030 | Add low-count aggregate rate-comparison note for any “significance” flagging | documentation note / diagnostic | validation / Stage 2 | National Highways 2022 | Asymptotic tests can mislead at low exposure; Monte-Carlo LRT is useful for pairwise aggregates but not at-scale ranking | medium | medium | low | Users may overinterpret link-level p-values as practical safety certainty | new | later |
| LIT-TODO-031 | Compute skewness and zero proportion of Open Road Risk link-year crash distribution; compare against Shirazi et al. (2017) NB-L preference threshold (skewness > 1.92; zero proportion as context); then test NB-L as Stage 2 candidate on a stratified sample of ~100k link-years | diagnostic + candidate model extension | Stage 2 / pre-modelling analysis | Khodadadi et al. 2021; Pew 2020; Lord 2010 | NB-L outperforms NB-2 by WAIC ~600 when skewness > 1.92 and zero proportion ≥ 37%; Open Road Risk at ~98–99% zeros is in a more extreme regime; frequentist NB-L implementation recommended before full Bayesian MCMC | high | high | low (skewness check) / medium (NB-L fit) | NB-L dispersion estimation can be unstable at very low mean values; test on filtered subsample (e.g. links with ≥1 crash over observation period) before applying at full 2.17M scale | new | now (skewness check) / later (NB-L fit) |
| LIT-TODO-032 | Before implementing NB-2 or NB-L as Stage 2 alternatives, run a variance-mean diagnostic plot: bin Stage 2 Poisson fitted means into ~10 categories; compute average (Y − μ̂)² per bin; overlay linear (quasi-Poisson) and quadratic (NB) variance curves | diagnostic | Stage 2 / pre-model-family-choice | Ver Hoef & Boveng 2007; Khodadadi 2021 | Directly determines whether Var ∝ μ or Var ∝ μ² better describes link-year crash variance; low effort using existing GLM output | medium | medium | low | With ~99% zero link-years most bins cluster near μ ≈ 0; compute on non-zero link-years or filter to links with ≥1 crash; document filtering choice | new | later |
| LIT-TODO-033 | Add documentation note that AIC cannot compare quasi-Poisson against NB-2 or NB-L; if quasi-Poisson is tested as a Stage 2 alternative use variance-mean diagnostic; CURE plots; and cross-validation for the comparison | documentation note | Stage 2 / validation / documentation | Ver Hoef & Boveng 2007 | Quasi-Poisson lacks a full distributional likelihood; AIC and BIC require full log-likelihood; QAIC is only valid within the quasi class | medium | low | low | None — documentation only | new | later |
Current-code alignment assessment
Current strengths
- The exposure-adjusted crash-frequency framing is supported by multiple extractions: Hauer 2001, Gilardi 2022, Aguero-Valverde 2008, Lord 2010, and Pan 2017.
- Link-year modelling is consistent with the crash-frequency/SPF literature, while the Gilardi et al. 2022 extractions provide a direct UK OS-segment analogue.
- Grouped or held-out validation is directionally aligned with the caution in Lord 2010 and with temporal holdout practice in the Chengye/Ranjitkar motorway papers.
- The repository’s attention to spatial units is aligned with Gilardi 2022, Baddeley 2021, Ziakopoulos 2020, and Aguero-Valverde 2008.
- Use of open data is a defensible distinction versus studies relying on complete motorway counters, commercial probe data, or inspection/video logs.
- Keeping EB shrinkage, spatial models, and point-process methods as diagnostics or future work is consistent with computational and validation cautions in the extractions.
Current weaknesses / limitations to document
- Exposure uncertainty is not fully propagated from Stage 1a into Stage 2; several papers treat AADT as observed, but that is not true for Open Road Risk.
- The fixed VMT-style offset implies exposure elasticity of 1.0; several extractions support testing free AADT/length coefficients diagnostically.
- OS Open Roads link choice may be sensitive to very short links, junction proximity, and MAUP-like segmentation effects.
- Junction/intersection mechanisms are under-represented in a pure link-level model.
- Severity is not separately modelled; the severity papers show this is a different target, not just a weighted version of frequency.
- Spatial autocorrelation is not fully handled in production; this may affect coefficient interpretation and ranking confidence.
- The grouped-by-road-link CV split prevents same-link leakage across years but does not enforce spatial separation between neighbouring links on the same corridor. Mahoney 2023 shows that this kind of temporal grouped split produces estimates close to V-fold (optimistically biased) rather than true out-of-sample performance. The degree of bias depends on the spatial autocorrelation range of collision risk, which has not been measured.
- Hotspot/risk percentile sensitivity to spatial unit and residual clustering needs explicit documentation.
- The current Stage 2 Poisson GLM likely underestimates zeros at link-year level. Pew 2020 shows that a Poisson-equivalent model (ZIP with π ≈ 0) calibrates poorly on zero-heavy count data; the improvement from NB/ZINB comes from the dispersion parameter, not zero-inflation per se. Confirmed on Open Road Risk data: the zero-calibration diagnostic (see Zero-Calibration Diagnostic) finds Poisson severely underestimates zeros (p = 0.000); NB closes the gap (p = 0.722, α = 2.057). NB GLM is the warranted next step; ZINB is not the priority next step.
- Post-event variables from collision records must not leak into prospective Stage 2 features.
- STATS19 contributory factors and road safety factors are subjective post-event fields with partial coverage and a 2024 structural break; they should remain provenance/EDA context unless a separate non-prospective analysis is explicitly labelled.
- Current UK official statistics confirm that non-fatal casualties remain under-reported and that severity-adjusted national series are not automatically comparable to raw pipeline collision records.
Current areas where the repo is deliberately conservative
- The current pipeline should not claim causality from road-feature coefficients.
- It should not use post-event collision variables as predictors in the production frequency model.
- Spatial, point-process, and CAR/INLA methods should remain diagnostics or pilots before any production use.
- Severity-weighted, fatal-only, motorcycle, cyclist, or pedestrian risk targets should remain parallel/future models unless exposure and validation are made explicit.
- Machine-learning rankings should be presented as decision-support indicators, not as calibrated external safety scores.
Claims Open Road Risk can safely make
Safer claims
- The project estimates exposure-adjusted injury-collision risk.
- The outputs are exploratory decision-support indicators.
- The model can help identify links with unusually high observed collisions relative to estimated exposure and context.
- Spatial-unit choice and hotspot outputs are known limitations.
- Severity and frequency are distinct modelling targets.
- EB shrinkage and spatial diagnostics can help assess ranking confidence, but they do not prove causal treatment effects.
- Open Road Risk uses open transport/collision/network data, which brings reproducibility advantages and exposure-coverage limitations.
Claims not yet supported
- The model proves causal effects of road features.
- The production risk percentile is externally validated.
- High-ranked links are definitely unsafe independent of exposure uncertainty.
- Severity-weighted risk is validated.
- The current model fully handles junction conflict mechanisms.
- The model is directly comparable to proprietary inspection scores without further validation.
- Spatial autocorrelation is fully captured in the production model.
- XGBoost feature importance is a causal interpretation of crash mechanisms.
- The grouped-by-road-link cross-validation provides a spatially robust estimate of model performance. It controls for same-link temporal leakage but does not enforce spatial separation between adjacent links; reported pseudo-R² values may be optimistically biased by an unknown amount relative to true geographic holdout performance.
Secondary review queue
Use literature/prompts/literature_extraction_additional_prompts.md for these checks:
- Use the Cross-Audit Prompt when there is one extraction and the PDF/tables need checking.
- Use the Lightweight Sanity Check Prompt for low-priority single extractions.
- Use the Reconciliation Prompt when two or more independent extractions need to be combined into a final record.
- Use the Human Review Checklist before treating a reconciled extraction as final.
Missing or weak review queue
These papers need an additional source check because extraction coverage is thin, the extraction flags OCR/table problems, or the paper could support repo actions.
| priority | paper | extraction_file | review_gap | prompt_to_use | what_to_check | likely_impact_if_wrong | recommended_next_action |
|---|---|---|---|---|---|---|---|
| conditional | Pew et al. 2020 | paper-extraction-pew-2020-zero-inflated-crash.md | One extraction; key π ≈ 0 finding drives NB-vs-ZINB TODO ordering | Targeted Cross-Audit Prompt | Confirm π posterior mean ≈ 0.00 (SD 0.01) for both ZIP and ZINB in original PDF Table A1; confirm ϕ = 17.04 for ZINB; check prior specification Beta(0.15,1) on π | If π is not near zero, the argument for NB GLM priority over ZINB weakens and TODO ordering changes | Check before citing π ≈ 0 or using it as justification for NB-first approach. |
| conditional | Gao et al. 2024 | paper-extraction-gao-2024-stzitd-gnn.md | One extraction; Table 4 performance values may contain transcription errors; train/val/test split chronology not stated | Targeted Cross-Audit Prompt | Verify Table 4 MAE/RMSE/AccHR@20 values; confirm whether 8:2:2 split is chronological or random; check GitHub repo accessibility | Could misstate AccHR@20 values or overstate validation strength | Check Table 4 and split description before writing AccHR@k diagnostic or citing improvement percentages. |
Completed reconciliation records
These papers now have final combined records. Use the combined record for future citation and TODO work, while preserving the earlier extraction files for provenance.
| paper | combined_record | source_extraction_files | remaining_caution | recommended_status |
|---|---|---|---|---|
| Huda & Al-Kaisy 2024 | paper-extraction-huda-2024-COMBINED.md | paper-extraction-huda-alkaisy-2024-lvr-network-screening.md; paper-extraction-huda-2024-network-screening-low-volume-roads.md | Curvature CART sharp-group value is internally inconsistent; grade should not be cited as a final-model predictor without caution | Use combined record; no further extraction needed unless quoting disputed threshold values. |
| Jayasinghe et al. 2019 | paper-extraction-jayasinghe-2019-COMBINED.md | paper-extraction-jayasinghe-2019-centrality-aadt.md; paper-extraction-jayasinghe-2019-traffic-modeling-centrality.md | Final selected regression type is implied but not fully documented across OLS/robust/Poisson alternatives | Use combined record; cite as Stage 1a exposure-modelling evidence, not collision-risk evidence. |
| Poch & Mannering 1996 | paper-extraction-poch-mannering-1996-COMBINED.md | paper-extraction-poch-1996-intersection-negative-binomial.md; paper-extraction-poch-mannering-1996-nb-intersection.md | Accident-type table values should still be checked before formal publication because OCR is imperfect | Use combined record for junction/approach mechanism claims; table-value citation remains conditional. |
| Roll et al. 2026 | paper-extraction-roll-2026-oregon-COMBINED.md | paper-extraction-roll-2026-oregon-pedestrian-spf.md; paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md | Appendices should be checked if exact SPF coefficients, exposure-only comparisons, or crash-assignment rules are needed | Use combined record for exposure-model and future junction/pedestrian-layer evidence. |
| Quddus, Wang & Ison | paper-extraction-quddus-wang-ison-COMBINED.md | paper-extraction-quddus-2010-m25-severity-ordered-response.md; paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md | Published version year/bibliographic details and exact Tables 2-3 should be checked before formal numeric reporting | Use combined record for severity target, traffic-lag design, and leakage-guardrail claims. |
| Wang, Quddus & Ison 2011 | paper-extraction-wang-2011-two-stage-severity-ranking.md | paper-extraction-wang-2011-two-stage-severity-ranking.md | In-sample MAD only; no held-out validation. Complete HA traffic counts assumed for all segments — not replicable for Open Road Risk’s minor road network. Log(length) ≈ 1 finding supports exposure offset but is from motorway/major roads only. | Use for severity-disaggregation methodology reference and exposure-offset documentation; do not use MAD values as external benchmarks. |
| Ziakopoulos & Yannis 2020 | paper-extraction-ziakopoulos-yannis-2020-COMBINED.md | paper-extraction-ziakopoulos-2020-spatial-approaches-road-safety.md; paper-extraction-ziakopoulos-yannis-2020-spatial-review.md | Primary cited papers still need checking before using exact study-level model specifications, validation methods, or numerical claims | Use combined record for high-level spatial methods, MAUP, hotspot sensitivity, and spatial-diagnostic claims. |
Active reconciliation / combination check queue
These papers still have multiple extraction passes but no final combined record. Do not re-extract them from scratch; use the Reconciliation Prompt in literature_extraction_additional_prompts.md after any needed cross-audit notes exist.
| priority | paper | extraction_files | why_reconcile | prompt_to_use | reconciliation_focus | expected_output |
|---|---|---|---|---|---|---|
| conditional | Gilardi et al. 2022 | paper-extraction-gilardi-2022-leeds-network-lattice-bayesian.md; paper-extraction-gilardi-2022-multivariate-hierarchical-crashes.md; paper-extraction-gilardi-2022-network-lattice-crashes.md | Three extraction records already exist; only targeted citation checks remain for table/sign ambiguity and MAUP/contraction details | Reconciliation Prompt only if creating a final canonical record; otherwise Human Review Checklist plus targeted PDF check | Table 2 coefficient signs; Primary Road interpretation; balanced accuracy wording; dodgr/network-contraction details | Do not re-extract; manually inspect disputed PDF tables/text before citing coefficient directions or balanced-accuracy values. |
| conditional | Pan et al. 2017 | paper-extraction-pan-2017-deep-belief-network-global-spf.md.md; paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md | Two extraction records now exist; use reconciliation only if writing the neural/global-SPF comparison page | Reconciliation Prompt | DBN training details; crash scope; NB benchmark coefficients; Washington split; normalization and minimum-length handling | Reconciled benchmark note; do not treat DBN as a production recommendation without stronger validation. |
| low | Brodersen et al. 2010 | paper-extraction-brodersen-2010-balanced-accuracy.md; paper-extraction-brodersen-2010-balanced-accuracy-posterior.md | Two extraction records already exist; equations only needed for implementation | Reconciliation Prompt only if implementing posterior intervals | Posterior balanced accuracy equations, Equation 7 wording, and examples | Reconciled implementation note if adding code for posterior intervals. |
Candidate Quarto literature pages
| proposed_qmd_file | purpose | papers_to_use | key_claims | figures/tables_needed | readiness |
|---|---|---|---|---|---|
| quarto/literature/crash-frequency-models.qmd | Explain Poisson/NB/SPF count-model basis and limitations | Lord 2010; Hauer 2001; Aguero-Valverde 2008; Chengye/Ranjitkar 2013; Pan 2017; Poch 1996; Al-Omari 2021; Pew 2020 | Count models need exposure, dispersion, validation, and cautious interpretation; overdispersion is the immediate model-family issue before zero-inflation; intersection evidence should not be transferred directly to link risk | Model-family comparison table; Open Road Risk alignment table; zero-calibration diagnostic summary; NB-over-Poisson evidence note | exists — mostly current; update references to use the combined Poch record and verify Pew π≈0 before quoting exact values |
| quarto/literature/exposure-and-traffic-volume.qmd | Document AADT/AADF/WebTRIS exposure handling | Gilardi 2022; Hauer 2001; Jayasinghe 2019 combined; Roll 2026 combined; Aguero-Valverde 2008; Wang 2009; Pew 2020; Gao 2024; Mensah & Hauer 1998; Qin 2006; Dutta 2020; Sung 2024; National Highways 2022 | Exposure is central but elasticity, estimated-AADT uncertainty, no-exposure contrast cases, and temporal-flow aggregation need clear separation | Exposure-treatment matrix; Stage 1a validation summary; no-offset contrast table; AADT/AADPT data-fusion note; temporal exposure caveats | exists — current after 2026-05-12 update; use LIT-049 to LIT-052 and LIT-055 for temporal/exposure additions |
| quarto/literature/spatial-methods-and-network-risk.qmd | Review OS-link lattice, CAR, MAUP, point-process diagnostics | Gilardi 2022; Aguero-Valverde 2008; Ziakopoulos 2020 combined; Baddeley 2021; Cronie 2019; Eckardt 2024; Mahoney 2023 | Spatial methods support diagnostics, not immediate production replacement; spatial CV evidence strengthens validation caveats | Spatial-unit comparison; diagnostic queue; CV method comparison table from Mahoney | exists — linked in site nav; use combined Ziakopoulos record and keep Gilardi table/sign details conditional |
| quarto/literature/junctions-and-conflict-structure.qmd | Separate junction/approach mechanisms from link risk | Poch 1996 combined; Roll 2026 combined; Al-Omari 2021; Wang 2015; Ziakopoulos 2020 combined | Junction risk needs different units, data, and exposure structures from the current link-year model | Junction mechanism table; available/open-data proxy table; link-vs-intersection transferability table | exists — linked in site nav; use combined Poch/Roll/Ziakopoulos records, with exact Poch table values and Roll appendices conditional for formal citation |
| quarto/literature/severity-modelling.qmd | Separate severity from frequency and define future severity path | Boulieri 2016; Gilardi 2022; Michalaki 2015; Quddus/Wang/Ison combined; Wang/Quddus/Ison 2011; Ma 2019; Gao 2024; Savolainen 2011; DfT 2025 casualty report; DfT 2025 CF/RSF guide; National Highways 2022 | Severity is conditional/different target and can conflict with frequency; severity-weighted composites should not be treated as validated Stage 2 risk; CF/RSF fields are post-event and structurally unstable across 2024 | Severity target taxonomy; leakage warning table; composite-vs-separate response variable note; STATS19 under-reporting and CF/RSF provenance notes | exists — current after 2026-05-12 update; use LIT-059 for Wang/Quddus/Ison 2011 severity-disaggregated frequency citation |
| quarto/literature/validation-and-metrics.qmd | Document heldout, balanced accuracy, CURE, pseudo-R2 limitations | Brodersen 2010; Gilardi 2022; Chengye 2013; Roll 2026 combined; Lord 2010; Huda 2024 combined; Mahoney 2023; Pew 2020; Pan 2017; Gao 2024; Quddus 2007; Savolainen 2011; Roshandel circa 2015; Dutta 2020; Sung 2024; National Highways 2022 | Metrics test different things; avoid in-sample/holdout confusion; spatial CV, temporal holdout, zero-calibration, AccHR@k, CURE plots, and AADT denominator sensitivity are candidate validation additions | Metric taxonomy; current repo validation map; CV method performance table from Mahoney; zero-check diagnostic table from Pew; AccHR@k definition from Gao; low-count and denominator sensitivity notes | exists — current after 2026-05-12 update; keep Pew/Gao exact-value checks conditional |
| quarto/literature/transferability-and-open-data-limits.qmd | Explain what transfers to open UK data and what does not | All papers, with combined records for Huda, Jayasinghe, Poch, Roll, Quddus/Wang/Ison, and Ziakopoulos/Yannis; Gao 2024 and Balawi & Tenekeci 2024 as negative-transfer examples; DfT 2024/2025 CF/RSF and casualty-statistics records for open-data limits | Some evidence is blocked by missing lane/turning/exposure data or different unit/target; apparent UK relevance still needs data-stack checks; CF/RSF fields have a 2024 structural break and RSFs are not open at record level | Transferability table; data-availability matrix; negative-transfer rows; combined-record provenance note; CF/RSF provenance note | exists — current after 2026-05-12 update; keep DfT records as documentation context, not model evidence |
Appendices
Register taxonomy
current_repo_relevance: how directly the extraction informs the current Open Road Risk pipeline, code, model, validation, or documentation.high: directly relevant to current Stage 1a, Stage 1b, Stage 2, validation, or docs.medium: relevant to a subset, diagnostic, or caution.low: indirect or future-only.
future_research_relevance: usefulness for extensions beyond the current implementation.high: directly informs plausible future Open Road Risk research.medium: useful if a specific future branch exists.low: peripheral.
literature_review_relevance: usefulness for future narrative Quarto literature pages.high: should be cited or tabulated.medium: include in specialised page or caveat table.low: likely appendix/background only.
code_actionability_now: whether the extraction supports a near-term code/doc action.high: a clear documentation, diagnostic, or baseline action is supported.medium: action is plausible but should be scoped.low: no near-term code action.
supports_production_change:no: no production change supported.diagnostic-only: supports checks, reporting, documentation, or sensitivity analysis.pilot-first: supports a limited pilot before any production consideration.baseline-comparison-first: supports comparing against current implementation before adopting.possible-later: may support a future production change after more evidence.
secondary_review_needed:no: extraction is sufficient for high-level register use.yes: manual PDF/table review is needed before use in TODOs or literature prose.conditional: adequate for cautious register use, but check before quoting numbers, equations, or coefficient signs.
extraction_quality_initial_judgement:high: extraction reports high confidence or appears complete for register-level use.medium: useful but has missing tables, indirect relevance, or stated uncertainty.low: do not use without review.unknown: extraction does not state enough to judge.
Known extraction files not yet processed
| file | reason not included |
|---|---|
| literature/prompts/road_safety_literature_extraction_prompt.md | Prompt template, not a paper extraction. |
| literature/prompts/OLD_road_safety_literature_extraction_prompt.md | Old prompt template, not a paper extraction. |
| literature/prompts/literature_extraction_additional_prompts.md | Companion prompt file, not a paper extraction. |
| literature/prompts/README_literature_extraction.md | Workflow guide, not a paper extraction. |
| literature/prompts/grep_extraction.sh | Utility script, not a paper extraction. |
| literature/prompts/grep_extraction_output.txt | Generated grep output/provenance helper, not an extraction source. |
No literature/papers_raw/ extraction Markdown was found during this pass. No Quarto or docs files were treated as paper extractions; quarto/future-work.qmd, todo/TODO.md, docs/internal/sites_todo.md, and quarto/background/metrics-and-methodology.qmd were used only as roadmap/methodology context.
Update (2026-05-10): Four extraction files previously not in this register have now been added as LIT-032 through LIT-035: paper-extraction-pew-2020-zero-inflated-crash.md, paper-extraction-mahoney-2023-spatial-cv.md, paper-extraction-gao-2024-stzitd-gnn.md, and paper-extraction-balawi-tenekeci-2024-arima-sarimax-london-aroads.md. The file paper-extraction-chengye-ranjitkar-2013-motorway-nb-regression.md was confirmed as the source file for the existing LIT-009 row and required no new entry.
Update (2026-05-10): Six additional review-pass extraction files have been added as LIT-036 through LIT-041: paper-extraction-huda-2024-network-screening-low-volume-roads.md, paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md, paper-extraction-poch-mannering-1996-nb-intersection.md, paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md, paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md, and paper-extraction-ziakopoulos-yannis-2020-spatial-review.md. paper-extraction-mcfadden-not-stated-conditional-logit.md was removed from this register because it is not a road-safety paper and has no material current relevance to Open Road Risk.
Update (2026-05-10): Four final combined/reconciled records have been added as LIT-042 through LIT-045: paper-extraction-huda-2024-COMBINED.md, paper-extraction-jayasinghe-2019-COMBINED.md, paper-extraction-poch-mannering-1996-COMBINED.md, and paper-extraction-roll-2026-oregon-COMBINED.md. These are now the preferred records for future citation/TODO work for those papers; the earlier extraction files remain in the inventory for provenance.
Update (2026-05-10): Two further combined/reconciled records have been added as LIT-046 and LIT-047: paper-extraction-quddus-wang-ison-COMBINED.md and paper-extraction-ziakopoulos-yannis-2020-COMBINED.md. These move Quddus/Wang/Ison and Ziakopoulos/Yannis out of the active reconciliation queue. All seven candidate Quarto literature pages now exist under quarto/literature/ and have been added to the website Literature menu.
Update (2026-05-12): Eleven additional extraction files have been added as LIT-048 through LIT-058: paper-extraction-quddus-2007-inar-time-series-count.md, paper-extraction-mensah-hauer-1998-two-problems-averaging.md, paper-extraction-qin-et-al-2006-bayesian-hourly-exposure.md, paper-extraction-dutta-2020-freeway-crash-prediction-disaggregate-flow.md, paper-extraction-sung-et-al-2024-modified-temporal-spf.md, paper-extraction-savolainen-et-al-2011-severity-modelling-review.md, paper-extraction-roshandel-2015-realtime-traffic-freeway-crash.md, paper-extraction-national-highways-2022-comparing-collision-casualty-rates.md, paper-extraction-dft-2024-rsf-initial-analysis.md, paper-extraction-dft-2025-guide-cf-rsf-transition.md, and paper-extraction-dft-2025-reported-road-casualties-gb-2024.md. The associated Quarto literature references have been updated from LIT-PENDING to these register IDs where an extraction exists.
Update (2026-05-12): paper-extraction-wang-2011-two-stage-severity-ranking.md has been added as LIT-059 and linked from quarto/literature/severity-modelling.qmd.
Update (2026-05-24): Three new extraction files have been added as LIT-060 through LIT-062: paper-extraction-khodadadi-2021-NB-parameterisations-NFAS-SPF.md (LIT-060), paper-extraction-asumadu-2015-poisson-NB-Ghana-road-accidents.md (LIT-061), and paper-extraction-verhoef-boveng-2007-quasipoisson-vs-NB-overdispersion.md (LIT-062). LIT-060 (Khodadadi 2021) is the highest-priority new addition: it provides direct evidence for NB-L model superiority on zero-heavy low-sample-mean crash data and confirms sub-linear length elasticity across 30 model variants, strengthening the evidence base for LIT-TODO-002, LIT-TODO-016, and LIT-TODO-022. Three new TODOs have been added: LIT-TODO-031 (NB-L candidate model on sampled data), LIT-TODO-032 (variance-mean diagnostic plot), LIT-TODO-033 (document AIC limitation for QP vs NB comparison).