Open Road Risk
  • Home
  • Project
    • Project overview
    • Current model status
    • AI-assisted development
  • Background
    • Metrics and methodology
    • Literature evidence register
  • Literature
    • Crash frequency models
    • Exposure and traffic volume
    • Spatial methods and network risk
    • Junctions and conflict structure
    • Severity modelling
    • Validation and metrics
    • Transferability and open data limits
  • Data Sources
    • Overview
    • STATS19 Collisions
    • OS Open Roads
    • AADF Traffic Counts
    • WebTRIS Sensors
    • Network Model GDB
  • Methodology
    • Methodology Overview
    • Joining the Datasets
    • Feature Engineering
    • Empirical Bayes Shrinkage
  • Exploratory Data Analysis
    • Collision EDA
    • Collision-Exposure Behaviour
    • Vehicle Mix Analysis
    • Road Curvature
    • Months and Days of Week
    • Traffic Volume EDA
    • OSM Coverage
  • Models
    • Modelling Approach
    • Stage 1a: Traffic Volume
    • Stage 1b: Time-Zone Profiles
    • Stage 2: Collision Risk Model
    • Facility Family Split
    • Model Inventory
  • Outputs
    • Top-risk map
  • Future Work

On this page

  • Purpose and scope
  • How to update this register
  • Extraction inventory
  • Thematic evidence matrix
    • Crash-frequency and count modelling
    • Exposure and traffic-volume handling
    • Spatial and network methods
    • Junctions, intersections, and conflict structure
    • Severity modelling
    • Validation, metrics, and model assessment
    • Point-process / hotspot / spatial diagnostics
    • Methods to avoid as production changes for now
  • Code and documentation implications
  • Current-code alignment assessment
    • Current strengths
    • Current weaknesses / limitations to document
    • Current areas where the repo is deliberately conservative
  • Claims Open Road Risk can safely make
    • Safer claims
    • Claims not yet supported
  • Secondary review queue
    • Missing or weak review queue
    • Completed reconciliation records
    • Active reconciliation / combination check queue
  • Candidate Quarto literature pages
  • Appendices
    • Register taxonomy
    • Known extraction files not yet processed

Literature Evidence Register

Note

This is a structured, maintainer-facing evidence register for tracing extracted literature evidence into documentation, model checks, and TODOs. It is not the final narrative literature review.

Purpose and scope

This is a maintainable evidence register for Open Road Risk. It is not the final narrative literature review.

  • It tracks extracted papers and source files.
  • It records methodological relevance to Open Road Risk.
  • It separates current repo relevance from future research relevance.
  • It records provenance and extraction quality.
  • It supports future Quarto literature pages, repo TODOs, and model evaluation.
  • It should be updated append-only when new paper extractions are added.

The main source of truth is the existing extraction Markdown in literature/papers_summary/. This register does not re-read source PDFs and does not infer beyond the extraction files. Where extractions are duplicated for the same paper, each extraction file is kept as its own row so provenance across AI tools/models is preserved.

How to update this register

  • Add one row to the inventory table for each new extraction file.
  • Add or update thematic rows only where the new paper contributes evidence.
  • Add repo TODOs only when supported by the extraction.
  • Add secondary review flags where needed.
  • Do not rewrite existing judgements unless the new paper changes the evidence base.
  • Preserve previous source filenames and extraction filenames.
  • If importance changes because the repo changes, update current_repo_relevance but preserve future_research_relevance.
  • Prefer append-only edits. If a judgement changes, add a note explaining why rather than silently replacing earlier context.
  • Keep current implementation actions separate from future research ideas.

Extraction inventory

register_id extraction_file source_pdf_filename paper_title authors year paper_type geography road_setting main_method_or_model outcome_or_target exposure_handling spatial_unit temporal_unit validation_type key_transferable_idea current_repo_relevance future_research_relevance literature_review_relevance code_actionability_now supports_production_change extraction_ai_tool model_name_if_known extraction_quality_initial_judgement secondary_review_needed secondary_review_reason notes
LIT-001 paper-extraction-aguero-valverde-2008-crash-frequency-spatial-models.md Paper08-0088RG.pdf Analysis of Road Crash Frequency with Spatial Models Jonathan Aguero-Valverde; Paul P. Jovanis 2008 crash-frequency SPF; spatial model comparison US; Pennsylvania mixed road classes Full Bayes spatial CAR vs non-spatial NB total crashes per segment VMT from AADT and length; observed/assumed PennDOT variable-length segments 5-year aggregate in-sample model comparison Spatial correlation can change crash-frequency parameter estimates; VMT-style exposure supports current framing high high high medium diagnostic-only Gemini Gemini high conditional Manual check only if quoting coefficient/table values Duplicate source paper with LIT-002; keep both for provenance.
LIT-002 paper-extraction-aguero-valverde-2008-spatial-car-crash-frequency.md Paper08-0088RG.pdf Analysis of Road Crash Frequency with Spatial Models Jonathan Aguero-Valverde; Paul P. Jovanis 2008 crash-frequency SPF; spatial autocorrelation assessment US; Pennsylvania rural two-lane roads Full Bayes Poisson lognormal with CAR random effects annual crash count per segment AADT as free log covariate; length as fixed offset rural road segment; intersections and ramps excluded annual segment-year in-sample DIC and spatial diagnostics Test AADT elasticity and spatial residual autocorrelation rather than assume offset is always correct high high high high diagnostic-only Claude Claude Sonnet 4.6 high conditional Review Table 2 before citing exact elasticity values Strong action source for Moran’s I, residual maps, and free-AADT elasticity diagnostics.
LIT-003 paper-extraction-al-omari-2021-florida-context-classification-spf.md Crash_Analysis_And_Development_Of_Safety_Performance_Functions_Fo.pdf Crash Analysis and Development of Safety Performance Functions for Florida Roads in the Framework of the Context Classification System Ma’en Mohammad Ali Al-Omari 2021 thesis; SPF; network screening US; Florida rural to urban context classes; road segments Negative binomial SPFs; EB network screening annual crash frequency by segment; KABCO/KABC/PDO variants observed FDOT AADT; AADT plus length offset and DVMT alternatives merged homogeneous road segments annual average over 5 years mainly in-sample; no holdout noted Context-class SPFs and urban sub-linear AADT coefficients motivate stratified diagnostics medium high high medium baseline-comparison-first Claude Claude Sonnet 4.6 medium conditional Thesis, no peer-review/holdout; check counterintuitive PSP coefficient if citing Useful for junction density/access density and road-class stratification, not direct production change.
LIT-004 paper-extraction-baddeley-2021-analysing-point-patterns-networks.md 91405.pdf Analysing point patterns on networks - a review Adrian Baddeley; Gopalan Nair; Suman Rakshit; Greg McSwiggan; Tilman M. Davies 2021 methodological review; point processes mixed/theoretical linear networks Network point processes; network KDE; Cox/Poisson processes spatial point intensity on a network traffic volume in example; point-process intensity not traffic exposure exact point coordinates on continuous linear network mostly static; spatio-temporal noted methodological review; no predictive validation Segment aggregation can hide point-level clustering; avoid ordinary planar KDE for network crashes medium high high low diagnostic-only Gemini Gemini high conditional Review methods before any spatstat implementation Critiques link-year aggregation but does not invalidate current pipeline.
LIT-005 paper-extraction-boulieri-2016-space-time-bayesian-severity.md Boulieri_et_al-2016-Journal_of_the_Royal_Statistical_Society__Series_A_Statistics_in_Society.pdf A space-time multivariate Bayesian model to analyse road traffic accidents by severity Areti Boulieri; Silvia Liverani; Kees de Hoogh; Marta Blangiardo 2016 Bayesian severity/spatial model England mixed road types aggregated to wards Bayesian hierarchical Poisson lognormal with CAR/MCAR/RW1 effects ward-year accident counts by slight vs severe/fatal ward traffic volume from AADF times road length; partly imputed; major-road coverage electoral ward annual in-sample Bayesian model comparison and posterior checks Severity levels can have distinct spatial structure; offset structure aligns at coarser grain medium high high medium pilot-first Claude Claude Sonnet 4.6 high conditional Ward-level and MCMC scale limit direct transfer Good documentation support for severity caveat and year-specific AADT need.
LIT-006 paper-extraction-brodersen-2010-balanced-accuracy-posterior.md brodersen10post-balacc.pdf The balanced accuracy and its posterior distribution Kay H. Brodersen; Cheng Soon Ong; Klaas E. Stephan; Joachim M. Buhmann 2010 validation metric; classification methodology not road-specific not road-specific Bayesian posterior distribution of balanced accuracy binary class-label correctness not applicable abstract data point not stated cross-validation metric framework Use balanced accuracy and posterior uncertainty for imbalanced zero/non-zero diagnostics medium medium medium medium diagnostic-only Gemini Gemini 1.5 Pro high conditional Manual check equations if implementing posterior intervals Not road-safety evidence; validation reference only.
LIT-007 paper-extraction-brodersen-2010-balanced-accuracy.md brodersen10post-balacc.pdf The balanced accuracy and its posterior distribution Kay H. Brodersen; Cheng Soon Ong; Klaas E. Stephan; Joachim M. Buhmann 2010 validation metric; classification methodology not stated not road-specific posterior balanced accuracy estimators binary classification correctness not applicable not stated not stated conceptual cross-validation examples Warn against plain accuracy for rare collision/no-collision diagnostics medium medium medium medium diagnostic-only ChatGPT GPT-5.5 Thinking high conditional Manual check Equation 7 and MATLAB routines if implementing Duplicate source paper with LIT-006; use to compare extraction consistency.
LIT-008 paper-extraction-chengye-2013-modelling-motorway-accidents-nb.md Modelling Motorway Accidents using Negative Binomial Regression.pdf Modelling Motorway Accidents using Negative Binomial Regression Pan Chengye; Prakash Ranjitkar 2013 motorway SPF; NB regression New Zealand; Auckland motorway; urban/rural Negative binomial regression; GEE considered annual accident frequency per segment AADT per lane and length as free log covariates homogeneous motorway segments; ramp-defined yearly in-sample and temporally held-out prediction metrics Temporal holdout and ramp context diagnostics are useful; standard NB struggles on short/zero-heavy links medium high high medium diagnostic-only Gemini Gemini high conditional Check Equation 10 before any exposure-specification comparison Same source family as LIT-009 and LIT-024.
LIT-009 paper-extraction-chengye-ranjitkar-2013-motorway-nb-regression.md Modelling_Motorway_Accidents_using_Negative_Binomial_Regression.pdf Modelling Motorway Accidents using Negative Binomial Regression Pan Chengye; Prakash Ranjitkar 2013 motorway SPF; feature importance New Zealand; Auckland motorway only Negative binomial accident prediction model annual accident count per motorway segment observed AADT per lane and length as free log covariates; no formal offset homogeneous motorway mainline segment; ramp crashes excluded annual segment-year 2009-2010 temporal holdout after 2004-2008 training Add temporal holdout, ramp/slip-road diagnostics, per-family overdispersion checks high high high high baseline-comparison-first Claude Claude Sonnet 4.6 high conditional Review marginal variables due to weak 80% significance threshold Strong TODO source for validation and motorway facility-family diagnostics.
LIT-010 paper-extraction-cronie-2019-inhomogeneous-linear-network.md Inhomogeneous higher-order.pdf Inhomogeneous higher-order summary statistics for linear network point processes Ottmar Cronie; Mehdi Moradi; Jorge Mateu 2019 spatial point-process diagnostics US example; Houston road network example inhomogeneous network F/G/J functions; simulations accident point locations on linear network no traffic exposure; spatial intensity reweighting only point events on linear network static one-month example simulation/method diagnostics; no predictive validation Distinguish exposure-normalised risk from point-pattern clustering diagnostics low medium medium low diagnostic-only ChatGPT GPT-5.5 Thinking medium yes Equation formatting imperfect; check before implementation Useful for a small diagnostic pilot only.
LIT-011 paper-extraction-eckardt-2024-marked-point-process-rejoinder.md Rejoinder on ’Marked spatial point processes_ current state and extensions.pdf Rejoinder on ‘Marked spatial point processes: current state and extensions to point processes on linear networks’ Matthias Eckardt; Mehdi Moradi 2024 methodological discussion; marked point processes not stated not road-specific marked point process summaries; K/J functions; mark correlation point events with marks traffic exposure not applicable; intensity is not exposure point events on planar or linear networks not stated methodological discussion Marked point-process diagnostics may explore severity/type clustering but not production ranking low medium medium low no ChatGPT GPT-5.5 Thinking high conditional Check formulas directly if citing Keep as exploratory diagnostics reference.
LIT-012 paper-extraction-gilardi-2022-leeds-network-lattice-bayesian.md jrsssa_185_3_1150.pdf Multivariate hierarchical analysis of car crashes data considering a spatial network lattice Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace 2022 UK network-lattice Bayesian SPF UK; Leeds urban/metropolitan major roads Bayesian hierarchical Poisson INLA with ICAR/PCAR and multivariate severity OS segment counts by severe and slight crash length times Census-routed commuter flow as Poisson offset; estimated exposure OS road segment 8-year aggregate cross-section in-sample posterior predictive checks; no holdout Direct UK support for OS segment lattice and log-offset form; balanced accuracy for sparse severity high high high high diagnostic-only Claude Claude Sonnet 4.6 high yes Check wide Table 2 signs and Primary Road interpretation before citation Primary UK anchor for Stage 2 documentation, but not external validation.
LIT-013 paper-extraction-gilardi-2022-multivariate-hierarchical-crashes.md jrsssa_185_3_1150.pdf Multivariate hierarchical analysis of car crashes data considering a spatial network lattice Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace 2022 Bayesian spatial/severity model UK; Leeds urban/metropolitan INLA Bayesian hierarchical Poisson car crashes per street segment length times estimated traffic flow OS Vector OpenMap Local road segment 8-year aggregate in-sample posterior predictive diagnostics OS link lattice is a credible unit; balanced accuracy useful for zero/non-zero checks high high high medium diagnostic-only Gemini Gemini 3.1 Pro high conditional Check dodgr contraction details if replicating MAUP test Duplicate source paper with LIT-012/LIT-014.
LIT-014 paper-extraction-gilardi-2022-network-lattice-crashes.md jrsssa_185_3_1150.pdf Multivariate hierarchical analysis of car crashes data considering a spatial network lattice Andrea Gilardi; Jorge Mateu; Riccardo Borgoni; Robin Lovelace 2022 Bayesian spatial/severity SPF UK; Leeds major roads including motorways, primary roads, A roads bivariate Bayesian hierarchical Poisson models with ICAR/PCAR slight and severe road traffic collision counts length times estimated commuter flow as offset OS road segment with adjacency by shared boundary yearly available; collapsed to 8-year aggregate in-sample posterior predictive checks Spatial autocorrelation limitation and MAUP sensitivity tests are directly relevant high high high high diagnostic-only Claude Claude Sonnet 4.6 high conditional Manual review noted in extraction for table/sign details Richest extraction for repo actions from this paper.
LIT-015 paper-extraction-hauer-2001-eb-spf-tutorial.md SPF_Basic_Tutorial_2001_by_Ezra_Hauer.pdf Estimating Safety by the Empirical Bayes Method: A Tutorial Ezra Hauer; Douglas W. Harwood; Forrest M. Council; Michael S. Griffith 2001 EB/SPF tutorial not one geography segments and intersections Empirical Bayes expected crash frequency using SPF plus observed counts expected accidents for road entity ADT/AADT in SPF; length and years multiply expected count road segment or intersection entity annual worked tutorial; no holdout EB shrinkage should use correct overdispersion and full year-specific procedure high high high high diagnostic-only Claude Claude Sonnet 4.6 high conditional Check equations if implementing exact EB changes Primary EB methodology reference.
LIT-016 paper-extraction-huda-alkaisy-2024-lvr-network-screening.md dot_78279_DS1.pdf Network Screening on Low-Volume Roads Using Risk Factors Kazi Tahsin Huda; Ahmed Al-Kaisy 2024 low-volume road network screening US; Oregon rural low-volume two-lane roads OLS on log EB expected crashes; CART thresholds EB expected crashes per 0.05-mile section AADT covariate in one model; dropped in another; no offset due fixed length fixed 0.05-mile sections; intersections excluded annual aggregate random split/high R2 on EB output; not raw-count validation Low-volume links may need geometry-led diagnostics and careful AADT sensitivity framing medium high high medium pilot-first Claude Claude Sonnet 4.6 high yes Check grade variable ambiguity and avoid over-reading high R2 Strong for curvature/grade diagnostics, not production thresholds.
LIT-017 paper-extraction-jayasinghe-2019-centrality-aadt.md 1-s2_0-S2215016119301128-main.pdf A novel approach to model traffic on road segments of large-scale urban road networks Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanam 2019 AADT estimation; traffic modelling mixed developing-country cities urban road networks OLS/robust/Poisson regressions with dual-graph centrality AADT/PCU per road segment AADT is target; observed counts used for calibration road segment in dual graph annual AADT random 80/20 validation; likely spatial leakage Centrality features and learning curves can inform Stage 1a AADF sparsity diagnostics high high medium medium baseline-comparison-first Claude Claude Sonnet 4.6 high yes Check low-AADT RMSE and exact final regression type Traffic-volume paper, not collision-risk evidence.
LIT-018 paper-extraction-jayasinghe-2019-traffic-modeling-centrality.md 1-s2.0-S2215016119301128-main.pdf A novel approach to model traffic on road segments of large-scale urban road networks Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanama 2019 AADT estimation; traffic modelling urban/mixed urban road networks centrality-based OLS/RR/Poisson traffic volume model AADT in PCU exposure is modelled output road segment dual graph yearly average daily traffic random validation; spatial leakage risk Stage 1a should report spatial holdouts and sensitivity to count-point sparsity high high medium medium baseline-comparison-first Gemini Gemini 3.1 Pro high conditional Verify centrality radius/compute feasibility before implementation Duplicate source paper with LIT-017.
LIT-019 paper-extraction-lord-2010-crash-frequency-review.md Lord-Mannering_Review.pdf The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives Dominique Lord; Fred Mannering 2010 methodological review; crash-frequency modelling mixed road segments/intersections across reviewed studies review of Poisson, NB, zero-inflated, random effects, GAM, ML and Bayesian models crash frequency over roadway units traffic flow/length/VMT discussed across studies road segment/intersection/other mixed review; no empirical validation Use as risk checklist: overdispersion, zero-heavy counts, exposure functional form, omitted variables, spatial/temporal correlation high high high high diagnostic-only ChatGPT GPT-5.5 Thinking high conditional Review large tables if building full model-family comparison Best general modelling limitations reference.
LIT-020 paper-extraction-ma-2019-xgboost-fatality.md analyzing-the-leading-causes-of-traffic-fatalities-using-1jznp146gl.pdf Analyzing the Leading Causes of Traffic Fatalities Using XGBoost and Grid-Based Analysis: A City Management Perspective Jun Ma; Yuexiong Ding; Jack C. P. Cheng; Yi Tan; Vincent J. L. Gan; Jingcheng Zhang 2019 conditional severity/fatality classifier US; Los Angeles County mixed urban/peri-urban XGBoost binary classifier plus grid GIS fatal vs non-fatal crash given a crash no traffic exposure; fatality rate not exposure-adjusted crash record and 60x60 grid crash-time fields included; no panel train/test on balanced crash data; no exposure validation Separate conditional severity/fatality from exposure-adjusted frequency; watch leakage from crash-record features medium medium medium medium diagnostic-only Claude Claude Sonnet 4.6 high conditional Check unusual XGBoost learning-rate details if replicating Do not compare fatality rate to Stage 2 risk percentile.
LIT-022 paper-extraction-michalaki-2015-motorway-accident-severity-chatgpt.md 1-s2.0-S0022437515000833-main.pdf Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model Paraskevi Michalaki; Mohammed A. Quddus; David Pitfield; Andrew Huetson 2015 motorway severity modelling England motorway ordered logit; multilevel ordered logit; generalized ordered logit accident severity conditional on crash no formal exposure; time category proxy accident record accident-level with broad time categories in-sample; no held-out validation Frequency and severity are separate; post-event variables must not leak into Stage 2 predictors medium high high medium diagnostic-only ChatGPT GPT-5.5 Thinking high conditional Check STATS19 hard-shoulder/main-carriageway transfer Duplicate source paper with LIT-023.
LIT-023 paper-extraction-michalaki-2015-motorway-accident-severity.md 1-s2.0-S0022437515000833-main.pdf Exploring the factors affecting motorway accident severity in England using the generalised ordered logistic regression model Paraskevi Michalaki; Mohammed A. Quddus; David Pitfield; Andrew Huetson 2015 motorway severity modelling England motorway hard shoulder/main carriageway partially constrained generalized ordered logistic regression accident severity no explicit exposure; severity conditional on crash location type/accident record time-of-day/day/month categories in-sample HGV/off-peak/hard-shoulder diagnostics belong in severity work, not current frequency model medium high high medium diagnostic-only Gemini Gemini 3.1 Pro high conditional Check STATS20/STATS19 hard-shoulder encoding before diagnostics Warns against using number of vehicles/casualties as prospective features.
LIT-024 paper-extraction-pan-2013-motorway-negative-binomial.md Modelling Motorway Accidents using Negative Binomial Regression.pdf Modelling Motorway Accidents using Negative Binomial Regression Pan Chengye; Prakash Ranjitkar 2013 motorway SPF; NB regression New Zealand; Auckland motorway rural/urban Poisson/NB/ZINB/GEE tested; NB selected annual accident frequency per segment-year observed AADT per lane plus length as free log regressors homogeneous motorway segment-year; ramp segmentation yearly temporal holdout plus in-sample metrics Facility context, ramp proximity, temporal holdout and geometry sanity checks are useful medium high high medium diagnostic-only ChatGPT GPT-5.5 Thinking high conditional Check table values and equations before formal citation Duplicate source family with LIT-008/LIT-009.
LIT-025 paper-extraction-pan-2017-deep-belief-network-global-spf.md.md 1-s2_0-S2046043017300199-main.pdf Development of a global road safety performance function using deep neural networks Guangyuan Pan; Liping Fu; Lalita Thakali 2017 global SPF; neural model benchmark Canada/US; multiple regions mixed highway types Deep Belief Network; NB benchmarks; Bayesian regularised ANN annual crash frequency per homogeneous section-year observed AADT and length as DBN features; NB uses log exposure variants homogeneous road section annual segment-year train/test style performance; metrics mainly MAE/RMSE DBN with MSE is not suitable for sparse count production; use NB/log-offset comparisons and minimum-length diagnostics medium medium high medium baseline-comparison-first Claude Claude Sonnet 4.6 medium conditional DBN technical details and crash scope need checking Has useful negative evidence against neural MSE production changes.
LIT-026 paper-extraction-poch-1996-intersection-negative-binomial.md Negative_Binomial_Analysis_of_Intersection-Acciden.pdf Negative Binomial Analysis of Intersection-Accident Frequencies Mark Poch; Fred Mannering 1996 intersection SPF; NB regression US urban/suburban intersections Negative binomial regression annual accident frequency on intersection approach turning and intersection traffic volumes as covariates; no formal offset intersection approach yearly no external/held-out validation stated Junction approach mechanisms are structurally different from link risk; use junction diagnostics/proxies medium high high medium pilot-first ChatGPT GPT-5.5 Thinking high yes OCR artefacts; check tables before formal literature table Strong warning about junction under-representation.
LIT-027 paper-extraction-quddus-2010-m25-severity-ordered-response.md road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf Road Traffic Congestion and Crash Severity: Econometric Analysis Using Ordered Response Models Mohammed A. Quddus; Chao Wang; Stephen G. Ison circa 2010 motorway severity model; ordered response UK; M25 motorway OLOGIT/HCM/GOLOGIT/PC-GOLOGIT ordinal crash severity given crash no exposure; 15-minute flow as severity predictor with 30-min lag individual crash matched to motorway segment crash-level 15-minute traffic lag in-sample ordered response metrics Use 30-minute pre-crash lag if future WebTRIS crash-level work; separate frequency vs severity medium high high medium diagnostic-only Claude Claude Sonnet 4.6 high yes Check dense Table 2 and whether junction crashes excluded Supports severity/frequency separation and cautious congestion claims.
LIT-028 paper-extraction-roll-2026-oregon-pedestrian-spf.md dot_89189_DS1.pdf Developing a Pedestrian Safety Performance Function for Oregon Josh Roll; Jason Anderson; Nathan McNeil 2026 pedestrian/intersection SPF; exposure estimation US; Oregon urban intersections Poisson/NB SPFs; random forest exposure data fusion pedestrian injury crashes per intersection-year vehicle AADT and estimated pedestrian AADPT; both partly estimated urban intersection annual 10-fold CV for exposure model; SPF tables partly unreadable Exposure-only vs full-feature baseline comparisons and CURE plots could inform Stage 2 diagnostics low medium medium medium diagnostic-only Claude Claude Sonnet 4.6 medium yes Report tables not fully machine-readable; check SPF forms and AADPT metrics Scope is pedestrian intersections, not link-level all-injury risk.
LIT-029 paper-extraction-wang-2009-m25-congestion-safety.md Wang_et_al_AAP_Final_submitted1.pdf Impact of Traffic Congestion on Road Safety: A Spatial Analysis of the M25 Motorway in England Chao Wang; Mohammed A. Quddus; Stephen G. Ison circa 2009 motorway SPF; congestion and spatial model UK; M25 motorway Poisson-lognormal, NB, CAR spatial variants accident count per motorway segment observed UKHA AADT and segment length as free log covariates; no offset junction-to-junction motorway segment; junction crashes excluded annual aggregate in-sample Bayesian/model comparison Motorway AADT elasticity and grade/congestion diagnostics; bearing can improve snapping QA medium high high medium diagnostic-only Claude Claude Sonnet 4.6 high conditional Publication year not in document; DIC differences small Companion to Quddus severity paper for congestion null result.
LIT-030 paper-extraction-wang-2015-investigating-safety-impacts-suburban-arterials.md 1805.06381v3.pdf Investigating Safety Impacts of Roadway Network Features of Suburban Arterials in Shanghai, China Xuesong Wang; Jinghui Yuan; Grant G. Schultz; Wenjing Meng 2015 zonal spatial crash model China; Shanghai suburban arterials Bayesian Poisson-lognormal CAR total crash frequency on arterials within TAZ trip productions/attractions and arterial length as exposure proxies; no AADT Traffic Analysis Zone yearly in-sample R2; no true held-out validation Junction/signal/access-density proxies may matter, but zonal unit is low transferability low medium medium low diagnostic-only Gemini Gemini 3.1 Pro high conditional Betweenness computed within TAZ, not global; in-sample R2 only Use as junction/network-complexity prompt, not model benchmark.
LIT-031 paper-extraction-ziakopoulos-2020-spatial-approaches-road-safety.md A review of spatial approaches in road safety.pdf A review of spatial approaches in road safety Apostolos Ziakopoulos; George Yannis not stated in visible metadata spatial road-safety review mixed mixed review of spatial/spatio-temporal methods crash counts/rates/severity/hotspots across reviewed studies mixed exposure definitions mixed units: links, intersections, grids, zones, corridors mixed review; primary studies need checking for exact claims Supports spatial validation, MAUP sensitivity, proximity/junction diagnostics and caution about production spatial models high high high high diagnostic-only ChatGPT GPT-5.5 Thinking high yes Year/DOI missing; check primary papers for exact numerical claims Broad review; do not use alone to justify a production model swap.
LIT-032 paper-extraction-pew-2020-zero-inflated-crash.md Justification_for_considering_zero-inflated_models_in_crash_frequency_analysis.pdf Justification for considering zero-inflated models in crash frequency analysis Timo Pew; Richard L. Warr; Grant G. Schultz; Matthew Heaton 2020 zero-inflated model comparison; Bayesian hierarchical count modelling US; Utah signalised intersections statewide (urban and rural) Bayesian hierarchical ZIP; ZINB; NB-Lindley; MCMC via JAGS annual injury and fatal crash count per intersection entering vehicles per day as standardised covariate — no formal offset signalised intersection annual (2014–2017 fitting; 2018 held out) temporal holdout (2018); Bayesian chi-squared goodness-of-fit; posterior predictive zero check; WAIC ZINB improvement over Poisson driven mainly by overdispersion parameter (π ≈ 0); NB GLM with offset is the priority diagnostic step, not full zero-inflation high high high high baseline-comparison-first Claude Claude Sonnet 4.6 high conditional Verify Table A1 π ≈ 0 finding in original PDF before citing; check prior sensitivity on Beta(0.15,1) Critical nuance: π posterior mean ≈ 0 in both ZIP and ZINB — improvement over Poisson is from ϕ dispersion, not zero-inflation. Intersection unit not link; counts are much higher than Open Road Risk link-years. No exposure offset — does not challenge Open Road Risk offset design.
LIT-033 paper-extraction-mahoney-2023-spatial-cv.md ASSESSING_THE_PERFORMANCE_OF_SPATIAL_CROSS-VALIDATION.pdf Assessing the Performance of Spatial Cross-Validation Approaches for Models of Spatially Structured Data Michael J Mahoney; Lucas K Johnson; Julia Silge; Hannah Frick; Max Kuhn; Colin M Beier 2023 spatial CV methodology; simulation study simulation (no specific geography) not road-specific random forest on simulated spatially structured continuous outcome; five CV method comparisons simulated continuous outcome (not a crash count) not applicable regular 50×50 grid cells not applicable cross-landscape prediction as external reference; 100 simulated landscapes V-fold CV is severely optimistic for spatially autocorrelated data; spatial clustering CV with exclusion buffer ≈ autocorrelation range is the most practical improvement; current grouped-link split does not enforce spatial separation high high medium high diagnostic-only Claude Claude Sonnet 4.6 high conditional Specific buffer sizes (25–41% of grid length) are simulation-specific and do not transfer directly to road network; must estimate autocorrelation range from Stage 2 residuals first Not a road safety paper. Simulation uses continuous Gaussian outcome; zero-heavy count generalisation assumed but not tested. BLO3 performs poorly despite large buffers — do not assume larger buffer always helps. Regular grid assumption does not match OS Open Roads geometry.
LIT-034 paper-extraction-gao-2024-stzitd-gnn.md Uncertainty-Aware_Probabilistic_Graph_Neural_Networks_for_Road-Level.pdf Uncertainty-Aware Probabilistic Graph Neural Networks for Road-Level Traffic Crash Prediction Xiaowei Gao; Xinke Jiang; Dingyi Zhuang; Huanfa Chen; Shenhao Wang; Stephen Law; James Haworth 2024 probabilistic GNN; zero-inflated Tweedie; road-level crash prediction UK; London (Lambeth; Tower Hamlets; Westminster) urban road segments GRU temporal encoder + GAT spatial encoder + ZITD decoder (STZITD-GNN); baselines include STGCN; STZINB-GNN; STTD-GNN daily severity-weighted crash risk score per road (y = sum of collision count × severity weight 1/2/3) no exposure; no offset; no traffic volume data urban road segment (OS-style link; ~4,700–5,700 nodes per borough) daily; 2019 only; 8:2:2 within-year temporal split within-year temporal holdout (no spatial holdout; no cross-year test) AccHR@k metric is directly applicable to Open Road Risk risk percentile ranking; MPIW/PICP for future probabilistic outputs; Gaussian distributional assumption is clearly worst medium high high medium diagnostic-only Claude Claude Sonnet 4.6 high conditional Verify Table 4 values against original; check whether 8:2:2 split is chronological or random (not stated); GitHub repo may be private No exposure offset — cannot distinguish high-risk from high-traffic roads; major methodological gap relative to Open Road Risk. Severity-weighted composite response variable not directly comparable to raw injury count. Daily urban scale vs annual national scale: zero-inflation mechanisms differ. GNN architecture not feasible at 2.17M links. Validation: same roads in train and test; single year; no spatial holdout — weaker than current Open Road Risk grouped split.
LIT-035 paper-extraction-balawi-tenekeci-2024-arima-sarimax-london-aroads.md Time_series_traffic_collision_analysis_of_London_hotspots__Patterns.pdf Time series traffic collision analysis of London hotspots: Patterns, predictions and prevention strategies Mohammad Balawi; Goktug Tenekeci 2024 ARIMA; SARIMAX; corridor-level time series UK; London (A1; A3; A4; A6 corridors) major A-road corridors (aggregate) ARIMA(5,4,7); SARIMAX(4,1,2)×(4,1,2,8) on daily corridor-level aggregate time series daily count of vehicles involved in accidents (not accident count — wrong response variable) no exposure; no AADT; corridor-level aggregate only four A-road corridors treated as a single aggregate time series daily; 2016–2019; December 2019 holdout only single-month temporal holdout (Christmas period); AIC/BIC in-sample Post-event STATS19 attributes (severity; light condition; road surface) must not enter Stage 2 as features — this paper inadvertently illustrates why low low low low no Claude Claude Sonnet 4.6 high — high confidence in the identified problems no No secondary review recommended; do not use as evidence for pipeline decisions CRITICAL: wrong response variable (vehicles involved, not accident count); SARIMAX predicts negative counts (model specification error); R-squared values in Table 3 implausibly high and methodology opaque; ARIMA d=4 order from misconfigured grid search (excluded d=0,1,2); log-likelihood sign inconsistency between ARIMA and SARIMAX tables; 80-20 split described but only 30-day Christmas holdout reported. Published in Heliyon (broad open-access). Do not cite as methodological support for any decision. Retained for completeness of literature search only.
LIT-036 paper-extraction-huda-2024-network-screening-low-volume-roads.md dot_78279_DS1.pdf Network Screening on Low-Volume Roads Using Risk Factors Kazi Tahsin Huda; Ahmed Al-Kaisy 2024 low-volume road network screening US; Oregon rural low-volume two-lane paved roads HSM EB expected crashes; CART thresholds; OLS log-linear screening equations EB expected crashes per 0.05-mile section; crash density for ranking AADT in HSM SPF and one proposed model; no-volume alternative model; no exposure offset fixed 0.05-mile roadway sections; intersections excluded annual expected crashes from 2004-2013 crash data random 80/20 split against EB expected-crash target; no spatial/temporal holdout Confirms Huda/Al-Kaisy as diagnostic support for low-volume, curvature/grade, and volume/no-volume sensitivity checks; flags EB-target R2 caveat medium high high medium pilot-first ChatGPT not stated high conditional Check curvature CART inconsistency and grade treatment before using thresholds Duplicate source paper with LIT-016. Stronger caveat that adjusted R2 predicts a smooth EB target, not raw future crashes.
LIT-037 paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md 1-s2.0-S2046043017300199-main.pdf Development of a global road safety performance function using deep neural networks Guangyuan Pan; Liping Fu; Lalita Thakali 2017 global SPF; DBN/ML benchmark Canada and US mixed highway segments Deep Belief Network with NB benchmarks and pooled/local model comparisons annual crash/collision frequency per homogeneous segment-year AADT and length as DBN inputs or NB exposure-like covariates; no clearly fixed offset homogeneous highway segments; coarse compared with OS Open Roads links annual segment-year; temporal holdouts for Ontario/Colorado temporally held-out MAE/RMSE for Ontario/Colorado; Washington split not fully stated; no spatial holdout Supports temporal holdout, local-vs-global/facility-family comparisons, and short-segment sensitivity; not production DBN medium medium high medium baseline-comparison-first ChatGPT GPT-5.5 Thinking high conditional Check Washington years/split and DBN normalization if reproducing Duplicate source paper with LIT-025; reinforces that DBN should be benchmark-only without ranking/spatial validation.
LIT-038 paper-extraction-poch-mannering-1996-nb-intersection.md Negative_Binomial_Analysis_of_Intersection-Acciden.pdf Negative Binomial Analysis of Intersection-Accident Frequencies Mark Poch; Fred Mannering 1996 intersection approach SPF; NB regression US; Bellevue, Washington urban/suburban intersections Negative binomial regression by intersection approach and crash type annual accident frequency per intersection approach approach turning/opposing/intersection traffic volumes as covariates; no offset intersection approach annual; 1987-1993 in-sample rho-squared and likelihood tests; no held-out validation Stronger second extraction for overdispersion and junction-approach mechanisms; confirms in-sample-only limitations medium high high medium pilot-first Claude Claude Sonnet 4.6 high conditional Check Table 1 coefficients and likelihood-ratio test values before citing Duplicate source paper with LIT-026; improves confidence despite old validation standards.
LIT-039 paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf Road Traffic Congestion and Crash Severity: An Econometric Analysis Using Ordered Response Models Mohammed A. Quddus; Chao Wang; Stephen G. Ison 2010 / manuscript year unclear motorway severity model; ordered response UK; M25 motorway OLOGIT/HCM/GOLOGIT/PC-GOLOGIT ordered crash severity conditional on crash 15-minute traffic flow and congestion matched with 30-minute lag; no exposure offset because target is severity conditional on crash crash records assigned to 72 motorway segments crash-level; 2003-2006; 15-minute traffic state lag in-sample ordered-response fit and marginal effects; no held-out validation Confirms severity/frequency separation, lagged traffic-state design, and conditional interpretation caveat medium high high medium diagnostic-only ChatGPT GPT-5.5 Thinking high conditional Check published ASCE citation year and Tables 2-3 against final version Duplicate source paper with LIT-027; clearer on in-sample metrics and conditional severity target.
LIT-040 paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md dot_89189_DS1.pdf Developing a Pedestrian Safety Performance Function for Oregon Josh Roll; Jason Anderson; Nathan McNeil 2026 pedestrian/intersection SPF; exposure estimation US; Oregon urban intersections Poisson/NB SPFs; pedestrian-volume data fusion; random forest/XGBoost/NN exposure models pedestrian crash frequency at intersections vehicle AADT and estimated pedestrian AADPT as explanatory exposure variables; no explicit offset urban intersection; contracted complex nodes annual average exposure; crash outcome years not fully stated in SPF sections exposure model 10-fold CV; SPF validation details/table extraction require care Supports junction/intersection future work, exposure-only vs proxy comparisons, and vulnerable-user exposure caveats low medium medium medium diagnostic-only ChatGPT GPT-5.5 Thinking high yes Long report; check SPF equations, crash-year window, and AADPT/AADT metrics before citing Duplicate source paper with LIT-028; broader report extraction confirms report-table review still needed.
LIT-041 paper-extraction-ziakopoulos-yannis-2020-spatial-review.md A_review_of_spatial_approaches_in_road_safety.pdf A review of spatial approaches in road safety Apostolos Ziakopoulos; George Yannis not explicitly stated; circa 2020 spatial road-safety review international review mixed review of spatial units, spatial models, MAUP, proximity, network KDE, VRU approaches mixed crash counts/rates/severity/hotspots across reviewed studies mixed: AADT, VMT/VDT, trips, road length, population; not offset-specific mixed: links, intersections, grids, zones, regions, network lixels mixed across reviewed studies review-level synthesis; no single validation protocol Second extraction reinforces spatial-unit, MAUP, junction-segment, and network-KDE cautions; exact primary-study values need source checks high high high high diagnostic-only Claude Claude Sonnet 4.6 high conditional Check primary papers before using numerical claims from review tables Duplicate source paper with LIT-031; improves confidence for high-level caution but not production model choice.
LIT-042 paper-extraction-huda-2024-COMBINED.md dot_78279_DS1.pdf Network Screening on Low-Volume Roads Using Risk Factors Kazi Tahsin Huda; Ahmed Al-Kaisy 2024 combined reconciliation record; low-volume road network screening US; Oregon rural low-volume two-lane paved roads HSM EB expected crashes; CART thresholds; OLS log-linear screening equations EB-smoothed expected crashes per 0.05-mile section; crash density for ranking AADT in HSM SPF/EB target and one proposed model; deliberate no-volume comparator; no exposure offset fixed 0.05-mile roadway sections; intersections excluded annual expected crashes from 2004-2013 crash data random 80/20 split against EB expected-crash target; no spatial/temporal holdout Canonical record clarifies EB target, no-offset structure, volume/no-volume scope, and curvature/grade caveats for low-volume diagnostics medium high high high pilot-first ChatGPT GPT-5.5 Thinking high conditional Curvature CART sharp-group value is internally inconsistent; grade should not be cited as final-model predictor without caution Combined record from original PDF plus LIT-016 and LIT-036; use this row for future Huda/Al-Kaisy citations.
LIT-043 paper-extraction-jayasinghe-2019-COMBINED.md 1-s2.0-S2215016119301128-main.pdf A novel approach to model traffic on road segments of large-scale urban road networks Amila Jayasinghe; Kazushi Sano; C. Chethika Abenayake; P.K.S. Mahanama 2019 combined reconciliation record; AADT estimation / traffic-volume modelling Sri Lanka; Cambodia; Vietnam; Pakistan; Tanzania urban road networks centrality-based traffic-volume model using betweenness, closeness, and path-distance weighting AADT / PCU per road segment AADT is the modelled target; observed counts used for calibration/validation; no collision exposure offset road segment in dual-graph road network cross-sectional annual AADT base year by city random 80/20 validation plus calibration-sample learning curve; no spatial holdout Canonical record supports Stage 1a centrality diagnostics, learning curves, AADT-band errors, and warnings about random spatial leakage high high medium high baseline-comparison-first ChatGPT GPT-5.5 Thinking high conditional Final selected regression type is implied but not fully documented across OLS/robust/Poisson alternatives Combined record from original PDF plus LIT-017 and LIT-018; traffic-exposure paper, not Stage 2 collision-risk evidence.
LIT-044 paper-extraction-poch-mannering-1996-COMBINED.md Negative_Binomial_Analysis_of_Intersection-Acciden.pdf Negative Binomial Analysis of Intersection-Accident Frequencies Mark Poch; Fred Mannering 1996 combined reconciliation record; intersection approach SPF US; Bellevue, Washington urban/suburban intersections Negative binomial regression for total and accident-type approach counts annual accident frequency per intersection approach approach and turning traffic volumes as covariates; no formal offset intersection approach annual observations from 1987-1993, excluding improvement year in-sample likelihood/rho-squared diagnostics; no held-out validation Canonical record confirms junction/approach mechanisms and NB-over-Poisson relevance while warning against link-level coefficient transfer medium high high medium pilot-first ChatGPT GPT-5.5 Thinking high conditional Exact accident-type table values should still be checked before formal publication because OCR is imperfect Combined record from original PDF plus LIT-026 and LIT-038; use this for junction/intersection evidence.
LIT-045 paper-extraction-roll-2026-oregon-COMBINED.md dot_89189_DS1.pdf Developing a Pedestrian Safety Performance Function for Oregon Josh Roll; Jason Anderson; Nathan McNeil 2026 combined reconciliation record; pedestrian/intersection SPF and exposure data fusion US; Oregon urban intersections Poisson/NB pedestrian SPFs; random-forest AADT/AADPT data fusion; CURE-style diagnostics pedestrian crash frequency at intersections vehicle AADT and estimated pedestrian AADPT as explanatory exposure variables; no explicit offset urban intersection with contraction of complex nodes annual average exposure; final SPF crash period not fully stated AADPT model 10-fold CV; final crash SPF diagnostics mainly in-sample; no clear held-out SPF validation Canonical record supports exposure-only baselines, CURE diagnostics, Stage 1a distribution checks, and separate future junction/pedestrian layer medium high high medium diagnostic-only ChatGPT GPT-5.5 Thinking medium-high conditional Check appendices only if exact SPF coefficients, exposure-only comparisons, or crash-assignment rules are needed Combined record from original PDF plus LIT-028 and LIT-040; use this for Roll/Oregon pedestrian SPF citations.
LIT-046 paper-extraction-quddus-wang-ison-COMBINED.md road-traffic-congestion-and-crash-severity-econometric-2rrbyxf6f0.pdf Road Traffic Congestion and Crash Severity: An Econometric Analysis Using Ordered Response Models Mohammed A. Quddus; Chao Wang; Stephen G. Ison not clearly stated; circa 2010 combined reconciliation record; motorway conditional severity model UK; M25 motorway ordered logit; heteroskedastic choice model; generalized ordered logit; partially constrained generalized ordered logit ordered crash severity conditional on crash occurrence no exposure offset; 15-minute traffic flow/congestion assigned to crash records using 30-minute pre-crash lag individual crash record matched to 72 motorway segments crash-level records from 2003-2006 with 15-minute traffic state lag in-sample ordered-response model fit and marginal effects; no held-out validation Canonical record clarifies conditional severity scope, pre-crash traffic-state matching, no-frequency interpretation, and post-event leakage cautions medium high high medium diagnostic-only ChatGPT GPT-5.5 Thinking high conditional Published version year/bibliographic details and exact Tables 2-3 should be checked before formal numeric reporting Combined record from original PDF plus LIT-027 and LIT-039; use this for Quddus/Wang/Ison severity citations.
LIT-047 paper-extraction-ziakopoulos-yannis-2020-COMBINED.md A review of spatial approaches in road safety.pdf A review of spatial approaches in road safety Apostolos Ziakopoulos; George Yannis not explicitly stated; circa 2020 combined reconciliation record; spatial road-safety review international review mixed review of spatial units, MAUP, spatial dependence, proximity structures, network KDE, GWR/CAR/SAR and spatio-temporal methods mixed crash counts, rates, severity outcomes, hotspot classifications, and spatial crash distributions mixed: AADT, VMT/VDT, road length, population, trips, and vulnerable-road-user exposure variables; no single offset structure mixed units: segments, intersections, corridors, grids, zones, regions, and network lixels mixed across reviewed studies review-level synthesis; primary studies need checking for exact method/validation claims Canonical record supports spatial-unit documentation, MAUP/hotspot sensitivity notes, spatial residual diagnostics, and caution against production spatial models from review evidence alone high high high high diagnostic-only ChatGPT GPT-5.5 Thinking high conditional Check original cited papers before using exact study-level model specifications, validation methods, or numerical claims Combined record from original PDF plus LIT-031 and LIT-041; use this for spatial-methods review citations.

Thematic evidence matrix

Crash-frequency and count modelling

paper method what it supports what it does not support relevance to current Stage 2 actionability
Aguero-Valverde & Jovanis 2008; Claude and Gemini extractions Poisson/NB/Poisson-lognormal spatial crash-frequency models Count modelling with exposure, overdispersion, and spatial residual diagnostics Direct national-scale CAR production model High: Stage 2 is a count/ranking model with exposure Run diagnostics for AADT elasticity, residual spatial autocorrelation, and spatial uncertainty notes.
Lord & Mannering 2010 broad crash-frequency methodological review Conservative framing around overdispersion, zero-heavy outcomes, omitted variables, exposure functional form Any single best model family High: maps directly to Stage 2 risks Add modelling limitations and baseline comparison tables.
Chengye & Ranjitkar 2013; three extractions NB motorway segment models with temporal holdout Temporal holdout, ramp/facility-family diagnostics, motorway-specific geometry checks Direct replacement of link-level model or uncritical coefficient transfer Medium: motorway subset only Add temporal holdout and ramp/slip-road diagnostic.
Gilardi et al. 2022; three extractions Bayesian Poisson network lattice with spatial/severity effects OS-segment count modelling, log-offset structure, balanced accuracy diagnostics External validation of Open Road Risk or national-scale INLA production High: closest UK link-network literature Add documentation and balanced accuracy diagnostic, not production spatial model.
Al-Omari 2021 NB SPFs by context class with EB screening Context/facility stratification and urban exposure elasticity diagnostics Direct coefficient transfer from Florida thesis Medium Baseline comparison of global vs road-family/context split models.
Hauer et al. 2001 EB tutorial using SPF prior plus observed counts EB shrinkage, regression-to-mean warning, overdispersion role A specific predictive model for Open Road Risk High for EB diagnostic layer Audit EB formula and document approximation.
Pan et al. 2017 DBN vs NB global SPF NB benchmark and minimum segment-length sensitivity DBN/MSE as production model for sparse injury counts Medium Use as baseline-comparison and methods-to-avoid evidence.
Pew et al. 2020 Bayesian ZIP; ZINB; NB-Lindley on Utah intersection panel Methodological justification for ZINB as candidate; posterior predictive zero check; NB GLM as priority diagnostic step Full Bayesian MCMC at 2.17M links; intersection-unit coefficients; no exposure offset High: π ≈ 0 finding means NB GLM with offset is the right first step, not full zero-inflation Fit NB GLM candidate; run posterior predictive zero check on current Poisson GLM.
Gao et al. 2024 STZITD-GNN (GRU + GAT + zero-inflated Tweedie) on London urban road-day data AccHR@k ranking metric; MPIW/PICP uncertainty metrics (future); Tweedie GLM as intermediate candidate Full GNN at national scale; no exposure offset; daily urban resolution; severity-weighted composite not raw count Medium: AccHR@k metric is immediately applicable; architecture does not transfer Implement AccHR@k as validation metric for Stage 2 risk percentile output.

Exposure and traffic-volume handling

paper exposure treatment transferable part non-transferable part implication for AADF/WebTRIS actionability
Gilardi et al. 2022 offset = segment length times estimated commuter flow Same mathematical log-offset family on UK OS segments Census commuter flow is weaker than AADF/AADT Supports documenting Open Road Risk’s AADT x length offset as literature-aligned Documentation note; no production change.
Hauer et al. 2001 ADT/AADT in SPF; length and years scale expected count Year-specific exposure and EB weighting logic Tutorial examples not full pipeline Supports using year-specific AADT in EB diagnostic Audit/upgrade EB diagnostic.
Aguero-Valverde & Jovanis 2008 AADT free coefficient; length offset Test whether AADT elasticity differs from 1.0 Rural US scope and intersection exclusion Run diagnostic freeing AADT coefficient from fixed VMT offset Diagnostic only.
Wang et al. 2009 AADT and length as free covariates, not offset Motorway-specific AADT elasticity check No sparse AADF estimation and long segments Motorway AADT coefficient may differ by road class Motorway-only diagnostic.
Jayasinghe et al. 2019 AADT is target, estimated from centrality and sparse counts Stage 1a centrality features, learning curves, sparse-count sensitivity Not a collision-risk paper; random validation likely leaks spatially Stage 1a should report spatial holdout and count-sparsity sensitivity Baseline comparison/diagnostic.
Roll et al. 2026 data-fusion vehicle/pedestrian exposure Compare exposure-only vs full-feature baselines; CURE plots Pedestrian/intersection scope; commercial/US data tiers Stage 1a analogy is conceptual only Documentation and diagnostic baseline.
Huda & Al-Kaisy 2024 AADT covariate dropped in one low-volume model Low-volume geometry/AADT-sensitivity diagnostic LVR-specific and EB-output response Test whether low-AADT links are dominated by geometry vs exposure uncertainty Pilot-first.

Spatial and network methods

paper spatial unit / network concept key spatial issue relevance to OS Open Roads links actionability
Gilardi et al. 2022 OS road segment lattice and shared-boundary adjacency spatial autocorrelation and MAUP/segment contraction High; closest OS-network analogue Document support, add MAUP pilot and adjacency residual diagnostics.
Aguero-Valverde & Jovanis 2008 road segments with CAR neighbourhoods unobserved spatial correlation biases coefficients/precision High as diagnostic concept; lower as production model Moran’s I and residual corridor mapping.
Ziakopoulos & Yannis 2020 review across links, intersections, zones, corridors spatial-unit sensitivity, boundary effects, proximity weights High as cautionary framework Spatial validation section and segmentation sensitivity pilot.
Baddeley et al. 2021 continuous network point process segment aggregation and planar KDE can mislead Conceptually high, production low Avoid ordinary 2D KDE; small point-process diagnostic only.
Cronie et al. 2019 linear-network point-process diagnostics point clustering after intensity adjustment Medium for snapped-collision diagnostics Small pilot on one urban area; not Stage 2 replacement.
Wang et al. 2015 TAZ-level CAR arterial model MAUP and zonal aggregation Low direct transfer Junction/signal density ideas only.

Junctions, intersections, and conflict structure

paper junction/intersection mechanism required data transferability current repo implication actionability
Poch & Mannering 1996 intersection approach-level traffic, turning, signal, geometry variables turning volumes, approach geometry, signal/control data Medium conceptually; low direct data coverage Pure link model under-represents junction mechanics Add junction-adjacent residual diagnostic and proxy feature pilot.
Roll et al. 2026 urban intersection SPF by type/control/crossing intersection inventory, pedestrian exposure, crossing/control data Low direct transfer Highlights missing junction-specific model class Documentation/future work; CURE diagnostics transferable.
Al-Omari 2021 access-point and signalized-intersection density as segment features junction/access density from inventory Medium if derived from OS/OSM topology Candidate junction density per link/corridor Diagnostic before feature inclusion.
Wang et al. 2015 signal spacing/access density at TAZ level signals/accesses and zonal network features Low to medium Possible missing urban conflict proxies Low-priority diagnostic.
Aguero-Valverde & Jovanis 2008 intersections/ramp crashes excluded in one extraction junction exclusion flag/sensitivity Medium as scope caveat Current STATS19-to-link snapping includes junction-proximate crashes Document and test near-junction sensitivity.
Hauer et al. 2001 intersections treated as separate EB entity type intersection entity definition and SPF High conceptually for future junction module Link and junction EB weights differ Future junction-level methodology note.

Severity modelling

paper severity target model type useful idea leakage risk current/future relevance
Boulieri et al. 2016 slight vs severe/fatal counts multivariate Bayesian Poisson at ward-year Severity strata can have distinct spatial patterns Low if kept as aggregate target; scale mismatch Current documentation; future severity target.
Gilardi et al. 2022 slight vs severe segment counts bivariate Bayesian Poisson network lattice Balanced accuracy for sparse severe counts; severity-specific rates Low for target; no holdout caveat High documentation/future relevance.
Michalaki et al. 2015 conditional motorway accident severity ordered/generalized ordered logit Frequency and severity mechanisms differ; HGV/hard-shoulder diagnostics High if using post-event variables as predictors Documentation and future accident-level severity module.
Quddus et al. circa 2010 conditional crash severity ordered response models with 30-minute traffic lag Pre-crash lag design for WebTRIS/crash-level work Post-event crash variables could leak Future severity/time-profile design.
Ma et al. 2019 fatal vs non-fatal crash XGBoost classifier Severity-feature importance and leakage warning High for crash-record features Diagnostic-only severity stratification.
Roll et al. 2026 pedestrian injury crashes intersection SPF Vulnerable-user exposure is separate from vehicle exposure Low for current all-injury link model Future active-travel literature only.

Validation, metrics, and model assessment

paper reported validation/metric type what the metric actually tests limitations Open Road Risk implication
Brodersen et al. 2010 posterior balanced accuracy imbalanced binary classifier performance and uncertainty Only applies after binarising outcomes Use for zero/non-zero or hotspot classification diagnostics, not count likelihood replacement.
Gilardi et al. 2022 posterior predictive balanced accuracy in-sample posterior predictive adequacy Not external/spatial holdout validation Label clearly and report alongside grouped holdout metrics.
Chengye & Ranjitkar 2013 MAD/MSPE temporal holdout temporal prediction for motorway segments motorway-only; longer homogeneous segments Add temporal holdout diagnostic to Stage 2.
Roll et al. 2026 exposure-only vs feature-rich SPF; CURE plots model misspecification against covariates intersection/pedestrian scope Use CURE plots and exposure-only baseline for GLM diagnostics.
Huda & Al-Kaisy 2024 high R2 predicting EB expected crashes fit to smoothed EB target, not raw crashes random split and circularity inflate fit Avoid comparing R2 to raw-count pseudo-R2.
Lord & Mannering 2010 review of fit/diagnostic issues model risk checklist no single empirical validation Use as validation documentation scaffold.
Ma et al. 2019 classifier metrics on balanced fatality data conditional fatal/nonfatal classification not exposure-adjusted and not frequency prediction Do not compare to Stage 2 risk percentile.
Mahoney et al. 2023 simulation comparison of V-fold vs spatial CV methods which CV method best estimates true out-of-sample error for spatially autocorrelated data simulation uses continuous outcome; regular grid not road network; zero-heavy counts not tested Current grouped-link split is temporal, not spatial CV; document this limitation; pilot police-force holdout; estimate autocorrelation range from Stage 2 residuals via variogram.
Pew et al. 2020 Bayesian chi-squared goodness-of-fit; posterior predictive zero check; temporal holdout RPMSE/MAD zero-calibration and distributional adequacy for zero-heavy count models intersection unit; no spatial holdout; single-year holdout only Run posterior predictive zero check on current Poisson GLM; π ≈ 0 finding supports NB GLM as priority next step before ZINB.
Gao et al. 2024 AccHR@k (hit rate at top-k% predicted risk roads); MPIW/PICP uncertainty ranking precision at top-k; interval calibration within-year temporal holdout only; same roads in train/test; no spatial holdout; weaker than current Open Road Risk CV Implement AccHR@k for Stage 2 risk percentile validation; MPIW/PICP deferred until probabilistic outputs added.

Point-process / hotspot / spatial diagnostics

paper method diagnostic use production risk recommended status
Baddeley et al. 2021 network point processes and network KDE compare raw/snap collision clustering with link rankings Does not scale easily and changes target from link-year risk to event intensity small pilot / documentation note
Cronie et al. 2019 inhomogeneous network J/F/G functions test point clustering after intensity correction Not exposure-normalised traffic risk small pilot only
Eckardt & Moradi 2024 marked point process summaries explore severity/type mark dependence exploratory summaries can be mistaken for predictive validation small pilot only
Aguero-Valverde & Jovanis 2008 CAR residual/spatial effects residual spatial autocorrelation and corridor clustering national CAR production infeasible diagnostic-only
Ziakopoulos & Yannis 2020 spatial-methods review MAUP, proximity, hotspot sensitivity review evidence cannot justify direct production swap documentation and diagnostic queue

Methods to avoid as production changes for now

method/paper why not production-ready safer use required evidence before production
Full national CAR/MCAR Bayesian model; Aguero-Valverde, Gilardi, Boulieri computationally unrealistic at 2M+ links; often in-sample only pilot area residual/spatial diagnostic scalable implementation, grouped/spatial holdout benefit, compute budget
DBN with MSE crash-count regression; Pan et al. 2017 no count likelihood/offset; poor match to zero-heavy injury collisions baseline comparison note; negative evidence Poisson/NB loss with offset and strong held-out performance
Planar KDE for road crashes; Baddeley et al. 2021 ignores network geometry and can mislead network-aware KDE/point process pilot network-distance implementation and clear diagnostic framing
Post-event crash variables as Stage 2 predictors; Michalaki, Quddus, Ma crash type/casualties/contributory factors happen after or during crash retrospective severity diagnostics only prospective feature availability and leakage audit
Zonal TAZ CAR model for link ranking; Wang et al. 2015 loses link-level geometry; MAUP risk contextual/junction-density inspiration link-level validation of derived proxies
STZITD-GNN full architecture; Gao et al. 2024 GRU+GAT+ZITD at 2.17M links is computationally infeasible; no exposure offset; daily resolution; severity-weighted composite not raw count AccHR@k metric and Tweedie GLM as extractable contributions scaled pilot (small area), exposure offset retained, annual aggregation, robust cross-year holdout
ARIMA/SARIMAX on corridor-level collision data without exposure; Balawi & Tenekeci 2024 wrong response variable (vehicles involved not collision count); no exposure denominator; negative predicted counts; implausible R-squared values; methodology not replicable negative example: illustrates post-event feature leakage from STATS19 attributes not recommended under any circumstances for this pipeline
Random V-fold CV as primary Stage 2 validation; implied by Mahoney et al. 2023 severely underestimates true prediction error for spatially autocorrelated data (2% within target range vs 37% for spatial CV) current grouped-link temporal split is an improvement but does not enforce spatial separation; document limitation spatial clustering CV with buffer sized to autocorrelation range of Stage 2 residuals
Pedestrian intersection SPF as all-injury link model; Roll et al. 2026 different mode, exposure, and unit active-travel/future junction literature UK-equivalent pedestrian exposure and junction inventory

Code and documentation implications

todo_id suggested_action action_type relevant_stage supporting_papers why_supported current_repo_relevance future_research_relevance effort risk_if_done_badly already_present_or_new priority
LIT-TODO-001 Add Stage 2 documentation note on exposure-offset support and limitations documentation note Stage 2 / documentation Gilardi 2022; Hauer 2001; Lord 2010; Aguero-Valverde 2008; Pan 2017 Multiple extractions support exposure-adjusted count framing but note elasticity/functional-form caveats high high low Overclaiming exact offset optimality partly present in methodology pages now
LIT-TODO-002 Run diagnostic Stage 2 GLM with log(AADT) and log(length) as free covariates or road-family interactions diagnostic / baseline comparison Stage 2 Aguero-Valverde 2008; Wang 2009; Al-Omari 2021; Lord 2010 Several papers estimate sub/super-linear AADT effects rather than fixed offset high high medium Confusing diagnostic with production replacement new/partly implied later
LIT-TODO-003 Add temporal holdout report for Stage 2 diagnostic validation / Stage 2 Chengye & Ranjitkar 2013; Pan 2013; Lord 2010 Motorway NB papers use later-year prediction; current grouped split should be complemented high medium medium COVID-year split can distort results likely partly present; verify now
LIT-TODO-004 Add spatial residual/autocorrelation diagnostic on pilot area diagnostic validation / Stage 2 Aguero-Valverde 2008; Gilardi 2022; Ziakopoulos 2020 Spatial autocorrelation can bias inference and hotspot confidence high high medium Treating in-sample spatial smoothers as external validation new later
LIT-TODO-005 Add MAUP/segmentation sensitivity pilot for OS Open Roads links small pilot validation / feature engineering Gilardi 2022; Baddeley 2021; Ziakopoulos 2020; Pan 2017 Link granularity and very short segments are repeated cautions medium high high Large refactor or inconsistent target grain new backlog
LIT-TODO-006 Add junction-adjacent residual/risk diagnostic diagnostic Stage 2 / feature engineering Poch 1996; Al-Omari 2021; Ziakopoulos 2020; Baddeley 2021 Junction mechanisms differ from mid-link risk high high medium Using noisy OSM junction proxies as production features too early future-work mentions junction density later
LIT-TODO-007 Pilot junction-density or conflict-proxy features only after diagnostic small pilot / candidate feature feature engineering / Stage 2 Poch 1996; Al-Omari 2021; Wang 2015 Intersection/access density repeatedly appears as relevant but data differs medium high medium Proxy may measure urbanity/AADT rather than conflict candidate in future-work backlog
LIT-TODO-008 Audit EB shrinkage formula and overdispersion parameter usage diagnostic Stage 2 / validation Hauer 2001; Al-Omari 2021; Huda 2024 EB weighting depends on correct dispersion and entity type high high medium Miscalibrated shrinkage overstates confidence in rankings EB exists as diagnostic now
LIT-TODO-009 Document regression-to-mean warning for before/after use of high-risk links documentation note documentation / validation Hauer 2001 Users may evaluate interventions on links selected by high observed counts high medium low Users mistake ranking for treatment-effect evidence likely new now
LIT-TODO-010 Add balanced-accuracy diagnostic for zero/non-zero or severe/KSI checks diagnostic validation / Stage 2 Brodersen 2010; Gilardi 2022 Imbalanced sparse counts make ordinary accuracy misleading medium high medium Binarisation can obscure count calibration possibly absent later
LIT-TODO-011 Keep severity modelling separate from frequency model in docs documentation note documentation / Stage 2 Boulieri 2016; Gilardi 2022; Michalaki 2015; Quddus 2010; Ma 2019 Severity and frequency targets differ and may have different predictors high high low Implying severity-weighted validation exists future-work covers severity now
LIT-TODO-012 Add severity-stratified diagnostic comparing top-risk links with KSI/fatal proportions diagnostic Stage 2 / validation Ma 2019; Quddus 2010; Michalaki 2015; Boulieri 2016 Tests whether current frequency ranking misses severity burden medium high medium Leakage if post-event proportions become production predictors new later
LIT-TODO-013 Add feature-interpretation leakage note for crash-record variables documentation note feature engineering / documentation Ma 2019; Michalaki 2015; Quddus 2010 Post-event crash features are not prospective link predictors high high low Accidental use of target-derived variables likely partly present now
LIT-TODO-014 Add centrality-feature and count-sparsity diagnostics for Stage 1a diagnostic / baseline comparison Stage 1a Jayasinghe 2019 Centrality-based AADT estimation depends on split design and sparse counts high medium medium Random splits overstate spatial generalisation centrality likely present later
LIT-TODO-015 Add learning curve for Stage 1a count-point sparsity diagnostic Stage 1a / validation Jayasinghe 2019 Extraction explicitly suggests training-point sensitivity medium medium medium Misreading random split R2 as spatial transfer new backlog
LIT-TODO-016 Add CURE plots and exposure-only baseline comparison for Poisson GLM diagnostic / baseline comparison Stage 2 / validation Roll 2026; Lord 2010 CURE plots and exposure-only baselines diagnose misspecification medium medium medium Applying pedestrian-intersection claims to link model new later
LIT-TODO-017 Add documentation note that congestion proxies are low priority for current Stage 2 documentation note Stage 2 / documentation Wang 2009; Quddus 2010 Two M25 companion extractions report congestion null findings; scope is motorway-specific medium medium low Generalising M25 null result to all roads new later
LIT-TODO-018 Run motorway slip-road/ramp residual diagnostic diagnostic Stage 2 / feature engineering Chengye & Ranjitkar 2013; Pan 2013; Michalaki 2015 Motorway context differs around ramps/hard shoulder medium medium medium Sparse/noisy ramp coding possibly available via form-of-way backlog
LIT-TODO-019 Add curvature/grade interpretation note by road family documentation note / diagnostic feature engineering / Stage 2 Pan 2017; Chengye 2013; Wang 2009; Huda 2024; Quddus 2010 Geometry effects vary by road type and by frequency vs severity target high high low-medium Treating coefficient direction as causal curvature active; grade candidate now/later
LIT-TODO-020 Treat point-process methods as exploratory comparison layers only documentation note / small pilot validation / future work Baddeley 2021; Cronie 2019; Eckardt 2024 Network point-process literature critiques aggregation but does not replace Stage 2 medium high low for note; high for pilot Presenting in-sample clustering as predictive validation new backlog
LIT-TODO-021 Run posterior predictive zero check on current Stage 2 Poisson GLM diagnostic Stage 2 / validation Pew 2020 Table 3 in Pew shows Poisson-equivalent model (ZIP with π=0) underestimates zeros; same structure expected for Open Road Risk Poisson GLM given ~98–99% link-year zero rate high medium low Drawing samples at link-year level must incorporate correct exposure offset per link new now
LIT-TODO-022 Fit negative binomial GLM with existing exposure offset as Stage 2 candidate and compare to Poisson GLM using grouped-link CV baseline comparison Stage 2 Pew 2020; Lord 2010; Chengye & Ranjitkar 2013 π ≈ 0 in Pew’s ZINB indicates overdispersion (ϕ = 17) drives improvement, not zero-inflation; NB GLM is the priority step before any ZINB complexity high high low-medium NB GLM dispersion can be sensitive to motorway overfitting already noted; check ϕ stability across facility families new now
LIT-TODO-023 Estimate empirical variogram of Stage 2 Poisson GLM residuals to determine spatial autocorrelation range diagnostic Stage 2 / validation Mahoney 2023; Aguero-Valverde 2008; Gilardi 2022 Mahoney 2023 shows optimal spatial CV buffer ≈ autocorrelation range; without measuring the range for Open Road Risk, spatial CV design is uninformed high high low-medium Variogram on 2.17M links requires subsampling; use road-class-stratified subsample of ~10–50k links new later
LIT-TODO-024 Pilot police-force-level regional holdout as a spatial CV diagnostic diagnostic / small pilot Stage 2 / validation Mahoney 2023; Gilardi 2022 ~13–16 force areas provide pre-defined geographic groups of comparable size; holding each out in turn enforces real spatial separation and tests geographic generalisation high high medium Force areas vary substantially in size and collision density; compare force-holdout R²/pseudo-R² against current grouped-link metrics to quantify spatial optimism new later
LIT-TODO-025 Document current grouped-link CV as temporal grouped CV and record that it does not enforce spatial separation between neighbouring links documentation note Stage 2 / validation / documentation Mahoney 2023 Paper shows V-fold without spatial separation is strongly optimistic; grouped-link split prevents same-link leakage but does not address neighbouring-link spatial autocorrelation high medium low None (documentation only) new now
LIT-TODO-026 Implement AccHR@k (accuracy hit rate at top-k% predicted risk links) as a Stage 2 validation metric diagnostic / validation metric Stage 2 / validation Gao 2024 AccHR@k directly evaluates whether high-percentile risk predictions correspond to roads with actual collisions; more operationally meaningful than RMSE or pseudo-R² for a ranking output high medium low Choice of k matters at 2.17M links; consider AccHR@1, AccHR@5, and AccHR@20 rather than a single threshold; avoid treating a broad k as strong evidence of discrimination new now

Current-code alignment assessment

Current strengths

  • The exposure-adjusted crash-frequency framing is supported by multiple extractions: Hauer 2001, Gilardi 2022, Aguero-Valverde 2008, Lord 2010, and Pan 2017.
  • Link-year modelling is consistent with the crash-frequency/SPF literature, while the Gilardi et al. 2022 extractions provide a direct UK OS-segment analogue.
  • Grouped or held-out validation is directionally aligned with the caution in Lord 2010 and with temporal holdout practice in the Chengye/Ranjitkar motorway papers.
  • The repository’s attention to spatial units is aligned with Gilardi 2022, Baddeley 2021, Ziakopoulos 2020, and Aguero-Valverde 2008.
  • Use of open data is a defensible distinction versus studies relying on complete motorway counters, commercial probe data, or inspection/video logs.
  • Keeping EB shrinkage, spatial models, and point-process methods as diagnostics or future work is consistent with computational and validation cautions in the extractions.

Current weaknesses / limitations to document

  • Exposure uncertainty is not fully propagated from Stage 1a into Stage 2; several papers treat AADT as observed, but that is not true for Open Road Risk.
  • The fixed VMT-style offset implies exposure elasticity of 1.0; several extractions support testing free AADT/length coefficients diagnostically.
  • OS Open Roads link choice may be sensitive to very short links, junction proximity, and MAUP-like segmentation effects.
  • Junction/intersection mechanisms are under-represented in a pure link-level model.
  • Severity is not separately modelled; the severity papers show this is a different target, not just a weighted version of frequency.
  • Spatial autocorrelation is not fully handled in production; this may affect coefficient interpretation and ranking confidence.
  • The grouped-by-road-link CV split prevents same-link leakage across years but does not enforce spatial separation between neighbouring links on the same corridor. Mahoney 2023 shows that this kind of temporal grouped split produces estimates close to V-fold (optimistically biased) rather than true out-of-sample performance. The degree of bias depends on the spatial autocorrelation range of collision risk, which has not been measured.
  • Hotspot/risk percentile sensitivity to spatial unit and residual clustering needs explicit documentation.
  • The current Stage 2 Poisson GLM likely underestimates zeros at link-year level. Pew 2020 shows that a Poisson-equivalent model (ZIP with π ≈ 0) calibrates poorly on zero-heavy count data; the improvement from NB/ZINB comes from the dispersion parameter, not zero-inflation per se. This has not been tested on Open Road Risk data.
  • Post-event variables from collision records must not leak into prospective Stage 2 features.

Current areas where the repo is deliberately conservative

  • The current pipeline should not claim causality from road-feature coefficients.
  • It should not use post-event collision variables as predictors in the production frequency model.
  • Spatial, point-process, and CAR/INLA methods should remain diagnostics or pilots before any production use.
  • Severity-weighted, fatal-only, motorcycle, cyclist, or pedestrian risk targets should remain parallel/future models unless exposure and validation are made explicit.
  • Machine-learning rankings should be presented as decision-support indicators, not as calibrated external safety scores.

Claims Open Road Risk can safely make

Safer claims

  • The project estimates exposure-adjusted injury-collision risk.
  • The outputs are exploratory decision-support indicators.
  • The model can help identify links with unusually high observed collisions relative to estimated exposure and context.
  • Spatial-unit choice and hotspot outputs are known limitations.
  • Severity and frequency are distinct modelling targets.
  • EB shrinkage and spatial diagnostics can help assess ranking confidence, but they do not prove causal treatment effects.
  • Open Road Risk uses open transport/collision/network data, which brings reproducibility advantages and exposure-coverage limitations.

Claims not yet supported

  • The model proves causal effects of road features.
  • The production risk percentile is externally validated.
  • High-ranked links are definitely unsafe independent of exposure uncertainty.
  • Severity-weighted risk is validated.
  • The current model fully handles junction conflict mechanisms.
  • The model is directly comparable to proprietary inspection scores without further validation.
  • Spatial autocorrelation is fully captured in the production model.
  • XGBoost feature importance is a causal interpretation of crash mechanisms.
  • The grouped-by-road-link cross-validation provides a spatially robust estimate of model performance. It controls for same-link temporal leakage but does not enforce spatial separation between adjacent links; reported pseudo-R² values may be optimistically biased by an unknown amount relative to true geographic holdout performance.

Secondary review queue

Use literature/prompts/literature_extraction_additional_prompts.md for these checks:

  • Use the Cross-Audit Prompt when there is one extraction and the PDF/tables need checking.
  • Use the Lightweight Sanity Check Prompt for low-priority single extractions.
  • Use the Reconciliation Prompt when two or more independent extractions need to be combined into a final record.
  • Use the Human Review Checklist before treating a reconciled extraction as final.

Missing or weak review queue

These papers need an additional source check because extraction coverage is thin, the extraction flags OCR/table problems, or the paper could support repo actions.

priority paper extraction_file review_gap prompt_to_use what_to_check likely_impact_if_wrong recommended_next_action
conditional Pew et al. 2020 paper-extraction-pew-2020-zero-inflated-crash.md One extraction; key π ≈ 0 finding drives NB-vs-ZINB TODO ordering Targeted Cross-Audit Prompt Confirm π posterior mean ≈ 0.00 (SD 0.01) for both ZIP and ZINB in original PDF Table A1; confirm ϕ = 17.04 for ZINB; check prior specification Beta(0.15,1) on π If π is not near zero, the argument for NB GLM priority over ZINB weakens and TODO ordering changes Check before citing π ≈ 0 or using it as justification for NB-first approach.
conditional Gao et al. 2024 paper-extraction-gao-2024-stzitd-gnn.md One extraction; Table 4 performance values may contain transcription errors; train/val/test split chronology not stated Targeted Cross-Audit Prompt Verify Table 4 MAE/RMSE/AccHR@20 values; confirm whether 8:2:2 split is chronological or random; check GitHub repo accessibility Could misstate AccHR@20 values or overstate validation strength Check Table 4 and split description before writing AccHR@k diagnostic or citing improvement percentages.

Completed reconciliation records

These papers now have final combined records. Use the combined record for future citation and TODO work, while preserving the earlier extraction files for provenance.

paper combined_record source_extraction_files remaining_caution recommended_status
Huda & Al-Kaisy 2024 paper-extraction-huda-2024-COMBINED.md paper-extraction-huda-alkaisy-2024-lvr-network-screening.md; paper-extraction-huda-2024-network-screening-low-volume-roads.md Curvature CART sharp-group value is internally inconsistent; grade should not be cited as a final-model predictor without caution Use combined record; no further extraction needed unless quoting disputed threshold values.
Jayasinghe et al. 2019 paper-extraction-jayasinghe-2019-COMBINED.md paper-extraction-jayasinghe-2019-centrality-aadt.md; paper-extraction-jayasinghe-2019-traffic-modeling-centrality.md Final selected regression type is implied but not fully documented across OLS/robust/Poisson alternatives Use combined record; cite as Stage 1a exposure-modelling evidence, not collision-risk evidence.
Poch & Mannering 1996 paper-extraction-poch-mannering-1996-COMBINED.md paper-extraction-poch-1996-intersection-negative-binomial.md; paper-extraction-poch-mannering-1996-nb-intersection.md Accident-type table values should still be checked before formal publication because OCR is imperfect Use combined record for junction/approach mechanism claims; table-value citation remains conditional.
Roll et al. 2026 paper-extraction-roll-2026-oregon-COMBINED.md paper-extraction-roll-2026-oregon-pedestrian-spf.md; paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md Appendices should be checked if exact SPF coefficients, exposure-only comparisons, or crash-assignment rules are needed Use combined record for exposure-model and future junction/pedestrian-layer evidence.
Quddus, Wang & Ison paper-extraction-quddus-wang-ison-COMBINED.md paper-extraction-quddus-2010-m25-severity-ordered-response.md; paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md Published version year/bibliographic details and exact Tables 2-3 should be checked before formal numeric reporting Use combined record for severity target, traffic-lag design, and leakage-guardrail claims.
Ziakopoulos & Yannis 2020 paper-extraction-ziakopoulos-yannis-2020-COMBINED.md paper-extraction-ziakopoulos-2020-spatial-approaches-road-safety.md; paper-extraction-ziakopoulos-yannis-2020-spatial-review.md Primary cited papers still need checking before using exact study-level model specifications, validation methods, or numerical claims Use combined record for high-level spatial methods, MAUP, hotspot sensitivity, and spatial-diagnostic claims.

Active reconciliation / combination check queue

These papers still have multiple extraction passes but no final combined record. Do not re-extract them from scratch; use the Reconciliation Prompt in literature_extraction_additional_prompts.md after any needed cross-audit notes exist.

priority paper extraction_files why_reconcile prompt_to_use reconciliation_focus expected_output
conditional Gilardi et al. 2022 paper-extraction-gilardi-2022-leeds-network-lattice-bayesian.md; paper-extraction-gilardi-2022-multivariate-hierarchical-crashes.md; paper-extraction-gilardi-2022-network-lattice-crashes.md Three extraction records already exist; only targeted citation checks remain for table/sign ambiguity and MAUP/contraction details Reconciliation Prompt only if creating a final canonical record; otherwise Human Review Checklist plus targeted PDF check Table 2 coefficient signs; Primary Road interpretation; balanced accuracy wording; dodgr/network-contraction details Do not re-extract; manually inspect disputed PDF tables/text before citing coefficient directions or balanced-accuracy values.
conditional Pan et al. 2017 paper-extraction-pan-2017-deep-belief-network-global-spf.md.md; paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md Two extraction records now exist; use reconciliation only if writing the neural/global-SPF comparison page Reconciliation Prompt DBN training details; crash scope; NB benchmark coefficients; Washington split; normalization and minimum-length handling Reconciled benchmark note; do not treat DBN as a production recommendation without stronger validation.
low Brodersen et al. 2010 paper-extraction-brodersen-2010-balanced-accuracy.md; paper-extraction-brodersen-2010-balanced-accuracy-posterior.md Two extraction records already exist; equations only needed for implementation Reconciliation Prompt only if implementing posterior intervals Posterior balanced accuracy equations, Equation 7 wording, and examples Reconciled implementation note if adding code for posterior intervals.

Candidate Quarto literature pages

proposed_qmd_file purpose papers_to_use key_claims figures/tables_needed readiness
quarto/literature/crash-frequency-models.qmd Explain Poisson/NB/SPF count-model basis and limitations Lord 2010; Hauer 2001; Aguero-Valverde 2008; Chengye/Ranjitkar 2013; Pan 2017; Poch 1996; Al-Omari 2021; Pew 2020 Count models need exposure, dispersion, validation, and cautious interpretation; overdispersion is the immediate model-family issue before zero-inflation; intersection evidence should not be transferred directly to link risk Model-family comparison table; Open Road Risk alignment table; zero-calibration diagnostic summary; NB-over-Poisson evidence note exists — mostly current; update references to use the combined Poch record and verify Pew π≈0 before quoting exact values
quarto/literature/exposure-and-traffic-volume.qmd Document AADT/AADF/WebTRIS exposure handling Gilardi 2022; Hauer 2001; Jayasinghe 2019 combined; Roll 2026 combined; Aguero-Valverde 2008; Wang 2009; Pew 2020; Gao 2024 Exposure is central but elasticity, estimated-AADT uncertainty, and no-exposure contrast cases need clear separation Exposure-treatment matrix; Stage 1a validation summary; no-offset contrast table; AADT/AADPT data-fusion note exists — linked in site nav; use combined Jayasinghe/Roll records and keep Pew/Gao as cautious no-offset contrasts
quarto/literature/spatial-methods-and-network-risk.qmd Review OS-link lattice, CAR, MAUP, point-process diagnostics Gilardi 2022; Aguero-Valverde 2008; Ziakopoulos 2020 combined; Baddeley 2021; Cronie 2019; Eckardt 2024; Mahoney 2023 Spatial methods support diagnostics, not immediate production replacement; spatial CV evidence strengthens validation caveats Spatial-unit comparison; diagnostic queue; CV method comparison table from Mahoney exists — linked in site nav; use combined Ziakopoulos record and keep Gilardi table/sign details conditional
quarto/literature/junctions-and-conflict-structure.qmd Separate junction/approach mechanisms from link risk Poch 1996 combined; Roll 2026 combined; Al-Omari 2021; Wang 2015; Ziakopoulos 2020 combined Junction risk needs different units, data, and exposure structures from the current link-year model Junction mechanism table; available/open-data proxy table; link-vs-intersection transferability table exists — linked in site nav; use combined Poch/Roll/Ziakopoulos records, with exact Poch table values and Roll appendices conditional for formal citation
quarto/literature/severity-modelling.qmd Separate severity from frequency and define future severity path Boulieri 2016; Gilardi 2022; Michalaki 2015; Quddus/Wang/Ison combined; Ma 2019; Gao 2024 Severity is conditional/different target and can conflict with frequency; severity-weighted composites should not be treated as validated Stage 2 risk Severity target taxonomy; leakage warning table; composite-vs-separate response variable note exists — linked in site nav; use combined Quddus record and keep exact table/bibliographic details conditional
quarto/literature/validation-and-metrics.qmd Document heldout, balanced accuracy, CURE, pseudo-R2 limitations Brodersen 2010; Gilardi 2022; Chengye 2013; Roll 2026 combined; Lord 2010; Huda 2024 combined; Mahoney 2023; Pew 2020; Gao 2024 Metrics test different things; avoid in-sample/holdout confusion; spatial CV, zero-calibration, AccHR@k, and CURE diagnostics are candidate validation additions Metric taxonomy; current repo validation map; CV method performance table from Mahoney; zero-check diagnostic table from Pew; AccHR@k definition from Gao exists — needs light update to cite LIT-042/LIT-045 combined records and keep Pew/Gao exact-value checks conditional
quarto/literature/transferability-and-open-data-limits.qmd Explain what transfers to open UK data and what does not All papers, with combined records for Huda, Jayasinghe, Poch, Roll, Quddus/Wang/Ison, and Ziakopoulos/Yannis; Gao 2024 and Balawi & Tenekeci 2024 as negative-transfer examples Some evidence is blocked by missing lane/turning/exposure data or different unit/target; apparent UK relevance still needs data-stack checks Transferability table; data-availability matrix; negative-transfer rows; combined-record provenance note exists — linked in site nav; needs light citation refresh to prefer combined records

Appendices

Register taxonomy

  • current_repo_relevance: how directly the extraction informs the current Open Road Risk pipeline, code, model, validation, or documentation.
    • high: directly relevant to current Stage 1a, Stage 1b, Stage 2, validation, or docs.
    • medium: relevant to a subset, diagnostic, or caution.
    • low: indirect or future-only.
  • future_research_relevance: usefulness for extensions beyond the current implementation.
    • high: directly informs plausible future Open Road Risk research.
    • medium: useful if a specific future branch exists.
    • low: peripheral.
  • literature_review_relevance: usefulness for future narrative Quarto literature pages.
    • high: should be cited or tabulated.
    • medium: include in specialised page or caveat table.
    • low: likely appendix/background only.
  • code_actionability_now: whether the extraction supports a near-term code/doc action.
    • high: a clear documentation, diagnostic, or baseline action is supported.
    • medium: action is plausible but should be scoped.
    • low: no near-term code action.
  • supports_production_change:
    • no: no production change supported.
    • diagnostic-only: supports checks, reporting, documentation, or sensitivity analysis.
    • pilot-first: supports a limited pilot before any production consideration.
    • baseline-comparison-first: supports comparing against current implementation before adopting.
    • possible-later: may support a future production change after more evidence.
  • secondary_review_needed:
    • no: extraction is sufficient for high-level register use.
    • yes: manual PDF/table review is needed before use in TODOs or literature prose.
    • conditional: adequate for cautious register use, but check before quoting numbers, equations, or coefficient signs.
  • extraction_quality_initial_judgement:
    • high: extraction reports high confidence or appears complete for register-level use.
    • medium: useful but has missing tables, indirect relevance, or stated uncertainty.
    • low: do not use without review.
    • unknown: extraction does not state enough to judge.

Known extraction files not yet processed

file reason not included
literature/prompts/road_safety_literature_extraction_prompt.md Prompt template, not a paper extraction.
literature/prompts/OLD_road_safety_literature_extraction_prompt.md Old prompt template, not a paper extraction.
literature/prompts/literature_extraction_additional_prompts.md Companion prompt file, not a paper extraction.
literature/prompts/README_literature_extraction.md Workflow guide, not a paper extraction.
literature/prompts/grep_extraction.sh Utility script, not a paper extraction.
literature/prompts/grep_extraction_output.txt Generated grep output/provenance helper, not an extraction source.

No literature/papers_raw/ extraction Markdown was found during this pass. No Quarto or docs files were treated as paper extractions; quarto/future-work.qmd, todo/TODO.md, docs/internal/sites_todo.md, and quarto/background/metrics-and-methodology.qmd were used only as roadmap/methodology context.

Update (2026-05-10): Four extraction files previously not in this register have now been added as LIT-032 through LIT-035: paper-extraction-pew-2020-zero-inflated-crash.md, paper-extraction-mahoney-2023-spatial-cv.md, paper-extraction-gao-2024-stzitd-gnn.md, and paper-extraction-balawi-tenekeci-2024-arima-sarimax-london-aroads.md. The file paper-extraction-chengye-ranjitkar-2013-motorway-nb-regression.md was confirmed as the source file for the existing LIT-009 row and required no new entry.

Update (2026-05-10): Six additional review-pass extraction files have been added as LIT-036 through LIT-041: paper-extraction-huda-2024-network-screening-low-volume-roads.md, paper-extraction-pan-2017-global-road-safety-performance-function-dbn.md, paper-extraction-poch-mannering-1996-nb-intersection.md, paper-extraction-quddus-2009-road-traffic-congestion-crash-severity.md, paper-extraction-roll-2026-pedestrian-safety-performance-function-oregon.md, and paper-extraction-ziakopoulos-yannis-2020-spatial-review.md. paper-extraction-mcfadden-not-stated-conditional-logit.md was removed from this register because it is not a road-safety paper and has no material current relevance to Open Road Risk.

Update (2026-05-10): Four final combined/reconciled records have been added as LIT-042 through LIT-045: paper-extraction-huda-2024-COMBINED.md, paper-extraction-jayasinghe-2019-COMBINED.md, paper-extraction-poch-mannering-1996-COMBINED.md, and paper-extraction-roll-2026-oregon-COMBINED.md. These are now the preferred records for future citation/TODO work for those papers; the earlier extraction files remain in the inventory for provenance.

Update (2026-05-10): Two further combined/reconciled records have been added as LIT-046 and LIT-047: paper-extraction-quddus-wang-ison-COMBINED.md and paper-extraction-ziakopoulos-yannis-2020-COMBINED.md. These move Quddus/Wang/Ison and Ziakopoulos/Yannis out of the active reconciliation queue. All seven candidate Quarto literature pages now exist under quarto/literature/ and have been added to the website Literature menu.

Open Road Risk

 

Built with Quarto