Open Road Risk
  • Home
  • Project
    • Project overview
    • Current model status
    • AI-assisted development
  • Literature
    • Literature overview
    • Literature evidence register
    • Literature-pipeline alignment
    • Crash frequency models
    • Exposure and traffic volume
    • Spatial methods and network risk
    • Junctions and conflict structure
    • Severity modelling
    • Validation and metrics
    • Transferability and open data limits
  • Data Sources
    • Overview
    • STATS19 Collisions
    • OS Open Roads
    • AADF Traffic Counts
    • WebTRIS Sensors
    • Network Model GDB
    • OS Terrain 50 (grade)
    • Deprivation (IoD 2025)
  • Methodology
    • Methodology Overview
    • Joining the Datasets
    • Feature Engineering
    • Empirical Bayes Shrinkage
  • Exploratory Data Analysis
    • Collision EDA
    • Collision-Exposure Behaviour
    • Vehicle Mix Analysis
    • Road Curvature
    • Months and Days of Week
    • Traffic Volume EDA
    • OSM Coverage
  • Models
    • Modelling Approach
    • Stage 1a: Traffic Volume
    • Stage 1b: Time-Zone Profiles
    • Stage 2: Collision Risk Model
    • Facility Family Split
    • Model Inventory
  • Investigations
    • Investigations overview
    • KSI atlas diagnostic
    • Staffordshire data quality
    • Temporal descriptors evaluation
    • AADF counted-only filter
    • Rank stability harness
    • Zero-calibration diagnostic
  • Outputs
    • Top-risk map
  • Tools
    • ukgeo — UK Geocoder
  • Future Work

On this page

  • Metric taxonomy
  • Classification and binary ranking metrics
    • Balanced accuracy
    • AccHR@k — accuracy hit rate at top-k%
  • Count model fit metrics
    • Training loss as a metric: MSE on count data
    • Pseudo-R² (McFadden’s ρ²)
    • Inflated R² from regressing on EB outputs
    • CURE plots
  • Cross-validation design
    • Why V-fold CV is severely optimistic for spatial crash data
  • Posterior predictive zero check
  • Temporal holdout validation
  • Serial correlation and standard error validity
  • Structural explanatory ceiling
  • Random split weakness: a documented example
  • CURE plots as a functional-form diagnostic
  • Low-count rate comparison
  • References

Validation and Metrics

Methodology basis for Open Road Risk validation design

Methodological basis for Open Road Risk validation design: grouped CV, temporal holdout, spatial blocking, balanced accuracy, pseudo-R², CURE plots, and the structural explanatory ceiling of road-environment models.

This page documents the methodological basis for the validation design and metrics used in Open Road Risk. Each metric tests a different property of the model, and confusing in-sample fit statistics with predictive validation is a persistent risk in crash-frequency literature. The page collects evidence from nine paper extractions and maps findings to the current pipeline’s validation choices.


Metric taxonomy

Not all reported model-quality statistics are equivalent. The table below classifies the metrics used or referenced in this project.

Metric What it tests In/out of sample Main limitation
Pseudo-R² (ρ²) In-sample likelihood improvement over intercept-only In-sample Sensitive to mean count; low values are expected and not diagnostic of failure
AIC / BIC / DIC / WAIC Model comparison, penalised likelihood In-sample Cannot substitute for held-out test; Gilardi 2022 explicitly uses DIC/WAIC as model-comparison tools, not predictive validation
MAD / MSPE on temporal holdout Predictive accuracy on held-out years (same links) Temporal holdout Tests temporal generalisation only; same road segments in train and test
V-fold cross-validation RMSE Resampled estimate of predictive error Spatially leaky Mahoney 2023: V-fold CV is severely optimistic; only 2% within target RMSE range at best parameter settings
Spatially blocked CV RMSE Predictive error with spatial autocorrelation controlled Spatial holdout Requires choice of exclusion buffer; Mahoney 2023: clustering CV achieves 37–60% within target range
Balanced accuracy Classification quality under severe class imbalance Holdout or posterior Must pool confusion matrices across folds, not average fold metrics; Brodersen 2010
AccHR@k Ranking usefulness: top-k% predicted links vs actual crash locations Out-of-sample Depends on k choice; no exposure normalisation in Gao 2024’s implementation
CURE plot Model misspecification at specific covariate ranges In-sample diagnostic Does not test generalisation; flags systematic bias by AADT or length band
Posterior predictive zero check Zero-inflation calibration In-sample diagnostic Pew 2020 procedure; p ≈ 0.50 indicates calibration; p ≫ 0.50 indicates excess predicted zeros
MPIW / PICP Prediction interval width and coverage Out-of-sample Gao 2024; requires probabilistic model
Important

In-sample is not validation. Pseudo-R², AIC, DIC, and WAIC measure how well a model fits the data it was trained on. Only MAD/MSPE on temporal holdouts, spatially blocked cross-validation, and external test sets measure predictive generalisation. Lord & Mannering (2010) explicitly warn that superior in-sample fit does not imply practical predictive capability.


Classification and binary ranking metrics

Balanced accuracy

Standard accuracy is uninformative when ~98–99% of link-years have zero observed crashes. A model predicting zero for every link-year achieves 98%+ accuracy without capturing any true positive signals.

Brodersen, Ong, Stephan & Buhmann (2010) define balanced accuracy as:

\[\text{BA} = \frac{1}{2}\left(\frac{TP}{TP + FN} + \frac{TN}{TN + FP}\right) = \frac{\text{TPR} + \text{TNR}}{2}\]

Key implementation requirements from Brodersen 2010:

  • Pool confusion matrices across folds, then compute a single balanced accuracy from the pooled matrix. Averaging fold-level balanced accuracies instead introduces bias proportional to fold-size imbalance.
  • The Bayesian posterior distribution of balanced accuracy given the data (Beta distribution from the pooled TP, FP, TN, FN counts) provides an uncertainty interval rather than a point estimate.
  • For the Open Road Risk binary classifier (top-k% predicted links as “high risk”), balanced accuracy can be computed at any threshold and is meaningfully higher than standard accuracy only when both TPR and TNR are reasonable.

Gilardi, Caimo & Ghosh (2022) apply balanced accuracy in a spatial network context on OS Open Roads segments in Leeds. Their implementation uses 5,000 posterior predictive Monte Carlo simulations to derive a balanced accuracy distribution rather than a single point estimate. Key notes for Open Road Risk:

  • DIC and WAIC are used as in-sample model-comparison tools, not as predictive validation — the paper does not report external holdout performance.
  • MAUP sensitivity analysis (contracting OS segments to longer links) shows that model conclusions are robust to network aggregation, which provides some confidence that OS Open Roads link-level results are not artefacts of segment definition.
  • The paper uses UK OS road segments, making it one of the closest structural analogues to Open Road Risk in the literature.
Caution

Gilardi 2022 Table 2 sign direction for Primary Roads has not been manually verified against the source PDF at this level of extraction confidence. Do not cite specific coefficient signs from that table without checking the original.

AccHR@k — accuracy hit rate at top-k%

Gao, Zhang, Ma, Yang & Ma (2024) introduce AccHR@k as a ranking quality metric for road risk prediction:

\[\text{AccHR@}k = \frac{|\text{predicted top-}k\% \cap \text{actual crash roads}|}{|\text{predicted top-}k\%|}\]

In words: among the top-\(k\)% of roads ranked by predicted risk, what fraction actually experienced crashes in the evaluation period?

The metric is complementary to balanced accuracy. Balanced accuracy evaluates overall TPR/TNR at a chosen threshold; AccHR@k directly measures whether the model’s high-risk predictions are useful for network screening.

Gao et al.’s reported AccHR@k values (Table 4, single-year London data) should be treated as indicative rather than directly comparable to Open Road Risk, for three reasons:

  1. No exposure offset: the Gao 2024 model uses a severity-weighted composite response without normalising by AADT or link length. Open Road Risk models exposure-adjusted crash frequency.
  2. Within-year temporal split only: train/validation/test split is 8:2:2 within a single year (2019). No spatial holdout. AccHR@k may be optimistic due to spatial autocorrelation between nearby training and test links.
  3. Single-year London data: may not generalise across Open Road Risk’s multi-year, multi-region scope.
Note

Exact Table 4 values from Gao 2024 require manual verification against the source PDF before being cited numerically. Use the framework (proportion of top-k% predicted roads with actual crashes) rather than the specific numbers.

MPIW and PICP (Gao 2024) are probabilistic uncertainty metrics:

  • MPIW (mean prediction interval width): average width of the 90% or 95% prediction interval across test roads. Lower is better, conditional on adequate coverage.
  • PICP (prediction interval coverage probability): proportion of test observations falling within the stated interval. Should match nominal coverage (e.g., 0.90 for a 90% PI).

Open Road Risk does not currently produce prediction intervals; these metrics are relevant if a probabilistic output layer is added.


Count model fit metrics

Training loss as a metric: MSE on count data

Pan et al. (2017) train a Deep Belief Network on pooled crash data from Ontario Highway 401, Colorado, and Washington state, using mean squared error as the loss function. The paper reports modest improvements over locally calibrated NB on temporally held-out data — 0–32% MAE reduction depending on dataset, with 0% improvement on rural multilane Washington data.

MSE loss gives equal weight to all residuals and is dominated by the rare high-count rows on zero-heavy data; it does not penalise distributional mismatch in the zero regime. Pan et al. (2017) acknowledge “several unsolved questions” in their conclusions. The improvement over NB is marginal for most highway types and disappears entirely for the rural multilane case.

This is the training-loss half of the DBN/MSE caution. The related exposure-structure critique — AADT as a feature rather than a constrained offset, with continuous predictions that lack a count-likelihood interpretation — is documented in Crash Frequency Models §4.

Pseudo-R² (McFadden’s ρ²)

Pseudo-R² for count regression models is defined as:

\[\rho^2 = 1 - \frac{\ell(\hat\beta)}{\ell(\hat\beta_0)}\]

where \(\ell(\hat\beta)\) is the log-likelihood of the fitted model and \(\ell(\hat\beta_0)\) is the log-likelihood of the intercept-only model.

Chengye & Ranjitkar (2013) report ρ² values of 0.088–0.194 across negative binomial sub-models for an Auckland motorway (overall model 0.119). These are in-sample values on a dataset with a mean of 8.77 crashes per segment per year — a far higher mean count than Open Road Risk’s link-year data (~0.01–0.02 crashes per link-year). Because pseudo-R² depends on the mean count, these values are not directly comparable to Open Road Risk’s ρ².

Key caveats from Chengye 2013:

  • Chengye & Ranjitkar use an 80% confidence level for variable selection (not the standard 95%). This threshold retains more variables and inflates reported pseudo-R² relative to a stricter selection rule. Open Road Risk should use 95% or cross-validated importance for feature selection.
  • Pseudo-R² is an in-sample diagnostic only. The paper also reports MAD and MSPE on a 2-year temporal holdout (2009–2010), which is the primary validation. Ramp-type sub-models achieve MSPE 27.87 vs 36.60 for the overall model — a ~24% reduction from facility-family splitting.

Lord & Mannering (2010) review explicitly warns that “superior in-sample model fit does not necessarily imply practical predictive capability or transferability.” Low pseudo-R² (e.g., 0.05–0.15) is typical for crash-frequency count models and does not indicate model failure; the relevant question is whether predictive performance on held-out data is acceptable.

Inflated R² from regressing on EB outputs

Huda & Al-Kaisy (2024) fit OLS regression to log-transformed Empirical Bayes expected crash counts, achieving adjusted R² of 0.91–0.92. These values are not comparable to pseudo-R² from Open Road Risk’s Poisson GLM or XGBoost R² on raw crash counts, for two reasons:

  1. The response variable (EB expected crashes) is already a smoothed model output, not a zero-heavy integer count. Regressing on a model output reduces variance and inflates R² artificially.
  2. A random 80/20 train/test split (not spatial) allows spatially adjacent 0.05-mile sections from the same road corridor to appear in both sets, creating spatial leakage.
Important

Do not benchmark Open Road Risk’s R² or pseudo-R² against Huda & Al-Kaisy (2024) R² values. They measure fundamentally different quantities.

CURE plots

Roll, Anderson & McNeil (2026) use cumulative residual (CURE) plots as a standard in-sample fit diagnostic for safety performance functions. A CURE plot shows the cumulative sum of residuals (observed minus predicted) against an ordered covariate (typically AADT or link length), with ±2 standard deviation bands:

  • If the cumulative residual stays within the confidence band, the model is adequately calibrated across the covariate range.
  • Systematic exceedances indicate model misspecification at specific volume or length ranges (e.g., the model systematically under-predicts for very high-AADT links).

CURE plots are an in-sample diagnostic, not a measure of predictive generalisation. Roll et al. use CURE plots throughout Section 4 of the Oregon pedestrian SPF report as the primary model-fit assessment tool; no external holdout is reported for the SPF models (only the AADPT exposure model is cross-validated).

For Open Road Risk at 2.1M observations, individual-link CURE plots would be unreadable; AADT-quantile bins (e.g., 50-unit quantile bins) are required to produce an interpretable plot.

Exposure-only baseline (Roll 2026): The report found no substantial improvement in expected crash frequency prediction from adding built-environment features over a simple exposure-only model (vehicle AADT + pedestrian AADPT). This provides a precedent for running an exposure-only NB/Poisson baseline in Open Road Risk’s Stage 2 and documenting whether the full feature model materially outperforms it.


Cross-validation design

Why V-fold CV is severely optimistic for spatial crash data

Mahoney, Pugh & Medrano-Gracia (2023) provide the most quantitative evidence in this literature set on how CV method choice affects reported performance. Their key finding:

CV method % parameter combinations within target RMSE range Notes
V-fold (random) ~2% Highly optimistic; spatial autocorrelation inflates apparent performance
Spatial clustering (best params) ~60% Optimal exclusion buffer matches residual autocorrelation range
Spatial clustering (mean params) ~37% Reasonable middle estimate
Block-LOO 3 (BLO3, large buffers) < V-fold in some settings Over-exclusion causes pessimistic underfit

The core mechanism: when nearby road segments appear in both training and test folds (as in V-fold CV), spatial autocorrelation in crash counts means the training data effectively previews the test distribution. Reported RMSE is lower than true out-of-sample error.

Exclusion buffer selection: The optimal buffer matches the autocorrelation range of the outcome residuals (~24–41% of the spatial domain extent in Mahoney’s experiments). Too small → leakage. Too large (BLO3) → too little training data remaining, causing pessimistic underfit.

Police force holdout as a practical approximation: Mahoney et al. suggest using administrative spatial units (e.g., police force areas or local authority boundaries) as a practical grouped spatial holdout when the residual autocorrelation range is not known in advance.

Caution

Mahoney 2023 uses a regular spatial grid, not a road network, and a single crash type in a limited geographic area. The exact CV performance percentages (2%, 37%, 60%) are not directly transferable to Open Road Risk’s OS Open Roads link structure. The directional finding — that V-fold is severely optimistic and spatial clustering is substantially better — is robust and transferable.

Current Open Road Risk CV design: The pipeline uses a grouped link split (held-out links, not held-out years), which controls for within-link temporal autocorrelation but not for spatial autocorrelation across neighbouring links. A spatial clustering split with an exclusion buffer based on residual autocorrelation range would more closely match Mahoney’s best-performing approach.


Posterior predictive zero check

Pew, Dixon & Banerjee (2020) describe a procedure for diagnosing whether a fitted count model is well-calibrated with respect to the proportion of zero-crash observations. The check is:

  1. Fit the model and obtain predicted mean counts λ̂ᵢ for each observation.
  2. Draw S = 1,000 (or more) replicated datasets. In each draw \(s\), simulate \(\tilde{y}_{is} \sim \text{Poisson}(\hat\lambda_i)\) for all \(i\).
  3. For each draw, count the number of zeros: \(Z_s = \sum_i \mathbf{1}[\tilde{y}_{is} = 0]\).
  4. Record the observed zero count: \(Z_\text{obs} = \sum_i \mathbf{1}[y_i = 0]\).
  5. Compute the posterior predictive p-value: \(p = P(Z_s > Z_\text{obs})\).

Interpretation:

p-value range Interpretation
≈ 0.50 Well-calibrated; model generates zeros at the observed rate
≫ 0.50 (e.g., > 0.90) Model over-generates zeros; predicted λ̂ values too small; likely underdispersion or too many near-zero predictions
≪ 0.50 (e.g., < 0.10) Model under-generates zeros; predicted λ̂ values too large; possible unmodelled zero-inflation

The check is in-sample — it uses the fitted λ̂ values, not a holdout. Its value is diagnostic: if \(p \approx 0.50\), zero-inflation is not a modelling concern; if \(p \ll 0.50\), a ZIP or ZINB model should be evaluated.

Pew 2020 finding on zero-inflation (π ≈ 0): When a ZINB model was fitted to Utah intersection crash data, the zero-inflation parameter π converged to approximately zero. The overdispersion parameter (NB dispersion φ = 17.04) drove the improvement over Poisson, not structural zero-inflation. The authors interpret this as evidence that the zeros in their dataset are adequately explained by the Poisson/NB mean structure rather than requiring a separate zero-generating process.

For Open Road Risk (≈98% link-year zeros), the analogous check has not yet been run. If \(p \ll 0.50\) for the Stage 2 Poisson GLM, a NB model with overdispersion or a two-stage hurdle structure should be considered.

Note

The Pew 2020 π ≈ 0 result is reported in the paper’s appendix. Verify the exact appendix section and table number before citing this value in methods documentation.


Temporal holdout validation

Open Road Risk’s current grouped-by-link split holds out entire links (all years for a given link assigned to either train or test). This controls for within-link temporal leakage — the model cannot memorise a specific link’s crash history across years. It does not enforce any temporal holdout: the model trains and tests on the same calendar years, just different links.

Quddus (2007) demonstrates the value of temporal holdout using INAR(1) and competing time-series count models on UK STATS19 data. The most concrete finding for Open Road Risk is the London congestion charging zone result: the INAR(1) model detects the ~27% casualty reduction from the February 2003 congestion charge using only post-2005 temporal holdout data. This result is consistent across all four model families tested (ARIMA, SARIMA, NB, INAR(1)), confirming that STATS19 data at monthly resolution can detect policy interventions of ~25–30% magnitude when a genuine temporal holdout is used.

The practical implication for Open Road Risk: hold out the most recent 1–2 years of crash data (e.g. 2023–2024) and evaluate Stage 2 model predictions on these unseen years. This tests whether the model’s risk rankings generalise temporally — a different question from the current grouped-link CV, which tests whether risk patterns transfer across links within the same time window.

Chengye and Ranjitkar (2013) provide a template: they hold out 2009–2010 from a 2004–2008 fitting period, report MAD and MSPE on the holdout, and find that ramp-type sub-models reduce holdout MSPE by ~24% relative to the pooled model. This temporal holdout result is more informative than the in-sample pseudo-R² for assessing generalisation.

Note

A temporal holdout (hold out 2023–2024, train on 2015–2022) combined with the existing grouped-link split is the most informative validation design for Open Road Risk. The two splits test different things: grouped-link tests spatial/link generalisation; temporal tests year-to-year stability. Run both before any production model update.


Serial correlation and standard error validity

Standard Poisson GLM and NB regression assume that observations are independent given the feature set. For Open Road Risk’s panel structure (same road link observed over 10 years), this assumption is violated: crash counts on the same link in consecutive years are correlated, both through unmeasured persistent link characteristics and through genuine temporal autocorrelation in risk.

Quddus (2007) quantifies this for monthly UK crash counts: the INAR(1) thinning parameter \(\hat{\alpha} = 0.355\) for the London CC zone monthly data, indicating ~35% stochastic carry-over from one month to the next. Standard NB models that ignore this produce MAPE 25.27% versus INAR(1) MAPE 18.23% on the same holdout data.

The direct consequence for Open Road Risk is that GLM coefficient standard errors are likely underestimated, inflating the apparent statistical significance of features. The fix is low-effort: cluster-robust standard errors grouped by link_id in the Stage 2 Poisson GLM. Most Python and R GLM implementations support this via the HC3 or cluster sandwich estimator.

Savolainen et al. (2011) confirm this is a general, unresolved challenge in the severity and frequency literature: “crashes that occur in close proximity in space or in time are likely to share important unobserved effects. If such correlations are ignored, there will be a loss of efficiency and parameters will be estimated with less precision.”


Structural explanatory ceiling

Roshandel, Zheng and Washington (2015) synthesise 13 studies on real-time crash prediction from loop detector data and find that road environment and traffic variables explain only a small fraction of crash causation. Citing Lum and Reagan (1995), they note that behavioural factors account for approximately 93% of crash causation, with road environment factors accounting for ~34% (with substantial overlap). Even the best real-time binary prediction models reviewed achieve only 66–80% correct prediction with false positive rates of 5–20%, and the paper concludes these models are “currently unsuitable for implementation at the real world operational level.”

This is directly relevant to Open Road Risk’s expected performance. The pipeline models road-environment risk — road class, AADT, geometry, land use — and explicitly excludes behavioural factors (individual driver behaviour, vehicle condition, alcohol involvement). A structural ceiling on discriminative power is therefore expected, regardless of model choice or feature engineering. Open Road Risk’s appropriate objective is aggregate risk ranking for network screening, not individual-event crash prediction. These are different tasks with different achievable performance levels.

Note

Open Road Risk’s XGBoost held-out pseudo-R² of ~0.32 is not a failure — it is consistent with what road-environment models achieve when behavioural factors dominate crash causation. The relevant benchmark is whether the model’s risk rankings identify links with disproportionate observed crash rates, not whether it predicts individual crash events.


Random split weakness: a documented example

Sung et al. (2024) develop a modified temporal SPF on 1,095 Korean highway segments (2018–2022) using XGBoost, LGBM, and random forest, reporting R² up to 0.83 for the AADT XGBoost model. These values should not be compared directly to Open Road Risk’s grouped-CV metrics for two reasons:

  1. Random 8:2 split on panel data: the same road segment can appear in both training and test sets in different time periods, creating leakage through persistent segment characteristics.
  2. Balanced 1:1 undersampling for 15-minute models: the 15-minute dataset was undersampled to 1:1 crash/non-crash ratio (from ~18% crash rate). This inflates apparent model performance by ~50× relative to the true base rate at deployment.

Open Road Risk’s grouped-by-link split is more conservative than this design. The comparison is documented here to clarify why high R² values from other papers should not be taken as evidence that better-performing models are achievable in Open Road Risk without equivalent validation weaknesses.


CURE plots as a functional-form diagnostic

Dutta and Fontaine (2020) use cumulative residual (CURE) plots throughout their Virginia freeway SPF analysis as the primary in-sample model-fit diagnostic. Their average-hourly volume models pass the CURE plot test (cumulative residuals within ±2 SD bands), providing evidence that the functional form is adequate across the volume range. Importantly, raw hourly models fail this check in some volume ranges, consistent with the overall finding that raw hourly data (with 23% quality failures) underperforms smoothed average-hourly data.

For Open Road Risk, CURE plots by AADT quantile and link-length quantile are a low-effort diagnostic that would reveal whether the Stage 2 model systematically under- or over-predicts at specific exposure ranges. At 2.17M rows, individual-link CURE plots are unreadable; 50-quantile bins of AADT and length are appropriate.

Roll et al. (2026) use the same diagnostic — CURE plots are standard SPF practice in the US Highway Safety Manual context.


Low-count rate comparison

National Highways (2022) proposes formal tests and confidence intervals for comparing collision and casualty rates between roads, road types, or time periods. The core collision model is a Poisson rate model with vehicle miles as exposure, so it is consistent with the Open Road Risk offset structure. Its main validation relevance is not a new predictive metric, but a caution about inference in sparse settings: asymptotic Z-test p-values can be misleading when traffic volumes or collision counts are low.

For Open Road Risk, this supports treating any link-level “significance” flag cautiously, especially on low-AADT links with zero or one observed collision. The National Highways Monte-Carlo likelihood-ratio test is more appropriate for pairwise aggregate comparisons than for production-scale ranking across 2.1 million links. It is therefore best treated as methodological support for diagnostic rate comparisons, not as a replacement for grouped holdout validation.

The same document explicitly warns that traffic estimates may be inaccurate and recommends sensitivity analysis around the traffic denominator. That maps directly to Open Road Risk’s Stage 1a uncertainty problem: varying AADT estimates by plausible percentages and checking whether risk percentiles or rate comparisons change direction would be a useful diagnostic, but not formal uncertainty propagation.



Note

For a consolidated view of how the findings on this page map to the current pipeline state, open gaps, and recommended diagnostic actions, see the Literature–Pipeline Alignment page.


References

ID Citation
LIT-011 Brodersen, K.H., Ong, C.S., Stephan, K.E. & Buhmann, J.M. (2010). The balanced accuracy and its posterior distribution. ICPR 2010.
LIT-012/013/014 Gilardi, A., Caimo, A. & Ghosh, S. (2022). Network lattice models for road collision analyses. SSRN preprint.
LIT-009 Chengye, P. & Ranjitkar, P. (2013). Modelling motorway accidents using negative binomial regression. EASTS Proceedings.
LIT-028/045 Roll, J., Anderson, J. & McNeil, N. (2026). Developing a pedestrian safety performance function for Oregon. FHWA-OR-RD-26-06. — use combined record LIT-045.
LIT-019 Lord, D. & Mannering, F. (2010). The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transportation Research Part A. DOI: 10.1016/j.tra.2010.02.001
LIT-016 / LIT-042 Huda, K.T. & Al-Kaisy, A. (2024). Network screening on low-volume roads using risk factors. Future Transportation. DOI: 10.3390/futuretransp4010013 — use combined record LIT-042.
LIT-033 Mahoney, M.J., Johnson, L.K., Silge, J., Frick, H., Kuhn, M. & Beier, C.M. (2023). Assessing the performance of spatial cross-validation approaches.
LIT-032 Pew, T., Warr, R.L., Schultz, G.G. & Heaton, M. (2020). Justification for considering zero-inflated models in crash frequency analysis. TRIP.
LIT-025/037 Pan, G., Fu, L. & Thakali, L. (2017). Development of a global road safety performance function using deep neural networks. International Journal of Transportation Science and Technology, 6(3), 159–173. DOI: 10.1016/j.ijtst.2017.07.004
LIT-034 Gao, X. et al. (2024). Uncertainty-aware probabilistic graph neural networks for road-level traffic crash prediction.
LIT-048 Quddus, M.A. (2007). Time series count data models: an empirical application to traffic accidents. Accident Analysis and Prevention. hdl.handle.net/2134/5308
LIT-053 Savolainen, P.T., Mannering, F.L., Lord, D. & Quddus, M.A. (2011). The statistical analysis of highway crash-injury severities. Accident Analysis and Prevention, 43(5), 1666–1676. DOI: 10.1016/j.aap.2011.03.025
LIT-054 Roshandel, S., Zheng, Z. & Washington, S. (2015). Impact of real-time traffic characteristics on freeway crash occurrence: systematic review and meta-analysis. Accident Analysis and Prevention.
LIT-051 Dutta, N. & Fontaine, M.D. (2020). Improving freeway crash prediction models using disaggregate flow state information. VTRC 20-R15, Virginia DOT.
LIT-055 National Highways (2022). Statistical methods for comparing road traffic collision and casualty rates: proposed approach. National Highways PR81/22.
LIT-052 Sung, Y., Kim, S., Park, J. & Wang, L. (2024). Development of modified temporal safety performance function considering various time flows. Journal of Advanced Transportation. DOI: 10.1155/2024/7970454

Open Road Risk

 

Built with Quarto