Open Road Risk
  • Home
  • Project
    • Project overview
    • Current model status
    • AI-assisted development
  • Background
    • Metrics and methodology
    • Literature evidence register
  • Literature
    • Crash frequency models
    • Exposure and traffic volume
    • Spatial methods and network risk
    • Junctions and conflict structure
    • Severity modelling
    • Validation and metrics
    • Transferability and open data limits
  • Data Sources
    • Overview
    • STATS19 Collisions
    • OS Open Roads
    • AADF Traffic Counts
    • WebTRIS Sensors
    • Network Model GDB
  • Methodology
    • Methodology Overview
    • Joining the Datasets
    • Feature Engineering
    • Empirical Bayes Shrinkage
  • Exploratory Data Analysis
    • Collision EDA
    • Collision-Exposure Behaviour
    • Vehicle Mix Analysis
    • Road Curvature
    • Months and Days of Week
    • Traffic Volume EDA
    • OSM Coverage
  • Models
    • Modelling Approach
    • Stage 1a: Traffic Volume
    • Stage 1b: Time-Zone Profiles
    • Stage 2: Collision Risk Model
    • Facility Family Split
    • Model Inventory
  • Outputs
    • Top-risk map
  • Future Work

On this page

  • 1 Overview
  • 2 Role In The Pipeline
  • 3 Load And Inspect
  • 4 Source Documentation
  • 5 Coverage Summary
  • 6 Constraints On Use
  • 7 Source Snapshot
  • 8 Expected Values
    • 8.1 Core Link Fields
    • 8.2 Related Table Fields
    • 8.3 Model-Ready Features
  • 9 Completeness Checks
    • 9.1 Table And Field Completeness
    • 9.2 Relationship Completeness
    • 9.3 Categorical Domain Checks
  • 10 Yearly Coverage
  • 11 Geographic Coverage
  • 12 Link-Level Feature Build
  • 13 Suggested Exposure Metrics
  • 14 Suggested Risk Features
  • 15 Good For / Not Good For
  • 16 Known Limitations
  • 17 QA Checklist

Network Model GDB

1 Overview

The Network Model File Geodatabase provides a link-node representation of the road network, with related tables for lanes, roads, streets, junctions, access restrictions, vehicle restrictions, and turn restrictions.

In this project it is most useful as a source of network structure and road-infrastructure exposure. It can support structural risk proxies such as junction complexity, carriageway type, lane count, and restriction density. Observed traffic, collision, speed, and temporal-flow concepts should be checked explicitly before they are treated as available model inputs.

Important

Use this dataset as a road supply and network structure source. For calibrated road risk, join it to observed exposure datasets such as AADF or WebTRIS and to collision data such as STATS19.

2 Role In The Pipeline

  • Provides link-level road geometry and attributes for road exposure features.
  • Provides lane, junction, and restriction tables for structural risk features.
  • Supports completeness checks against OS Open Roads, MRDB, and modelled road links.
  • Can be aggregated by year, geography, road class, ownership, or operational status.
  • Helps distinguish structural exposure from measured or estimated traffic exposure.

Recommended modelling grain:

one row per Link.linkid

The Link layer should be treated as the spine of the feature table. Related tables can be joined or aggregated to linkid.

3 Load And Inspect

Set gdb_path to the local File Geodatabase directory. The project root is resolved from road_risk.config so the notebook works whether it is run from the repository root or from quarto/data-sources.

Code
from pathlib import Path

import fiona
import geopandas as gpd
import pandas as pd

from road_risk.config import _ROOT

pd.set_option("display.max_columns", 80)

gdb_path = _ROOT / (
    "data/raw/Network_Model_(Public_Download)/"
    "d77ab8dc-afaa-4475-af6f-7dfdaea5b135.gdb"
)

4 Source Documentation

Official source page: Network Model Public, National Highways on data.gov.uk. The source page describes the dataset as representing England’s Strategic Road Network and notes that speed limit and smart motorway information were removed from the initial release pending validation.

5 Coverage Summary

The coverage summary below is computed from the Link layer. It is intended to answer the first modelling question: whether this dataset is a full road-network source or a specialist source for a subset of roads.

Code
links_for_coverage = gpd.read_file(gdb_path, layer="Link")

category_counts = (
    links_for_coverage["linkcategory"]
    .fillna("<missing>")
    .astype(str)
    .value_counts(dropna=False)
)
srn_counts = (
    links_for_coverage["srn"]
    .fillna("<missing>")
    .astype(str)
    .value_counts(dropna=False)
)
ownership_counts = (
    links_for_coverage["ownership"]
    .fillna("<missing>")
    .astype(str)
    .value_counts(dropna=False)
)

link_count = len(links_for_coverage)
link_length_km = links_for_coverage["SHAPE_Length"].sum() / 1000
motorway_or_a = links_for_coverage["linkcategory"].isin(["M", "A"]).sum()
motorway_or_a_pct = 100 * motorway_or_a / link_count
srn_y = srn_counts.get("Y", 0)
srn_y_pct = 100 * srn_y / link_count
nh_owned = ownership_counts.get("NH", 0)
nh_owned_pct = 100 * nh_owned / link_count

coverage_summary = pd.DataFrame(
    [
        {"metric": "links", "value": f"{link_count:,.0f}"},
        {"metric": "link geometry length km", "value": f"{link_length_km:,.1f}"},
        {"metric": "motorway or A-road links", "value": f"{motorway_or_a:,.0f} ({motorway_or_a_pct:.1f}%)"},
        {"metric": "SRN flag = Y", "value": f"{srn_y:,.0f} ({srn_y_pct:.1f}%)"},
        {"metric": "ownership = NH", "value": f"{nh_owned:,.0f} ({nh_owned_pct:.1f}%)"},
        {"metric": "bounds EPSG:4326", "value": ", ".join(f"{x:.3f}" for x in links_for_coverage.to_crs(4326).total_bounds)},
    ]
)

display(coverage_summary)
display(category_counts.rename_axis("linkcategory").reset_index(name="links"))
metric value
0 links 42,960
1 link geometry length km 23,932.3
2 motorway or A-road links 42,925 (99.9%)
3 SRN flag = Y 42,936 (99.9%)
4 ownership = NH 42,960 (100.0%)
5 bounds EPSG:4326 -5.513, 50.129, 1.756, 55.807
linkcategory links
0 A 28844
1 M 14081
2 U 22
3 B 13
Code
print(
    "Interpretation: this GDB is best treated as an authoritative National "
    "Highways / Strategic Road Network source. It is not a full all-road "
    "network. In this extract, coverage is overwhelmingly motorway and "
    "trunk A-road links managed by National Highways. It should therefore "
    "be integrated into the project as a facility-family-conditional source: "
    "rich SRN features for SRN links, not imputed pseudo-coverage for the "
    "wider local-road network."
)
Interpretation: this GDB is best treated as an authoritative National Highways / Strategic Road Network source. It is not a full all-road network. In this extract, coverage is overwhelmingly motorway and trunk A-road links managed by National Highways. It should therefore be integrated into the project as a facility-family-conditional source: rich SRN features for SRN links, not imputed pseudo-coverage for the wider local-road network.
Code
non_srn_links = links_for_coverage[
    links_for_coverage["srn"].fillna("<missing>").astype(str).ne("Y")
].copy()

if non_srn_links.empty:
    print("No non-SRN links found in the Link layer.")
else:
    cols = [
        "linkid",
        "roadname",
        "linkcategory",
        "linkform",
        "carriageway",
        "ownership",
        "srn",
        "operationalstate",
        "SHAPE_Length",
    ]
    display(
        non_srn_links[cols]
        .sort_values(["roadname", "linkcategory", "linkform"])
        .reset_index(drop=True)
    )
linkid roadname linkcategory linkform carriageway ownership srn operationalstate SHAPE_Length
0 {AA2D189D-A767-5DAB-DA16-03BE79C5320A} A12 A SL M NH N O 447.799507
1 {AA2B6DE3-FF0A-1F5B-5D55-B86E908615CA} A2270 A SC A NH N O 102.007383
2 {F8F74D61-2FCB-48F7-8E72-DC9A481C686A} A23 A DC A NH N O 126.257730
3 {92946B85-E28B-0346-27D5-AC4BD2A58234} A293 A SC A NH N O 19.971946
4 {412F7E8F-1A83-1F7B-875F-628E4378D1EE} A293 A SC A NH N O 16.548472
5 {8C55DC9B-C39B-4F17-EB98-0ADD99E294B9} A293 A SC A NH N O 24.599465
6 {2BBF8039-DACD-4C37-E2D6-949AF00FF0EF} A303 A SL K NH N O 166.813493
7 {F7024204-FB1E-22C9-4C13-CAB742C05014} A47 A SL J NH N O 236.004304
8 {7A9381E0-9ABC-F309-49F9-35F74DACC01F} A66 A SL L NH N O 129.248274
9 {0E7AE448-0182-154B-C220-A1402509AACB} A66 A SL J NH N O 112.563493
10 {A1F5A0EC-2778-BE11-94BC-0FA3B7103116} M1 U SR X NH N O 1372.397979
11 {E61D80B1-6A39-0EE3-C245-C68DABC0E78E} M1 U SR X NH N O 48.496226
12 {2292C08B-BF4D-0D29-2C73-3ACA94492933} M1 U SR X NH N O 1418.420468
13 {D3360D8B-F3B5-2F8C-618C-B0CC4E1F38D6} M25 M SR X NH N O 68.832008
14 {978452EA-3164-C8C1-01EA-911C8DFD01A6} M25 M SR X NH N O 63.410833
15 {C6BC13A2-8CFA-10D4-5D87-505A87698952} M25 M SR X NH N O 44.360923
16 {EC0C811A-C2EC-C46C-4DCF-9CEE829AB7AA} M4 M SL L NH N O 298.064144
17 {6381DEAE-C210-741F-C2C1-36690AF6BCD0} M4 M SL L NH N O 439.638495
18 {78F6A432-5A91-464A-26B5-44AE28098215} M4 M SL K NH N O 362.996218
19 {049BD5B0-4E5B-B8A0-D9B9-3F9EFDC34E69} M4 M SL J NH N O 337.686443
20 {0B63E859-6260-8091-C73B-03CA5A048653} Unclassified Unnamed Road U SL J NH N O 39.363534
21 {785C183B-D42D-BBCE-EE74-F4A06104B633} Unclassified Unnamed Road U SL K NH N O 63.440015
22 {DCF08C31-FD10-AACD-7010-24EAC0745756} Unclassified unnamed road U DC A NH N O 60.217838
23 {9E61B238-6CFF-A1C2-0D5E-91E8C0CCC83C} Unclassified unnamed road U DC B NH N O 76.430320
Note

For this project, the key integration consequence is that the Network Model GDB does not replace the all-road backbone from OS Open Roads or OSM. It enriches the SRN subset with more authoritative geometry, lane, carriageway, grade-separation, and restriction features.

Tip

For exposure features such as lane_km, remember that the link geometry already contains separated carriageways, slip roads, junction arms, and directional splits. Avoid applying a second manual two-direction multiplier unless the feature definition explicitly requires it.

6 Constraints On Use

These checks are deliberately near the top of the page because they affect whether and how this source should be integrated into the modelling pipeline.

Code
def active_in_year(df: pd.DataFrame, year: int) -> pd.Series:
    year_start = pd.Timestamp(year=year, month=1, day=1, tz="UTC")
    year_end = pd.Timestamp(year=year, month=12, day=31, tz="UTC")

    start = pd.to_datetime(df["startdate"], errors="coerce", utc=True)
    end = pd.to_datetime(df["enddate"], errors="coerce", utc=True)

    starts_before_year_end = start.isna() | (start <= year_end)
    ends_after_year_start = end.isna() | (end >= year_start)
    return starts_before_year_end & ends_after_year_start

years = range(2015, 2025)
yearly_validity = []
for year in years:
    active = links_for_coverage[active_in_year(links_for_coverage, year)]
    yearly_validity.append(
        {
            "year": year,
            "active_links_by_validity_dates": len(active),
            "active_link_length_km": active["SHAPE_Length"].sum() / 1000,
        }
    )

yearly_validity = pd.DataFrame(yearly_validity)

with fiona.open(gdb_path, layer="Speed_Limit") as src:
    speed_limit_rows = len(src)

constraints = pd.DataFrame(
    [
        {
            "constraint": "Coverage family",
            "finding": f"{motorway_or_a_pct:.1f}% of links are motorway or A-road; {srn_y_pct:.1f}% have srn = Y.",
            "modelling_implication": "Treat as SRN / trunk-road enrichment, not all-road coverage.",
        },
        {
            "constraint": "Validity-date coverage",
            "finding": (
                f"{(yearly_validity['active_links_by_validity_dates'] == 0).sum()} "
                "model years have zero active links under startdate/enddate."
            ),
            "modelling_implication": "Do not use validity dates as proof of historical availability without source confirmation.",
        },
        {
            "constraint": "Speed limit",
            "finding": f"Speed_Limit rows: {speed_limit_rows:,}.",
            "modelling_implication": "Use another source for speed-limit features if this table is empty.",
        },
        {
            "constraint": "Smart motorway",
            "finding": (
                f"Non-missing smartmotorway values: "
                f"{links_for_coverage['smartmotorway'].notna().sum():,}."
            ),
            "modelling_implication": "Do not create a smart_motorway_flag unless values are populated.",
        },
        {
            "constraint": "Constant fields",
            "finding": (
                f"ownership unique values: {links_for_coverage['ownership'].nunique(dropna=True)}; "
                f"operationalstate unique values: {links_for_coverage['operationalstate'].nunique(dropna=True)}."
            ),
            "modelling_implication": "Constant fields are useful QA signals but not predictive features.",
        },
    ]
)

display(constraints)
display(yearly_validity)
constraint finding modelling_implication
0 Coverage family 99.9% of links are motorway or A-road; 99.9% h... Treat as SRN / trunk-road enrichment, not all-...
1 Validity-date coverage 5 model years have zero active links under sta... Do not use validity dates as proof of historic...
2 Speed limit Speed_Limit rows: 0. Use another source for speed-limit features if...
3 Smart motorway Non-missing smartmotorway values: 0. Do not create a smart_motorway_flag unless val...
4 Constant fields ownership unique values: 1; operationalstate u... Constant fields are useful QA signals but not ...
year active_links_by_validity_dates active_link_length_km
0 2015 0 0.000000
1 2016 0 0.000000
2 2017 0 0.000000
3 2018 0 0.000000
4 2019 0 0.000000
5 2020 75 19.008889
6 2021 84 21.349183
7 2022 41725 23531.010355
8 2023 41880 23576.021013
9 2024 42796 23855.699972
Warning

The startdate and enddate fields do not provide reliable historical coverage for this project’s full 2015-2024 modelling window without additional source confirmation. Treat the GDB as a current SRN structural snapshot unless a separate historical validity method is established.

Important

The clean integration path is facility-family conditional: use Network Model features inside an SRN-specific model or SRN feature branch, and keep non-SRN links on the OS Open Roads / OSM feature set. Imputing these authoritative SRN fields across the full network would create the same kind of coverage bias as sparse OSM-derived features.

7 Source Snapshot

The inventory below is generated from the File Geodatabase at render time.

Code
layer_roles = {
    "Node": "Network topology and junction proximity",
    "Link": "Main road segment geometry and attributes",
    "Access_Restriction": "Access restriction descriptions",
    "Vehicle_Restriction": "Vehicle restriction descriptions",
    "Turn_Restriction": "Turn movement restrictions",
    "Street": "Street-level metadata and surface",
    "Lane": "Lane-level widths and offsets",
    "Speed_Limit": "Speed limit records if populated",
    "Access_Restriction_Reference": "Links restrictions to links",
    "Access_Restriction_Inclusion": "Access inclusion details",
    "Access_Restriction_Exemption": "Access exemption details",
    "Junction": "Junction type and naming",
    "Junction_Reference": "Links junctions to nodes",
    "Road": "Road name and classification",
    "Road_Reference": "Links roads to links",
    "Street_Reference": "Links streets to links",
    "Turn_Restriction_Reference": "Links turn restrictions to ordered links",
    "Turn_Restriction_Inclusion": "Turn inclusion details if populated",
    "Turn_Restriction_Exemption": "Turn exemption details",
    "Vehicle_Restriction_Reference": "Links vehicle restrictions to links/nodes",
    "Vehicle_Node_Restriction_Reference": "Vehicle-node restriction references",
    "Vehicle_Restriction_Inclusion": "Vehicle restriction inclusion details",
    "Vehicle_Restriction_Exemption": "Vehicle restriction exemption details",
    "Street_Interest": "Street works interest metadata",
    "Street_Construction": "Street construction metadata",
    "Street_Special_Designation": "Street special designation metadata",
    "Street_Special_Designation_Points": "Point special designations",
    "Street_Special_Designation_Lines": "Line special designations",
    "Street_Special_Designation_Polygons": "Polygon special designations",
}

rows = []
for layer in fiona.listlayers(gdb_path):
    with fiona.open(gdb_path, layer=layer) as src:
        crs = src.crs.to_string() if src.crs else None
        rows.append(
            {
                "layer": layer,
                "rows": len(src),
                "crs": crs,
                "fields": len(src.schema["properties"]),
                "geometry": src.schema.get("geometry"),
                "main_use": layer_roles.get(layer, "Review before use"),
            }
        )

inventory = pd.DataFrame(rows).sort_values(["rows", "layer"], ascending=[False, True])
display(inventory)
layer rows crs fields geometry main_use
9 Lane 104840 None 15 None Lane-level widths and offsets
18 Street_Reference 86191 None 8 None Links streets to links
17 Road_Reference 49345 None 9 None Links roads to links
1 Link 42960 EPSG:3857 28 3D MultiLineString Main road segment geometry and attributes
19 Turn_Restriction_Reference 39101 None 11 None Links turn restrictions to ordered links
0 Node 37874 EPSG:3857 12 3D Point Network topology and junction proximity
4 Turn_Restriction 37474 EPSG:3857 9 3D MultiLineString Turn movement restrictions
5 Street 9274 EPSG:3857 17 3D MultiLineString Street-level metadata and surface
15 Junction_Reference 6946 None 9 None Links junctions to nodes
14 Junction 762 None 11 None Junction type and naming
16 Road 617 None 9 None Road name and classification
11 Access_Restriction_Reference 246 None 11 None Links restrictions to links
2 Access_Restriction 108 EPSG:3857 9 3D Point Access restriction descriptions
13 Access_Restriction_Exemption 40 None 9 None Access exemption details
12 Access_Restriction_Inclusion 40 None 9 None Access inclusion details
3 Vehicle_Restriction 18 EPSG:3857 11 3D Point Vehicle restriction descriptions
22 Vehicle_Restriction_Reference 18 None 13 None Links vehicle restrictions to links/nodes
21 Turn_Restriction_Exemption 1 None 9 None Turn exemption details
10 Speed_Limit 0 None 17 None Speed limit records if populated
27 Street_Construction 0 None 11 None Street construction metadata
26 Street_Interest 0 None 13 None Street works interest metadata
28 Street_Special_Designation 0 None 13 None Street special designation metadata
7 Street_Special_Designation_Lines 0 EPSG:3857 15 3D MultiLineString Line special designations
6 Street_Special_Designation_Points 0 EPSG:3857 14 3D Point Point special designations
8 Street_Special_Designation_Polygons 0 EPSG:3857 16 3D MultiPolygon Polygon special designations
20 Turn_Restriction_Inclusion 0 None 9 None Turn inclusion details if populated
23 Vehicle_Node_Restriction_Reference 0 None 9 None Vehicle-node restriction references
25 Vehicle_Restriction_Exemption 0 None 9 None Vehicle restriction exemption details
24 Vehicle_Restriction_Inclusion 0 None 9 None Vehicle restriction inclusion details
Note

Layers with zero rows should not be dropped from the documentation entirely. Their presence is useful because it shows that a concept exists in the schema.

Code
empty_layers = inventory.loc[inventory["rows"].eq(0), ["layer", "geometry", "main_use"]]

if empty_layers.empty:
    print("No empty layers in this GDB extract.")
else:
    display(empty_layers)
layer geometry main_use
10 Speed_Limit None Speed limit records if populated
27 Street_Construction None Street construction metadata
26 Street_Interest None Street works interest metadata
28 Street_Special_Designation None Street special designation metadata
7 Street_Special_Designation_Lines 3D MultiLineString Line special designations
6 Street_Special_Designation_Points 3D Point Point special designations
8 Street_Special_Designation_Polygons 3D MultiPolygon Polygon special designations
20 Turn_Restriction_Inclusion None Turn inclusion details if populated
23 Vehicle_Node_Restriction_Reference None Vehicle-node restriction references
25 Vehicle_Restriction_Exemption None Vehicle restriction exemption details
24 Vehicle_Restriction_Inclusion None Vehicle restriction inclusion details

8 Expected Values

Expected values should be checked from the GDB rather than hard-coded into the model. The useful fields are mostly categorical domains and linkable keys.

8.1 Core Link Fields

The Link layer fields and their project roles are generated below from the source schema.

Code
link_field_roles = {
    "linkid": ("Unique link key", "Primary feature table key"),
    "linkref": ("Human-readable link reference", "Diagnostics and matching"),
    "linkcategory": ("Road category", "Exposure and risk stratification"),
    "linkdesc": ("Road description", "Diagnostics"),
    "linkform": ("Physical or functional link form", "Structural risk feature"),
    "directionality": ("One-way / two-way status", "Routing and conflict proxy"),
    "direction": ("Direction description", "Routing and diagnostics"),
    "numberoflanes": ("Lane count", "Exposure and capacity proxy"),
    "smartmotorway": ("Smart motorway flag/category", "Availability check before modelling"),
    "carriageway": ("Carriageway type", "Risk and severity proxy"),
    "ownership": ("Owning authority/operator", "Governance and coverage"),
    "startgradeseparation": ("Start grade-separation level", "Junction and conflict proxy"),
    "endgradeseparation": ("End grade-separation level", "Junction and conflict proxy"),
    "parentlinkref": ("Parent link reference", "De-duplication / hierarchy"),
    "srn": ("Strategic Road Network flag/category", "Major-network segmentation"),
    "startnode": ("From-node key", "Topology"),
    "endnode": ("To-node key", "Topology"),
    "operationalstate": ("Operational status", "Filtering and coverage"),
    "roadname": ("Road name", "Reporting and corridor grouping"),
    "startdate": ("Valid-from date", "Yearly coverage"),
    "enddate": ("Valid-to date", "Yearly coverage"),
    "toid": ("Topographic object identifier", "Cross-dataset matching"),
    "SHAPE_Length": ("Link length in CRS units", "Length exposure"),
}

with fiona.open(gdb_path, layer="Link") as src:
    link_schema = pd.DataFrame(
        [
            {"field": field, "source_type": source_type}
            for field, source_type in src.schema["properties"].items()
        ]
    )

link_roles = pd.DataFrame(
    [
        {"field": field, "expected_role": role, "model_use": model_use}
        for field, (role, model_use) in link_field_roles.items()
    ]
)

display(link_schema.merge(link_roles, on="field", how="left"))
field source_type expected_role model_use
0 linkid str Unique link key Primary feature table key
1 linkref str:255 Human-readable link reference Diagnostics and matching
2 linkcategory str:255 Road category Exposure and risk stratification
3 linkdesc str:255 Road description Diagnostics
4 linkform str:255 Physical or functional link form Structural risk feature
5 directionality str:255 One-way / two-way status Routing and conflict proxy
6 direction str:255 Direction description Routing and diagnostics
7 numberoflanes int32 Lane count Exposure and capacity proxy
8 smartmotorway str:255 Smart motorway flag/category Availability check before modelling
9 carriageway str:255 Carriageway type Risk and severity proxy
10 ownership str:255 Owning authority/operator Governance and coverage
11 startgradeseparation int16 Start grade-separation level Junction and conflict proxy
12 endgradeseparation int16 End grade-separation level Junction and conflict proxy
13 parentlinkref str Parent link reference De-duplication / hierarchy
14 srn str:255 Strategic Road Network flag/category Major-network segmentation
15 startnode str From-node key Topology
16 endnode str To-node key Topology
17 operationalstate str:255 Operational status Filtering and coverage
18 roadname str:255 Road name Reporting and corridor grouping
19 startdate datetime Valid-from date Yearly coverage
20 enddate datetime Valid-to date Yearly coverage
21 toid str:255 Topographic object identifier Cross-dataset matching
22 globalid str NaN NaN
23 created_user str:255 NaN NaN
24 created_date datetime NaN NaN
25 last_edited_user str:255 NaN NaN
26 last_edited_date datetime NaN NaN
27 SHAPE_Length float Link length in CRS units Length exposure

Key categorical domains to profile:

linkcategory
linkform
directionality
direction
smartmotorway
carriageway
ownership
srn
operationalstate

8.2 Related Table Fields

Useful related fields are checked against the source schema below.

Code
related_field_roles = {
    "Lane": {
        "fields": ["linkid", "lanenumber", "averagewidth", "minimumwidth", "directionality", "start", "end_"],
        "expected_use": "Lane count and lane-width summaries",
    },
    "Road": {
        "fields": ["roadid", "roadname", "roadclassification"],
        "expected_use": "Road class and named-road grouping",
    },
    "Road_Reference": {
        "fields": ["roadid", "linkid"],
        "expected_use": "Road-to-link join table",
    },
    "Street": {
        "fields": [
            "usrn",
            "roadclassification",
            "streettype",
            "surface",
            "operationalstate",
            "town",
            "responsibleauthorityname",
            "administrativearea",
        ],
        "expected_use": "Local street classification and authority coverage",
    },
    "Street_Reference": {
        "fields": ["usrn", "linkid"],
        "expected_use": "Street-to-link join table",
    },
    "Junction": {
        "fields": ["junctionid", "junctiontype", "roadnumber", "junctionnumber", "junctionname"],
        "expected_use": "Junction categorisation",
    },
    "Junction_Reference": {
        "fields": ["junctionid", "nodeid"],
        "expected_use": "Junction-to-node relationship",
    },
    "Turn_Restriction_Reference": {
        "fields": ["turnid", "linkid", "applicabledirection", "sequence"],
        "expected_use": "Turn-complexity features",
    },
    "Access_Restriction_Reference": {
        "fields": ["accessid", "linkid", "applicabledirection", "atposition"],
        "expected_use": "Access-complexity features",
    },
    "Vehicle_Restriction_Reference": {
        "fields": ["vehicleid", "linkid", "nodeid", "applicabledirection", "atposition"],
        "expected_use": "Vehicle-restriction features",
    },
}

related_rows = []
for layer, spec in related_field_roles.items():
    with fiona.open(gdb_path, layer=layer) as src:
        source_fields = set(src.schema["properties"])
    for field in spec["fields"]:
        related_rows.append(
            {
                "layer": layer,
                "field": field,
                "present_in_schema": field in source_fields,
                "expected_use": spec["expected_use"],
            }
        )

display(pd.DataFrame(related_rows))
layer field present_in_schema expected_use
0 Lane linkid True Lane count and lane-width summaries
1 Lane lanenumber True Lane count and lane-width summaries
2 Lane averagewidth True Lane count and lane-width summaries
3 Lane minimumwidth True Lane count and lane-width summaries
4 Lane directionality True Lane count and lane-width summaries
5 Lane start True Lane count and lane-width summaries
6 Lane end_ True Lane count and lane-width summaries
7 Road roadid True Road class and named-road grouping
8 Road roadname True Road class and named-road grouping
9 Road roadclassification True Road class and named-road grouping
10 Road_Reference roadid True Road-to-link join table
11 Road_Reference linkid True Road-to-link join table
12 Street usrn True Local street classification and authority cove...
13 Street roadclassification True Local street classification and authority cove...
14 Street streettype True Local street classification and authority cove...
15 Street surface True Local street classification and authority cove...
16 Street operationalstate True Local street classification and authority cove...
17 Street town True Local street classification and authority cove...
18 Street responsibleauthorityname True Local street classification and authority cove...
19 Street administrativearea True Local street classification and authority cove...
20 Street_Reference usrn True Street-to-link join table
21 Street_Reference linkid True Street-to-link join table
22 Junction junctionid True Junction categorisation
23 Junction junctiontype True Junction categorisation
24 Junction roadnumber True Junction categorisation
25 Junction junctionnumber True Junction categorisation
26 Junction junctionname True Junction categorisation
27 Junction_Reference junctionid True Junction-to-node relationship
28 Junction_Reference nodeid True Junction-to-node relationship
29 Turn_Restriction_Reference turnid True Turn-complexity features
30 Turn_Restriction_Reference linkid True Turn-complexity features
31 Turn_Restriction_Reference applicabledirection True Turn-complexity features
32 Turn_Restriction_Reference sequence True Turn-complexity features
33 Access_Restriction_Reference accessid True Access-complexity features
34 Access_Restriction_Reference linkid True Access-complexity features
35 Access_Restriction_Reference applicabledirection True Access-complexity features
36 Access_Restriction_Reference atposition True Access-complexity features
37 Vehicle_Restriction_Reference vehicleid True Vehicle-restriction features
38 Vehicle_Restriction_Reference linkid True Vehicle-restriction features
39 Vehicle_Restriction_Reference nodeid True Vehicle-restriction features
40 Vehicle_Restriction_Reference applicabledirection True Vehicle-restriction features
41 Vehicle_Restriction_Reference atposition True Vehicle-restriction features

8.3 Model-Ready Features

Candidate link-level features:

Feature Derivation Exposure / risk role
length_km Link.SHAPE_Length / 1000 Basic road exposure
lane_count Link.numberoflanes or count of Lane rows Capacity exposure
lane_km length_km * lane_count Core infrastructure exposure
avg_lane_width_m Mean Lane.averagewidth by linkid Geometry quality
min_lane_width_m Min Lane.minimumwidth by linkid Narrow-lane risk proxy
road_category Link.linkcategory Exposure and risk stratification
road_classification Joined from Road or Street Hierarchy feature
link_form Link.linkform Structural risk proxy
carriageway_type Link.carriageway Severity and capacity proxy
directionality Link.directionality Conflict and routing proxy
srn_flag Link.srn Strategic-network feature
grade_separation_delta endgradeseparation - startgradeseparation Junction structure proxy
junction_count Junction references near startnode / endnode Conflict-point proxy
turn_restrictions_per_km Count turn references by linkid / length_km Network complexity
access_restrictions_per_km Count access references by linkid / length_km Access complexity
vehicle_restrictions_per_km Count vehicle references by linkid / length_km Freight / vehicle constraint proxy
Code
links = gpd.read_file(gdb_path, layer="Link")
nodes = gpd.read_file(gdb_path, layer="Node")
lanes = gpd.read_file(gdb_path, layer="Lane")
roads = gpd.read_file(gdb_path, layer="Road")
road_ref = gpd.read_file(gdb_path, layer="Road_Reference")
streets = gpd.read_file(gdb_path, layer="Street")
street_ref = gpd.read_file(gdb_path, layer="Street_Reference")
junctions = gpd.read_file(gdb_path, layer="Junction")
junction_ref = gpd.read_file(gdb_path, layer="Junction_Reference")
turn_ref = gpd.read_file(gdb_path, layer="Turn_Restriction_Reference")
access_ref = gpd.read_file(gdb_path, layer="Access_Restriction_Reference")
vehicle_ref = gpd.read_file(gdb_path, layer="Vehicle_Restriction_Reference")

9 Completeness Checks

Completeness should be reported at three levels:

  1. table coverage: row counts and empty layers,
  2. field coverage: null and blank percentages,
  3. relationship coverage: whether related-table keys resolve back to the core Link, Node, Road, Street, and Junction tables.

9.1 Table And Field Completeness

Code
def completeness(df: pd.DataFrame, key: str | None = None) -> pd.DataFrame:
    out = []
    n = len(df)
    for col in df.columns:
        if col == "geometry":
            continue
        s = df[col]
        missing = s.isna()
        if pd.api.types.is_string_dtype(s):
            missing = missing | s.astype("string").str.strip().eq("")
        out.append(
            {
                "field": col,
                "rows": n,
                "missing": int(missing.sum()),
                "complete_pct": round(100 * (1 - missing.mean()), 2) if n else 0,
                "unique_values": int(s.nunique(dropna=True)),
                "is_key": col == key,
            }
        )
    return pd.DataFrame(out).sort_values(["complete_pct", "field"])

display(completeness(links, key="linkid"))
field rows missing complete_pct unique_values is_key
20 enddate 42960 42960 0.00 0 False
13 parentlinkref 42960 42960 0.00 0 False
8 smartmotorway 42960 42960 0.00 0 False
21 toid 42960 1525 96.45 41119 False
27 SHAPE_Length 42960 0 100.00 42953 False
9 carriageway 42960 0 100.00 7 False
24 created_date 42960 0 100.00 43 False
23 created_user 42960 0 100.00 1 False
6 direction 42960 0 100.00 6 False
5 directionality 42960 0 100.00 2 False
12 endgradeseparation 42960 0 100.00 3 False
16 endnode 42960 0 100.00 35936 False
22 globalid 42960 0 100.00 42960 False
26 last_edited_date 42960 0 100.00 43 False
25 last_edited_user 42960 0 100.00 1 False
2 linkcategory 42960 0 100.00 4 False
3 linkdesc 42960 0 100.00 11174 False
4 linkform 42960 0 100.00 8 False
0 linkid 42960 0 100.00 42960 True
1 linkref 42960 0 100.00 42958 False
7 numberoflanes 42960 0 100.00 7 False
17 operationalstate 42960 0 100.00 1 False
10 ownership 42960 0 100.00 1 False
18 roadname 42960 0 100.00 190 False
14 srn 42960 0 100.00 2 False
19 startdate 42960 0 100.00 768 False
11 startgradeseparation 42960 0 100.00 3 False
15 startnode 42960 0 100.00 35840 False

Recommended completeness thresholds:

Check Expected value
Link.linkid completeness 100%
Link.linkid uniqueness 100% unique
Link.geometry completeness 100% non-null line geometry
Link.SHAPE_Length completeness 100% and positive
Link.startnode, Link.endnode completeness Near 100%
Lane.linkid relationship to Link.linkid High, ideally 100%
Road_Reference.linkid relationship to Link.linkid High, many-to-one allowed
Street_Reference.linkid relationship to Link.linkid High, many-to-one allowed
Optional layers Check row counts before using as feature dependencies
Code
checks = {
    "linkid_missing": links["linkid"].isna().sum(),
    "linkid_duplicate_rows": links["linkid"].duplicated().sum(),
    "geometry_missing": links.geometry.isna().sum(),
    "length_non_positive": (links["SHAPE_Length"] <= 0).sum(),
    "startnode_missing": links["startnode"].isna().sum(),
    "endnode_missing": links["endnode"].isna().sum(),
}

display(pd.Series(checks, name="count").to_frame())
count
linkid_missing 0
linkid_duplicate_rows 0
geometry_missing 0
length_non_positive 0
startnode_missing 0
endnode_missing 0

9.2 Relationship Completeness

Code
link_ids = set(links["linkid"].dropna())
node_ids = set(nodes["nodeid"].dropna())
road_ids = set(roads["roadid"].dropna())
street_ids = set(streets["usrn"].dropna())
junction_ids = set(junctions["junctionid"].dropna())

relationship_checks = pd.DataFrame(
    [
        {
            "relationship": "Lane.linkid -> Link.linkid",
            "rows": len(lanes),
            "unmatched": (~lanes["linkid"].isin(link_ids)).sum(),
        },
        {
            "relationship": "Road_Reference.linkid -> Link.linkid",
            "rows": len(road_ref),
            "unmatched": (~road_ref["linkid"].isin(link_ids)).sum(),
        },
        {
            "relationship": "Road_Reference.roadid -> Road.roadid",
            "rows": len(road_ref),
            "unmatched": (~road_ref["roadid"].isin(road_ids)).sum(),
        },
        {
            "relationship": "Street_Reference.linkid -> Link.linkid",
            "rows": len(street_ref),
            "unmatched": (~street_ref["linkid"].isin(link_ids)).sum(),
        },
        {
            "relationship": "Street_Reference.usrn -> Street.usrn",
            "rows": len(street_ref),
            "unmatched": (~street_ref["usrn"].isin(street_ids)).sum(),
        },
        {
            "relationship": "Junction_Reference.nodeid -> Node.nodeid",
            "rows": len(junction_ref),
            "unmatched": (~junction_ref["nodeid"].isin(node_ids)).sum(),
        },
        {
            "relationship": "Junction_Reference.junctionid -> Junction.junctionid",
            "rows": len(junction_ref),
            "unmatched": (~junction_ref["junctionid"].isin(junction_ids)).sum(),
        },
        {
            "relationship": "Link.startnode -> Node.nodeid",
            "rows": len(links),
            "unmatched": (~links["startnode"].isin(node_ids)).sum(),
        },
        {
            "relationship": "Link.endnode -> Node.nodeid",
            "rows": len(links),
            "unmatched": (~links["endnode"].isin(node_ids)).sum(),
        },
    ]
)

relationship_checks["unmatched_pct"] = (
    100 * relationship_checks["unmatched"] / relationship_checks["rows"].clip(lower=1)
).round(2)

display(relationship_checks.sort_values("unmatched_pct", ascending=False))
relationship rows unmatched unmatched_pct
5 Junction_Reference.nodeid -> Node.nodeid 6946 1 0.01
0 Lane.linkid -> Link.linkid 104840 0 0.00
1 Road_Reference.linkid -> Link.linkid 49345 0 0.00
3 Street_Reference.linkid -> Link.linkid 86191 0 0.00
2 Road_Reference.roadid -> Road.roadid 49345 0 0.00
4 Street_Reference.usrn -> Street.usrn 86191 0 0.00
6 Junction_Reference.junctionid -> Junction.junc... 6946 0 0.00
7 Link.startnode -> Node.nodeid 42960 2 0.00
8 Link.endnode -> Node.nodeid 42960 2 0.00

9.3 Categorical Domain Checks

Code
categorical_fields = [
    "linkcategory",
    "linkform",
    "directionality",
    "direction",
    "smartmotorway",
    "carriageway",
    "ownership",
    "srn",
    "operationalstate",
]

for col in categorical_fields:
    print(f"\n{col}")
    display(
        links[col]
        .fillna("<missing>")
        .astype(str)
        .value_counts(dropna=False)
        .rename_axis(col)
        .reset_index(name="rows")
    )

linkcategory
linkcategory rows
0 A 28844
1 M 14081
2 U 22
3 B 13

linkform
linkform rows
0 DC 23306
1 SL 7357
2 SC 6813
3 R 4893
4 L 507
5 SR 52
6 DL 30
7 EA 2

directionality
directionality rows
0 1 37240
1 0 5720

direction
direction rows
0 N 10680
1 S 8909
2 E 8681
3 W 7908
4 CW 5856
5 ACW 926

smartmotorway
smartmotorway rows
0 <missing> 42960

carriageway
carriageway rows
0 A 16756
1 B 11578
2 X 6770
3 L 2040
4 J 2028
5 K 1902
6 M 1886

ownership
ownership rows
0 NH 42960

srn
srn rows
0 Y 42936
1 N 24

operationalstate
operationalstate rows
0 O 42960
Code
categorical_suitability = []
for col in categorical_fields:
    s = links[col]
    non_missing = s.notna().sum()
    unique_non_missing = s.nunique(dropna=True)
    categorical_suitability.append(
        {
            "field": col,
            "non_missing": non_missing,
            "non_missing_pct": round(100 * non_missing / len(links), 2),
            "unique_non_missing": unique_non_missing,
            "feature_guidance": (
                "drop: empty"
                if non_missing == 0
                else "drop: constant"
                if unique_non_missing <= 1
                else "usable after code meaning is resolved"
            ),
        }
    )

display(pd.DataFrame(categorical_suitability))
field non_missing non_missing_pct unique_non_missing feature_guidance
0 linkcategory 42960 100.0 4 usable after code meaning is resolved
1 linkform 42960 100.0 8 usable after code meaning is resolved
2 directionality 42960 100.0 2 usable after code meaning is resolved
3 direction 42960 100.0 6 usable after code meaning is resolved
4 smartmotorway 0 0.0 0 drop: empty
5 carriageway 42960 100.0 7 usable after code meaning is resolved
6 ownership 42960 100.0 1 drop: constant
7 srn 42960 100.0 2 usable after code meaning is resolved
8 operationalstate 42960 100.0 1 drop: constant
Warning

Several useful-looking fields are coded domains. Do not treat linkform or carriageway as ordinal or self-explanatory until the National Highways / OS Highways code meanings have been resolved from source documentation or metadata.

10 Yearly Coverage

The GDB has startdate and enddate fields on many layers. These should be interpreted as feature-validity dates, not traffic observation years.

For annual coverage, mark a feature as active in a year if:

startdate <= 31 December of that year
and
enddate is missing or enddate >= 1 January of that year

This gives a structural-network coverage series by year. It does not replace AADF, WebTRIS, or STATS19 year fields.

Code
def active_in_year(df: pd.DataFrame, year: int) -> pd.Series:
    year_start = pd.Timestamp(year=year, month=1, day=1, tz="UTC")
    year_end = pd.Timestamp(year=year, month=12, day=31, tz="UTC")

    start = pd.to_datetime(df["startdate"], errors="coerce", utc=True)
    end = pd.to_datetime(df["enddate"], errors="coerce", utc=True)

    starts_before_year_end = start.isna() | (start <= year_end)
    ends_after_year_start = end.isna() | (end >= year_start)
    return starts_before_year_end & ends_after_year_start

years = range(2015, 2025)
yearly = []
for year in years:
    active = links[active_in_year(links, year)].copy()
    yearly.append(
        {
            "year": year,
            "active_links": len(active),
            "active_length_km": active["SHAPE_Length"].sum() / 1000,
            "active_lane_km": (
                active["SHAPE_Length"]
                * active["numberoflanes"].fillna(1).clip(lower=1)
            ).sum()
            / 1000,
        }
    )

yearly_coverage = pd.DataFrame(yearly)
display(yearly_coverage)
year active_links active_length_km active_lane_km
0 2015 0 0.000000 0.000000
1 2016 0 0.000000 0.000000
2 2017 0 0.000000 0.000000
3 2018 0 0.000000 0.000000
4 2019 0 0.000000 0.000000
5 2020 75 19.008889 54.838358
6 2021 84 21.349183 59.241304
7 2022 41725 23531.010355 66490.982582
8 2023 41880 23576.021013 66559.739110
9 2024 42796 23855.699972 66982.821574
Code
coverage_diagnostics = {
    "first_year_with_active_links": (
        yearly_coverage.loc[yearly_coverage["active_links"].gt(0), "year"].min()
        if yearly_coverage["active_links"].gt(0).any()
        else None
    ),
    "max_active_links": yearly_coverage["active_links"].max(),
    "years_with_no_active_links": yearly_coverage.loc[
        yearly_coverage["active_links"].eq(0), "year"
    ].tolist(),
}

display(pd.Series(coverage_diagnostics, name="value").to_frame())

if coverage_diagnostics["years_with_no_active_links"]:
    print(
        "Some modelling years have no active links under the startdate/enddate "
        "validity test. Treat this as source validity-date coverage, not as "
        "proof that the physical road network was absent."
    )
value
first_year_with_active_links 2020
max_active_links 42796
years_with_no_active_links [2015, 2016, 2017, 2018, 2019]
Some modelling years have no active links under the startdate/enddate validity test. Treat this as source validity-date coverage, not as proof that the physical road network was absent.

Additional yearly checks:

  • active links by operationalstate,
  • active link length by linkcategory,
  • links created or retired per year using startdate and enddate,
  • whether yearly coverage changes materially across the modelling window,
  • whether link identifiers are stable enough to join to annual exposure outputs.
Warning

Do not use created_date or last_edited_date as road-network validity dates. Those fields usually describe database editing history rather than when the road was open to traffic.

11 Geographic Coverage

Geographic coverage should be reported using both geometry bounds and overlay against the project study area.

Useful coverage outputs:

  • total link length inside the study area,
  • percentage of links intersecting the study area,
  • link length by local authority, police force, region, or custom grid cell,
  • number of links with missing or invalid geometry,
  • comparison with OS Open Roads or MRDB length by geography,
  • map of links by linkcategory, carriageway, or operationalstate.
Code
links_3857 = links.to_crs(3857)

coverage = {
    "crs": str(links.crs),
    "rows": len(links),
    "geometry_missing": int(links.geometry.isna().sum()),
    "invalid_geometry": int((~links.geometry.is_valid).sum()),
    "total_length_km_shape": links["SHAPE_Length"].sum() / 1000,
    "total_length_km_geometry": links_3857.length.sum() / 1000,
}

display(pd.Series(coverage, name="value").to_frame())
display(pd.Series(links_3857.total_bounds, index=["minx", "miny", "maxx", "maxy"]))
value
crs EPSG:3857
rows 42960
geometry_missing 0
invalid_geometry 0
total_length_km_shape 23932.307324
total_length_km_geometry 23932.307324
minx   -6.137428e+05
miny    6.468640e+06
maxx    1.955150e+05
maxy    7.520043e+06
dtype: float64
Code
# Optional example: replace this with a real project boundary, local authority
# layer, police force boundary layer, or generated grid before using it.
boundary_path = _ROOT / "data/external/boundaries/study_area.gpkg"
boundary_label = "data/external/boundaries/study_area.gpkg"

if not boundary_path.exists():
    print(f"Boundary file not found, skipping area overlay: {boundary_label}")
else:
    areas = gpd.read_file(boundary_path).to_crs(links.crs)

    links_for_overlay = links[
        ["linkid", "linkcategory", "operationalstate", "geometry"]
    ].copy()
    overlay = gpd.overlay(links_for_overlay, areas, how="intersection")
    overlay["length_km"] = overlay.to_crs(3857).length / 1000

    area_summary = (
        overlay.groupby("area_name", dropna=False)
        .agg(
            links=("linkid", "nunique"),
            length_km=("length_km", "sum"),
        )
        .reset_index()
        .sort_values("length_km", ascending=False)
    )

    display(area_summary)
Boundary file not found, skipping area overlay: data/external/boundaries/study_area.gpkg

For grid-based coverage:

1. create a regular grid over the study area,
2. intersect links with the grid,
3. sum link length and lane-km per cell,
4. flag cells with zero or very low coverage,
5. compare against OS Open Roads, AADF count points, and STATS19 collisions.

12 Link-Level Feature Build

The minimum useful output is a single feature table with one row per linkid.

Code
link_features = links[
    [
        "linkid",
        "linkref",
        "linkcategory",
        "linkform",
        "directionality",
        "direction",
        "numberoflanes",
        "smartmotorway",
        "carriageway",
        "ownership",
        "startgradeseparation",
        "endgradeseparation",
        "srn",
        "startnode",
        "endnode",
        "operationalstate",
        "roadname",
        "startdate",
        "enddate",
        "toid",
        "SHAPE_Length",
        "geometry",
    ]
].copy()

link_features["length_km"] = link_features["SHAPE_Length"] / 1000
link_features["lane_count"] = link_features["numberoflanes"].fillna(1).clip(lower=1)
link_features["lane_km"] = link_features["length_km"] * link_features["lane_count"]
link_features["grade_separation_delta"] = (
    link_features["endgradeseparation"] - link_features["startgradeseparation"]
)

lane_summary = (
    lanes.groupby("linkid", dropna=False)
    .agg(
        lane_rows=("laneid", "count"),
        avg_lane_width_m=("averagewidth", "mean"),
        min_lane_width_m=("minimumwidth", "min"),
    )
    .reset_index()
)

turn_summary = turn_ref.groupby("linkid").size().rename("turn_restriction_count").reset_index()
access_summary = access_ref.groupby("linkid").size().rename("access_restriction_count").reset_index()
vehicle_summary = vehicle_ref.groupby("linkid").size().rename("vehicle_restriction_count").reset_index()

road_class = (
    road_ref[["linkid", "roadid"]]
    .merge(roads[["roadid", "roadclassification"]], on="roadid", how="left")
    .groupby("linkid")["roadclassification"]
    .agg(lambda x: "; ".join(sorted(set(x.dropna().astype(str)))))
    .rename("roadclassification")
    .reset_index()
)

for df in [lane_summary, turn_summary, access_summary, vehicle_summary, road_class]:
    link_features = link_features.merge(df, on="linkid", how="left")

for col in [
    "turn_restriction_count",
    "access_restriction_count",
    "vehicle_restriction_count",
]:
    link_features[col] = link_features[col].fillna(0)
    link_features[f"{col}_per_km"] = link_features[col] / link_features["length_km"].clip(lower=0.001)

display(link_features.head())
linkid linkref linkcategory linkform directionality direction numberoflanes smartmotorway carriageway ownership startgradeseparation endgradeseparation srn startnode endnode operationalstate roadname startdate enddate toid SHAPE_Length geometry length_km lane_count lane_km grade_separation_delta lane_rows avg_lane_width_m min_lane_width_m turn_restriction_count access_restriction_count vehicle_restriction_count roadclassification turn_restriction_count_per_km access_restriction_count_per_km vehicle_restriction_count_per_km
0 {82765052-D7CB-43C3-BD54-D3C9257E6A34} A1/N/DC/A/games.rezoning.matrons A DC 1 N 1 None A NH 0 0 Y {24095067-FCE4-468A-950E-F9A0AA05D58D} {9607B713-6985-40A8-A45B-A5DEDEDEE74E} O A1 2022-04-06 10:53:21+00:00 NaT osgb4000000006289689 366.071038 MULTILINESTRING Z ((-224980.962 7508198.137 0,... 0.366071 1 0.366071 0 1 7.2 3.65 2.0 0.0 0.0 A 5.463420 0.0 0.0
1 {82A91CA9-F2F1-40C0-B685-15A88F7B8158} A1/S/DC/B/newsprint.flinches.soaps A DC 1 S 1 None B NH 0 0 Y {9607B713-6985-40A8-A45B-A5DEDEDEE74E} {39DE40C2-4A9D-4906-BF0B-5A418ACDF0F8} O A1 2022-04-06 10:53:21+00:00 NaT osgb4000000006289690 368.051474 MULTILINESTRING Z ((-225288.339 7508395.414 0,... 0.368051 1 0.368051 0 1 9.5 3.65 2.0 0.0 0.0 A 5.434023 0.0 0.0
2 {02A2A372-CC0F-4DD8-9FD5-74CEB6EFD334} A1/N/DC/A/regaining.wishing.angel A DC 1 N 1 None A NH 0 0 Y {94196E96-59E7-454B-92E9-32D97C73CE5D} {24095067-FCE4-468A-950E-F9A0AA05D58D} O A1 2022-04-06 10:53:21+00:00 NaT osgb4000000006289693 55.858780 MULTILINESTRING Z ((-224930.594 7508173.984 0,... 0.055859 1 0.055859 0 1 17.8 8.30 1.0 0.0 0.0 A 17.902289 0.0 0.0
3 {25BAA659-1627-430A-ABD8-DD78DC7DE3EA} A1/S/DC/B/daydream.roving.snuggled A DC 1 S 1 None B NH 0 0 Y {39DE40C2-4A9D-4906-BF0B-5A418ACDF0F8} {A1B8F722-DBF3-4A4F-AB32-7BCD22D517D2} O A1 2022-04-06 10:53:21+00:00 NaT osgb4000000006289694 50.818935 MULTILINESTRING Z ((-224967.32 7508216.036 0, ... 0.050819 1 0.050819 0 1 16.2 8.60 1.0 0.0 0.0 A 19.677705 0.0 0.0
4 {DE91C580-6DD0-4940-85D5-CA5B5C957D09} A1/N/SC/A/sleep.ballots.face A SC 0 N 2 None A NH 0 0 Y {9607B713-6985-40A8-A45B-A5DEDEDEE74E} {FB3B4345-245E-4499-80C3-8441382AA096} O A1 2022-04-06 10:53:21+00:00 NaT osgb4000000006289728 1448.257669 MULTILINESTRING Z ((-225288.339 7508395.414 0,... 1.448258 2 2.896515 0 2 4.8 3.85 0.0 0.0 0.0 A 0.000000 0.0 0.0

13 Suggested Exposure Metrics

Use these as interpretable denominators:

Metric Formula Interpretation
Road length exposure sum(length_km) Physical road supply
Lane-km exposure sum(length_km * lane_count) Capacity-weighted road supply
Active lane-km by year sum(active length_km * lane_count) Structural exposure over time
SRN lane-km sum(lane_km where srn is populated / true) Strategic-network exposure
Junction density junction_count / length_km Conflict-point exposure
Restriction density restriction_count / length_km Network complexity exposure

Recommended baseline:

infrastructure_exposure = length_km * lane_count

Then stratify by:

road_category
road_classification
carriageway_type
link_form
operational_state
ownership
geography
year

14 Suggested Risk Features

The GDB supports structural risk features rather than observed risk outcomes.

Useful predictors:

  • road_category
  • road_classification
  • link_form
  • carriageway_type
  • directionality
  • lane_count
  • avg_lane_width_m
  • min_lane_width_m
  • junction_density_per_km
  • turn_restrictions_per_km
  • access_restrictions_per_km
  • vehicle_restrictions_per_km
  • grade_separation_delta
  • srn_flag

Possible proxy formulation:

structural_risk_score =
    road_category_weight
  + link_form_weight
  + carriageway_weight
  + junction_density_weight
  + restriction_density_weight
  + narrow_lane_weight
  + grade_separation_weight

For a risk-volume proxy:

expected_risk_proxy = lane_km * structural_risk_score

For a rate-style proxy:

structural_risk_rate = structural_risk_score / lane_km
Tip

Keep the exposure denominator explicit. A high total-risk segment may simply be long, multi-lane, or high-volume. A high risk-rate segment is a different question.

15 Good For / Not Good For

Good uses:

  • SRN / National Highways motorway and trunk A-road structural features.
  • Authoritative lane count, lane width, carriageway, link form, and link geometry checks on the network subset where the data is populated.
  • Grade-separation, turn-restriction, access-restriction, and vehicle-restriction features that are not available in OS Open Roads or OSM in the same model-ready form.
  • SRN-specific model development or a facility-family feature branch.

Poor uses:

  • Full-network exposure on its own.
  • Local-authority A-roads, B-roads, C-roads, residential streets, or other minor roads.
  • Speed-limit, lighting, gradient, curvature, traffic-volume, collision, or temporal-flow modelling without external joins.
  • Pre-2022 historical network validity for a 2015-2024 model unless source validity dates are independently resolved.
  • Global model features imputed across non-SRN links.

16 Known Limitations

  • Use the availability checks below before depending on optional layers or fields.
  • Validity dates can support source-validity diagnostics, but not annual traffic exposure.
  • Several categorical fields use opaque codes and need source code-list resolution before modelling.
  • Non-spatial related tables may not carry a CRS.
  • Relationship tables are many-to-many in places, so joins need aggregation before merging into a one-row-per-link feature table.
Code
availability_checks = inventory.assign(
    dependency_status=lambda df: df["rows"].gt(0).map(
        {True: "available in this GDB", False: "not populated in this GDB"}
    )
)[["layer", "rows", "geometry", "crs", "dependency_status"]]

display(availability_checks.sort_values(["rows", "layer"], ascending=[True, True]))
layer rows geometry crs dependency_status
10 Speed_Limit 0 None None not populated in this GDB
27 Street_Construction 0 None None not populated in this GDB
26 Street_Interest 0 None None not populated in this GDB
28 Street_Special_Designation 0 None None not populated in this GDB
7 Street_Special_Designation_Lines 0 3D MultiLineString EPSG:3857 not populated in this GDB
6 Street_Special_Designation_Points 0 3D Point EPSG:3857 not populated in this GDB
8 Street_Special_Designation_Polygons 0 3D MultiPolygon EPSG:3857 not populated in this GDB
20 Turn_Restriction_Inclusion 0 None None not populated in this GDB
23 Vehicle_Node_Restriction_Reference 0 None None not populated in this GDB
25 Vehicle_Restriction_Exemption 0 None None not populated in this GDB
24 Vehicle_Restriction_Inclusion 0 None None not populated in this GDB
21 Turn_Restriction_Exemption 1 None None available in this GDB
3 Vehicle_Restriction 18 3D Point EPSG:3857 available in this GDB
22 Vehicle_Restriction_Reference 18 None None available in this GDB
13 Access_Restriction_Exemption 40 None None available in this GDB
12 Access_Restriction_Inclusion 40 None None available in this GDB
2 Access_Restriction 108 3D Point EPSG:3857 available in this GDB
11 Access_Restriction_Reference 246 None None available in this GDB
16 Road 617 None None available in this GDB
14 Junction 762 None None available in this GDB
15 Junction_Reference 6946 None None available in this GDB
5 Street 9274 3D MultiLineString EPSG:3857 available in this GDB
4 Turn_Restriction 37474 3D MultiLineString EPSG:3857 available in this GDB
0 Node 37874 3D Point EPSG:3857 available in this GDB
19 Turn_Restriction_Reference 39101 None None available in this GDB
1 Link 42960 3D MultiLineString EPSG:3857 available in this GDB
17 Road_Reference 49345 None None available in this GDB
18 Street_Reference 86191 None None available in this GDB
9 Lane 104840 None None available in this GDB
Code
schema_fields = []
for layer in fiona.listlayers(gdb_path):
    with fiona.open(gdb_path, layer=layer) as src:
        for field in src.schema["properties"]:
            schema_fields.append({"layer": layer, "field": field})

schema_fields = pd.DataFrame(schema_fields)
field_text = schema_fields["field"].str.lower()
layer_text = schema_fields["layer"].str.lower()

concept_terms = {
    "traffic_or_aadt": ["traffic", "aadt", "flow", "volume"],
    "collision_or_casualty": ["collision", "accident", "casualty", "severity"],
    "speed": ["speed"],
    "temporal_observation": ["hour", "day", "month", "year"],
}

concept_rows = []
for concept, terms in concept_terms.items():
    mask = pd.Series(False, index=schema_fields.index)
    for term in terms:
        mask = mask | field_text.str.contains(term, regex=False) | layer_text.str.contains(term, regex=False)
    matches = schema_fields.loc[mask].copy()
    concept_rows.append(
        {
            "concept": concept,
            "matching_fields": len(matches),
            "matching_layers": ", ".join(sorted(matches["layer"].unique())),
            "example_fields": ", ".join(matches["field"].head(8)),
        }
    )

display(pd.DataFrame(concept_rows))
concept matching_fields matching_layers example_fields
0 traffic_or_aadt 0
1 collision_or_casualty 0
2 speed 17 Speed_Limit speedid, linkid, laneid, start, end_, speedlim...
3 temporal_observation 0

17 QA Checklist

Before using this source in the model:

  • Confirm Link.linkid is unique and complete.
  • Confirm Link.geometry is valid and non-empty.
  • Confirm CRS for geometry layers and transform before distance comparison if needed.
  • Profile categorical domains for link category, form, carriageway, ownership, SRN, and operational state.
  • Check all related-table linkid values resolve to Link.linkid.
  • Report empty layers and exclude them from feature dependencies.
  • Build active-link yearly coverage from startdate and enddate.
  • Overlay links against the project geography and report length/lane-km coverage.
  • Compare link length coverage against OS Open Roads and MRDB where possible.
  • Keep speed, traffic volume, collisions, and weather as external joins.

Open Road Risk

 

Built with Quarto