Open Road Risk
  • Home
  • Project
    • Project overview
    • Current model status
    • AI-assisted development
  • Background
    • Metrics and methodology
    • Literature evidence register
  • Literature
    • Literature overview
    • Literature-pipeline alignment
    • Crash frequency models
    • Exposure and traffic volume
    • Spatial methods and network risk
    • Junctions and conflict structure
    • Severity modelling
    • Validation and metrics
    • Transferability and open data limits
  • Data Sources
    • Overview
    • STATS19 Collisions
    • OS Open Roads
    • AADF Traffic Counts
    • WebTRIS Sensors
    • Network Model GDB
  • Methodology
    • Methodology Overview
    • Joining the Datasets
    • Feature Engineering
    • Empirical Bayes Shrinkage
  • Exploratory Data Analysis
    • Collision EDA
    • Collision-Exposure Behaviour
    • Vehicle Mix Analysis
    • Road Curvature
    • Months and Days of Week
    • Traffic Volume EDA
    • OSM Coverage
  • Models
    • Modelling Approach
    • Stage 1a: Traffic Volume
    • Stage 1b: Time-Zone Profiles
    • Stage 2: Collision Risk Model
    • Facility Family Split
    • Model Inventory
  • Outputs
    • Top-risk map
  • Tools
    • ukgeo — UK Geocoder
  • Future Work

On this page

  • What is ukgeo?
  • Why it matters for road safety data
  • Installation
  • Basic usage
  • How it works
  • Performance
  • Data sources
  • Links

ukgeo — UK Location Geocoder

Resolving UK road references, place names, and infrastructure to coordinates

What is ukgeo?

ukgeo is an open-source Python geocoder built specifically for UK location text. It resolves messy free-text inputs — road references, junction names, place names, colloquial names — to latitude/longitude coordinates without API calls or rate limits.

It was developed alongside Open Road Risk to handle the kinds of location strings that appear in road safety data: motorway junction references, A-road descriptions, named interchanges, and administrative place names.

Why it matters for road safety data

Standard geocoders struggle with road safety inputs:

  • "M62 Junction 26" — a junction reference, not a postal address
  • "A647 near Bradford" — a road with place context
  • "Spaghetti Junction Birmingham" — a colloquial name
  • "Lofthouse Interchange" — a named motorway feature

ukgeo handles all of these using a tiered pipeline backed by OS Open Names, OS Open Roads, and OpenStreetMap data.

Installation

pip install ukgeo
ukgeo setup    # downloads ~50MB reference dataset from Kaggle

Basic usage

from ukgeo import Geocoder

geo = Geocoder()

# Single query
result = geo.geocode("M62 Junction 26")
print(result.lat, result.lon)     # 53.7362, -1.7266
print(result.confidence)          # High
print(result.level_resolved)      # 2

# Batch geocoding
import polars as pl
locations = pl.Series(["LS1 1BA", "A647 near Bradford", "Skipton, North Yorkshire"])
results = geo.geocode_batch(locations)

How it works

ukgeo uses a four-level pipeline, escalating complexity only when needed:

Level Method Speed Examples handled
0 Infrastructure alias lookup <1ms Dartford Crossing, Spaghetti Junction
1 Regex + postcodes.io ~100ms LS1 1BA, M62 J26
2 OS Names token scoring ~5ms Skipton North Yorkshire, A647 Bradford
3 OS Names API fallback ~200ms Bus stations, airports, services

Performance

Benchmarked against 5,000 STATS19 2024 collision records:

Input type Median error Notes
Postcode <100m postcodes.io centroid
Motorway junction <10m OS Open Roads point geometry
B-road 1,927m OSM segment matching
A-road 4,490m Road centroid — structurally limited

Road-only inputs (no junction or place context) resolve to the road’s OS centroid. For STATS19 records this rarely matters since GPS coordinates are already present — ukgeo is most useful for derived datasets and reports that have road references but no coordinates.

Data sources

Dataset Content Licence
OS Open Names Places, roads, postcodes OGL v3
OS Open Roads Motorway junction points OGL v3
OpenStreetMap Named junctions, roundabouts, B-roads ODbL

Pre-built data available on Kaggle.

Links

  • GitHub repository
  • PyPI package
  • Kaggle dataset
  • Known limitations and ecosystem overview

Open Road Risk

 

Built with Quarto