Temporal Traffic Exploration: Seasonality and Day-of-Week Patterns

Overview

This notebook explores the temporal dimensions of traffic flow using WebTRIS sensor data. We analyze how traffic volumes and compositions change across different months and between weekdays and weekends.

The goal is to determine which temporal patterns are link-specific (requiring granular modelling) and which are stable enough to be treated as global constants.

We investigate: 1. Monthly Seasonality: High-season vs. low-season flow amplitudes. 2. Weekday/Weekend Ratios: Commuter-heavy vs. leisure-heavy road signatures. 3. Time-of-Day Intensity: The ratio of daytime to nighttime flow. 4. HGV Mixing: How vehicle composition shifts across these same axes.

import os
import logging
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from road_risk.config import _ROOT as ROOT
from road_risk.model.constants import MONTH_ORDER

# Configure paths
WEBTRIS_RAW = ROOT / "data/raw/webtris"
WEBTRIS_CLEAN = ROOT / "data/processed/webtris/webtris_clean.parquet"
TEMPORAL_PROFILES = ROOT / "data/models/temporal_profiles.parquet"

sns.set_theme(style="whitegrid")

1 Variance Analysis (Weekday/Weekend)

We begin by testing the variance of the weekday/weekend ratio at the individual sensor level. If the ratio is stable across different sites within the same month, it suggests the pattern is universal rather than link-specific.

We start with the strongest test: whether different sensors in the same month give similar weekday/weekend ratios. If they do, the descriptor has no link-specific variation to model.

# Load a larger sample of raw parquets to test variance
files = [f for f in os.listdir(WEBTRIS_RAW) if "2023" in f and f.endswith(".parquet")]
df_list = []
for file in files: 
    df = pd.read_parquet(WEBTRIS_RAW / file, columns=["site_id", "monthname", "adt24hour", "awt24hour"])
    df_list.append(df)

raw_2023 = pd.concat(df_list, ignore_index=True)
raw_2023["adt24hour"] = pd.to_numeric(raw_2023["adt24hour"], errors="coerce")
raw_2023["awt24hour"] = pd.to_numeric(raw_2023["awt24hour"], errors="coerce")
raw_2023["ww_ratio"] = raw_2023["awt24hour"] / raw_2023["adt24hour"].replace(0, np.nan)

# Calculate within-month standard deviation across sites
month_stats = raw_2023.groupby("monthname")["ww_ratio"].agg(["mean", "std", "count"])
print("Per-site Weekday/Weekend Ratio stats by month:")
print(month_stats.round(4))

overall_std = raw_2023.groupby("monthname")["ww_ratio"].std().mean()
print(f"\nAverage within-month standard deviation across sites: {overall_std:.3f}")

Per-site Weekday/Weekend Ratio stats by month:
             mean     std  count
monthname                       
Apr        1.0869  0.0379   3631
Aug        1.0483  0.0289   3579
Dec        1.0760  0.0378   4240
Feb        1.0706  0.0331   4536
Jan        1.0896  0.0325   4550
Jul        1.0597  0.0439   3638
Jun        1.0579  0.0322   3638
Mar        1.0632  0.0279   4529
May        1.0593  0.0277   3631
Nov        1.0667  0.0298   4249
Oct        1.0493  0.0304   4363
Sep        1.0533  0.0335   4316

Average within-month standard deviation across sites: 0.033

1.1 Interpretation

The data shows that the standard deviation across sites within any given month is consistently ~0.03. This low variance suggests that individual roads do not deviate significantly from the network-wide weekday/weekend behavior.

2 Monthly Seasonality Analysis

We now switch to the temporal_profiles.parquet, which aggregates data at the road corridor grain and calculates seasonal indices.

profiles = pd.read_parquet(TEMPORAL_PROFILES)
profiles = profiles[profiles["n_site_months"] > 0].copy()

# Broad classification
profiles['road_type'] = profiles['road_prefix'].apply(
    lambda x: 'Motorway' if str(x).startswith('M') else ('A-Road' if str(x).startswith('A') else 'Other')
)

# Ensure chronological sorting
profiles["monthname"] = pd.Categorical(profiles["monthname"], categories=MONTH_ORDER, ordered=True)

profiles.sample(5)

	road_prefix	monthname	month_num	mean_adt24	mean_awt24	mean_large_pct	n_site_months	seasonal_index	weekday_weekend_ratio	road_type
80482	8996	Nov	11	2423.562500	2727.687500	12.581250	16	1.051685	1.125487	Other
83965	9055	Feb	2	15282.000000	17056.000000	11.633333	3	0.940908	1.116084	Other
61461	8269	Oct	10	25701.333333	27193.333333	19.066667	3	1.074822	1.058051	Other
26050	7003	Nov	11	14156.833333	14991.833333	10.100000	6	1.032264	1.058982	Other
122138	A556	Mar	3	15995.611111	17043.222222	10.394444	18	0.935311	1.065494	A-Road

2.1 Seasonal Index Distribution

The seasonal index is the monthly flow divided by the annual average. We want to see if different road types have different seasonal “signatures.”

fig, ax = plt.subplots(figsize=(12, 6))

sns.lineplot(
    data=profiles, 
    x="monthname", 
    y="seasonal_index", 
    hue="road_type", 
    marker="o", 
    errorbar=("sd", 1),
    ax=ax
)

ax.axhline(1.0, color="black", linestyle="--", alpha=0.5)
ax.set_title("Seasonal Traffic Index by Road Type (Mean ± 1 SD)")
ax.set_ylabel("Seasonal Index (1.0 = Annual Average)")
plt.show()

print("Mean Seasonal Index by Month and Road Type:")
print(profiles.groupby(["monthname", "road_type"], observed=True)["seasonal_index"].mean().unstack().round(3))

Mean Seasonal Index by Month and Road Type:
road_type  A-Road  Motorway  Other
monthname                         
Jan         0.833     0.835  0.837
Feb         0.910     0.923  0.913
Mar         0.947     0.941  0.948
Apr         0.972     0.965  0.985
May         0.995     0.995  1.013
Jun         1.051     1.050  1.060
Jul         1.069     1.074  1.070
Aug         1.076     1.094  1.084
Sep         1.084     1.079  1.074
Oct         1.069     1.073  1.058
Nov         1.044     1.034  1.031
Dec         0.950     0.935  0.931

2.2 Interpretation

Amplitude: The vertical distance between the summer peak and winter trough shows a significant ~30% seasonal swing.
Convergence: The lines for Motorways and A-Roads overlap closely, with seasonal indices sitting within ±0.02 of each other in every month. This means seasonality is a global phenomenon rather than a link-specific one.

3 Weekday vs. Weekend Profiles

We investigate the stability of the weekday/weekend ratio across the network.

The Variance Analysis section used raw 2023 chunks at site grain. This section uses the corridor-level aggregations from temporal_profiles.parquet, confirming the same result holds at corridor grain.

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Distribution across all sites/months
sns.histplot(profiles["weekday_weekend_ratio"].dropna(), bins=40, kde=True, color="seagreen", ax=ax1)
ax1.axvline(profiles["weekday_weekend_ratio"].median(), color="red", linestyle="--")
ax1.set_title("Distribution of Weekday/Weekend Ratios")
ax1.set_xlabel("Ratio (AWT / ADT)")

# Stability by month
sns.boxplot(data=profiles, x="monthname", y="weekday_weekend_ratio", color="lightblue", ax=ax2)
ax2.axhline(1.0, color="black", linestyle="--", alpha=0.3)
ax2.set_title("Ratio Stability Across Months")
ax2.set_ylabel("Weekday/Weekend Ratio")

plt.tight_layout()
plt.show()

# Mean check by road type
print("Mean Weekday/Weekend Ratio by Road Type:")
print(profiles.groupby("road_type")["weekday_weekend_ratio"].mean().round(4))

Mean Weekday/Weekend Ratio by Road Type:
road_type
A-Road      1.0779
Motorway    1.0712
Other       1.0849
Name: weekday_weekend_ratio, dtype: float64

3.1 Interpretation

Mean ratio is 1.08; standard deviation across sites within any month is ~0.03; mean is essentially identical across road types (motorway 1.073, A-road 1.076, other 1.085). The descriptor has no link-specific variation to model.

4 Time-of-Day Intensity (Day vs. Night)

Using the webtris_clean.parquet dataset, we examine the core_overnight_ratio. This measures the daytime flow per hour relative to the nighttime flow per hour.

webtris_clean = pd.read_parquet(WEBTRIS_CLEAN)
site_tod = webtris_clean.groupby("site_id")["core_overnight_ratio"].mean().dropna()

fig, ax = plt.subplots(figsize=(10, 5))
sns.histplot(site_tod, bins=50, kde=True, color="purple", ax=ax)
ax.set_title("Daytime-to-Nighttime Intensity Ratio")
ax.set_xlabel("Ratio (Mean Day Flow / Mean Night Flow)")
plt.show()

print(f"5th Percentile: {site_tod.quantile(0.05):.2f}")
print(f"95th Percentile: {site_tod.quantile(0.95):.2f}")

5th Percentile: 4.13
95th Percentile: 14.91

4.1 Interpretation

Unlike the weekday/weekend ratio, this distribution is wide, spanning from 4.2 to 15.2. This indicates that time-of-day behavior is highly link-specific. Some roads carry substantial overnight flows (e.g., freight corridors), while others are almost exclusively used during the day.

5 Heavy Goods Vehicle (HGV) Composition

Finally, we look at how the large vehicle percentage fluctuates across the network.

fig, ax = plt.subplots(figsize=(12, 6))

sns.boxplot(data=profiles, x="monthname", y="mean_large_pct", hue="road_type", ax=ax)
ax.set_title("Large Vehicle Percentage by Month and Road Type")
ax.set_ylabel("HGV %")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

print("Mean Large Vehicle Percentage by Month and Road Type:")
print(profiles.groupby(["monthname", "road_type"], observed=True)["mean_large_pct"].mean().unstack().round(2))

Mean Large Vehicle Percentage by Month and Road Type:
road_type  A-Road  Motorway  Other
monthname                         
Jan         17.51     17.10  14.41
Feb         17.26     16.83  14.22
Mar         17.05     16.76  13.98
Apr         15.63     15.34  12.97
May         15.55     15.12  12.65
Jun         15.49     15.08  12.72
Jul         15.16     15.03  12.50
Aug         14.37     14.26  11.88
Sep         14.98     14.86  12.57
Oct         15.45     15.34  12.69
Nov         15.69     15.69  13.14
Dec         13.89     13.86  11.55

5.1 Interpretation

HGV percentage shows a ~3pp summer dip across all road types. This is largely a denominator artefact — total traffic rises ~30% in summer while HGV volumes stay roughly flat — not a freight-side seasonal pattern. The within-road-class spread visible in the boxplot is genuine variation in road character, but a cleaner descriptor would be HGV volume (vehicles/day) rather than HGV percentage. Per-site within-month std check still warranted before drawing a verdict; if HGV is eventually taken to ablation, volume is the better feature. Worth a per-site within-month std analysis matching the Variance Analysis section before drawing a final verdict.

6 Data Depth Check

We verify the amount of data backing these observations to ensure statistical confidence.

# Site counts per corridor
site_counts = profiles.groupby("road_prefix")["n_site_months"].sum() / 12
print(f"Total Unique Corridors: {len(site_counts)}")

fig, ax = plt.subplots(figsize=(10, 4))
sns.histplot(site_counts, bins=30, log_scale=(True, False), ax=ax)
ax.set_title("Distribution of Sites per Corridor (Log Scale)")
ax.set_xlabel("Approximate Sites")
plt.show()

Total Unique Corridors: 879

7 Summary of Potential Modelling Implications

Based on the variance observed in the plots above:

Dimension	Observed Variance	Modelling Strategy
Monthly Seasonality	High amplitude, low inter-link variance.	Handle via global monthly multipliers.
Weekday vs. Weekend	Low variance across network (~1.08).	Treat as a global constant.
Time-of-Day	High inter-link variance (4.2 - 15.2).	Treat as link-specific (Priority for ablation).
HGV Mix	Global seasonal swing (~3pp) and substantial within-class spread.	Status: not formally tested at site grain. May carry signal beyond road class.