Machine Learning for Traffic Forecasting: A Practical Approach Deployed in the Future Road Lab of the C-32 motorway in Spain

3rd June, 2025

Ferran Torrent-Fontbona
Data Science Lead

Núria Toribio
Data Scientist 

Jordi Casas, Head of R&D at Aimsun

Jordi Casas
Global Head of R&D

Matthew Juckes
Managing Director, North America

The availability of large-scale traffic data, such as historical records, real-time sensor feeds, and GPS traces, has transformed traffic forecasting. Machine learning (ML) techniques, especially Graph Neural Networks (GNN) [1] for spatial dependencies and Recurrent Neural Networks (RNN) [2, 6, 7] or attention mechanisms [3] for temporal dependencies, have gained attraction in recent years. Despite the noteworthy developments, much of the literature overlooks the issue of data drift [4, 5]—the degradation in model performance due to shifts in data distribution. Moreover, existing models often fail to address the combined spatial-temporal dependencies and exogenous factors, such as events or holidays, required for robust traffic forecasting.

We propose a novel traffic forecasting approach that integrates multiple ML techniques with an online learning algorithm to continually update the model as new data is collected. This approach is implemented and deployed in the Monitoring and Forecasting Module (M&FM) of Abertis’ Future Road Lab (FRL). The FRL is a testbed for new tools and technologies for safer and more sustainable traffic management established on a stretch of about 50km along the C-32 motorway south of Barcelona, Spain. The M&FM delivers real-time traffic forecasting and incident detection to the Management Module, through which the operator disseminates alerts and deploys mitigation measures.

Figure 1 illustrates the architecture of the proposed approach. The spatial dependencies are captured using graph clustering, allowing the model to focus on local traffic patterns while ensuring scalability. Long-term temporal dependencies, such as day type or holiday effects, are modeled through a combination of clustered patterns and a random forest that predicts the likelihood of each day type based on exogenous factors, such as the day of the week or holidays. These spatial and long-term temporal models capture most of the system’s nonlinearities. The final prediction for the desired traffic time series is made by a short-term temporal model, which is typically a linear regression model. However, when time-dependent exogenous variables, such as demand-affecting events, are present, a more complex model like a multi-layer perceptron or RNN is used due to the non-linear nature of these factors.

The proposed method features the following key components:

  • Traffic data represented as time series localized in a graph where nodes represent network sections and edges represent allowed turns.
  • Spatial dependencies modeled using graph clustering and linear regression.
  • Short-term temporal dependencies modeled using linear regression or RNN.
  • Long-term temporal dependencies captured by identifying day patterns through clustering and predicting their probabilities using random forests.
Figure 1. Representation of the proposed approach

Tackling Data Drift

Our model addresses the challenge of data drift through continuous monitoring of model accuracy and updating specific sub-models as new data becomes available. Data drift is detected by tracking the fitness of predictions. The online learning system ensures that sub-models can be selectively updated without requiring a complete retraining of the model, making updates efficient and adaptive.
For instance, if new day patterns are detected, these are included in the pattern base and the system automatically updates the random forest to adjust predictions based on these shifts. Similarly, obsolete patterns are removed to avoid overfitting and keep the system efficient. The modular nature of our approach allows specialized updates for each model component, ensuring that the system adapts quickly to new traffic conditions with minimal data requirements—usually as little as two weeks of data post-drift.

Performance Evaluation

Traffic Forecasting Challenge from TRBAI data is from January 1st, 2020, until June 16th, 2020, in the Greater Seattle Area: before and during the first lockdown for covid-19. It contains the Traffic Performance Score (TPS) [8], a metric for measuring the impact of covid-19 on urban mobility. The TPS is calculated with the following formula:
(1)
where is the traffic speed in section i and time t, is the traffic flow, is the length of the section and is the free-flow speed of the section.

The models are trained with data from January 1st until May 31st and tested during the first fifteen days of July, forecasting one different hour for each day, from 5 am until 7 pm as specified by the challenge.

Table 1 summarizes and compares the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) achieved by the presented model, the LSTM encoder-decoder and the transformer. Figure 2 shows the observed and predicted TPS (average TPS amongst all sensors). The results indicate that our approach outperforms the LSTM and the transformer in terms of RMSE values, while in terms of MAPE, the transformer is the best model. The reason for the discrepancy is shown in Figure 2 where the transformer prediction is highly tight to the observed timeseries except in the period of congestion, where the TPS is reduced about 15%. Therefore, the transformer completely fails at predicting the recurrent congestion. On the other hand, the LSTM and especially our approach achieve an accurate prediction during the congestion period. However, the proposed model slightly underestimates the TPS for the rest of the time. The reason for this is the selection of a single linear layer as output layer. This could be improved by adding an activation function limiting the output between zero and one values.

Figure 3 and Figure 4 show the performance of the proposed model with two datasets of two different cities, Perth and Oxfordshire, in severe data shift conditions. The results demonstrate the robustness of our method in maintaining high prediction accuracy even after significant data shifts, such as those caused by the COVID-19 lockdowns. Both figures show how accuracy, in percentage of GEH lower than five (%GEH<5), degrades throughout time if online learning is not activated (blue line), and how it is maintained when online learning is activated (red line). Therefore, the proposed approach is not only able to achieve state-of-art accuracy in prepared train-and-validation datasets but is also able to update itself throughout time in real case studies.

Ours

LSTM

Transformer

MAPE

3.937

4.031

3.131

RMSE

0.050

0.062

0.055

Table 1. Comparison between our approach, a LSTM encoder-decoder and a Transformer usign TRBAI Open Data Challenge.
Figure 2. Observed TPS and predicted TPS by our approach, an LSTM encoder-decoder and a transformer
Figure 3. Oxfordshire daily average (dots) and standard deviation (vertical dashed lines) of the %GEH<5 with online learning (red) and without (blue) online learning for predictions of the proposed approach with a 15-minute and 60-minute horizon.
Figure 4. Perth daily average (dots) and standard deviation (vertical dashed lines) of the %GEH<5 with online learning (red) and without (blue) online learning for NMF predictions with a 15-minute and 60-minute horizon.

Deployment in real-time traffic prediction software

Our model has been integrated into the Future Road Lab of the C-32, a software solution that provides real-time traffic forecasting, situational awareness, and anomaly detection for traffic management institutions. Figure 5 shows a screenshot of Aimsun Predict in action, deployed on the C32 highway in Barcelona. The software offers predictive traffic flow, congestion alerts, and speed recommendations, providing actionable insights for traffic operators.

Using the proposed model, transportation agencies can benefit from real-time, accurate traffic forecasts and improved decision-making capabilities, addressing congestion and optimizing traffic flow across entire networks.

Figure 5. Screenshot of Future Road Lab of the C-32 UI for the C32 highway in south Barcelona. Section of the network colored according to the measured or estimated flow, comparison of the predicted and real traffic flow, warning signals reporting possible accidents and blue squares with the recommended speed due to congestions along the highway in direction north.

Geline Canayon presented this paper at the Transportation Research Board 2025 Mid-year Meeting.

References

  1. Gori, M., Monfardini, G., Scarselli, F. A new model for learning in graph domains. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2005, 249-256.
  2. Amari, S. Learning patterns and pattern sequences by self-organizing nets of thresholds elementrs. IEEE Transactions C, 1972, (21) 1197-1206.
  3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I. Attention is all you need, 2017, https://arxiv.org/abs/1706.03762
  4. Lu, N., Zhang, G., Lu, J. Concept drift detection via competence models. Artificial Intelligence, 2014, 11-28, 209, https://doi.org/10.1016/j.artint.2014.01.001
  5. Lu, N., Lu, J., Zhang, G., Lopez de Mantaras, R. A concept drift-tolerant case-base editing technique. Artificial Intelligence, 2016, 108-133, 209, https://doi.org/10.1016/j.artint.2015.09.009
  6. Cui, Z., Ke, R., Pu, Z., & Wang, Y. Stacked bidirectional and unidirectional LSTM recurrent neural network for forecasting network-wide traffic state with missing values. Transportation Research Part C: Emerging Technologies, 2020. 118, 102674. https://doi.org/10.1016/j.trc.2020.102674
  7. Tian, Y., Zhang, K., Li, J., Lin, X., & Yang, B. LSTM-based traffic flow prediction with missing data. Neurocomputing, 2018. 318, 297–305. https://doi.org/10.1016/j.neucom.2018.08.067
  8. Cui, Z., Zhu, M., Wang, S., Wang, P., Zhou, Y., Cao, Q., Kopca, C., Wang, Y. Traffic Performance Score for Measuring the Impact of COVID-19 on Urban Mobility. https://doi.org/10.48550/arXiv.2007.00648
  • Contact us

SHARE

Cite Aimsun Next

Aimsun Next 26

Aimsun (2026). Aimsun Next 26 User’s Manual, Aimsun Next Version 26.0.0, Barcelona, Spain. Accessed on: December. 3, 2025. [Online].

Available: https://docs.aimsun.com/next/26.0.0/

Aimsun Next 26

@manual {AimsunManual,
title = {Aimsun Next 26 User’s Manual},
author = {Aimsun},
edition = {Aimsun Next 26.0.0},
address = {Barcelona, Spain},
year = {2026. [Online]},
month = {Accessed on: Month, Day, Year},
url = {https://docs.aimsun.com/next/26.0.0},
}​​​​​​​​​​​​​​​

Aimsun Next 26

TY – COMP
T1 – Aimsun Next 26 User’s Manual
A1 – Aimsun
ET – Aimsun Next Version 26.0.0
Y1 – 2026
Y2 – Accessed on: Month, Day, Year
CY – Barcelona, Spain
PB – Aimsun
UR – [In software]. Available:
https://docs.aimsun.com/next/26.0.0/