Senior Data Scientist at Aimsun
Aimsun Live predicts road traffic in real-time by simulating how mobility demand interacts with the infrastructure. Unlike purely data-driven methods, a simulation-based approach allows the system to not only predict traffic under recurrent conditions, but also deal with any changes in the infraestructure, such as incidents or changes to traffic control plans. This approach has three main requirements:
In this article I want to show how good Aimsun Live is at predicting mobility patterns that are then used to simulate road traffic.
As a reminder, in a previous article I focused on the importance of mobility patterns for simulating and predicting mobility and the advantages of applying unsupervised learning – clustering – to mobility data for extracting date-related patterns. We showed such advantages with a real example with traffic data of the city of Wiesbaden. For example, Figure 1, on the left, shows the distribution of patterns over two years: each column is a day of the week, each row is a week, and each color represents a day-mobility pattern. On the right, it shows the %GEH<5 comparing each day with its pattern using the patterns extracted with clustering or other patterns calculated by grouping days according to the day of the week and holidays.
If we want to simulate the demand on a given future day, then the problem is that this procedure requires a prediction of the right pattern for that day. Let’s say, for example, that you want to simulate what will happen over the next hour, like Aimsun Live does. You can predict the mobility pattern associated with a particular mobility demand, but you’ll need to have built a good pattern classifier.
The left-most pattern in Figure 2 shows the ground truth (it is in fact the same as the first figure above). Then, moving across one image to the right, you’ll see the prediction made using logistic regression, then a random forest, and then the pattern matcher that is part of Aimsun Live.
Figure 3 shows the %GEH<5 obtained for each pattern. Each value in the curve represents one day, and the days are ordered from the worst to the best according to the %GEH<5.
A simple model like logistic regression can’t model how patterns are distributed, especially for atypical weeks or days related to holidays.
A random forest is more successful, and it doesn’t do a bad job of predicting the right patterns even on atypical days, but it fails completely on around 10 days; 10 failed days is not very significant over 730 days/2 years, but on these 10 days the pattern chosen is radically different from the right one. In other words, there is room for improvement in terms of modeling how good each pattern is for each day.
The pattern predictor we use in Aimsun Live has slightly lower accuracy in terms of counting the best possible pattern, but its accuracy in terms of %GEH<5 does not suffer severe drops. It does a good job modeling how good each pattern is for each day, and when it fails it tends to choose the 2nd or 3rd best option. This limits the loss of accuracy when the prediction is not optimal. Note that the results are using 5-fold cross-validation, meaning that there is no overfitting.
The predictions showed use calendar features, such as day of the week, holidays, or month, for predicting the pattern. However, for short-term predictions (like one-hour-ahead), mobility data is available right up until the moment of the prediction, which can improve accuracy. This is the case in Aimsun Live and you can see the results in Figure 4, which shows that when Aimsun Live’s predictor uses mobility data it achieves almost optimal accuracy in terms of %GEH<5. Moreover, the predictor is trained with the same number of examples as logistic regression and random forest, meaning that each day represents one example in the supervised learning process, meaning that the approach does not require more resources.
Machine learning offers many ways of tackling the same problem, but its successful deployment ultimately depends on a deep understanding of the problem so that we can choose the best tools and tune the algorithms correctly.