Mobility patterns are a cornerstone of mobility demand modelling. The idea is to estimate patterns from mobility observations, such as vehicle counts, and use them to build different demand models that explain or emulate people’s mobility needs. In other words, to understand when, why and how people want to go from A to B, we need to start gathering mobility observations and find patterns.
Back in the days when road traffic data was scarce, mobility patterns were inferred from “field knowledge”, that is, the intuition of the traffic engineer plus the variables that influenced traffic according to the engineer’s experience. In other words, there were a set of variables that described the type of day. For example, the day of the week, the weather, or if it was a holiday. Then, a typical day for each type of day was sampled (usually by manually counting vehicles) and there they had the road traffic patterns for the corresponding city.
The advent of loop detectors and other sensors made it possible to automate road data gathering and provide data for each type of day in order to calculate a pattern that better described all days of the same type. So, if the explanatory variables were weekdays (Monday-Friday) and weekends/holidays, one could take all data for all weekends/holidays, calculate the average or the median or any other reduction metric, and have the pattern for weekends/holidays. The same could be done for all weekdays that were not holidays to get the pattern for workdays.
Basically, this procedure implicitly performs clustering by grouping days according to a set of features, such as day or the week, that according to the engineer’s experience, minimizes both the number of patterns and the traffic flow variance in each group. Then the pattern, or the exemplar, of each group is its centroid according to Euclidean distance if it is calculated using the mean of all of them. This can be generalized to any other method for calculating or choosing an exemplar for each group of days, so we could say that each type of day is a cluster of days, and each pattern is the exemplar that represents each cluster.
Obviously, this methodology can be systemized and automated with a top-down or bottom-up splitting/merging of the types of days. For example, we could start by taking all 14 different types of days (7 days of the week being a holiday or not a holiday, makes 14 types of day) calculate the pattern of each group, merge the most similar pair, and iterate these two steps until we meet a stopping criterion. Alternatively, we could start with a single group with all the days, calculate each pattern and select the sub-group whose pattern differs most from the parent pattern. The first methodology is more efficient, but both should reach similar results with limited explanatory variables, as in this example.
However, is this the right way to define mobility patterns? Clearly not: there is no point in using a set of predefined explanatory variables for grouping days if want to maximize the homogeneity of a set of variables, such as flow, that we already have. In such a case, the right way to extract daily patterns is to cluster days according to the metrics whose homogeneity we want to maximize. For example, if we want to minimize the distance between days according to their daily flow profile, we can use the Euclidean distance of the flow between days. Up for discussion are points like the right distance metric, or which variables to use to calculate the distance, or even how we define patterns or clusters, because these depend on the purpose the patterns.
To illustrate, let’s use the dataset described used in the previous article and sample data from 2018 and 2019, which happen to be the most homogenous sets. We could take the whole dataset, but this would mean that we would have to add year-explanatory variables if we do not use clustering. I regret to say that I cannot share the dataset and I must keep the name of the city anonymous, but you can replicate the following exercise with any dataset that includes loop detector data.
We’ll start by taking the whole dataset and try to group days according to [day-of-the-week and not holidays] and [holidays]. The 2-year dataset has a total of 22 holidays, so there should be enough to have a good pattern if holidays are similar. With this procedure we get 8 groups. For each group we can take the average, or if we think that there may be outliers, we can take the median. Once we have the 8 patterns, we can calculate the distance between them. Since we are doing patterns according to vehicle flow, I will use the GEH<5 as the similarity metric since it is widely used, and it is a relative metric that give values from 0 (lowest similarity) to 100 (highest similarity) so it’s easy for everyone to understand.
The image below shows the GEH<5 for each pair of the 8 patterns. On the left, there are the patterns calculated using the average, and on the right, patterns calculated using the median. According to that, weekdays (Monday-Friday) are very similar, Saturdays and Sundays are different, and holidays are like Sundays. Thus, we could merge weekdays and Sunday-Holidays and have 3 patterns that explain the mobility in the city’s dataset if we prefer to have fewer than 8 patterns.