Metric selection for incident detection systems

Published on July 15, 2020

Ferran Torrent

Senior Data Scientist at Aimsun

Classification problems with highly unbalanced datasets, such as incident detection, pose the trade-off between true positive rate and false negative rate. A key point for choosing the final trade-off is the selection of the metric to evaluate the performance of the system, and this decision must be made by putting oneself in the skin of the user and asking if the chosen metric and the corresponding results are representative of the concept of usefulness.

For example, the following two tables are examples of confusion matrices of an imbalanced dataset with 9x more negative examples than positive examples, for example, no-incident vs incident. In such a situation, accuracy is not a good metric, not even balanced accuracy. Precision might be good if you must detect as many positives as possible, even at the risk of getting lots of false positives. On the other hand, recall is more oriented to answering whether we can trust predicted positives. The F1-score is a trade-off between precision and recall. In incident detection, not only is it important to detect incidents, but also to avoid overwhelming the traffic operator with false incident detections. Therefore, the F1-score is the best choice from among these four metrics.

Aimsun Insight

Aimsun Predict

Aimsun Start

Aimsun Plus

Aimsun Live

Research projects

Research papers

Aimsun Ride
Research Program

Innovation blog

Get Aimsun Next

Use Aimsun Next

Aimsun doc hub

About Aimsun Next

About Aimsun

Newsroom

Jobs

Events

Metric selection for incident detection systems

Aimsun Management System is certified to ISO 9001:2015 by Bureau Veritas.

Got a question? Get in touch.

We are here to help!

Cite Aimsun Next