I occasionally like to look at the ongoing Kaggle competitions to see what kind of data problems people are interested in (and the discussion boards are a good place to find out what techniques are popular.) Each competition includes a way of scoring the submissions, based on the type of problem. An interesting one that I’ve seen for a number of classification problems is the area under the Receiver Operating Characteristic (ROC) curve, sometimes shortened to the ROC score or AUC (Area Under the Curve). In this post, I want to discuss some interesting properties of this scoring system, and its relation to another similar measure – precision/recall.
View original post 2,354 more words