'Check Progress' in Classifier


I was trying out the CPA’s classifier function and I trained it for approx 800 objects. I then clicked ‘Check Progress’ out of curiosity and I got the line graph with ‘50% cross-validation accuracy’ and ‘95% cross-validation accuracy’ and also ‘accuracy of random classifier’. Can someone explain what these mean to me? Soeey if it’s a common sense. I’m really bad with statistics.


Hi Amos,

In short, the Check Progress button plots the cross-validation accuracy for the training set as an increasing number of rules are used; values closer to 1 indicate better performance.

Two features of the plot are useful for guiding further classification:

  • First, if the accuracy increases (that is, slopes upward) at larger numbers of rules, adding more rules is likely to help improve the classifier (if the line slopes downward, this may indicate more training examples are needed).
  • Second, accuracy is displayed for two versions of cross-validation, with 50% or 95% of the examples used for training and the remainder for testing. If the two accuracies are essentially the same, adding more cells to the training set is unlikely to improve performance.

Note that the accuracy in these plots should not be interpreted as the accuracy for the overall experiment. These plots tend to be pessimistic, as the training set often includes a large number of difficult-to-classify examples. The most accurate way to judge accuracy is by requesting a large number of cells of a certain class and counting mistakes.

Hope this helps!