I am using the classifier and am very pleased with the results. I am able to separate between 5-6 classes of cells with a very good certainty level.
However, I wonder if there is a way to train the classifier better, to reduce possible false positives? Is it feasible to train the classifier to ignore/dispose ‘borderline’ objects, those that have similar scores in two (or more) different classes?
The built-in way to reduce false positives is to remove anything from the training set that resembles a borderline case. Then fetch positives and actively remove false positives if they are more important to you than false negatives.
We don’t generate a likelihood score for each object, so that you could exclude those that fall into the “DMZ” area, near the decision boundary. However I could
imagine that you could do this calculation on your own and filter as you see fit. (Note: I am not a machine learning expert, so take this advice with that caveat! Comments welcome!) The decision of whether an object falls into one class or the other is a simple summation of the rules, i.e.
“IF greater/less than , then add [Greater than value, Lesser than value]” summed for all the features. For a 2-bin classifier, if the value is > 0, then the object is positive. But if you chose a value greater than 0 then you could be more conservative in your classification (exactly how much so, you’d have to experiment with, and no, I don’t know the behavior of this function). It is a simple MySQL query to get these values which I believe you can read right from the Classifier command window. So feel free to calculate your own classifier function!
[quote]And in a relative note - Is there a way to see nad export the per-class score for each object?
Indeed I was able to use the rules (and the score values associated with each rule) generated by the ‘train classifier’ and apply them using MATLAB on the object table. Thus I manually generated this “object score table” per-class, and can now exclude the objects falling into the “DMZ”.
When trying to classify the objects into ~5 classes, most of the times there aren’t always a clear cut-offs, which leads into a large number of objects found in the DMZ. So sometimes, this data is crucial for correct classification. I don’t know to get these values directly from the MySQL DB, but I guess it would be faster to get it directly from CPA, rather than going through MATLAB…