Classifier Accuracy



Hello everybody,

I’m doing approach with the CPA Classifier. The approach is
that I stain PBMCs with T-cell, B-cell, and monocyte. Imaging the cells with
four colors (Hoechst, and colors for the markers of the three mentioned
populations). Then I run a CellProfiller analysis to measure maximum possible number
of features for the objects. After that I go to the classifier to classify me
cells according to staining into four population; T-cell, B-cell, Monocyte, and
Negative (for the rest of unstained PBMCs population. After I’ve trained the
software, I ask the classifier to classify my cells according to nuclear
feature into the four mentioned classes.

My Problem is that the accuracy of the classifier is not enough
(around 60%). I tried to increase the rule and or the examples, but in vain.

Does any one has a clue or an suggestion to improve the
accuracy of classification



Can you upload your pipeline and your properties file? Thanks.


You’re not actually measuring anything in your cells, just in your nuclei- depending on if your markers are nuclear or cytoplasmic, the nuclear values might not be capturing all the information. Unless there’s a good reason why you can’t, I’d add “Cells” to all of the measurement modules you’re using.

Other than that, play with different classifier types and keep an eye on your confusion matrices- it may be that given the stains you have in the cells you have, you may only be able to correctly classify certain types (ie Negative is generally sorted correctly but the other 3 are usually confused), and you may need to adjust your future experiments accordingly.


Hi Ahmed,

I do not know what did you mean by ‘’ I’d add “Cells” to all of the measurement modules you’re using.’’. I have only Hoechst staining and CD markers staining for phenotype detection.

You have a module in your pipeline that takes your “Nuclei” objects and expands them out a bit to create objects you called “Cells”, but in your downstream measurement modules you are only measuring the shape, size, and intensity in your CD channels of the “Nuclei” but not the “Cells”- my suggestion would be to measure BOTH the shape, size, intensity of “Nuclei” and “Cells”, as it may give the classifier enough extra information to be more accurate at categorizing the PBMCs into subtypes.

You can also consider adding additional types of measurements, such as the “MeasureCorrelation” and “MeasureGranularity” modules.

I’d also double check your segmentation results- if the segmentation isn’t great it might explain a bad classification. You may want to consider adding a SaveImages module to your pipeline to save the outlines of your segmented Nuclei; if you enable the “Record path information” box you’ll be able to see the outlines in Analyst while you’re classifying the cells, which’ll give you a hint as to whether there are segmentation issues or not.

Even if you do all of this though, if your segmentation isn’t good enough or if the differences just aren’t clear enough there’s no guarantee that you will be able to train a classifier that gets very high accuracy on all of your 4 subpopulations. That’s why I mentioned in my last post that you should attempt to figure out if there are certain classes that are being identified particularly poorly and then trying to think about if there’s a marker or something you could add to your experiment to help enhance your ability to detect them next time.


Hi Bcimini

Not sure if I got all your suggetion because I started to use CellProfiller just reacently. But I will try to work on them. Thanks a million for the nice support.



Thank you for your great work. :slight_smile:
I am a new user and it took me a few days to get around working with CP. However I am having problem with the classifier in CPA. My samples are taken from human tissue, neuronendocrine tumors. So they come in divers shape, size, and placement of cells. I am looking at the Ki-67 staining. in the first step i do cell segmentation in the CP which is not perfect, but acceptable, considering the complexity of the case.

The problem start when in the CPA try to exclude the non-tumor cells.the machine learning seems to be having a very difficult time to distinguish the tumor cells from the non tumor cells.
It is a fairly hard task for the pathologyst as well. I think the problem is that machine learning algorithm only looks at the cherecteristics of the cells that i introduce as sample in tumor and non tumor bins, while it might be easier if it looks at the neighbouring cells. With more training, i make the system more confused and the accuracy not only doesn’t increase but also goes down :confused:

Do you have any suggestion for impeoving my results?

OnSampleKi_67.cpproj (133.4 KB) (7.2 KB)



In case you’re still working on this- you have only very few measurements created in your pipeline, which means CPA will do a poor job in difficult cases because it’s unlikely that any one of those on its own will be a good “rule” distinguish tumor from non tumor cells.

If you increase the number of measurments CPA has to play with by adding many more measurement modules to your pipeline, you’ll increase the likelihood CPA can find a measurement or combo of them that will work.

Good luck!


i am working on it. I reached the same conclusion and tried to add more measurement. But i don’t have any scesific plan for my measurement. basically i added all the measurement that i thought can be associated with my samples. But still not much improvement with the Analyst :frowning:(
I think one of the problem can be that i can not see what is the example of the ambiguouse case for the analyst. (something like the uncertainty feature of the ilastik)
if i could see the problematic case, then i would have tried to clarify it. But right now, i add more training which not only doesn’t help but also makes the analyst more confused.
i attached my pipeline, do you have any suggestion to imrove it further?
pipeline.cppipe (18.5 KB)