Same classifier behaves differently on same image depending on ROI

Hi, I am confused about how the computation works, don’t know what I am doing wrong.
I have 2 images with tumor, manually classified all the cells (tumor, stroma, fat, desmoplasia). Created 4 .ome images with “pure” classes. Trained a classifier based on those 4 images.

Separately created a superpixel classifier (lot of thanks to MicroscopyRA and Sra McArdle for your explanations) that creates a ROI of the more cellular area, (presumably where tumor is), the reason is to get rid of low reward tissue like fat or normal stroma when I deal with hundreds of slides.

Then Duplicated the initial 2 images, in one did cell detection on the whole image, in the other only in the “dense” area.

When I apply the classifier on the whole image, the classification is pretty accurate, but when I do it on the “dense” ROI, its completely off. It Classifies tumor as fat for example. Because there is no fat in the “dense ROI” image, it sounds to me the classifier is “stretching” the values, instead of using absolute values.

(Fat=blue, stroma=beige, desmoplasia=lime, tumor=magenta)

Since its the same image, with the same classifier, I thought the same cells would be classified in the same class regardless of the ROI. What am I doing wrong?
Thank you all for your valuable time.

Is there any chance that, when you duplicated the images, you did not perform the same background correction/Estimate color vectors?

That would impact the Measurements in all of your cell detections, which are what object based classifiers are based off of. Though I am not sure whether you are using a pixel classifier or an object classifier to classify your cells.

*Any color correction needs to be done before the cells are created, as that is when the measurements are generated.


… that was it !
Just to underscore … again … how important it is to have the same color correction before starting any analysis. Which I HAVE read many times in the tutorials. Sorry !! :slightly_smiling_face:

Would results be more accurate if I do the “estimate stain vectors” for every slide or just copying the color values be enough? Copying values can be scripted, the other I think can only be done manually?
I am using an object classifier.

Thanks again !


This is one of those areas where “Sample prep” and “Image analysis” strongly overlap. Ideally, your samples should be prepared, and images taken, in such a way that applying that one set of color vectors/background always gives you accurate enough results. However, science is messy and this is not always the case.

In some cases you may need to manually set the color deconvolution vectors per image. This will ONLY affect the cell segmentation and measurements, as the pixel classifier/thresholders carry their own color vectors in them.

It might be nice to have an option in the pixel classifier hidden under “Advanced options” that uses the color decon values for the current image rather than those from the classifier (if it doesn’t take much time/gets high enough priority @petebankhead @melvingelbard), but for now, if you have pixel classifier issues, you might need to train it per image.

You can also have mixed options with “a little bit of coding” where you keep your stain vectors, but adjust only the background. For example, take a tissue annotation, expand it 50 microns, and use the median values in that area as your background “per image.”

Maybe others will have more informed opinions or better ideas on the topic…

There is a script somewhere on the forum that can automate the stain vector estimation but… how are you going to tell it which pixels to use? The whole image might… take a while.

Thanks RA, good to know.

Regarding how the program analyses the data, I wonder how having fewer or more classes affect the analysis.
Let’s imagine I want to ID cancer metastasis to lymph nodes and I dont care which cancer is there, just if its positive or negative. If I have 10 different cancers, different morphologies, does it make any difference if I have only one “positive” class, or 10 separate “positive A, positive B etc” classes?

Is it more difficult for the program to distinguish between the “positive” class and other negative classes if the positive class has a multimodal distribution, and some of the “negative” class values fall somewhere between the peaks of the “positive” class?

Thanks, Juan

1 Like

I would defer to @petebankhead on how the software actually works, but, in general, specific trumps general. Something to keep in mind is that all of these classes take up a certain amount of the classifier, be it branches in the random trees or nodes in the ANN. If your classifier gets too expansive, you may find you need to go into the options for whichever classifier you are using and expand the number of options/accuracy/other measurements used for your classifier.

If your negative is always the same, you could also perform a positive/negative classifier first, and then within the positive areas classify by type with a second classifier. That way your tumor type classifier would not have a chance to get confused by negative tissue. I am a fan of subdividing things, and there was a recent paper on deep learning segmentation of biomedical imaging data that advocated a similar approach. They used non-deep learning methods to decide which deep learning algorithm to use, out of a library of DL models.

Anyway, to the specific question of:

For any given model, I would say that the determination of “negative” regions would be worse the more positive classes you have, since there would likely be less of the total model devoted to determining what negative is. On the other hand the positive regions would likely be more accurately detected since most of the model would be devoted to them. End result? You would likely end up with more false positive negative regions with 10 classes of tumor. Are false positives more dangerous than false negatives? Kind of model dependent. If the purpose of the model were to make a diagnosis, either one could be dangerous. If the purpose of the model was to draw a pathologist’s attention to a certain slide, maybe I would tend towards false positives, trusting the human to get it right after.

*In addition to classes, the amount of training data you include will have a similar impact. My previous statements were assuming a relatively balanced selection of areas for each class. If 90% by area of your training data is Negative, you will see different effects, probably a lot more false negatives :stuck_out_tongue:

TLDR: It depends as much on your training data, the distribution of your training data, the accuracy of that training data, as the class selection. The “size” of the model also comes into play. I don’t think there is an easy answer but would be happy to be wrong.

That’s really helpful to understand how to approach it. Thank you!!

when you say … “you need to go into the options for whichever classifier you are using and expand the number of options/accuracy/other measurements used for your classifier.”
Is there a general number of classes above which you would recommend people start thinking about expanding the default options?

Also, I have heard about a 10.000 number, when it comes to the # of annotations needed to accurately train in deep learning. Of course, it depends on your particular cases, but in general, is there a number (or range) regarding the minimum number of annotations you like to have when working with Qpath?

In a pixel classifier annotations are irrelevant, every pixel is a training object. Likewise with object classifiers, each object is an input. That is why Pete regularly recommends, in videos and other places, not to use broad swaths of pixels as training objects. You will be overloading the classifier with approximately the same object over and over and over. It is more important to get a few examples of all possible cases. At the same time, you need enough examples that the classifier cannot make erroneous connections between measurements that are not important. The more measurements you input, the more training examples you need.

It is something important to keep in mind, since none of the classifiers truly work like a deep learning algorithm. Aside from pixel downsampling, there is no actual context, so it can be very difficult for the QuPath classifiers to pick up complex objects, or deal with non-regular shading.

Also, expanding the default options can slow things down quite a bit. Another thing to keep in mind is whether your hardware can support the complexity of the classifier you want to run. And QuPath doesn’t have any directly trainable deep learning (the StarDist script is the closest I know of), so all of the recommendations for that do not apply. Hopefully it will in the future!