Workflow for batch analysis - generation of classifiers

Hi there,

i have started to try qupath in order to achieve a more reproducible analysis compared to conventional scoring systems. I am a bit lost regarding the best way to tackle the following project: i have around 90 whole slide scans of tissue stained for FasL which is expressed in tumor tissue and in the stromal compartment as well.
Ideally i would like to analyse both compartments seperately but tumor expression is the main goal.

The first problem is that i find it hard to get a classifier that works for all samples - can this be salvaged by using examples from several slides in order to train the classifier?

  1. Would you train the classifiers first in small annotations and afterwards use scripts to select the whole tissue and use cell detect on it? I first tried it with the whole sample (small biopsies) but there was significant lag while contouring the different compartments (128GB ram…)

  2. How many additional features would you try to create? I have seen two smoothened feature sets (25µm and 70µm) used but is this helpful?

Another possibility is of course manual delineation of tumor tissue but i would like to avoid this, if possible.

I am sorry if my questions seem a bit confusing but i am stuck at the moment and would appreciate some input.

Best regards
daniel

The more automated you want to get, the more the staining/fixation/all other steps up the pipeline need to be done perfectly to produce uniform results. The workshop in San Diego at LJI recently really emphasized that. Brightfield images in general are also harder to quantify due to stains like DAB not really producing linear measurements (semi-quantitative). Whether you can do it? Maybe, depends on your slides :slight_smile:

That said, you can definitely train classifiers across multiple images, and I would recommend using a minimal number of both detection objects and measurements. Work your way up as necessary. One of the more common mistakes is selecting too many measurements that are at least partially redundant (cytoplasmic DAB and whole cell DAB for a cytoplasmic marker) and reinforce each other, or selecting too many objects for “easy” classes. The latter biases the machine learning classifier since there are more measurements in unimportant categories that the decision trees (or whatever) can use. Overfitting can be a real danger, and reduces the ability of your model to generalize to new data. This problem is also a major factor in training deep learning models.

  1. Your lag is likely due to large annotations with many points. Smoothing the tissue annotation might help this, but extra memory will not. If you are training for cell detection, you would need cells to be present. It might help to turn detection visualizations off while moving around the tissue (D or H key depending on your version)
  2. Smoothing can help group cells together and overcome minor variations. It frequently looks great if you simply want to compare tumor marker to no tumor marker, since such things tend to come in clumps. It can be absolutely terrible to use if you are looking for small populations (or infrequent), like tumor infiltrating lymphocytes. You will smooth them right out of existence.

Tumor detection will get a lot easier once the pixel classifier is scriptable, but you can start playing with it now in v 0.2.0m2 if you want to see if it is worth waiting for. The other option is using SLICs,classifying them, and merging the resulting classes into annotations. So your workflow would involve two classification steps.

  1. Tissue detection
  2. Create SLICs and create measurements
  3. Create a classifier for the SLICs, either manually or through training. Save classifier
  4. Convert the classes into annotations
  5. Create cells, create a classifier for the cells, save classifier.

And as you mentioned, you would probably want to run 3 and 5 on multiple slides when you are on those steps to ensure they work on multiple samples and will generalize well to your whole set of samples.