Random Trees Detection Classifier (V.0.2.1.and V 0.20)

Summary: If I use the ‘random trees’ setting when creating a detection classifier, will the object classifications be slightly different each time I run it because the model is not deterministic - or is this because the classifier is deprecated / I’m just not applying it correctly?

I create a project and run the following script:

//Set image type
setImageType(‘FLUORESCENCE’);

//Clear annotations and detections
clearAnnotations();
clearDetections();

//Create a full image annotation
createSelectAllObject(true);

//Run cell detection
runPlugin(‘qupath.imagej.detect.cells.WatershedCellDetection’, ‘{“detectionImage”: “DAPI”, “requestedPixelSizeMicrons”: 0.5, “backgroundRadiusMicrons”: 8.0, “medianRadiusMicrons”: 0.0, “sigmaMicrons”: 1.5, “minAreaMicrons”: 10.0, “maxAreaMicrons”: 400.0, “threshold”: 1.0, “watershedPostProcess”: true, “cellExpansionMicrons”: 2.5, “includeNuclei”: true, “smoothBoundaries”: true, “makeMeasurements”: true}’);

//Calculate intensity features
selectCells();
runPlugin(‘qupath.lib.algorithms.IntensityFeaturesPlugin’, ‘{“pixelSizeMicrons”: 2.0, “region”: “Square tiles”, “tileSizeMicrons”: 25.0, “channel1”: false, “channel2”: false, “channel3”: false, “channel4”: true, “channel5”: false, “doMean”: true, “doStdDev”: true, “doMinMax”: true, “doHaralick”: true, “haralickMin”: 5.0, “haralickMax”: 80.0, “haralickDistance”: 1, “haralickBins”: 32}’);

//Calculate smoothed features
selectAnnotations();
runPlugin(‘qupath.lib.plugins.objects.SmoothFeaturesPlugin’, ‘{“fwhmMicrons”: 25.0, “smoothWithinClasses”: false, “useLegacyNames”: false}’);

Then I go to classify --> Object classification --> Older classifiers --> Create detection classifier, train the classifier using a select number of features, random trees model and save as I move in between images.

I then go to apply the same script to a new project with some of the same images as the classifier creation project, but with an added line:
runClassifier(‘ClassifierICreated’);

However, I noticed that there seems to be slight differences in object classifications between projects, when I apply it with a script vs loading and running from the menu, and even when I rerun after running another classifier in between. This may be a very stupid question but is this lack of reproducibility simply because the classifier is stochastic? Because I’ve done things in the wrong order or previous classifications affect new ones? Or because it’s deprecated?

Any insight appreciated. Thanks! :slight_smile:

1 Like

This seems to be a bug… well, your explanation for the reason is totally correct, but QuPath should seed the random generator to enable reproducibility.

I thought this had already been addressed, but I can reproduce the problem in v0.2.1 – I don’t know if it’s a recent regression or if it also happened earlier.

In any case, I will investigate a bit more and report back.

Note that this should only affect the legacy/deprecated classifiers. It shouldn’t be an issue with the ‘new’ classifiers (if you find it is, please let me know since that would be a separate bug :slight_smile: )

2 Likes

Hmmm, I see now why this is a problem… and also why I thought it was solved.

The RTrees classifier is implemented in OpenCV. When originally integrated into QuPath, the OpenCV code for RTrees set the RNG seed explicitly – so trained classifier was consistent. QuPath also explicitly set a different RNG seed if training data was randomly subsampled. The combination of these two things meant that everything ‘random’ was also kept consistent. I remember checking all that at the time, which is why I was surprised to see this issue…

However, it appears that 7 months ago this behavior changed in OpenCV :confused:

By updating the OpenCV dependency, the reproducibility of the legacy classifiers was broken in QuPath :frowning:

Adding the following two lines to your script can get it back:

import org.bytedeco.opencv.global.opencv_core;
opencv_core.setRNGSeed(-1)

This can’t guarantee the results will be identical to the interactively trained classifier, since it will have used a different seed – but at least every time you run the script the classifier should give the same results.

I’ll look into setting this seed explicitly inside QuPath for v0.2.2, since I can see no advantages in the change implemented in OpenCV.

Edit: I’ve made a pull request to fix this in v0.2.2.
See https://github.com/qupath/qupath/issues/567
This basically does the same as the two lines of the script above automatically.

4 Likes

Thanks as always for your speedy and useful reply Pete! :slight_smile: