How to batch processing for thousands of images in one project?

I have around 20 thousand images to extract features. And I wrote a script and used batch processing to run it. It run very fast at the begining(about 0.4 sec one img). But it run slower and slower and the software finally stuck in 14000th images or so. I wonder how to deal with it.

I’ve never worked with a project that big in QuPath myself… but to begin to answer, it would be necessary to see the script and to know how you are running it - ideally with some indication of the kind of images, e.g. whole slide images, microscopy images, which file format.

1 Like
def name = getProjectEntry().getImageName() + '.txt'

setColorDeconvolutionStains('{"Name" : "H&E default", "Stain 1" : "Hematoxylin", "Values 1" : "0.65111 0.70119 0.29049 ", "Stain 2" : "Eosin", "Values 2" : "0.2159 0.8012 0.5581 ", "Background" : " 255 255 255 "}');


runPlugin('qupath.imagej.detect.cells.PositiveCellDetection', '{"detectionImageBrightfield": "Hematoxylin OD",  "backgroundRadius": 15.0,  "medianRadius": 0.0,  "sigma": 3.0,  "minArea": 10.0,  "maxArea": 1000.0,  "threshold": 0.1,  "maxBackground": 2.0,  "watershedPostProcess": true,  "cellExpansion": 5.0,  "includeNuclei": true,  "smoothBoundaries": true,  "makeMeasurements": true,  "thresholdCompartment": "Nucleus: Eosin OD mean",  "thresholdPositive1": 0.2,  "thresholdPositive2": 0.4,  "thresholdPositive3": 0.6000000000000001,  "singleThreshold": true}');

runPlugin('qupath.lib.algorithms.IntensityFeaturesPlugin', '{"downsample": 1.0,  "region": "ROI",  "tileSizePixels": 200.0,  "colorOD": false,  "colorStain1": true,  "colorStain2": true,  "colorStain3": false,  "colorRed": false,  "colorGreen": false,  "colorBlue": false,  "colorHue": false,  "colorSaturation": false,  "colorBrightness": false,  "doMean": false,  "doStdDev": false,  "doMinMax": false,  "doMedian": false,  "doHaralick": true,  "haralickDistance": 1,  "haralickBins": 32}');
path1 = '/G:/'

path = buildFilePath(path1, name)
print 'Results exported to ' + path

Thanks for your reply!
This is my code which just apply tumor detection on each image and extract texture features and I use “Run -> Run for project” for running it. In fact, each images has been tessalated from a whole slide image by other’s work. So the capacity of each image is very small in ‘png’ format.

QuPath can work directly with the whole slide image & do the tiling itself, so if that’s an option for you then the 20,000 PNGs may not be necessary… but in any case the easiest solution is probably to split your project into a couple of pieces and run them separately.

Removing the print statement might also help, as this may cause some parts of the user interface to have to store thousands of lines of text.

If using v0.2.0-m10 you could also use VisualVM to profile where the code might be slowing down, or call QuPath from the command line.

I also have memories of Windows’ ability to handle large numbers of files in the same directory isn’t always the best, so changing the output file path to split the images into batches might help.

1 Like