Qupath takes too long to run

Thank you for sharing the wonderful program “qupath”. As a pathologist, I am trying to analyze the distribution of fibroblast and immune cells in tumor stroma using TCGA breast whole slide image. I have very little experience for programming.

Analysis goals

My strategy for this study is as follows;

  1. Classify “Lesion (with tumor and stroma)” for analysis using Pixel classifier.
  2. Classify “Lesion” into tumor and stroma using superpixel (SLIC, Tile diameter 25 um).
  3. And it classifies “stroma” in to mature and immature stroma by pixel classifier.
  4. Finally, to determine the fibroblast and lymphocyte density in the mature and immature stroma, select the annotations of “mature, immature” stroma and measure the fibroblast and lymphocyte with a trained cell detection object classifier.

The script I use for each step for this project is very simple and is as follows.

selectObjectsByClassification(“Immature”, “Mature”);
runPlugin(‘qupath.imagej.detect.cells.WatershedCellDetection’, ‘{“detectionImageBrightfield”: “Hematoxylin OD”, “requestedPixelSizeMicrons”: 0.5, “backgroundRadiusMicrons”: 8.0, “medianRadiusMicrons”: 0.0, “sigmaMicrons”: 1.5, “minAreaMicrons”: 10.0, “maxAreaMicrons”: 400.0, “threshold”: 0.1, “maxBackground”: 2.0, “watershedPostProcess”: true, “cellExpansionMicrons”: 2.5, “includeNuclei”: true, “smoothBoundaries”: true, “makeMeasurements”: true}’);
def name = getProjectEntry().getImageName() + ‘_ANNO’ + ‘.txt’
def path = buildFilePath(PROJECT_BASE_DIR, ‘Results_Annotation’)
path = buildFilePath(path, name)


My problem is;

In the case of a large image having many cells, it takes a very long time to execute the above script.
(It may take from 10 minutes to 4-5 hours depending on the image size.)
(I use the computer with CPU 24 cores, RAM 256g and SSD 4g)
It seems that ram and cpu utilization is very low even though parallel threads are adjusted.
It can only process 5-10 images per day. It will take 3-4 months to process 1000 files.

Is it normal to take this time? Is there any way to speed it up?

Thank you.

Seoungwan Chae

Hello Seoungwan Chae,
Did you check your maximum memory settings. As I can read on your Memory Monitor screen shot your maximum is ~8GB.
Normally you can change it on QuPath Help / Show Setup Options / Maximum Memory. But sometimes it does not work. You need to setup manually. Try to edit QuPath-0.2.3.cfg file. For me the location is:
At the bottom, below [JVMOptions] change the value comes after -Xmx, as explained here
I hope this helps.

Hi @smcell and @carcen

It looks like the maximum memory is > 224 GB, but the maximum QuPath has actually used is approximately 8 GB. QuPath doesn’t tend to use more than it requires, and 8 GB is probably enough. I appreciate the term ‘Total memory’ displayed in the Memory monitor is a confusing/misleading term… here, it is a value that can increase over time, up to the maximum (which is shown on the vertical axis of the plot).

Nevertheless, it’s always worth checking as @carcen describes anyway, and sometimes QuPath can’t access as much memory as it might like because too many other memory-hungry applications are running on the computer.

My best guess about the slow performance is that the annotations (yellow) in the image look very complex and detailed, with a very large number of vertices (perhaps millions for the selected stroma annotation). This means that QuPath probably has to do some very expensive geometry calculations during the cell detection. Because these calculations involve the same annotation, it is possible that they cannot be parallelized very well – and so do not take advantage of the processing power available.

If you really need the complex annotations, you might see some improvements if you call Objects → Annotations… → Split annotations before later processing steps – but I’m not certain.

If possible, I suggest trying to change your workflow to try to use less complicated processing. QuPath should work better if you have simpler annotations, or annotations that are split into multiple pieces.

As a comparison, you might try running cell detection just within the detected tissue – without distinguishing tumour and stroma. This should give a baseline that demonstrates how fast (or slow) cell detection and classification are alone.

1 Like

Thank you so much for your comment.
Split annotation seems to be effective. I will compare the time taken for cell detection after split annotation and upload the result.

1 Like

Thanks for your comment.
Cell detection doesn’t seem to use enough memory. I checked that ckg is also using 256G. Currently, about 3G of total 14G is used.
When classifying Tumor and Stroma by Superpixel, RAM seems to be sufficiently used as follows.

Thank you,

Seoungwan Chae

I just want to throw out there that maybe the entire workflow could be redone to reduce this problem.
It sounds like you are creating the annotations to determine what is inside them - this is not strictly necessary. If you have a pixel classifier that works, and a Lesion object, all you should need is:
Lesion annotations
Lots of cells

At that point you could use the pixel classifier and add measurements to determine what the subsets of cells are, unless I am missing something about your analysis. Which is entirely possible.

Create Lesion
Create cells inside of Lesion
Classify the cells using a pixel classifier as Tumor or Stroma

if (it.getPathClass() == getPathClass("Tumor"){
it.getMeasurementList().putMeasurement("Tumor", 1)
else{ it.getMeasurementList().putMeasurement("Tumor", 0)}

So Tumor will have a Tumor measurement of 1, and Stroma will have a Tumor measurement of 0.

Next classify your cells as mature or immature stroma - I assume you only want the stroma cells as inputs.

classifyDetectionsByCentroid("Whatever The Pixel Classifier for Stroma Immature vs Mature ")

And once again, you can create a variable to determine Immature vs mature cells.

if (it.getPathClass() == getPathClass("Mature"){
it.getMeasurementList().putMeasurement("Mature", 1)
else{ it.getMeasurementList().putMeasurement("Mature", 0)}

Now in addition to tumor and stroma measurements, you have mature and immature measurements within the Stroma population.
Finally, run your cell classifier.
You now have the information from your pixels classifiers and your cells, all together, with only one or two annotation objects. If need be, you can use a setCellIntensityClassifications(“Mature”, 0.5) to give the cells a subclass of Positive if they are mature, or Negative if they are immature (there may be an issue here with how the classifier works on Tumor cells!! They may all show up as Tumor-Positive, but that sounds like it would not be an issue).

Alternatively, split your script into steps and log the time for each. Though it should be somewhat obvious, you can find out which steps are slow using the built in script time:

1 Like

Thank you so much for your advice.

I didn’t think of annotation after cell detection first, but I’m trying it now.
One problem occurred.

  1. Create Lesion. (Pass)
  2. Create cells inside of Lesion. (Pass)
  3. Classify the cells using a pixel classifier as Tumor or Stroma. (Problem: I annotated Tumor vs Stroma by an “object classifier” after “superpixel”. At this stage, all previous cell detections disappear.)

if (it.getPathClass() == getPathClass(“Tumor”)){
it.getMeasurementList().putMeasurement(“Tumor”, 1)
else{ it.getMeasurementList().putMeasurement(“Tumor”, 0)}

To preserve cell detection information, I am trying to create a new pixel classifier for Tumor - Stroma annotation.
However, Tumor - Stroma annotation seems to work better with superpixel than pixel classifier.
Is there any way possible with superpixel + object classifier?

Thank you,

Seoungwan Chae

Ah, in that case the suggested workflow would not work - it is a pixel classifier only option.

Would it be correct to assume, then, that you are merging the SLICs into annotations once they are classified? From your annotation list it looks like you have Tumor and Stroma annotations, at least.

The closest I could suggest there is, after creating the annotation, immediately store and remove it. The ROI can still be used to classify cells as long as you are working within the same single script.

For example - and this makes the assumption you only have one tumor and one stroma annotation:

tumorAnnotation = getAnnotationObjects().findAll{it.getPathClass() == getPathClass("Tumor")}
stromaAnnotation =  getAnnotationObjects().findAll{it.getPathClass() == getPathClass("Stroma")}
//Now annotations contains the tumor and stroma annotations
removeObjects(annotations, true)
//Other code here until you have cells
cellsInTumor = getCurrentHierarchy().getObjectsForROI(qupath.lib.objects.PathDetectionObject, tumorAnnotation[0].getROI())
cellsInStroma = getCurrentHierarchy().getObjectsForROI(qupath.lib.objects.PathDetectionObject, stromaAnnotation[0].getROI())

At this point, you could use the previous scripts to modify or classify the cells within the tumor or the stroma, without having a large complex annotation slowing down the interface.

I am not sure how much this will speed up the script, as I am not yet sure which portion of the script is slow for you - have you checked?

1 Like

Also want to point out that different steps will have drastically different requirements and bottlenecks. Pixel classifiers are incredibly RAM intensive, while generating cells is usually very CPU intensive.
Cell detection

Meanwhile the pixel classifier can quickly ramp up past 100GB of RAM while using less CPU.

Even after all of the cells or whatever have been created, writing to the hierarchy is, I think, still a single core process.

Finally, most bottlenecks I have seen for large projects actually involve the data itself, not exactly QuPath. If your 1000 images are on a network location or external drive, the images need to be copied to the computer with QuPath in order to process them - and you can be limited by network bandwidth. On the other side of the processing, all of that data still needs to be written to the disk. If you have large .QPDATA files with millions of objects, that data is written with a single CPU core and limited by your drive write speed. Big data files mean long times sitting at almost no CPU utilization, though this is slightly better if you are writing to an SSD. Writing to a network location would be even further bottlenecked.