Training Object Classifier Through Headless Scripting


Hope you’re doing well. I’m working on a project using point annotations to train an object classifier over cellular detections in QuPath. We have quite a large amount of slides, so we’ve built a Docker container to perform detections on our cluster using groovy scripts.

There are some caveats we’ve run into using the Dockerized QuPath. For example – so far, to our knowledge, we haven’t been able to get a “Project” loaded in; however, we’ve been able to work around this by having the docker execute the groovy script independently for each slide and outputting a per-slide GeoJSON file of cellular detections.

After importing the cellular detection objects and point annotations (labels), the final step we hope to achieve is to train an ANN object classifier to classify the cells according to their corresponding point annotation labels. We’ve been able to achieve this in the QuPath GUI with a small test set of slides, but this likely wont scale to the entire cohort of slides we have.

I wanted to ask if it’s possible to train an ANN object classifier in QuPath using Groovy commands / if there’s a way you think we could train an object classifier given the no-project constraint we’re facing? I was thinking I could maybe concatenate all slide-level GeoJSONS into a single file (likewise with the point annotation GeoJSONs), load these two files in as detections and annotations into qupath, and potentially pass these as arguments into a training function, depending on how the ObjectClassification function is set up.

Any guidance or ideas would be tremendously appreciated! :slight_smile:

Thanks for your help as always,

There’s no streamlined, intended way to train a classifier through Groovy yet. Scriptable training is a tentatively planned feature, but I don’t have a timeframe for it.

In the meantime, pretty much anything should be possible through Groovy, although I haven’t tried it so don’t know how complicated the scripting gets.

Regarding projects, the command line supports passing a project but there is a bug in saving the results to a data file – see here for a workaround.

For the rest of your question I’m not able to understand exactly enough your setup to give much more detail. For example

  • Roughly how many images do you have to analyse?
  • Roughly how many cells do you have per image?
  • Where do your points come from for training?
  • How many classes do you have in your classifier?
  • Do you want to train one classifier across multiple image, or one classifier per image?
  • Do you plan to check the classification accuracy somehow?
  • What is the end goal of the analysis?

Mostly, for a large-but-not-too-large number of slides I’d rather just leave QuPath running overnight than deal with the complexities of running it on a cluster.

But if the detection is unrealistically slow that way and I really do need to parallelise it, I’d consider splitting up my images into multiple projects and saving the results for these multiple projects in the ‘default’ way (which internally uses .qpdata files, which should be a lot smaller/faster to write the GeoJSON). Then, I’d merge these projects together into one for later steps.

This should work because when you use the ‘Import images’ dialog for a project, you can actually select another project file instead of an image file – in which case QuPath should bring in all the images and data from the second project in one step. That way you can parallelise the detection (which I guess is the slowest part) but keep the classification limited to QuPath’s interactive tools.

Not sure if that’s a viable option in your case, but hopefully something in here is useful.

Ok, great! This was really helpful. To clarify – on model training, we’re only running cell detections over a smaller-sized ROI (a bounding box of all the point annotations generated on the slide, which we’ve tried to localize to small areas). The cell detections with a corresponding point annotation (with a designated class label) then serve as the training objects for the QuPath object classifier.

Our main concern was that as the number of slides increases, developing object classifier models would become difficult locally. But if I understand you right – QuPath should be able to support interactive object classifier model training on a relatively large-ish number of slides?

To answer your questions –

  • As of now we are in the process of annotating, so we have around 20-25 slides, but this may eventually ramp up to the order of 100s of slides.
  • Number of cells varies per slide, but for the training cells created by running detections on the bounding boxes, on the order of 100,000s.
  • The points are collected from experts through an external viewer that we’ve exported and converted to a QuPath-compatible GeoJSON
  • Ideally we’d like to have be one classifier trained across all our training images. We’re interested right now in identifying lymphocytes (so 2 classes – lymphocyte and other) but this could change as our project continues.
  • Regarding accuracy – this has definitely ben a goal of ours to quantitatively evaluate performance, but we have been trying to figure out the best way to approach this and if this is supported in QuPath?
  • The end goal for this is outputting a cell-wise CSV of cellular objects and their corresponding object classifier label and pixel classifier label (parent class) to perform downstream spatial analyses
1 Like