addPixelClassifierMeasurements - low performance

I have QuPath 0.2.2 running on Windows 10.

I have a project with an image that was added manually, “Auto-generate pyramids” was checked.
Project contains a pixel classifier (thresholder with very high resolution).

When I try to run

...
QP.createSelectAllObject(true)
QP.addPixelClassifierMeasurements(classifier, classifierName)

from the Script Editor - it takes very long time to process.
From very low CPU load I was suspecting initially that it doesn’t even work…
In Windows Task Manager there is no 100%-occupied core of 8 available, only one core has noticeable extra load.

While exploring the issue, I realized it is apparently working in the same way on my machine even when called via GUI. But in that case there is an illusion of work because classifier display layer is making full use of CPU while I’m activating measurement (this is rather annoying UX “feature” - having an explicit toggle for the classifier preview would’ve been nicer). Classifier preview finishes quicker though, and the measurement time actually the same as when called from script.
I also tried running the script via CLI, with the same result.

On a different machine, where this script is intended to be used, the results are following:

  • run via GUI - makes full use of CPU cores and completes quickly;
  • run via Script Editor - clearly uses a single core and takes about the same time as I have locally.
    It’s about 14 times difference in time.

I tried to limit the number of processors in QuPath settings from 8 to 6 - no observable difference.

I tried to run java -jar ... instead of console executable - no observable difference.

I tried to make some screenshots from VisualVM on my local machine. Not really clean, but I couldn’t see any details worth to re-capture. I’m not sure what exactly to look for, so let me know what else I can measure.



Other performance-related topic I’ve been looking through, but it didn’t help me (some things I tried above are came from there):

Questions:

  • What may cause the difference between running from GUI thread and script thread on the target machine?
    (I’ll have to install JVM before being able to use VisualVM there. I would like to have a better idea what to look for before that.)

  • What may cause low performance in either case on my local machine?

I’m aiming to move from script to extension for different reasons. That might also help me make it work the same as it currently does when called from GUI on the target machine. But since there is no difference on the local machine, it will still be difficult for me to debug, and also means that it is not really reliable and might work slow on different machines.
I’d love to be sure it doesn’t waste time doing nothing.

Thanks in advance for any good pointers to chase it down.

You can easily turn off the preview with the C shortcut key or the ‘C’ key in the toolbar. But it needs to be on by default or there would be more complaints it isn’t doing anything…

You can also turn off live prediction with the big button in the classifier training window (albeit only when training).

Do neither of these do what you want?

The preview overlay is parallelized, but adding measurements is not. Probably it should be, but I hadn’t previously heard of it being such a performance issue as it is here.

Adding measurements does however make use of tiles that are already cached, so is much faster whenever tiles have been requested for some other purpose (in this case computing the overlay).

Performance seem much worse than usual in your case because

  • you’re applying the classifier to the entire image (I presume at a fairly high resolution), and
  • reading the image is slow. According to VisualVM, > 80% of the time is being spent reading the pixels using Bio-Formats.

With that in mind, the speed of disk access on your local machine may be a factor.

Until parallelization is added for measurements, you could take matters into your own hands by pre-requesting all tiles in your script (although this does assume the cache will be large enough to contain them). Something like this should do it:

def imageData = getCurrentImageData()
def classifier = loadPixelClassifier('Tissue detection')
def classifierServer = PixelClassifierTools.createPixelClassificationServer(imageData, classifier)

classifierServer.getTileRequestManager().getAllTileRequests()
    .parallelStream()
    .forEach { classifierServer.readBufferedImage(it.getRegionRequest()) }
2 Likes

Not really intuitive, but ok. These letter buttons in the toolbar just don’t communicate anything before you have something to do with them, and act like mere visual clutter… ¯\_(ツ)_/¯

It’s an SSD, and I’m surprised it can take 7 minutes to read 700 MB from SSD. But I will try to monitor disk access.

Thank you for the code - I will give it a try.

And thanks for the fast response.

How would you improve it?

(The ones with letter icons are mostly due to a lack of time to create a new icon font, but I’m not sure they are much less meaningful than the ones with the drawings. C should be added to the documentation more prominently, but it is already described here and there is always tooltip text… and fits with ‘c’ for classifier to toggle on/off in the same way as there is ‘d’ for detection and ‘a’ for annotation.)

The file format and compression type will matter.

There’s also the command list described under Two essential tips in the ‘Getting started’ section, so you’d just need to type ‘pixel’ to find anything that might be relevant to showing/hiding the overlay:

Screenshot 2020-08-13 at 18.26.48

Genuinely curious as to what you think would be more intuitive – I’m far from an expert on UX (which is one reason why there’s a UX position advertised currently for someone to work specifically on this).

1 Like

2 cents
Either a bigger button that says something like “Selection mode” and “Classifier visibility” (Unfortunately, space is limited, more on some monitors than others) or some sort of visual indication that particular tools are relevant to what is currently active elsewhere in the GUI.

IE highlighting the C button in the GUI when one of the pixel classifier tools is running (surrounded by Cyan box, or something similar).

Extreme: Hide interface options to limit confusion when certain things have no function. C does nothing if there is no classifier layer, so maybe it would not show up initially upon opening the program.

Mouseover text might also indicate which button toggles what when the viewer is selected (D, A, etc).

*can always split the topic into a UX/GUI thread.

1 Like

It’s somewhat unfortunate the topic is shifting towards the minor issue. I couldn’t resist from mentioning it, and now it takes the spotlight.

I actually remember now I were using this button before. But I easily forgot it’s existence when using the same UI next time, focused on different things.

I’m not an expert in UX either.
But in my understanding, once you find yourself thinking how to dress a feature to gain a minor usability improvement - you’re probably doing it wrong. Better solution might be discovered when you take a step back and look at how things fit into a larger picture.
I also have an issue with toolbars and menus in general. They often work like dumps for features that don’t have a natural place to live.
I don’t think I’m familiar enough with QuPath yet to be able to propose a good way to organize things.

OK, results of my experiments with pre-requests:

What I’m trying to run:

...
def classifierServer = PixelClassifierTools.createPixelClassificationServer(imageData, classifier)
classifierServer.getTileRequestManager().getAllTileRequests()
	.parallelStream()
	.forEach { classifierServer.readBufferedImage(it.getRegionRequest()) }

QP.addPixelClassifierMeasurements(classifier, classifierName)

I also changed the percentage of memory for tile caching from 25% to 75%.

It is clear though that on the local machine I don’t have enough memory to fit the cache.
Most of it gets GC’d, and it has to read the image file later again. Net result is worse than without this addition:

But on the target machine with a lot of RAM this seems to work really nice.
All cores are busy during the buffering process, and then the actual measurement process happens really quickly.

I wonder, though: we use a classifier server here, but does the tile cache depends on the classifier used to create it?

1 Like

For complex software, there’s always a tricky balance between making important features prominent vs visual clutter. There is no perfect answer that works well for every possible user and every application. In my opinion, for my applications, QuPath hits this balance better than nearly any other program I’ve ever used.

That being said, if we’re making UX feature requests around the pixel classifier, I have some thoughts (how surprising :slight_smile: ) :

1- I find myself wishing for a copy of the “C” classifier visibility toggle button directly in the pixel classifier training window. I do forget sometimes when I’ve turned visibility off and then I get irritated that the classifier didn’t work. A reminder directly in my face would be helpful. The live preview toggle button isn’t exactly the same functionality, because I’d like to be able to toggle visibility while annotating without retraining (or failing to update) the classifier. Of course, it would also need to show the opacity, because having the classifications visible at 0% opacity has the same problem. And now I just made it complicated :woman_shrugging:

2- Similarly, if I’m creating detection objects from an existing classifier, using the Load Pixel Classifier window, I wish it did not default to calculating the classification overlay for the entire image. If I’m trying to create high-res detection objects in a small selected annotation, it’s a waste of resources to classify the whole image first. I know I can turn off classifier visibility before loading the model, but I usually remember that 5 seconds too late. The “Region” dropdown doesn’t really help here, because A) it’s placed under the Choose Model dropdown, so by the time I get there, it has already started classifying the entire image; and B) there isn’t on option for “selected object only”. I usually have a full tissue annotation, so “any annotations” is still very large. I’m sure that if it defaulted to not displaying anything people would complain about that too. Maybe if it defaulted to calculating only in the selected area annotation when there is one selected?

1 Like

I admit, I feel this way about way too many aspects of the pixel classifier :slight_smile: Especially when I want to draw the 63rd training annotation and oh wait I forgot turn of the… time to go get a drink.