Export Pixel Selection as Image Scope xml format

Hi!

I would like to export the annotations created by pixel selection in QuPath 0.2.0-m2 version (for example Tumor and Stroma regions) as a xml file (ImageScope, Aperio like this one) rather than a binary mask, as explained here.

Any idea how to do it?
Thanks!

That isn’t possible because (as far as I know) that particular XML specification isn’t freely available. And I really don’t want to get into trying to reverse-engineer something like that.

There are various discussions about QuPath and ImageScope-friendly XML strewn across the internet, but they conclude as summarized above.

I have recently written some code to convert QuPath ROIs and objects to a GeoJSON-compatible representation, which should then be parseable with other tools (e.g. Shapely)*.

So if your goal is to export in a non-raster way, good news is coming in the next release. However, if your goal is to transfer annotations somewhere that only supports ImageScope’s XML format, then alas… I would prefer to promote open formats within QuPath**, and avoid any potentially thorny issues that might arise from reverse-engineering.

*-Admittedly I haven’t checked it’s really compatible and works elsewhere yet… only that QuPath can read its own exported objects.

**-If the XML is open after all, great! Please point me towards the specification.

Hi @petebankhead,
Thanks for the feedback. Yes, actually I found a script that could take svs file and tile the regions at different magnifications using xml files.

I understand the issue here, so, apologies for asking something quite naive. I will keep struggling to find a solution and will update it here :smiley:

Since XML is kind of readable, I might just be being extra-cautious :slight_smile:

You could potentially write your own export script. It’s ok-ish for polygons, but from what I recall of ImageScope it seems to diverge quite a bit from QuPath in how it handles complex shapes with disconnected pieces/holes. So to make a fully compatible exchange would be very tricky, if possible at all.

Or if this is purely for the purposes of using that tiling export script, if you could define very precisely what exactly needs to be exported then the easiest thing might be to do it directly from QuPath. I assume your images are RGB since that script looks like it only supports RGB.

The potentially-relevant import/export options I have worked on are:

  • OME-TIFF for whole slide images
  • GeoJSON for ROIs / objects

These are the best open formats I’ve found to handle the variation QuPath needs (e.g. multidimensional images, complex shapes).

What I want to export is, from every H&E slide, a set of square tiles at different magnifications (5X, 10X, 20X) of the different regions predicted by pixel selection tool as tumor and stroma.

I found that QP has a tiling tool but I am not sure how to accomplish this task using it.

A few questions to clarify:

  • What exactly do you mean by ‘pixel selection tool’?
  • Do you want every tile to be the same size (e.g. 256 x 256 pixels), or every tile to correspond to the same region of interest (e.g. 1024x1024 px at 40x, 256x256 px at 10x…)?
  • How do you define the regions that you want to export as tiles, and how do you want to handle pixels that overlap with the borders of those regions?
  • With what software will you use the exported tiles? I ask because depending on your answers to the previous questions, it might be easier to achieve the downsampling elsewhere (e.g. in Python).
  • Is it only RGB images that need to be supported?

Pixel Classifier in QP 0.2.0-m2

Always same size (224x224)

Regions defined as the annotations generated from the regions identified as “Tumor” or “Stroma” using Pixel classifier. Those regions that overlap or are not square would be discarded.

I am mainly using Python

Yes

And a BIG THANKS for the interest :smiley:

Thanks, some follow up questions then :slight_smile:

  • is there any need to export masks of the annotation, or only tiles of the original pixels?
  • should unannotated tiles just be discarded?
  • do you want to export regions only that are completely within an annotation of a specific class, or where the centroid is within an annotation of that class? If the former, I guess you could lose a lot of tiles when exporting at low resolutions.
  • do you want to export all valid (by the above criterion), non-overlapping tiles or do you want to randomly sample tiles from the annotations?
  • do you need to encode the original coordinates within your patches?
  • what export file format would you want? JPEG would introduce new compression artefacts, but PNG would be large (and TIFF larger still).

My understanding is that the ultimate plan to train a patch classifier to distinguish tumor from stroma (for example) - have I got the right idea?

If so, I imagine that an alternative approach would be to export a (possibly lower-resolution) map showing either the pixel classification results directly or generated from annotations, and then use this to carry out the region extraction directly in Python + OpenSlide. Not sure if that makes the task easier or not…

On reflection, it sounds like this script might be the most useful: https://petebankhead.github.io/qupath/scripting/2018/03/14/script-export-labelled-images.html

Not sure of its status (working vs. not…) in v0.2.0-m2, but if it does work you could run it multiple times to export at different resolutions, and then use the exported labels to sort your patches in Python later.

This has the advantage that you could also set different criteria for ‘ambiguous’ patches, e.g.

  • discard if annotated with more than one class
  • discard if less than 75% of the patch has a specified class
  • use the classification of the center pixel

Coding up each of these criteria in QuPath + Groovy would be harder, but with Numpy you can more easily count pixels with each label corresponding to the exported image patch.

2 Likes

Yes, that is actually the main idea at this stage

Tiles are the final desired output, so no, masks are not necessary at all

Yes

The former. And yes, I loss a lot of data at low resolutions (2.5X) but I found that this magnification are not sufficient for the classifier

I want to export them all

If patches are generated directly, this information is not critical

JPEG or PNG are good enough for the task

Yes, I guess this should be possibility but I am not quite familiar with OpenSlide to do it easily…

It works nicely changing import qupath.lib.scripting.QPEx to import qupath.lib.gui.scripting.QPEx

But I cannot see the numerical code (Stroma 1, Tumor 2) in the file name, should I find it somewhere else?

Just the last question… Pixel size in WSI 0.5 for 20X and 0.275 for 40X, am I right?

Thank you!

Good!

It varies by image, sadly… typically it’s around 0.25 µm per pixel at 40x for Aperio and Hamamatsu images I’ve worked with, but I’ve seen images from other scanners with very different pixel sizes yet the same reported magnification. There are some examples among the OpenSlide freely distributable data demonstrating this for the same slides.

And sometimes the magnification can be missing / buried in the metadata in some not-easily-accessible way.

I’d suggest working always with resolutions defined in terms of µm per pixel, and forgetting magnification values. Some commands in QuPath v0.1.2 use magnifications because I hadn’t yet realised this is a Bad Idea.

1 Like

Sorry missed this… that script should export a labelled image corresponding to each tile, where the integer label of each pixel corresponds to the classification of that pixel.

It has been a long time since I wrote the script and I’m not entirely sure that these labels will necessarily be consistent across multiple images, or if they are inferred based upon the actual classifications discovered in an image (e.g. 1=tumor, 2=stroma might sometimes be reversed - I don’t recall). So this might need to be amended if you see problems.

You may also need to be cautious in how you read the labelled image in Python to avoid it being converted to RGB (unless you’d prefer to work with RGB values, which might overcome the warning in my last paragraph).

In any case, you’d need to write some Python code that checks the labelled image and assigns the image tile as being tumor or stroma based upon the number of pixels with each label. This should be straightforward when all pixels have the same label, but, as I wrote above, counting the number of each label per tile this gives you the ability to make other decisions about the appropriate classification (e.g. if 99% of pixels are labelled tumor, you can probably use the tile - but if 50% is tumor then perhaps better not).

1 Like

Just to post here the final processing. I finally do it in bash looping on the images and using ImageMagic as follows:

#Get number of pixels by color
for i in `cat sample_names`; do convert "${i%}".png -define histogram:unique-colors=true -format %c histogram:info:-  | awk -F "#" '{print$1}' > "${i%}".txt; done

#Get files with >75% of tissue in it
for i in `ls`; do grep '(  0,  0,  0,  0)' "$i" | awk -v var="$i" -F ":" '{if($1<37632) print var}'; done > enough

#Get tiles with >  90% stroma
for i in `cat enough`; do grep '(150,200,150,255)' "$i" | awk -v var="$i" -F ":" '{if($1>33868) print var}'; done > stroma
sed s/'-labels.txt'//g stroma > stroma_samples
#Get tiles with >  90% tumor
for i in `cat enough`; do grep '(200,  0,  0,255)' "$i" | awk -v var="$i" -F ":" '{if($1>33868) print var}'; done > tumor
sed s/'-labels.txt'//g tumor > tumor_samples

Just need to mv the image “.jpg” files to any specific subfolder you prefer (eg. tumor and stroma)

Again, thank you for the feedback!

2 Likes

Hi @petebankhead,

Is there a way to run the script for all the annotated images in a folder?

I annotated 50 WSI and I would like to generate tiles at different pixel sizes, could I run the script without opening every single image and run it?
Thanks!

If your images are in a project, you can use Run → Run for project.

If not, you can write the loop in Groovy (but generally it’s highly recommended to use projects in QuPath…).