Exporting tiles for deep learning - specific case

Hi, I was trying to reproduce the script from @LorenzoMD, but it didn’t work.

Is there any script someone is willing to share with me that I can use to export generated tiles annotations as til images (no mask images)?

Here is the error message that I got:

Hi @sophia1,

You could replace line 9 with:

def filename = server.getMetadata().getName() 
1 Like

thanks!

That part works, but I am still getting another error message:

This is because imagewritertools is already per default installed in a newer version of qupath, isn’t it? If so, how do I solve this?

Which version of QuPath are you using? I did not see that listed.

It’s version v0.2.1.

the files were annotated in an earlier qupath version probably 0.2 m12 or m4, but the qupath data I tried it with was resaved under v.0.2.1

No compatible writer tends to be when someone tries to write a multichannel IF image as an RGB jpg. It looks like you have a brightfield image in the background, so I am a bit confused as to why it would be an issue here. Can you verify that your Image Type in the Image tab is H&E?


1 Like

@sophia1 the file path looks very long; Windows can have some problems with that.

1 Like

thanks!!! - it works now.

I think we will still add some refinements, for example that the number which is appended to the file name equals the number shown for the tile. At the moment the number that gets appended ist the total number of the annotation that the relative tile represents.

I will also try to maybe only choose randomly certain tiles out of all tiles, so that the download time gets reduced, as I don’t think, that using all tiles would increase the accuracy of a deep learning model.

If anybode interested, this is the script now, which is working:

1 Like

Glad you got it working. I’m not clear on what this part means, though.

Glad you got it working. I see you both changed the file extension and shorted the export path. Which solved the problem in the end?

haha I am glad too :smiley:

sorry if this wasn’t clear: in the tissue you can see a number associated with each created tile.
Like this:

But that number doesn’t match the title of each created file.

From what I have seen so for, the number that gets appended to the filename, is from here:

So for example let’s say I want to remove one tile, as there is a big artefact inside. If the numbers match properly, I can just remove those tiles by hand or over an algorith if that can be implemented later. Removing by hand would at least be relatively quickly possible.

1 Like

Changing the format from .jpg to .tif made it work. :slight_smile:

I only changed the path afterwards, because my internal ssd was immediately full after starting downloading the tiles^^

2 Likes

Two things come to mind:

  1. Selecting your annotations
    Instead of
    annotation in getAnnotationObjects()
    you could use
    annotation in getAnnotationObjects().findAll{it.getPathClass() != null}
    or == getPathClass("Tumor") for example. That lets you select a subset of your tiles for export if you already tagged some of them with Artifact first.
  2. Tile names, you could change the i in the String tilename line to the name of the annotation
    String tilename = String.format("%s_%s%s.tif", filename,tiletype, annotation.getName())

That would append all of “Tile 164” to the end of the file name, but it would get the number. You could pick the number out using regex expressions, but I am not 100% fluent in those and hopefully this would be good enough.

1 Like

Thanks for your suggestions!!

Suggestion no. 2 works fine. Suggestion 1 (at least in combination with suggestion 2) makes the script immediately stop.

I am already satisfied with the results, but is there maybe also an option to independently from each other switch between resolution and pixel size for creating these tiles within annotations?

Have a nice weekend all!

1 Like

Whoops. That first one should be == null, since all of the things you want to keep are null (have no class) :slight_smile:

I’m not sure what you mean by the second part, but I did notice that you seem to be generating tiles both in your whole tissue annotation, and in other annotations you have on top of your whole tissue annotation.
image
That will result in double export of those tiles.

Not 100% sure, but I think the def request line has a “1” in it that controls the downsample. You could change that to a variable to dynamically alter the downsample for any given tile. I don’t understand why you would want to, though, so not sure how to give further advice.

1 Like

Ye I had also tried it once or so with “== getPathClass(“Tumor”)” and it worked well and yes, the tiles with partially the same content get otherwise exported several times.

Where can I find the def request line?

I am not really sure whether I want to downsample the tiles :smiley: :
A co-supervisor sent me a diagram which showed that the accuracy of a deep learning model (at least the one that she is working with on similiar data) would reduce its accuracy, when the pixel size in µm would increase.

So maybe I am just having an error in reasoning, but she for example created tiles over methods directly in python and generated 299x299 pixels with 0.5µm resolution. When I use the “tiles script” for example with 10 µm or so, I have tiles with 40x40 pixels I think. I can’t check right now, whether it’s exactly 40x40 cause I am downloading something for several hours now that I don’t want to break again, otherwise I would check that now :slight_smile: .

But so I just thought that I could maybe create a higher accuracy for the model.

If your downsample is 1, that is the “best” pixel size you can get. You are exporting it at the true resolution of the original image.

If the image is at 0.5um resolution, that is what a downsample of 1 will give you. If your image has 0.29um per pixel resolution, a downsample of 1 will give you that instead. I don’t think the accuracy of a model will be improved much by subdividing pixels. Usually the limitation will be the JPEG encoding, or whatever compression was used to make the whole slide image.


" With each iteration of the file save, the JPEG artifacts compound one another, further degrading the image. On careful examination, lower quality factor JPEG images typically have an artifact of 8 × 8 pixel squares (96), also known as a macroblock (see Fig. 7c). In many cases, a more suitable alternative to JPEG is the loss-less PNG file format, although PNG does not support conversion to CYMK for printing (102). PNG is greatly preferred over JPEG when compressing figures that have small text or fine lines, since it does not blur the edges of these small items."
Even if your final format is TIFF or PNG, if it passed through a JPG stage at some point, you will see those 8x8 blocks, which will limit the accuracy of the classifier on fine objects.

Note that JPEG compression of some sort is very common in whole slide images, despite the problems it causes, due to the file sizes.

Pixel size: If your tiles are 10um and 40 pixels on a side, that give syou a resolution of 0.25um per pixel.
This is higher than the resolution of the 299 pixels at 0.5um; those tiles are also about 150um on a side, so much larger.

1 Like

The compression information regarding formats is very interesting! And I also just understood it better now regarding the resolution. Thank you!

So when exporting tiles they are automatically the original size in resolution or rather the downsampling is 1 I guess?

I think if I still wanted to change the downsampling factor, I would have to write something like that into the script I unless there is some other additional quick “fix”. Then I could test myself whether there is no change in model accuracy :-).

edit: I think I can directly change the downsampling factor in the script. But I am confused. When I use the same origin tile size and use “apparently” downsampling factor = 1, i have a 396x396 pixels. When I used a downsampling factor of 10, I got 11x11 pixels, which is not a 10 times downsizing factor if you ask me :smiley: - its more like 1300X. Crazy.

This is the part of the script that I changed (the bold 1, I tried it with 1 and with 10 instead):
import qupath.lib.regions.RegionRequest
import qupath.lib.gui.scripting.QPEx

def path = ‘C:/Users/medskillz/Desktop/TCGA Data/WSI/work_in_progress/ba_sq_2/export_tiles’

def imageData = QPEx.getCurrentImageData()
def server = imageData.getServer()

def filename = server.getMetadata().getName()

i = 1

for (annotation in getAnnotationObjects()) {

roi = annotation.getROI()

def request = RegionRequest.createInstance(imageData.getServerPath(),
    **1**, roi)

String tiletype = annotation.getParent().getPathClass()

if (!tiletype.equals("Image")) {

    String tilename = String.format("%s_%s%d.tif", filename, tiletype, i)
    
    writeImageRegion(server, request, path + "/" + tilename);
    
    print("wrote " + tilename)
    
    i++
    
}

}

It looks like you found the right place to alter the downsample, but if you want your code to be accessible in a forum post, please post it using the preformatted text option.
image

If you started with a 10x10 pixel tile, a downsample of 2 will change that to a 5x5.
10 should give you a 1x1 tile, and that is how it works for me as long as I make sure I pick a tile of the same size.
Here 1 downsample gives 453, while 10 downsample gives 45.
image
If you compare two different tile sizes (where a tile is cropped by the edge of the tissue), it will not be comparable. I have other tiles that are 45x2 or other random numbers.

1 Like

Ok so here is the code from above in the better format. Now in this version, I changed the downsampling factor: The “2” in the code represents the downsampling factor, although I am not sure wheter the downsampling factor changes exactly as the number does.

I have some more ideas or questions how to improve this code:

1.) You had already suggested to for example only to write the tiles as tiff files, that contain “tumor” or are unequal “0”.
Is there maybe a way to code something like: if tile with annotation “tumor” and tile with annotation “artefact” overlap within tiles or share similiarity in coordinates, ignore both? - Then I could completely remove artefacts. A workaround would be to next time just remove artefacts in annotations next time, but at the moment I have artefacts as additional annotations and you could quicklier generate new datasets for experimenting, if you could also change this with such a line of code.

  1. I will also try myself to do, as I think this is doable: I want to create for every tile an additional folder, that is later the output folder. The folder should have the same ID as the file the tiles were created from.

  2. Somebody else already asked in the past, how such a program could be used for a whole project. So I guess that someone could manually create the overlay tiles for every slide first and then let the script run for the whole project. Should also be possible. I will also give it a try.

So if someone already has a quick solution for the ideas, I will be glad to take it :slight_smile: .
But I will also try myself. But that thing with overlapping tiles is the most complicated to me.

import qupath.lib.regions.RegionRequest
import qupath.lib.gui.scripting.QPEx

def path = '/output folder of tiles'

def imageData = QPEx.getCurrentImageData()
def server = imageData.getServer()

def filename = server.getMetadata().getName() 

i = 1

for (annotation in getAnnotationObjects()) {
    
    roi = annotation.getROI()
    
    def request = RegionRequest.createInstance(imageData.getServerPath(),
        2, roi)
    
    String tiletype = annotation.getParent().getPathClass()
    
    if (!tiletype.equals("Image")) {
    
        String tilename = String.format("%s_%s%s.tif", filename,tiletype, annotation.getName())
        
        writeImageRegion(server, request, path + "/" + tilename);
        
        print("wrote " + tilename)
        
        i++
        
    }
}type or paste code here
1 Like