WSI preprocessing pipeline

Dear all,

I am a pathologist, interested in digital pathology and deep learning. I have basic knowledge in python (including numpy, pandas, sci-kit,…) and am familiar with some machine learning theory. By trying to set up a pipeline for CNN using whole slides, I encountered several implementation problems, that I would be very happy, if you could help me with as a primary medical professional. My aim is to:

  1. Scan histopathological slides at 40X resolution, saved as .svs.
  2. Manually annotate these slides at low resolution to only include tumor tissue (and not sorrounding normal tissue), e.g. using QuPath.
  3. Use this annotation to perform all following preprocessing steps (e. g. color normalization) in python, but only on the annotated tumor area. Here, I would like to create a pipeline that allows to use different magnification levels (e. g. 20X, 40X) and different tile sizes (e. g. 512x512, 256 vs. 256).
  4. Run the algorithms on these tiles in Python.

My particular questions are:

  1. How would you implement the manual annotation? Is it possible to use one annotation mask generated at low level for all upcoming steps? Or is it necessary to generate a PNG-image at the specified resolution and perform annotation individually at each magnification?
  2. Which file formats for the images in Python are recommended? Is downscaling necessary?

These questions may be self-explanatory for a professional in this field, so please bear with me as a primary medical practioner.

Thanks a lot!


In QuPath, you’d create a project containing your whole slide images and annotate each image once. Then you can export tiles in batch at any resolution and any tile size you like.

Since everyone tends to want to export in a slightly different way, the export is done using scripts – but there is already documentation with examples to try to help with this:

1 Like

Thank you for the quick response! This already helped me a lot.

So, my approach of making the tiles within Python is not the way to go? Which file format would you recommend for exporting the tiles in QuPath (TIF, PNG, JPEG,…)?

You could make the tiles in Python, but it can quickly become very awkward and require writing quite a lot of custom code; QuPath should spare you a lot of the pain, and work better across different file formats.

Regarding the export file format, any of the options should be ok. In general, assuming you’ve brightfield/RGB images:

  • TIFF will probably be uncompressed, so require a lot of memory; it can however store the pixel calibration information (i.e. size in µm) if available.
  • PNG will be compressed without loss of information… so will require quite a lot of information, but probably much less than TIFF. It will lose the µm info.
  • JPEG will be compressed in a way that introduces some compression artifacts, but gives comparatively small files. This artifacts might be unwelcome (it always sounds like a bad thing to introduce artifacts), but alternatively could conceivably help make the CNN more robust by giving it an extra challenge to overcome. It will lose the µm info.

I’ve no evidence that using JPEG really does help with a CNN in practice, but nor any evidence that it causes much trouble. For quantification JPEG is quite terrible in general, and usually a really bad idea for fluorescence images. But many brightfield WSI images are already JPEG-compressed when they are first written… the artifacts are subtle enough to be tolerated, since one doesn’t really care too much but the precise values of individual pixels.

(Complicating things, you can use QuPath to write a TIFF that uses JPEG compression inside… but that takes more effort, the default is that TIFFs are uncompressed.)

Assuming you’ll have a lot of tiles, I’d tend to try PNG first and then switch to use JPEG instead if I find the PNGs just take up too much memory.

1 Like

Thanks a lot!

I just tried to use the provided script on an example image:

 * Script to export image tiles (can be customized in various ways).

// Get the current image (supports 'Run for project')
def imageData = getCurrentImageData()

// Define output path (here, relative to project)
def name = GeneralTools.getNameWithoutExtension(imageData.getServer().getMetadata().getName())
def pathOutput = buildFilePath(PROJECT_BASE_DIR, 'tiles', name)

// Define output resolution in calibrated units (e.g. µm if available)
double requestedPixelSize = 5.0

// Convert output resolution to a downsample factor
double pixelSize = imageData.getServer().getPixelCalibration().getAveragedPixelSize()
double downsample = requestedPixelSize / pixelSize

// Create an exporter that requests corresponding tiles from the original & labelled image servers
new TileExporter(imageData)
    .downsample(downsample)   // Define export resolution
    .imageExtension('.tif')   // Define file extension for original pixels (often .tif, .jpg, '.png' or '.ome.tif')
    .tileSize(512)            // Define size of each tile, in pixels
    .annotatedTilesOnly(false) // If true, only export tiles if there is a (classified) annotation present
    .overlap(64)              // Define overlap, in pixel units at the export resolution
    .writeTiles(pathOutput)   // Write tiles to the specified directory

It works on the whole image, when .annotatedTilesOnly(false). Yet, when I draw a rectangle and set class to Tumor, after changing to .annotatedTilesOnly(true), the resulting file directory is empty. Any explanation?

Thanks in advance!

Hmmm, I’m afraid I don’t – when I do those steps the export works as expected.

(Edited your post to add ``` at the top and bottom of the script for code formatting)

Addendum: If you are using Run for project then you’ll need to save your tumor annotation – Run for project ignores the current viewer, and goes directly to the data files, which is why saving first is necessary.

In v0.2 you should see a related warning in red in the Run for project dialog… although that focusses on the results. Probably we need to add a separate ‘You have an unsaved image open’ warning (if we can think of a suitably concise way to express it).

Screenshot 2021-01-09 at 17.11.32

1 Like