Standardization for Whole Slide Images

Hey QuPath community,
I am relatively new to image processing with QuPath and I have been asked a question about data analysis and consistency as we process our data.

As an example, if two serial sections from the same sample are stained for H&E by two different labs and then WSI images are obtained by 2 different scanners, what steps are required to ensure that these images can be directly compared. What if these images are compared to a new set obtained a year in the future?

More generally, what best practices should we employ so our data is comparable in the future?

Any and all insights are greatly appreciated as always!

1 Like

That is an important question and there is quite a bit of work being put into figuring out how to standardize imaging, but I do not believe there is any quick and easy answer at the moment.
I keep hoping someone will come out with a FDA approved brightfield standard slide or something like that (that does not bleach/age).
Some discussion there, but while you can normalize a lot of image information, the moment you do that, you are also losing image information. If your background “white” on one scanner is 180,180,180 and another is 255,255,255, you could white balance the two - but even if the 180,180,180 now has a background of 255,255,255, you have not actually added any information, just stretched the information that was there.

This can give you false confidence that your results are meaningful, but brightfield staining is not linear and more information is lost the darker an area is. Deconvolution shows this very well, as you can imagine a red stain + a blue stain should give purple pixels. But if you add so much stain that the pixel is essentially black, was the pixel 20% blue and 80% red? 50/50? And the problem stays difficult in darker regions (and non-linearly at that).

So, I would say while white balancing and normalizing images is probably necessary to even try and compare brightfield images in many cases, it never really fixes the problem of images not being absolutely comparable. You will get batch effects. Worse, with different scanners, you can have different levels of compression. Some scanners like the Nanozoomer (at least older models), there wasn’t much of an option to scan non-JPEG compressed brightfield images. Comparing those images with a scanner that had no compression could be very problematic, though how problematic depends on the resolution you need to answer your problem. Example of effects compression can have on finely detailed analysis.

As with anything, the finer detail you need to resolve your structures of interest, the smaller the effect (%wise, structure size), the more reliable the data needs to be. In everything, not just image analysis, of course!

Probably enough from me, but maybe @gabriel @VetPathologist @mesencephalon might have some thoughts or corrections.


You may also be interested in: Color standardization in whole slide imaging using a color calibration slide


This is a really important and yet under-discussed point. I think this paper discusses the challenges well and suggests a possible (fairly intensive) solution:

It’s about DAB, not HE, but the concepts are similar.


Very good question. There are several books to write about this :slight_smile: You could try determining what are the differences in the scanners by having a standard set of slides to scan in the two machines to see what variability you get. Then knowing this you could attempt to correct one scanner based on the other. But I think your main problem is way bigger. Having 2 different processing methods for H&E, on serial (but still different) sections…
And also remember that paraffin sections suffer from a lot of distortions when deposited on the slide, so you are in fact not analysing the same thing. I guess that it all depends what you want to find out from the two setups.