Help ! how to threshold when I have big variation in coloration staining (IHC)

Dear all,
I have very big variation of staining between the dataset and I would like to know if my workflow is correct.
I’m not sure about the step 4…
I already did some trial with machine learning (thank to @haesleinhuepf @Pete and @Research_Associate for the help) but the researcher ask me to analysis the data with a more basic methods.

  1. Duplicate image and I detect the tissu with gaussian filter and Huang threshold

  2. Colour deconvolution with masson trichrome vectors (thanks to @phaub and @gabriel for CDBoost)

  3. Image calculator to calculate only collagen or nuclei in tissu

  4. I calculate the mean intensity of the collagen in the tissu and I apply different threshold based on the value. if mean of means < x I apply Otsu threshold , else Li .
    to do that I use analyze particles in the tissue and I multi-measure mean in the collagen image. Then I create an Array and I use Array.getStatistics(array, min, max, mean, stdDev) to extract the mean of the means

Please find the mean of the means of collagen staining in the tissu (not the entire image) slide 1 to 23

  1. I continue with the measure of the area etc etc…

Can I split my dataset like this because the staining have been made by different people and at different moment (+/- 10 month between the first and the last). And it results of different intensities of staining.
This not look very ethical but if I don’t do that I will overestimate collagen in some slice and underestimate in other slice

Kind regards,

I am not sure exactly what would be the best way to deal with stain variation causing batch effect other than to normalize the data using the overall mean and variance (ie mean center and unit variance the data) before performing a threshold or other process. In terms of ethics, they way i have dealt with it before is to be very transparent in the manuscript or report, saying there is batch effect within the dataset (ie we have 2 different batches stained at different times) and say you used 2 different thresholds. The hard part then is to convince yourself, the reviewer or reader that your different thresholds are actually performing the same analysis by finding some constant in all images, like a known positive cell.


Dear @rdbell3
Could you help me with the normalization ?

  1. I need to measure the mean of my staining per slide and then ?

Can you post an image of the two (or multiple types) of staining intensities?

And a description of your actual goal for the analysis?

Please find crop of my dataset
1.tif (19,0 Mo) 2.tif (2,1 Mo) 3.tif (4,4 Mo) 4.tif (5,2 Mo)

I should have in blue collagen, in red nuclei and I need to measure the holes in the tissue sometimes the samples where well decalcified and it’s close to white and sometimes pale blue.

This looks really difficult it not immosible because of how different the stains are but I would give this a try, assuming you are using QuPath. It is very important to note, this method assumes the stain intensities lie on a normal curve, and the max stain intensity of one slide is equivalent to the max stain intensity of another slide:

Draw a annotation around the whole tissue

Use Analyze->Calculate Features -> Calculate Intensity Features

Then Select the channel that corresponds to the blue channel and calculate the mean and SD

Then decide on an appropriate threshold, say 34% of max value. Take the mean and subtract 1 SD (as 1 SD below the mean is 34% of the max value on a normal curve) and use this number as your threshold

You can then use the Classify -> Pixel Classifier -> Create Thresholder and use this value as the threshold to create Collagen Positive annotations and collagen negative annotations

Once you have these annotations you should be able to perform down stream analysis.

Hope this helps