Quantifying colours in histology

Dear CellProfiler team,

For my reasearch I am currently looking for ways of easily and reproducibly quantifying colours in my histology images. I was told that CellProfiler could be a useful tool to do this. Therefore, I would like to ask you if you could have a look at the attached images and help me design a pipeline for assessing and quantifying black, purple, and pink areas in the images. I would be grateful if you could help me do this.

Best wishes,
William



Hi William,

Histological images can be difficult to work with, since color is a very subjective quantity that varies according to not only the imaging device but also the observer. For example, what you define as “purple”, “blue” and “black” may be different from what I would define those colors to be.

I’ve attached a pipeline that seems to work on the images that you provided, but I can’t guarantee its effectiveness on all the images that you throw at it. The basic workflow is the following:
(1) Combine the RGB channels in a grayscale image, invert it and threshold to find the complete tissue region.
(2) Mask the blue channel and threshold to find the colored region. I chose the blue channel since it seemed to give the highest contrast between the colored regions and the black areas.
(3) The colored region is subtracted from the compete tissue area to get the black tissue region.
(4) Divide the red channel by the green channel to enhance the region that I’m assuming is the purple area. I multiplied the result by 0.25 to bring it to between 0 and 1 so it could be thresholded to get the purple tissue region. I should note that this is completely empirical and could easily change if the color balance is different in another set of images.
(5) Measure the area of the black, blue and purple regions, save the outlines of each of the regions and export the measurements. You can find the measurements you want in the columns labeled with “(mean, Tissue, AreaShape_Area)” where is either “Purple”, “Black” or “Blue”.

Another note is that I’ve adjusted the size thresholds in the three IdentifyPrimAutomatic modules to obtain the largest object found of each color. This setting might also need to be adjusted in other images.

With these caveats in mind, I’d be interested in hearing how well the pipeline works for you.

Regards,
-Mark
2009_09_23_PIPE.mat (2.03 KB)

Dear Mark,

Thank you very much for your help. I will try using this pipeline for my whole set of images to find out how the data relate to the assessment-by-eye. I’ll keep you informed about the results.

Best wishes,
William

Dear Mark

I have read your answer to William and tried to apply your pipline om my immunohistological images. I am trying to quantify staining (with Vulcan fast red) for advanced glycosylation end products and CML in paraffin IHC sections from venous bypass grafts. My first problem is that my images doesent seem to fit the piplint with regard to recognizing the tissue borders. My second problem is that I don´t understand how to modify the settings in the pipeline to recognize red stain of the right treshold for my images. I attach a pair of my images.

I would be very grateful if You could help me.

Best regards
Jonas Malmstedt
Karolinska Institute
Sweden

PS I run CellProfiler 1.0.7522 on Windows XP Pro SP 3 DS

[quote=“mbray”]Hi William,

Histological images can be difficult to work with, since color is a very subjective quantity that varies according to not only the imaging device but also the observer. For example, what you define as “purple”, “blue” and “black” may be different from what I would define those colors to be.

I’ve attached a pipeline that seems to work on the images that you provided, but I can’t guarantee its effectiveness on all the images that you throw at it. The basic workflow is the following:
(1) Combine the RGB channels in a grayscale image, invert it and threshold to find the complete tissue region.
(2) Mask the blue channel and threshold to find the colored region. I chose the blue channel since it seemed to give the highest contrast between the colored regions and the black areas.
(3) The colored region is subtracted from the compete tissue area to get the black tissue region.
(4) Divide the red channel by the green channel to enhance the region that I’m assuming is the purple area. I multiplied the result by 0.25 to bring it to between 0 and 1 so it could be thresholded to get the purple tissue region. I should note that this is completely empirical and could easily change if the color balance is different in another set of images.
(5) Measure the area of the black, blue and purple regions, save the outlines of each of the regions and export the measurements. You can find the measurements you want in the columns labeled with “(mean, Tissue, AreaShape_Area)” where is either “Purple”, “Black” or “Blue”.

Another note is that I’ve adjusted the size thresholds in the three IdentifyPrimAutomatic modules to obtain the largest object found of each color. This setting might also need to be adjusted in other images.

With these caveats in mind, I’d be interested in hearing how well the pipeline works for you.

Regards,
-Mark[/quote]




Hi Jonas,

Attached is a pipeline which may suit your needs. You are correct; the original pipeline was inappropriate for your application, which is not unusual for histological sections. The basic workflow is the following:

  • Split the RGB channels into their individual components. I looked at the channels to see which one seemed to give the best contrast between the red specks and the tissue, and chose the green channel (I should note that this is completely empirical and could easily change if the color balance is different in another set of images). The green channel was inverted and thresholded to find the complete tissue region.

  • The image is then corrected by applying a small amount of smoothing to obtain the tissue background without the specks. This new image was subtracted from the original to get the specks alone.

  • The image is then cropped into the tissue region such that the thresholding is performed on the tissue region only (per-object thresholding)

  • Measure the morphology of the tissue region, save the outlines of the tissue and red stained regions and export the measurements.

Since you didn’t specify what type of quantification you wanted on the red stained regions, I omitted those from the pipeline. But you are free to add any additional measurement modules that you wish, the output of which will be exported in the ExportToExcel module.

Hope this helps!
-Mark
2009_12_02_PIPE.mat (1.78 KB)

Dear Mark!

Thank You so much for help. I’am eager to test the pipeline. I will keep you informed!

Yours sincerely,

Jonas

Dear Mark!

I have memory problems when trying to use your pipline. I use the latest 32bit version of CellProfiler.
At work I run on a DELL Latitude E5500 Intel Celeron CPU 900 @ 2.20Ghz 2,19GHz, 1,95GB RAM PAE, XP Pro SP3.

CellProfiler stops during first cycle at the module IdentifyPrimAutomatic. I used the same image as I have posted in this forum (cml102_0007.tif, 1600x1200 pxels, 5662kB). I paste the error message from Cellprofiler at the end of this post.

I have read the memory section i Cellprofiler help and some posts in the forum but I can´t fix it. I tried the following (trying to write this down in detail if anyone else can have some help from this):

  1. Optimize the computer for best performance according to
    informit.com/articles/article.aspx?p=25448
    (Right-click My Computer and select Properties. Click the Advanced tab. Click Performance. Choose “Optimize for best performance”.)

  2. Set the startup swith to /3GB according to http://technet.microsoft.com/en-us/library/bb124810(EXCHG.65).aspx .
    (Right-click My Computer and select Properties. Click the Advanced tab. Strat and recovery. Click Advanced. Click Edit. Add the switch **/3GB ** after **/fastdetect **)

  3. Turned the hardware acceration for the grapfics card down:
    (Right-click My Computer and select Properties. Click the Advanced tab. Click Performance. Click Advanced. Click Edit Virtual memory, Choose Managed by Windows.)

  4. Tried to add the module “SpeeUpCellProfiler” at end of pipline, no effect becase the analysis interrupts in the first cycle at module IdentifyPrimAutomatic. Tried to move the module “SpeeUpCellProfiler” before IdentifyPrimAutomatic but that caused the pipline to terminate because it couldnt find the picture (told the Speedup module to skip OrigRed, OrigBlue and OrigColor).

  5. Tried to change the diameter from 3,15 to 10,100 in IdentifyPrimObjects according to cellprofiler help to reduce the number of small objects.

  6. From the MATHLAB website, mathworks.com/support/tech-n … /1107.html, I only managed to set the boot.ini switch to 3GB and reduce the memory for the graphics card. The other stuff was too complicated. I couldn´t figure out how to start MATHLAB and issue the commands proposed at the website.

  7. I have closed all other applications when running CellProfiler.

  8. At home I have a Dell Precision Dual core with 4 GB RAM with 32bit XP Pro. I applied all the above but with the same result. On that machine I have installed a dual boot system so I can run Linux Debian Lenny 64-bit version. I tried to install the Linux 64b version of CellProfiler but that was too complicated for my scarce knowledge in Linux. I have only used the GNOME GUI tool Synaptic to install packages.

Can you help me to get this working on XP or help me to install Cellprofiler on my Linux 64-bit system.

Yours sincerely,
Jonas
PS Sorry for my bad english DS

Error message:

There was a problem running the image analysis. Sorry, it is unclear what the problem is. It would be wise to close the entire CellProfiler program in case something strange has happened to the settings. The output file may be unreliable as well. Matlab says the error is: Out of memory. Type HELP MEMORY for your options. in the MeasureObjectAreaShape module, which is module #10 in the pipeline.

Stack:
calculate_zernike in C:\Program\Cellprofiler\CompiledCellProfiler_7522\CellProfiler_mcr\Modules\MeasureObjectAreaShape.m (454)
MeasureObjectAreaShape in C:\Program\Cellprofiler\CompiledCellProfiler_7522\CellProfiler_mcr\Modules\MeasureObjectAreaShape.m (291)
AnalyzeImagesButton_Callback in C:\Program\Cellprofiler\CompiledCellProfiler_7522\CellProfiler_mcr\CellProfiler\CellProfiler.m (10890)
gui_mainfcn in C:\Program\Cellprofiler\CompiledCellProfiler_7522\CellProfiler_mcr\CellProfiler\CellProfiler.m (12633)
CellProfiler in C:\Program\Cellprofiler\CompiledCellProfiler_7522\CellProfiler_mcr\CellProfiler\CellProfiler.m (57)

Hi Jonas,

A couple of suggestions not on the list:

  • Insert a Resize module immediately after the LoadImages modules, with both the input and
    output as OrigColor. Downsample by 0.5 with bilinear interpolation. My guess is that your images are much larger than they need to be for this purpose, and so can downsized without losing much. Remember that the size dependent parameters in IDPrimAuto also need to be changed accordingly.
  • Turn off the Zernicke measurement in MeasureObjectAreaShape. As the help suggests, it’s quite slow and also memory-intensive, and I suspect you will not be using those measurements.

Regards,
-Mark

Dear Mark!

Many thanks for Your quick and effective answer! My memory problems are gone with resize and turning off Zernicke.
Just some questions about the measurement and output. I am primarly interested in the percentage staned area of the tisse area and the mean intensity for the stained area. I have four images per patient and the ideal would be to get one output file for every four images. Is this possible?

Again - thank You for the very valuable help provided!

Sincerely,

-Jonas
2009_12_07_PIPE_mod2.mat (2.05 KB)

Hi Jonas,

Re: measurement - Attached is a revised pipeline that obtains the requested measurements. In the Image.xls spreadsheet, the column ‘Math_AreaRatio’ will have the percentage stained area (area occupied by red stain divided by thresholded tissue area) and the column ‘Intensity_MeanIntensity_MaskRed’ will have the mean intensity for the stained area.

Re: output - Unfortunately, no, the output cannot be split up by patient, at least not in this version of CP, although our upcoming release should allow this if the images are grouped a certain way.

One additional recommendation: Rather than setting the input directory in LoadImages, it is usually better to leave the setting as a period (’.’) and set it instead in the default input folder at the bottom of CellProfiler. This way, the pipeline is more portable.

Regards,
-Mark
2009_12_08_PIPE.mat (2.34 KB)

Hi Mark!

I hope you had happy holidays and a great new year! Thank you very much again for your pipeline from 2009-12-08. I have tested the pipeline on my collection of 137 images and made stepwise modifications. It is now working reasonably well but there are some issues left. I will try to explain below, my apologies for that some of this are due to my insufficient knowledge in image analysis.

1a. Correct Illumination order. You have chosen to run the IdentifyPrimAuto module (identify tissue) on the uncorrected inverted OrigGreen image. Can you explain why you use the uncorrected image instead of the corrected? I modified the pipeline so that IdentifyPrimAuto module uses the corrected image and I think that the identification of tissue improved.

1b CorrectIlluminationCalculate settings. I have read the manual and changed the settings for *Block size *from 10 to 250 to not have the tissue area visible in the correctillumination image. I also added *Smoothing *(Fit polynomial) for the same purpose (the manual also says it’s necessary with the “Each option”).
The problem is that there are still visible areas of uneven illumination in the corners of the corrected image, which makes the IdentifyPrimAutomatic module fail in classifying empty corners as background (see attached image). Do you have any other suggestions?

1c. **CorrectIllumination method **- “each+pipeline” or “all+load images”? Do you recommend building a separate PIPEline using the “All” and “Load images” options to first calculate the illumination function based on all images? Would this help solving the problem with the corners?

Choice of color channel for tissue/stain identification. I think that this is of paramount importance for correct quantification of immunohistology images, sorry for the somewhat lengthy section below.
2a. Is there any way to use information from more than a single channel in order to improve the recognition of stain/tissue. I tried to use the combine option in ColorToGray module to produce an OrigGray image with 2% R, 73% G, and 25% B. The reason for this is that I would like to include more of the pink color instead of purple in the stain recognition. I have the impression that the blue channel also can help to identify some of the stain based of using the “ShowPixelData” tool. I also used the software JMicroVision (http://www.jmicrovision.com) to get thresholding values direct from the RGB color image. Is there a method to translate the RGB or IHS rage values to thresholds for the grayscale images in CellProfiler?

2b. I also wonder if you have any logical method how to choose and combine channel(s)? Do you think that M Grundland´s algorithm for Color to Gray transformation could be useful? See http://www.cl.cam.ac.uk/~mg290/Portfolio/TurnColorsGray.html#Portfolio, (citation: Decolorize: Fast, Contrast Enhancing, Color to Grayscale Conversion by Mark Grundland and Neil A. Dodgson in Pattern Recognition, vol. 40, no. 11, pp. 2891-2896, (2007). ISSN 0031-3203).

2c. Are there reasons to use different settings for tissue identification versus stain identification? I also noticed that you use the uncorrected red channel to measure intensity, could you explain why the red (when you use the green for stain detection), and why the uncorrected?

  1. I attach my revised PIPEline. The revisions made are in summary:
  • changed the ColorToGray option from split to combine using the image OrigGrey instead of OrigGreen for downstream analysis.
  • moved CorrectIlluminationClaculate and CorrectIlluminationAlpply to be performed before all Identify modules.
  • set the Bock size to 205 instead of 10 and added Smoothing fit polynomial to CorrectIlluminationClaculate module.
  • changed the threshold correction to 0.9, added threshold limits [0.025, 0.041] in the first IdentifyPrimAutomativ module (tissue).
  • changed the threshold correction to 1.3, added threshold limits [0.25, 0.30] in the first IdentifyPrimAutomatic module (tissue).
  • changed the image used in MeasureImageAreaOccupied from OrigColor to RedStainBinary (to get Math_AreaRatio right).
    Can you suggest any further changes to improve performance? Besides the problems with the stain identification and the corners missing i background, I also get some images where there is a marked misclassification of tissue (large tissue areas classified as background) see attached image2.

I also attach some more example images in a new reply (board quota limit).

I also wish to express my gratitude and thank to you and the CellProfiler team for an excellent software and outstanding support!

Yours sincerely,

Jonas Malmstedt, MD
Karolinska Institute
Stockholm, Sweden



2010_01_19 FINAL_PIPE.mat (2.38 KB)

My initial impression was that the uncorrected image gave a better foreground/background contrast in order to pick up the tissue. But if using the corrected image improved things, then great!

Good catch on adding smoothing (I had forgotten to add it). However, Fit Polynomial is probably not what you want in this case, since that smooths out the entire image, which it not what I want at this point (and may be responsible for the corners getting caught). Since I want to remove the tissue region from consideration (since I’ve already identified it) to capture the red regions, it would be better to smooth it in a smaller region, using median or gaussian filtering.

It may be appropriate though, to perform illumination correction globally initially on the image to capture the tissue, and then again locally to capture the red regions.

One important item to note is that we are using CorrectIllumination_Calc in a somewhat unorthodox way. If we were using the module for a fluorescent plate assay such as the module was originally designed for, then yes, using ‘All’ might be an option. But here, we are using it for the express purpose of increasing the contrast of red regions and not correcting the illumination as such. Since this needs to be performed on a image-by-image basis, then using ‘All’ would not be appropriate here.

This is a big question and difficult to answer as it is very assay-specific. In short, yes, but the problem is figuring out what information will be useful. I have found that division of color channels (as well as addition with various weights as you have begun exploring) can do the trick but I do not have a particular workflow other than trial and error at this point.

Not at this point.

We use MATLAB’s method to RBB to grayscale conversion, which is standard (performed by eliminating the hue and saturation information while retaining the luminance, using the formula 0.2989 * R + 0.5870 * G + 0.1140 * B). The paper on an enhanced method is one that we would have to look into it see if’s useful. As it is, the ColorToGray module is used only for splitting the color channels into their individual components, and not for combining them.

The reason is basically that different settings are typically used to detect different features; the features that describe the tissue are not the same as that I use to capture to red regions. With this in mind, I optimize the settings to maximize the contrast between the tissue and the everything else for tissue detection, and then between the red regions and everything else for reg region detection.

Since the “corrected” image is more an image enhancement than correction, I was unsure to exactly what the measurement from that image would actually be; it’s not exactly in units of “red” (whatever that is) and it wasn’t specified in the original forum post. But you are free to measure the intensity from whichever image you wish, if you deem it useful. Just keep in mind that for these sorts of images, the units of measurement are basically arbitrary.

For the tissue, I would go back to the original settings and re-adjust. The MoG method uses a user-provided estimate of the amount of foreground area to work, so if you are working with a wide variety to histological sections, another thresholding methods may be better suited.

Regards,
-Mark

Dear Mark!

Thanks for Your latest suggestions. Most things are now working very well. I have been struggling with the task to exclude some large objects identified in IdentifyPrimAutomatic by setting the the maximum value of typical diameter. During this process I thought it would be helpful to run a couple of images and get detailed information on each objects area. I used MeasureObjectAreaShape and looked at the variable AreaShape_Area. The problem is that these areas seems to be way to lagre compared with the approximate areas I calculate bu using the image tool measuring direct in the images (typical the diameters of objects are around 20 pixels, giving me an area of approx 314 pixels). The smallest area in the OUT file Objects is around 600000 pixels(?) corresponding to a diameter of 880.

I think I must have got something wrong but cant figure it out.

Sincerely,

Jonas
2010_02_18_double_illumcorr_artefacts_PIPE.mat (2.61 KB)

Hi Jonas,

I believe the problem lies in the fact that you have the pixel size (the text box to the left of the output filename in the main CellProfiler window) is set to “0,785”. I assume you intended to place a decimal instead of a comma here; as it is, the field is being read out by CellProfiler as “785”, which is then squared and is used to scale the pixel area. So all your areas are being multiplied by 616225!

To solve this, you can either adjust this value to 0.785, if that what you intended, or just set to the default of 1 and adjust the scaling in the spreadsheet. I would recommend the 2nd option, since this parameter will no longer be used in future releases, and we have found that it leads to confusion (like this current problem, for example).

Regards,
-Mark

I have been using CellProfilre to analyze Masson Tricrome (red and blue staining) to analyze data on heart sections at 2x. A few problems I have come across is that I have both white and black areas in the background the images and the light is very uneven, making it difficult to threshold the images. I have played around with enhancing the contrast by inverting the red image and multiplying it by the original blue image and it seems to work pretty well to get the blue area, but I have been having trouble consistently getting the red area. Some samples work with the inverted green image and some work best when I invert green and multiply by the original red. Sometimes I need to manually threshold and sometimes MoG global or Robust Background will give me the area I am looking for. However, I think some of these problems would go away if I could figure out how to background correct properly, which so far I have not been able to do. Also, I was wondering if there was a way to get an output of total tissue area if the threshold results in multiple images, and a way to get percent blue tissue compared to total tissue area.

I am attaching the pipeline I have been working with and a couple images.

Thank you so much!

p.s. The pipeline I have been using is CellProfiler 1 but I also have 2 on my computer, so feel free to change it to the newer version if that will work better.

Sarah




2009_09_23_needbackgroundcorr_PIPE.mat (1.51 KB)

Hi,

I’ve taken a look at your pipeline and attached a modified version made in CP 2.0. The module notes explain the workflow, but basically the background correction issue was handled by a judicious choice of division of individual channels to enhance the desired features. The following features are detected in this order:

  • The circular field of view
  • The full tissue
  • The blue stained tissue
  • The red stained tissue

The fraction of the blue stained tissue to the full tissue area is then calculated.

Cheers,
-Mark
2010_07_03.cp (15 KB)

Thanks for your help! This seems to work really well and I am getting comparable results to the other pipeline except I don’t have to change the threshold method every few pics.

Sarah

For those following this thread: Our latest release of CP 2.0 (r10415) now includes a module called UnmixColors, which performs color deconvolution. This module should remove some of the guess-work in determining what color operations are needed to separate stains in an image.

Cheers,
-Mark