Speeding up Color Deconvolution

This topic is a split.
It started originally in @haesleinhuepf post here

Some interesting information are

Then I asked some questions:

@Alex.h: Are you working with build-in color vectors or with your individual but ‘stable’ color vectors?
Or do you use unique color vector for each image?

The background of my question is that when you always use the same color vectors you can pre-calculate the color deconvolution, store it as an image and apply as a lookup table. This is much faster then processing every pixel of an image and even faster as using a GPU.

The color deconvolution plugin works with RGB color images with 8bit samples per color channel. So there exist 256x256x256 color combinations - in total 16777216 combinations.
An image with 8500x8000 pixels has in total 68000000 pixels.
So to pre-calculated the CD values and using them as LUT is beneficial already for a single image.
If you assume that your image with 8000x8500 only contains much less different color combinations then 16777216 then you can assume that there exists additional strategies to speed up Color Deconvolution without using GPU processing.

Alex.h answered:

@phaub: Usually, I build a macro for one staining. So I process all my H-DAB with one macro and all my Masson Trichrome staining with a second macro.

The macro is made like this:

I use NDPI-tools to extract the 10x data from .ndpi and convert it in .ome.tiff
If .ome.tiff is too big I cut the file in 4 parts or in 9 parts (With bioformat extension) because ImageJ can’t handle more than 2Go of pixels
I correct the White balance because I have often some color variation in the background in my slice
I measure the total area with color threshold
I split the .ome.tiff in 3 with the plugin colour deconvolution (Masson Trichrome)
Set measurement and measure color 1 area, color 2 area with a classical approach like Ostu threshold
IN total area (with analyse particles and image calculator).

So, here we are.


Hi @Alex.h
I created three plugins packed in the CDBoost.jar.
CDBoost_0.0.1.jar (7.2 KB)

Copy it to your ImageJ\plugins folder and restart ImageJ.
You will find an additional entry in the Plugins menu: CDBoost

To use it and to speed up your color deconvolution you first have to create a ColorDeconvolution-LUT:

  1. Use ‘CD-Boost\Create RGB Template’ to create a 4096x4096 pixel RGB image contain all possible colors
  2. Use the Color Deconvolution plugin with your color vectors to separate the channels of this Template
  3. Use ‘CD-Boost\Create CD-LUT’ to compute an LUT image for your separated channels. Make sure that you select the 3 channels in the correct sequence.
  4. Save this LUT image as tif file (!! TIF !!)

Now you are prepared to do ColorDeconvolution on huge images VERY fast.

Open one of your ome.tiffs.
Open the LUT image.
Then use ‘CD-Boost\Apply CD-LUT’ to do the color deconvolution.

In a macro it will look like that, e.g.:

// Load the Color Deconvolution LUT
LUTPath = "C:\\Test\\RGB24bit_LUT.tif";
open( LUTPath );
titleLUT = getTitle();

// for all your image do loop 
imagePath = "C:\\Test\\OME-Tiff-to-process.tif";
open( imagePath );
title =  getTitle();

run("Apply CD-LUT", "image="+title + " lut="+titleLUT);

I hope it is clear how to integrate this steps into your full macro.

The ‘Apply CD-LUT’ will display the result of the color deconvolution as a stack of 3 gray channels.
If you need them separated and/or colorized then let me know.

I recommend to do the steps above and compare the result with those you get when using ColorDeconvolution directly.


1 Like

Just a comment. You should define your own Masson Trichrome vectors as that particular stain is quite difficult to separate, specially if you use iron haematoxylin. Please do not rely on the built-in vectors (I know because I determined those included in the plugin and are not particularly robust).

Thanks for the great hint! Curious: How can one do this? Can one calibrate that vector somehow?

You need to determine those yourself from single-dye stained images, using the “Determining new vectors” section here:


However the “compiling instructions” applies to the standalone version of the plugin in my site. For the Fiji version you can update the text file holding the vectors.


Perhaps now you could help me with this issue:


Regarding the performance, I strongly suspect that main factors are the use of Math.log and Math.exp.

The Fiji plugin calls Math.log three times for every pixel:

However, it appears that the input to this can only be one of 256 possible values at most – since at this point R, G and B will always be an integer in the range 0–255. Therefore it would be straightforward to precompute these values in a single 256 element array, and generate Rlog, Glog and Blog almost ‘instantly’ by an array lookup instead.

This does not deal with the Math.exp calls at line 305 – @phaub’s ‘full LUT’ method is needed for this. However, I expect that removing the Math.log calls would already have a very substantial impact on performance.

@gabriel, if you are updating the plugin, perhaps this would be worth including? The same idea is already used for log255.

Sidenote: QuPath uses the ‘precompute log values’ trick, which allows it to apply color deconvolution extremely quickly while viewing an image… but QuPath doesn’t rescale to 8-bit, and therefore dispenses with the exponentials.

Well, I wrote a quite benchmark as a Groovy script and it seems that Math.exp costs about twice as much time as Math.log (regardless of which is tested first)…

Total time for Math.log: 315
Total time for Math.exp: 814

Still, if the performance may be improved by almost 1/3 by a small change it might still be worth it.
Or alternatively that may be the point where CLIJ gets involved… since there is no real need to compute the logarithms for every pixel.

double[] input = new double[256*256*256 as int]
for (int i = 0; i < input.length; i++)
	input[i] = i % 256

long startTime, endTime

startTime = System.currentTimeMillis()
endTime = System.currentTimeMillis()
println 'Total time for Math.log: ' + (endTime - startTime)

startTime = System.currentTimeMillis()
endTime = System.currentTimeMillis()
println 'Total time for Math.exp: ' + (endTime - startTime)

double[] testLog(double[] input) {
	int n = input.length
	double[] output = new double[n]
	for (int i = 0; i < n; i++) {
		output[i] = Math.log(input[i]);
	return output

double[] testExp(double[] input) {
	int n = input.length
	double[] output = new double[n]
	for (int i = 0; i < n; i++) {
		output[i] = Math.exp(input[i]);
	return output
1 Like

Thanks @petebankhead for your hint.

In addition to the log-trick the lines 307 to 309 (Math.exp and byte conversion) can be optimized in a similar way since the output is in {0 … 255}.
It will be not the same as the precomputed logX-Table.
Instead there must be a case switch depending on the
sum = Rscaled + Gscaled + Bscaled.
The improvement will be less effective but at least a little speed improvement can be realize this way.

1 Like

I will take these points into consideration when updating the current code. Thanks.

1 Like

Hey all,

in case you’re still interested in GPU-accelerating this. I just took @petebankhead example and translated it to CLIJ2:

// init GPU
run("CLIJ2 Macro Extensions", "cl_device=[GeForce RTX 2070]");

// create an image
width = 256.0;
height = 256.0;
depth = 256.0;
bit_depth = 32.0; // image type 32 bit float: 
Ext.CLIJ2_create3D(input, width, height, depth, bit_depth);

// set all pixels to their x-coordinate (0, 1, 2, ...)

// benchmark logarithm
for (i = 0; i < 10; i++) {
	time = getTime();
	result = "result";
	Ext.CLIJ2_logarithm(input, result);
	print("Log took " + (getTime() - time) + " ms");

// benchmark exponential
for (i = 0; i < 10; i++) {
	time = getTime();
	result = "result";
	Ext.CLIJ2_exponential(input, result);
	print("Exp took " + (getTime() - time) + " ms");

// clean up by the end

It outputs on an NVidia RTX 2070:

The first execution of operations is a bit slower because of the warmup-effect where code is compiled. This should also be measurable when using Java or Groovy.

Btw. the CLIJ2_logarithm and CLIJ2_exponential were programmed by @phaub - thanks Peter!


1 Like

Robert, maybe there was as misunderstanding.
The logX table does not has to be 3 dimensional.
It is a simple list with 256 entries.

1 Like

Alright, good to know. That might mean working on this little table might not make sense on the GPU - it’s too small :slight_smile:

I think I have to withdraw this idea.
It was a little hasty - not to say stupied :slightly_frowning_face:

Searching for an entry in a non-linear expX-table takes longer then calculating Math.exp.

1 Like

I still like the idea :slight_smile: In my mind it used an epic list of if else statement, but it feels like there must be some efficient way of doing it… but perhaps not any better than the 16 MB LUT or the GPU.

I guess color deconvolution of this kind could be very nicely CLIJable – although I do still wonder if the back conversion to 8-bit is really necessary at all. I guess most of the time the next step is a threshold, and applying that to the 32-bit output seems fine to me.

You are absolutely right.
Since the result of CD is a (kind of) concentration the values can be and should be handled as 32bit output.
The only reason to map back to 8bit is a technical one, e.g. if the resulting images are used in environments which cannot handle 32bit values.

1 Like

Excellent, thanks @phaub that’s how I was also seeing it.

With that in mind, and though I realise no-one was asking for it, here’s a QuPath script to export 32-bit (optionally) downsampled color deconvolved tiles from a whole slide image directly:

The output is a directory containing multichannel ImageJ TIFF stacks, with metadata set appropriately (including the ability to relate coordinates back to the whole slide image if needed). No GPU, but it should be reasonably fast, thanks to parallelizing the export and using the lookup table to avoid using many log computations. Bottleneck is likely to be reading/writing the tiles.

I understand the OP was already exporting using NDPITools before using Fiji’s deconvolution, so this would potentially combine steps. I don’t know exactly what the ‘some issue with QuPath’ was, but if it was related to the 8-bit/32-bit difference this won’t help… since QuPath only provides 32-bit out here.

Edit: Two additions after re-reading about the original macro:

  • QuPath supports setting background values other than 255 when defining the color deconvolution stains, so that the white balance adjustment might not be needed
  • If a fixed-threshold is used, this could potentially do most of what is required, with no export needed at all… https://qupath.readthedocs.io/en/latest/docs/tutorials/measuring_areas.html - but I’m afraid it doesn’t currently include support for automatic thresholds, e.g. Otsu’s method

Ah, but white balance is not only for the background colour, but for even illumination along the whole field, too. You need background correction unless the microscope has a completely flat illumination source and you are only aiming to correct the colour temperature of the light source.


Dear @phaub,

when you set the color in CD-LUT should I keep the same order or concider a final RGB image and set
Red (color 2) in channel 1
Green (Color 3) in channel 2
Blue (Color 1) in channel 1

input = getDirectory("choose the input directory"); 
output= input + "/Images/";
output2= input + "/Images_Corrected/";
output3= input + "/Mask/";
output4= input + "/Excel/";
output5 = input + "/Tmp/";

run("Create RGB Template");
run("Colour Deconvolution", "vectors=[Masson Trichrome] hide");
run("Create CD-LUT", "channel1=RGB24bit-(Colour_1) channel2=RGB24bit-(Colour_2) channel3=RGB24bit-(Colour_3)");
LutName = getTitle();
saveAs("Tiff", output5 + File.separator + LutName );
LutNameTif = getTitle();

list = getFileList(output2);   
for (i=0; i<list.length; i++) {
	if (endsWith(list[i],".ome.tif")){
	inputPath= output2 + list[i];
run("Bio-Formats Importer", "open=[" + output2  + File.separator + list[i] + "] autoscale color_mode=Default rois_import=[ROI manager] view=Hyperstack stack_order=XYCZT");
run("Stack to RGB");
// Open The Created LUT for Fast colour deconvolution CDBoost
LutPath = output5 + File.separator + LutNameTif;
open (LutPath);
LutName = getTitle();
run("Apply CD-LUT", "image="+imagesName + " lut="+LutName);

It would be much more easy if you could split all your specific parts of the macro and concentrate just on the CD-LUT related parts.
( What is function processFolder(output2); ? )

Ideally we should concentrate on a situation where the all your images are located in a folder as RGB tif files and the RGB24bit_LUT is already exsiting and tested.

Some questions:

  • Have you used the 3 tools manually?
  • What was the result?
  • Have you compared the result with the direct color deconvolution?
  • Are the two results identical?

Can you describe this?

I would recommend to keep the original order from the color deconvolution plugin (1-2-3). Otherwise you will get mixed up one day.
But in principle it should be possible to use any sequence. (It just a lookup table.)

Can you provide a (small) test image as tif file and the stain vectors.