Python scripts and ImageJ / Fiji - Speed up pixelwise calculations

I am having a problem with using ImageJ from Python in an Anaconda environment. I hesitated to install Anaconda as it adds yet another platform to my toolbox that I have to learn :-). It could bring me some nice stuff and then I will go for it. To determine if that is really the case, I will first sketch the original problem that I want to solve and then return to the Python / Anaconda question.

I develop tool for handling a set of image frames. I use ImageJ 1.53b in a Fiji installation. My code is an ijm macro. In a part where I need to do pixel-wise calculations with pixels from two stacks this code really becomes too slow. So I am looking for a solution where I can hopefully re-use most of the code that I have now.

So, I wondered how it would be possible to perform the pixelwise calculation in Python, as this is a language that I have been using before. I could not find directions on how to use Python scripts within ImageJ/Fiji (also because the site is down…) but I hit upon pyimagej.

I understand that with pyimagej I can run ImageJ WITH the GUI. Is this a GUI similar to that of the standalone versions? If so, this would probably fullfil my requirement to re-use the most of my current code.

Now my problem with pyimagej and Anaconda:

Yesterday I had a notification that something was wrong with mvn and imagej.net… I could not reproduce this today. [EDIT: i got the same notification as described on Github by AdvancedImagingUTSW (https://github.com/imagej/pyimagej/issues/68). Here I describe what I encountered now.

I have:

  • installed Anaconda succesfully I opened a CMD.exe prompt.

  • followed the instructions under https://github.com/imagej/pyimagej

    REMARK: when opening the cmd.exe prompt when trying to reproduce the error there was a notification: path not found (in some other words)
    This was resolved with conda init CMD.exe

-start python
python

Then I tried:

import imagej
ij = imagej.init()

and

import imagej
ij = imagej.init(‘sc.fiji:fiji+net.imagej:imagej-legacy’)

both without succcess (See screenshot).

[EDIT 2:]
As it is completely new to me I opened the Documentation in Anaconda Navigator (AN). I discovered something about environments, so I went to Environments in AN and selected the pyimagej environment. I had to install the CMD.exe prompt and after that I started the prompt and runned python in it. Here is the result of trying to use pyimagej:

1 Like

@Fred_ckx Thanks for the detailed info about what you tried! As you probably know, I’m the main PyImageJ developer but also the main ImageJ servers maintainer, and we’re currently in the midst of some very involved servers work. So apologies in advance if I’m less responsive helping you at the moment. The good news is that our top priorities this summer after the servers include PyImageJ, so there will be improvements coming over the next 1-3 months.

Totally understandable! I’ll just say that conda is great, and worth learning. As a Python developer, I find it indispensable, and if you’ve been using virtualenv you won’t need that anymore because conda is built around virtual environments. Also: note the difference between Anaconda and Miniconda: I personally use Miniconda, because I don’t need the whole distribution. I just want the conda tool, and then install the packages I want into the environments I want. :+1:

For that scenario, I think adding Python to the mix will make your life more complicated, and may not be necessary. There are performant ways to do pixelwise operations in ImageJ directly. For example: the Process › Math menu contains several common operations applied to a whole image; there are many user plugins for lots of different things; if you are willing to dip your toes into the ImageJ2 world, the ImageJ Ops library has quite a few things; and finally the CLIJ library lets you run pixelwise ops orders of magnitude faster using your GPUs.

People here on the forum can help you decide upon and pursue any of those directions, depending on your needs. Does your macro currently use for loops to iterate the pixels? What sorts of operations are you trying to do? Can you post your macro here, or at least the part doing the pixelwise operations?

Yes, if you raise the GUI from PyImageJ, it’s literally the ImageJ GUI—not a separate GUI. There is an issue right now with raising the GUI on macOS, but since you’re on Windows, shouldn’t be a problem for you.

The server-side issues with maven.scijava.org should be fully resolved. So if you get any more errors with mvn or maven.scijava.org that look like server problems (e.g. 404s, 500s or timeouts), please let us know here on the forum, or in the ImageJ Gitter channel, and I’ll prioritize fixing it.

I believe you have hit upon this bug:

The problem is fixed, but not yet released. We’ll be doing new releases later this month or next. (Was planned for July, but with the urgent server work, might end up happening in August instead.)

Fortunately, there is a workaround. With your pyimagej environment active, do this:

conda install imglyb=0.3.5

That will downgrade your imglyb library to the latest working version.

While you’re mucking with package versions, I’d also recommend you do:

pip install pyjnius

Or:

conda install pyjnius=1.2.0

Because the newest pyjnius on conda-forge, version 1.2.1, has a regression (since fixed in 1.3.0, but 1.3.0 is not yet available on conda-forge) that makes PyImageJ work less well.

I apologize that pyimagej doesn’t currently “just work” out of the box—these issues are of course exactly what we plan on fixing later this summer.

But again: you may not need PyImageJ at all for what you’re trying to do. If you can post your macro, many folks here will be able to give advice on how you might speed up the processing! :smile: :racing_car: :dash:

3 Likes

Thanks for your extensive answer Curtis! I am going to study this :wink: to discover what is my best path forward.

I definitely need some fexibility in my pixelwise operations. I do them now in two nested for loops. To make things worse: I am handling two stacks (images from a movie) and (even worse) they contain a number of channels. They are probability maps, output of the WEKA segmentation tool and I want to label/segment the images by combining the two probability maps.

The actual function where I combine the two sets of probabilities is still under construction, but even with a simple scheme I noticed that it will be terribly slow in a ijm macro.

As said: I am going to study your suggestions and choose a path forward based on that.

Thanks!
Fred

2 Likes

I will post the current routine for the pixelwise operations later… have to prep it a bit before I can share it :wink:

1 Like

Sounds good @Fred_ckx. Here are the ways I know about to speed up pixelwise looping in ImageJ:

  1. When possible, instead of looping in macro, call built-in ImageJ functions to do whole-image operations (e.g. the Process › Math menu).
  2. Implement the loop part in Java, because it will be much faster than macro, thanks to Java’s Just-in-Time (JIT) compiler.
  3. Use @haesleinhuepf’s CLIJ library to speed up execution with GPUs.
  4. Use Jython plus @albertcardona’s Weaver (example here) to hot-compile the pixelwise loops for better performance.

Hopefully others can chime in here too with approaches they’ve used to improve performance.

2 Likes

Thanks again,
This is the kind of answer that I was hoping for. It makes no sense investing into learning a new, complex tool, how great it might be, if my speed problem could be saved in an easier way. But I think that I see the advantages of conda and like the idea of embedding imagej in a python environment. So I think that I will follow your progressions![

Here is what I have now. The actual loop is starting in line 56. The code is heavily under construction. I have to discover on how to combine both probabilities.

One other route is to use classified stacks output by the WEKA tool . But going via the intermediate probability stack offer me more flexibility…

function probs_to_seg_combi(probs1, probs2) {



	//convert 2 WEKA generated probability map stacks to segmented/labeled stack 
	p_Exp_Log("Convert WEKA generated probability maps to segmented image stack, based on two probs stacks");

	// let probs1 be the maps stack with the highest number of classes (so the one with class1 AND class2)


	selectImage(probs1);	
	
    //Derive the title for the segmented stack from the name of the probabilities stack  YOU MAY HAVE TO DO THIS TASK IN THE MAIN CODE
	probs_Title = getTitle();		
	seg_Title = probs_Title.replace("probability_maps","seg");  

	//Get class names, retrieve them from the probability stack
	getDimensions(probs_width, probs_height, channels, nrOfSlices, nrOfProbsFrames);
	nClasses = channels;
    p_Exp_Log("Retrieve class names from probability stack");
    
	for( c = 1; c <= nClasses; c++ ) {
		slice = 0;
		frame = 0;
		Stack.setPosition(c, slice, frame) 
       sliceLabel[c-1] = getInfo("slice.label");
       p_Exp_Log(sliceLabel[c-1]);
	};


    //create empty classified image stack and set class names 
	//newImage("Classified image", "8-bit color-mode label",probs_width, probs_height, 1, 0, 0);
	newImage("Classified image", "8-bit noise", probs_width, probs_height, nrOfSlices);
	setBatchMode("hide");
	//run("glasbey on dark");
	//run("glasbey inverted");
	run("glasbey");
	//set LUT    do something more clever later
	Property.set("FCKXinfo","FCKX");
	Property.set("probmap1","pmap1");
    Property.set("probmap2","pmap2");
    Property.set("conv_algorithm","alg6");
	Property.set("nClasses", nClasses);
	for( c = 1; c <= nClasses; c++ ) {
		Property.set("class"+c-1,sliceLabel[c-1]);
	}

	selectImage(probs1);
	run("Duplicate...", "duplicate");
	rename("MYPROBS");
	setBatchMode("hide");
	selectImage(probs2);
	setBatchMode("hide");


   //here starts the actual calculation
   //now start classifying the pixels in each slices
	for( s = 1; s <= nrOfSlices; s++ )
		{

	 		showProgress(s, nrOfSlices);
	 		selectImage("MYPROBS");
			setSlice( s );
	  
			print("Slice ", s);
            //selectImage("MYPROBS");
			for( x=0; x<probs_width; x++ ){
	  			for( y=0; y<probs_height; y++ ){
	  			    maxProb = 0;
	  			    cmax = 0;
	  			    selectImage("MYPROBS");  //takes less time than selectWindow CHECK CHECK CHECK "probs" or probs  					
	  				for( c = 1; c <= nClasses; c++ ) {
	  					
	  				    frame = 0;
	  					Stack.setPosition(c, s, frame);
	  					if (getValue(x,y) > maxProb) { maxProb=getValue(x,y); cmax = c;}

	  					}	  					

                    //now get the probability from the second probs stack and convolute
                    selectImage(probs2);
                    wetClass = 0;
                    dryClass = 1;
                    bgClass = 2;
                    Stack.setPosition(wetClass+1, s, frame); wetProb = getValue(x,y);
                    Stack.setPosition(dryClass+1, s, frame); dryProb = getValue(x,y);
                    if (dryProb > wetProb) {
                    	
                    	Stack.setPosition(bgClass+1, s, frame);	bgProb = getValue(x,y);
                    	if (bgProb > dryProb) {
                    		cmax =3;
                    		} else {cmax=2;} 	
                    		
                    	}
	  					
	  				selectImage("Classified image");
	  				setSlice(s);
	  				setPixel(x, y, cmax-1);   //let pixel value in segemented stack start at 0 and correspond to channel-1. Then useful content in hist also starts at 0  
	  				//waitForUser("4 Inspect Image "+probs);
	  			
	  				}; 
	  			
	  			};
		}	

selectImage("MYPROBS");
close();

	
}; //END function probs_to_seg_combi

probs_to_seg_combi_.ijm (3.5 KB)

2 Likes

I think it will be either option 2 or option 3. At first I will give it a thought on how to cast my problem in to fit into the available basic operations …

[EDIT] It is almost embarrassing… when you say Java… is that Java or Javascript? Maybe I have to use Google to learn which of the two relates to a JIT-compiler …

1 Like

Consider using JavaScript, which is JIT-compiled. It runs the Sphere benchmark (Help>Examples>JavaScript>Sphere, with size=4096) 70 times faster that Python.

2 Likes

OK thanks. I am not well into Java or JavaScript. But if I am correct, JavaScript is closer to ij macro script. So, faster to learn for me…

In the mean time I am experimenting with CLIJ / CLIJ2 . Very fast , but I have to cast my problem into the available functions…

2 Likes

It’s confusing because they are similarly named, but completely different languages. I meant Java, but Wayne is right about using JavaScript. So you could try doing either or both of those approaches. The barrier to entry for JavaScript may be lower than Java, because it’s also a scripting language that you can develop inside ImageJ’s script editor. But either way (Java or JavaScript): it will take some time and effort to translate your IJM code, because the ImageJ macro functions are not directly available in the other script languages. On the other hand, the other script languages besides macro can directly call any of the ImageJ public API (Application Programming Interface), which ends up being, in general, more powerful than macro. People here can help with specific questions about how to translate this-or-that macro function into e.g. JavaScript code.

2 Likes

I have been thinking that it is possible to e.g. call a JavaScript routine from IJM script. Is that correct?
If I have to translate all of my IJM script, I will be thrown way back…

Yes! I should have mentioned that earlier. You can use eval("js", script); from the macro language to invoke JavaScript snippets.

Here is an example macro that does some per-pixel math normally:

Per_Pixel_Math.ijm
newImage("Untitled", "8-bit noise", 4000, 4001, 1);
start = getTime();
print("Calculating pixels...");

for (y=0; y<getHeight(); y++) {
	for (x=0; x<getWidth(); x++) {
		value = 128 * (cos(x / 20.0) + sin(y / 20.0) + 1);
		setPixel(x, y, value);
	}
}

end = getTime();
print("Calculation took " + ((end - start) / 1000.0) + " seconds")

On my system, it runs in 21.2 seconds, or ~1.32 microseconds per pixel.

And here is the same example, but with the per-pixel loop wrapped in JavaScript using eval:

Per_Pixel_Math_Optimized_with_JS.ijm
newImage("Untitled", "8-bit noise", 4000, 4001, 1);
start = getTime();
print("Calculating pixels...");

script = "" +
	"importClass(Packages.ij.IJ)\n" +
	"importClass(Packages.java.lang.Math)\n" +
	"imp = IJ.getImage();\n" +
	"ip = imp.getProcessor();\n" +
	"for (y=0; y<ip.getHeight(); y++) {\n" +
	"	for (x=0; x<ip.getWidth(); x++) {\n" +
	"		value = 128 * (Math.cos(x / 20.0) + Math.sin(y / 20.0) + 1);\n" +
	"		ip.setf(x, y, value);\n" +
	"	}\n" +
	"}\n" +
	"imp.updateAndDraw();\n";
eval("js", script);

end = getTime();
print("Calculation took " + ((end - start) / 1000.0) + " seconds")

On my system, this version runs in 3.2 seconds, or ~0.2 microseconds per pixel, more than 6x faster than pure IJM.

2 Likes

Hi @Fred_ckx,

If you are still willing to try out a mix with Python, you will see that all the loops and conditionals can be written in a very succinct way using Numpy which directly works at the matrix level. I believe the code below does more or less the same thing as your main loop which mixes the two probability maps:

import numpy as np

## This first part just sets-up some fake data with appropriate dimensions
# probs1 and probs2 are the probability maps

nbpixelsx = 30
nbpixelsy = 50
nbchannels1 = 4
nbchannels2 = 3
nbslices = 10

wetClass = 0;
dryClass = 1;
bgClass = 2;

probs1 = np.random.rand(nbpixelsx,nbpixelsy,nbchannels1,nbslices)
probs2 = np.random.rand(nbpixelsx,nbpixelsy,nbchannels2,nbslices)

## This implements your main loop (no guarantees)

# find channel with highest probability for all pixels and slices in one go
# cmax is an array of nnbpixelsx * nbpixelsy * nbslices dimensions
cmax = np.argmax(probs1, axis=2)
# in the second probability map, locate pixels where dryClass > wetClass.
# output is a logical array that can be used for indexing
select_dry_wet = probs2[:, :, dryClass,:] > probs2[:, :, wetClass, :]
# same for bgClass > wetClass.
select_bg_dry = probs2[:, :, bgClass,:] > probs2[:, :, dryClass, :]
# using logical indexing, set all pixels in cmax where dryClass > wetClass to value 1
cmax[select_dry_wet] = 1
# using logical indexing, set all pixels in cmax where dryClass > wetClass AND bgClass > wetClass to value 2
# cmax has dimensions nb
cmax[select_dry_wet&select_bg_dry] = 2
# nbpixelsx * nbpixelsy * nbslices
cmax.shape

Cheers,
Guillaume

3 Likes

Hi everyone,

that’s a super interesting and cool discussion over here!

I didn’t know that. Wow! Thanks :slight_smile:

Custom OpenCL-code is very uncommon in CLIJ, but doable (thanks to the customOperation suggested by @maarzt). If you keep it simple, you can circumvent really learning OpenCL, that’s the language we use in #clij to manipulate pixels on the GPU. I’m also happy to assist @Fred_ckx.

Here comes an example, adapted from @ctruedens code:

start = getTime();
print("Calculating pixels...");

// initialize GPU
run("CLIJ2 Macro Extensions", "cl_device=2080");
Ext.CLIJ2_clear();

// create an image on the GPU
width = 4000;
height = 4001;
bit_depth = 8;
Ext.CLIJ2_create2D(image, width, height, bit_depth);

// define parameter images:
// name1, value1, name2, value2, ...
image_parameters = newArray("opencl_image", image);

// that's custom OpenCL code. Read more: https://clij.github.io/clij2-docs/reference_customOperation
script = "" +
	"float value = 128 * (cos(x / 20.0) + sin(y / 20.0) + 1); \n" +
	"WRITE_IMAGE(opencl_image, POS_opencl_image_INSTANCE(x, y, z, 0), CONVERT_opencl_image_PIXEL_TYPE(value));\n";

// execture OpenCL-Kernel
Ext.CLIJ2_customOperation(script, "", image_parameters);

// pull result image from GPU
Ext.CLIJ2_pull(image);

end = getTime();
print("Calculation took " + ((end - start) / 1000.0) + " seconds");

Executed twice on an NVidia RTX 2080 Ti, it outputs:
image

The first run is slower because the OpenCL code needs to be compiled. The second and subsequent runs are faster then. The script will also become a little slower when input images are pushed to the GPU. Rule of thumb: it takes one second per GB. Also please note: Single operations executed on the GPU may not pay off. It’s recommended to execute whole workflows with multiple steps.

Let me know if you need anthing :slight_smile:

Cheers,
Robert

1 Like

Wow, a tsunami of support :slight_smile: thank you Wayne, Curtis, Guillaume, Robert!

Some of my questions via this forum, I would have been trying to answer by reading the docs. But that was unfortunately not possible because the site is down.

Your answers go beyond what I could have found myself! I have to digest the suggestions. At first I need a solution runs reasonably fast compared with the probably hours with my initial attempt. The other thing is that I want to re-use my existing code (except for the part that we are discussing here).

Give me some time to digest. I will probably be back with the next hurdle to take…

Fred

2 Likes

Hi Robert,

I have been gazing at CLIJ2_argMaximumZProjection and tried to apply it to my stack. I have to swap c and z variables first and then I note that the result is a single image. If I want to handle a stack, do I have to call the function for each slice individually? So that would require one external loop.

Is it possible to write a custom function that handles a multidemensional stack in one go?

Thanks
Fred

2 Likes

Hi Fred @Fred_ckx,

you’re right, CLIJ does not directly support multi-channel timelapse data. That’s because of optimization in OpenCL towards distributing tasks for processing 2D and 3D data. In order to process multiple stacks (in time or different channels), I recommend using the pushCurrentZStack method. Here comes an example for processing the mitosis data set which is 5D:

Let me know if you need anything else.

Cheers,
Robert

2 Likes

OK, so this is the way to push the right substack to the GPU. Can you please comment a bit on taking values from two of these input substacks (input1 and input2) and calculating some results that go to a stack output.

The last part you have already shown, I believe. If I’m correct (hard to read your message, while typing this one ) the results in your example of a custom CLIJ routine depend only on pixel coordinates x and y and not on pixel values of inputs.

It would help even more if input1 and input2 were added to the GPU with CLIJ2_pushCurrentZStack. So I can learn how to access / readpixel values from such stacks.

I have to discover still if I can do my job with the maximum projection methods alone or that I have to write a custom routine…

1 Like

Hey @Fred_ckx.

Sure. You push two stacks, and process them:


// push two channel stacks to the GPU
Stack.setChannel(1);
stack1 = getTitle();
Ext.CLIJ2_push(stack1);

Stack.setChannel(2);
stack2 = "stack2"; // we have to rename the stack because otherwise it would overwrite the first one
rename(stack2);
Ext.CLIJ2_push(stack2);

// determine the max of both
Ext.CLIJ2_maximumZProjection(stack1, max1);
Ext.CLIJ2_maximumZProjection(stack2, max2);

// at that point max1 and max2 exist and can be combined, e.g. you can determine the which channel was higher in the maximum projection
Ext.CLIJ2_greater(max1, max2, binary_max1_larger_than_max2);

Edit: The rename step is tricky. If you push the same stack twice, the second overwrites the first. One could copy it in the GPU (which costs time) or rename it on the CPU before pushing…

That’s more or less what you’re trying to do right? Determining which channel (probability) was greater than the other…

If you want to process three channels (you original code suggests that), consider the argMaximumZProjection (you found already), e.g. in combination with getDimensions create3D and copySlice:

// create a new stack as large as the maximum projections in X and Y with 3 slices:
Ext.CLIJ2_getDimensions(max1, width, height, depth);
num_slices = 3;
bit_depth = 32;
Ext.CLIJ2_create3D(max_collection, width, height, num_slices, bit_depth);

// fill these three slices with the three maximum projections
Ext.CLIJ2_copySlice(max1, max_collection, 0);
Ext.CLIJ2_copySlice(max2, max_collection, 1);
Ext.CLIJ2_copySlice(max3, max_collection, 2);

// determine the slice with the highest maximum pixel-by-pixels
Ext.CLIJ2_argMaximumZProjection(max_collection, collection_max, collection_arg_max);

// show result
Ext.CLIJ2_pull(collection_arg_max);

I know pushing pixels around on the GPU feels a bit like going around your elbow to get to your ear. But it should still be super fast because it’s minimalistic operations.

Does it do what you’re trying to achieve?

Cheers,
Robert

1 Like

Yes and no:-)

This helps when I succeed to express the desired transformation such that it can be handled with these stack-wise operations. I will aim for that as I guess that it is advantageous to use non-custom operations on stacks, speedwise.

It may however be necessary to be more flexible and use a custom script acting on both input stacks. In a previous message you gave this example:

// that's custom OpenCL code. Read more: https://clij.github.io/clij2-docs/reference_customOperation
script = "" +
	"float value = 128 * (cos(x / 20.0) + sin(y / 20.0) + 1); \n" +
	"WRITE_IMAGE(opencl_image, POS_opencl_image_INSTANCE(x, y, z, 0), CONVERT_opencl_image_PIXEL_TYPE(value));\n";

I still need to read the docs that you suggest… If I understand it well, in this example the value result does only depend on x and y. How would it look if value were to depend on pixel values in two input stacks.

In other words: how do I read a pixel value from input stacks, use that in an expression for value and then put the result in the output stack? Maybe the answer is in the suggested docs. I am going to read them! [EDIT, Yes it is! I see a READ command. And in fact I already ran the example code successfully yesterday :-)]

I hope that doing things like that does not completely destroy the advantage of GPU processing.

1 Like