Python scripts and ImageJ / Fiji - Speed up pixelwise calculations

Yes! I should have mentioned that earlier. You can use eval("js", script); from the macro language to invoke JavaScript snippets.

Here is an example macro that does some per-pixel math normally:

Per_Pixel_Math.ijm
newImage("Untitled", "8-bit noise", 4000, 4001, 1);
start = getTime();
print("Calculating pixels...");

for (y=0; y<getHeight(); y++) {
	for (x=0; x<getWidth(); x++) {
		value = 128 * (cos(x / 20.0) + sin(y / 20.0) + 1);
		setPixel(x, y, value);
	}
}

end = getTime();
print("Calculation took " + ((end - start) / 1000.0) + " seconds")

On my system, it runs in 21.2 seconds, or ~1.32 microseconds per pixel.

And here is the same example, but with the per-pixel loop wrapped in JavaScript using eval:

Per_Pixel_Math_Optimized_with_JS.ijm
newImage("Untitled", "8-bit noise", 4000, 4001, 1);
start = getTime();
print("Calculating pixels...");

script = "" +
	"importClass(Packages.ij.IJ)\n" +
	"importClass(Packages.java.lang.Math)\n" +
	"imp = IJ.getImage();\n" +
	"ip = imp.getProcessor();\n" +
	"for (y=0; y<ip.getHeight(); y++) {\n" +
	"	for (x=0; x<ip.getWidth(); x++) {\n" +
	"		value = 128 * (Math.cos(x / 20.0) + Math.sin(y / 20.0) + 1);\n" +
	"		ip.setf(x, y, value);\n" +
	"	}\n" +
	"}\n" +
	"imp.updateAndDraw();\n";
eval("js", script);

end = getTime();
print("Calculation took " + ((end - start) / 1000.0) + " seconds")

On my system, this version runs in 3.2 seconds, or ~0.2 microseconds per pixel, more than 6x faster than pure IJM.

2 Likes

Hi @Fred_ckx,

If you are still willing to try out a mix with Python, you will see that all the loops and conditionals can be written in a very succinct way using Numpy which directly works at the matrix level. I believe the code below does more or less the same thing as your main loop which mixes the two probability maps:

import numpy as np

## This first part just sets-up some fake data with appropriate dimensions
# probs1 and probs2 are the probability maps

nbpixelsx = 30
nbpixelsy = 50
nbchannels1 = 4
nbchannels2 = 3
nbslices = 10

wetClass = 0;
dryClass = 1;
bgClass = 2;

probs1 = np.random.rand(nbpixelsx,nbpixelsy,nbchannels1,nbslices)
probs2 = np.random.rand(nbpixelsx,nbpixelsy,nbchannels2,nbslices)

## This implements your main loop (no guarantees)

# find channel with highest probability for all pixels and slices in one go
# cmax is an array of nnbpixelsx * nbpixelsy * nbslices dimensions
cmax = np.argmax(probs1, axis=2)
# in the second probability map, locate pixels where dryClass > wetClass.
# output is a logical array that can be used for indexing
select_dry_wet = probs2[:, :, dryClass,:] > probs2[:, :, wetClass, :]
# same for bgClass > wetClass.
select_bg_dry = probs2[:, :, bgClass,:] > probs2[:, :, dryClass, :]
# using logical indexing, set all pixels in cmax where dryClass > wetClass to value 1
cmax[select_dry_wet] = 1
# using logical indexing, set all pixels in cmax where dryClass > wetClass AND bgClass > wetClass to value 2
# cmax has dimensions nb
cmax[select_dry_wet&select_bg_dry] = 2
# nbpixelsx * nbpixelsy * nbslices
cmax.shape

Cheers,
Guillaume

3 Likes

Hi everyone,

that’s a super interesting and cool discussion over here!

I didn’t know that. Wow! Thanks :slight_smile:

Custom OpenCL-code is very uncommon in CLIJ, but doable (thanks to the customOperation suggested by @maarzt). If you keep it simple, you can circumvent really learning OpenCL, that’s the language we use in #clij to manipulate pixels on the GPU. I’m also happy to assist @Fred_ckx.

Here comes an example, adapted from @ctruedens code:

start = getTime();
print("Calculating pixels...");

// initialize GPU
run("CLIJ2 Macro Extensions", "cl_device=2080");
Ext.CLIJ2_clear();

// create an image on the GPU
width = 4000;
height = 4001;
bit_depth = 8;
Ext.CLIJ2_create2D(image, width, height, bit_depth);

// define parameter images:
// name1, value1, name2, value2, ...
image_parameters = newArray("opencl_image", image);

// that's custom OpenCL code. Read more: https://clij.github.io/clij2-docs/reference_customOperation
script = "" +
	"float value = 128 * (cos(x / 20.0) + sin(y / 20.0) + 1); \n" +
	"WRITE_IMAGE(opencl_image, POS_opencl_image_INSTANCE(x, y, z, 0), CONVERT_opencl_image_PIXEL_TYPE(value));\n";

// execture OpenCL-Kernel
Ext.CLIJ2_customOperation(script, "", image_parameters);

// pull result image from GPU
Ext.CLIJ2_pull(image);

end = getTime();
print("Calculation took " + ((end - start) / 1000.0) + " seconds");

Executed twice on an NVidia RTX 2080 Ti, it outputs:
image

The first run is slower because the OpenCL code needs to be compiled. The second and subsequent runs are faster then. The script will also become a little slower when input images are pushed to the GPU. Rule of thumb: it takes one second per GB. Also please note: Single operations executed on the GPU may not pay off. It’s recommended to execute whole workflows with multiple steps.

Let me know if you need anthing :slight_smile:

Cheers,
Robert

1 Like

Wow, a tsunami of support :slight_smile: thank you Wayne, Curtis, Guillaume, Robert!

Some of my questions via this forum, I would have been trying to answer by reading the docs. But that was unfortunately not possible because the site is down.

Your answers go beyond what I could have found myself! I have to digest the suggestions. At first I need a solution runs reasonably fast compared with the probably hours with my initial attempt. The other thing is that I want to re-use my existing code (except for the part that we are discussing here).

Give me some time to digest. I will probably be back with the next hurdle to take…

Fred

2 Likes

Hi Robert,

I have been gazing at CLIJ2_argMaximumZProjection and tried to apply it to my stack. I have to swap c and z variables first and then I note that the result is a single image. If I want to handle a stack, do I have to call the function for each slice individually? So that would require one external loop.

Is it possible to write a custom function that handles a multidemensional stack in one go?

Thanks
Fred

2 Likes

Hi Fred @Fred_ckx,

you’re right, CLIJ does not directly support multi-channel timelapse data. That’s because of optimization in OpenCL towards distributing tasks for processing 2D and 3D data. In order to process multiple stacks (in time or different channels), I recommend using the pushCurrentZStack method. Here comes an example for processing the mitosis data set which is 5D:

Let me know if you need anything else.

Cheers,
Robert

2 Likes

OK, so this is the way to push the right substack to the GPU. Can you please comment a bit on taking values from two of these input substacks (input1 and input2) and calculating some results that go to a stack output.

The last part you have already shown, I believe. If I’m correct (hard to read your message, while typing this one ) the results in your example of a custom CLIJ routine depend only on pixel coordinates x and y and not on pixel values of inputs.

It would help even more if input1 and input2 were added to the GPU with CLIJ2_pushCurrentZStack. So I can learn how to access / readpixel values from such stacks.

I have to discover still if I can do my job with the maximum projection methods alone or that I have to write a custom routine…

1 Like

Hey @Fred_ckx.

Sure. You push two stacks, and process them:


// push two channel stacks to the GPU
Stack.setChannel(1);
stack1 = getTitle();
Ext.CLIJ2_push(stack1);

Stack.setChannel(2);
stack2 = "stack2"; // we have to rename the stack because otherwise it would overwrite the first one
rename(stack2);
Ext.CLIJ2_push(stack2);

// determine the max of both
Ext.CLIJ2_maximumZProjection(stack1, max1);
Ext.CLIJ2_maximumZProjection(stack2, max2);

// at that point max1 and max2 exist and can be combined, e.g. you can determine the which channel was higher in the maximum projection
Ext.CLIJ2_greater(max1, max2, binary_max1_larger_than_max2);

Edit: The rename step is tricky. If you push the same stack twice, the second overwrites the first. One could copy it in the GPU (which costs time) or rename it on the CPU before pushing…

That’s more or less what you’re trying to do right? Determining which channel (probability) was greater than the other…

If you want to process three channels (you original code suggests that), consider the argMaximumZProjection (you found already), e.g. in combination with getDimensions create3D and copySlice:

// create a new stack as large as the maximum projections in X and Y with 3 slices:
Ext.CLIJ2_getDimensions(max1, width, height, depth);
num_slices = 3;
bit_depth = 32;
Ext.CLIJ2_create3D(max_collection, width, height, num_slices, bit_depth);

// fill these three slices with the three maximum projections
Ext.CLIJ2_copySlice(max1, max_collection, 0);
Ext.CLIJ2_copySlice(max2, max_collection, 1);
Ext.CLIJ2_copySlice(max3, max_collection, 2);

// determine the slice with the highest maximum pixel-by-pixels
Ext.CLIJ2_argMaximumZProjection(max_collection, collection_max, collection_arg_max);

// show result
Ext.CLIJ2_pull(collection_arg_max);

I know pushing pixels around on the GPU feels a bit like going around your elbow to get to your ear. But it should still be super fast because it’s minimalistic operations.

Does it do what you’re trying to achieve?

Cheers,
Robert

1 Like

Yes and no:-)

This helps when I succeed to express the desired transformation such that it can be handled with these stack-wise operations. I will aim for that as I guess that it is advantageous to use non-custom operations on stacks, speedwise.

It may however be necessary to be more flexible and use a custom script acting on both input stacks. In a previous message you gave this example:

// that's custom OpenCL code. Read more: https://clij.github.io/clij2-docs/reference_customOperation
script = "" +
	"float value = 128 * (cos(x / 20.0) + sin(y / 20.0) + 1); \n" +
	"WRITE_IMAGE(opencl_image, POS_opencl_image_INSTANCE(x, y, z, 0), CONVERT_opencl_image_PIXEL_TYPE(value));\n";

I still need to read the docs that you suggest… If I understand it well, in this example the value result does only depend on x and y. How would it look if value were to depend on pixel values in two input stacks.

In other words: how do I read a pixel value from input stacks, use that in an expression for value and then put the result in the output stack? Maybe the answer is in the suggested docs. I am going to read them! [EDIT, Yes it is! I see a READ command. And in fact I already ran the example code successfully yesterday :-)]

I hope that doing things like that does not completely destroy the advantage of GPU processing.

1 Like

The documentation gives this example:

float value = READ_IMAGE(image, sampler, POS_image_INSTANCE(x, y, z, 0)).x; 
WRITE_IMAGE(image, POS_image_INSTANCE(x, y, z, 0), CONVERT_image_PIXEL_TYPE(value));

Let us know if you get stuck.

2 Likes

OK, thanks. I think that it is time to do some coding and testing !

1 Like

@haesleinhuepf

Just to let you know that I am making progress! I have an intermediate CLIJ code in place that handles a single probabilities stack, that I can benchmark (for speed but, now more important, for result quality) with my existing IJM code. The code returns results very fast !

I just show you the work-in-progress. Maybe you have some CLIJ - related comments? :slight_smile:

I am still struggling with slices vs frames vs channels… which is not really CLIJ related.

I have to re-order my input stack as the CLIJ function finds the maximum over the z-axis and my input has probabilities on the c-axis (channels).

To make thinks even more complicated: in the very beginning I import movie frames with File / Import /Image sequence, this makes that the z-axis (slices) in my case is in fact a time axis. I do not want to re-order right after import, otherwise I have to change many things in my code. I’ll figure out which re-ordering step gives me the desired labeling (max. prob) result.

FredCLI_label.ijm (2.8 KB)

2 Likes

Hey @Fred_ckx,

thanks for the update. So far your code looks great! Also happy that it’s fast on your Intel HD 4000.

Just two minor comments: After activating the CLIJ2 Macro extensions, I would call Ext.CLIJ2_clear(); See here why.

After 25 years of coding I can tell you: If you see that something is “wrong” in your code (e.g. z is not z, but time) change it. Change it as early as possible. You will spare the time invested now later on. Yet your scripts has 87 lines.

Let us know if you need further support.

Cheers,
Robert

1 Like

Thanks for your comments!

You are absolutely right about making things right as early as possible. I’ll give it a go. Make a branch and try if a swift “find and replace” of the keyword “slice” is sufficient for a translation. I am afraid that it is not… And I have some 2000 lines of code

I am pretty new to ImageJ so I saw only one option to import movie frames (FIle / Import / Image sequence…) . I discovered re-ordering much later …

Thanks again,

After fixing the stack-order issue I hope that I can show you some 2 prob-stack results, whenever with standard routines or with a custom one.

Now I am writing this, I think it may be a good way to start with standard routines, even if it doesn’t give the desired result. Then switch to “custom” trying to reproduce the result with the standard routines and then adapt “custom” to get the desired result.

3 Likes

@haesleinhuepf

I have been scrolling through your excellent docs on CLIJ/CLIJ2 and see that you also implemented (are implementing?) WEKA routines.

  1. Are these faster than the normal plugins that I call from my IJM script?

  2. I also learn that the tools accept a 3D stack !!! Unfortunately I can not check if this is also the case for the normal plugin (plain or 3D). This opens up a perspective to solve my problem in a completely different way:

I have to deal with 2 WEKA generated probability maps because I use 2 stacks with differently pre-processed movie frames. If the WEKA tool accepts 3D with differently pre-processed stacks in the 3D dimension (as I read from your docs) I can end up with only 1 probability stack where properties of all pre-processing types have been taken into account.

What is the status of your WEKA routines? I ask this because I don’t see examples as with your other routines.

Cheers
Fred

1 Like

Hey @Fred_ckx,

hehe, welcome to the dark side. The X in CLIJx stands for “eXperimental”. Feel free to use it, but use it with care. An example for clijx-weka can be found here.

Maybe you misunderstood the documentation. It mentiones 3D feature stacks. As in Trainable Weka Segmentation, this means one 2D-slice per feature. You will also see that training takes a 2D ground truth image only. But coming back to an earlier discussion: Is your data actually 3D? Isn’t it 2D over time? You could then process it frame by frame - and you should btw.

If you feel in experimental mood, you could take two 2D images out of your two input stacks, put them in a feature stack as two slices, add some more features derived from these images and apply clijx-weka training for three classes. But again, the whole thing is in experimental stage so please treat everything with care.

Cheers,
Robert

1 Like

It is probably my lack of understanding of the term feature stack. For me that is within the TWS black box :slight_smile:

Haha, I was asking for something where I could expect this answer. I need to be careful with using this. But let me investigate if it is interesting to follow your progression on this :wink:

I will describe the basic objective and the alternative approaches that I now see. [This has become quite lengthy… excuse me].

My primary data is a movie. So, a set of images (x,y) vs t . I preproces those . The first way is that I only blur . That blurred stack x y t is input for TWS. This gives me a segmented x y t stack. Because the quality of this segmentation is poor, I tried a second way of preprocessing. First I blur then I subtract all frames from the first frame. This is powerful to detect all changes with respect to the first frame.

Of course the WEKA tool is trained differently for both inputs: I have two arff files and two model files.

My subject is a microfluidic device. This accepts a liquid that can also change (liquid1, liquid2).
The first set delivers pixel classes liquid1, liquid2, dry (and background). But the quality of the segmentation is poor. Segmentation of the second image stack delivers classes wet (so liquid1 or liquid2) and dry. This segmenration is very reliable.
But it cannot distinguish liquid1 and liquid2.

I saw 3 ways to combine both ways of pre-processing:

  1. take both segemented stacks and do a logical operation based on wet equals liquid1 or liquid2

  2. create two probability stacks x y c1 t and x y c2 t (c1 and c2 contain different classes) and do the logical operation (this is the basis for our ongoing discussion).

  3. feed TWS with two input stacks x y t. These two could form in fact a higher dimension stack x, y, c, t (3D +time) . This route will take only one training, .arff and .model.

If TWS does not accept such a stack, I could use x, y, c and do t in a loop ???

What we have been discussing until now is a route 2) where let TWS produce the two probability maps in stead of the direct segmented stack and then combine both probability maps to deliver a segmented stack.

Cheers
Fred

1 Like

" All this could be done with Fijis Trainable Weka Segmentation as well btw"

And what does your routine add to that? Is it intrinsically faster than Fijis TWS?

You should gain some basic understanding of the algorithms you use, e.g. by reading the documentation or by watching youtube videos or at least take a look at the feature stack of your data. In TWS feature selection is key. You can improve segmentation quality but also processing speed by selecting the right features.

If you process an x-y-t stack in TWS 3D, it will generate features over time. Not sure if this is intentional. Not sure if this is good. Do you want to blur over time? How does a Difference-of-Gaussian in time look like? (-:

That’s a pretty cool strategy!

I removed my comment because it’s actually not true. You cannot combine two images in a feature stack in TWS. Sorry for the confusion.

Feature stack generation is potentially faster on the GPU. Furthermore we’re working on doing the prediction on the GPU as well. This will remove the need for push/pull and save even more time. :wink:

Cheers,
Robert

1 Like

Thanks again, also for the links.

Bottom line: I think that I first finish the approach that we have been discussing. I am pretty nearby a result!

No, I dont want information from separate time frames to contribute to a result. So I should be careful to apply 3D segmentation.

Fred