CLIJ does not seem to be using GPU during test

Hi @haesleinhuepf !

I finally got to try CLIJ on my computer. I have a large collection of images to crunch, so I saw a good opportunity to give it a shot and test it out. I have an ASUS VivoBook S, running Window 10, with an nVidia GeForce 940MX (I also have an in-board intel GPU).

The thing is I could not get past the initial tests. When I run a benchmarking test (two 1000-rounds loops, one on GPU, one on CPU), CLIJ takes 10+ times longer than the standard CPU processing, and all the load seems to be on the CPU during the whole test.

Here’s a couple of shots form the results for repeated 2°rotation (I know…) and 10px gaussian blur:

Here’s the test code:

run("Close All");
run("Blobs (25K)");
run("32-bit"); 
run("Invert LUT");

time=getTime();
run("CLIJ Macro Extensions", "cl_device=[GeForce 940MX]"); // GeForce 940MX
Ext.CLIJ_clear();
title=getTitle();
title2=title+"_r";
Ext.CLIJ_push(title);
for (i = 0; i < 1000; i++) {
	Ext.CLIJ_rotate2D(title, title2, 2, true);  
	//Ext.CLIJ_blur2D(title, title2, 10, 10); //altermative for blur
	Ext.CLIJ_copy(title2, title);
	}
Ext.CLIJ_pull(title2);
rename("GPU_img");

print("CLIJ:", getTime()-time, "ms");

selectWindow(title);

time=getTime();

setBatchMode(true);
run("Duplicate...", "title=CPU_img");
for (i = 0; i < 1000; i++) {
	run("Rotate... ", "angle=2 grid=1 interpolation=Bilinear"); 
	//run("Gaussian Blur...", "sigma=10"); // alternative for blur
	}
setBatchMode(false);
print("CPU:", getTime()-time, "ms");
resetMinAndMax();
run("Tile");

Forcing the Intel GPU instead of GeForce doesn’t seem to change anything.

Am I missing something in the setup?

Thanks a lot!
Nico

1 Like

Hey @NicoDF,

thanks for testing! You know, I like feedback :star_struck:

My first guess is that you are working with very small images where the GPU cannot outperform the CPU (blobs.gif is 64kb large). I tried your macro on my Intel UHD 620 GPU and its CPU Intel i7-8650U (the test laptop from the paper but in battery mode) and added a line after loading blobs.gif to test it with a bigger image:

run("Scale...", "x=10 y=10 width=2560 height=2540 interpolation=Bilinear average create");

Furthermore, I had to turn down the number of rotations to 100, because the CPU took sooo long :wink:

The log window logs these timings then:
image

The explanation might be: Very small images fit into the CPUs cache which has high access-speed. The CPU does not need to access RAM while processing the image and thus, outperforms the GPU. You can see that a bit in the benchmark plot for the Rotate2D method:

Read more about that in the CLIJ FAQ.

Let me know if it works with bigger images on your system.

I hope that helps :slightly_smiling_face:

Cheers,
Robert

4 Likes

Addendum: With the adaptation in the macro should then see a peak in the GPU load:
image

Yes, indeed! Image size was the key. Now we are talking! :grin:

I’m a little bit curious about the small size of the GPU blip in the performance graph (while CPU gets a higher, sustained hit). Still, this doesn’t change the actual performance. :wink:

I’m going to run some more tests, and I’ll keep you posted.

Thanks a lot!
Nico

1 Like

You’re welcome. Regarding the blip in the GPU usage: Rotating images hardly includes computation. In classical image processing in general the CPUs and GPUs are pretty bored. It’s all about reading and writing pixels. Only the RAM is busy. If you find a computationally heavy algorithm which might be worth to be implemented on the GPU - let me know :wink:

Cheers,
Robert

1 Like