Clij resource errors when handling large numbers of files

Hello all,

I have been working on building an ImageJ pipeline to analyze some histology images for scar. I got it up and running, but it was taking quite a while to run since I have about ~5000 stitched mosaic images spread across ~80 directories. I recently became aware of Clij and got it up and running and was able to get most of the functions used in my pipeline that have been implemented in Clij accelerated (Otsu autothresholding, erode/dilate binary operations, image subtraction, and image downsampling). I tested it out on a sample of 2 image directories and it was working beautifully (running ~3X as fast as without Clij). However, when I went to run the pipeline on the full dataset, I get a resource error after it processes ~3 directories worth of images (log output below). After this, it doesn’t fully crash, it just starts outputting blank images. If I kill the script, restart ImageJ, and then continue running it where it left it off before, it runs for another 2-3 directories before it throws the same error again. This doesn’t seem to be a problem with specific directories, since if I close it out and restart it, it runs fine again for a couple directories before breaking down again. Not sure if this is a problem with resource handling/releasing in Clij itself, or if there is some command I should be running periodically in my code to help clean everything up? At the conclusion of each push/pull command for Clij, I am also running “Ext.CLIJ_clear();” to clear the buffer. So, in theory, that should not be the issue unless it isn’t actually getting released properly.

Any advice would be much appreciated. The pipeline with Clij is so much faster than the pipeline without it - but it is a bit annoying to have to keep splitting up my images into sets of 3 directories, running it, and then pointing it at 3 new directories.

I am running ImageJ 1.52p on Windows 10. Thanks in advance for any advice here.

Error when trying to create kernel histogram_image_2d
net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -5 -> CL_OUT_OF_RESOURCES
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkOpenCLErrorCode(BackendUtils.java:352)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.lambda$getKernelPeerPointer$19(ClearCLBackendJOCL.java:596)
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkExceptions(BackendUtils.java:156)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.getKernelPeerPointer(ClearCLBackendJOCL.java:588)
	at net.haesleinhuepf.clij.clearcl.ClearCLCompiledProgram.createKernel(ClearCLCompiledProgram.java:137)
	at net.haesleinhuepf.clij.clearcl.ClearCLProgram.createKernel(ClearCLProgram.java:684)
	at net.haesleinhuepf.clij.utilities.CLKernelExecutor.getKernel(CLKernelExecutor.java:470)
	at net.haesleinhuepf.clij.utilities.CLKernelExecutor.enqueue(CLKernelExecutor.java:303)
	at net.haesleinhuepf.clij.CLIJ.lambda$execute$0(CLIJ.java:272)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:28)
	at net.haesleinhuepf.clij.CLIJ.execute(CLIJ.java:249)
	at net.haesleinhuepf.clij.kernels.Kernels.fillHistogram(Kernels.java:1372)
	at net.haesleinhuepf.clij.kernels.Kernels.automaticThreshold(Kernels.java:386)
	at net.haesleinhuepf.clij.kernels.Kernels.automaticThreshold(Kernels.java:368)
	at net.haesleinhuepf.clij.macro.modules.AutomaticThreshold.executeCL(AutomaticThreshold.java:32)
	at net.haesleinhuepf.clij.macro.CLIJHandler.lambda$handleExtension$0(CLIJHandler.java:144)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:28)
	at net.haesleinhuepf.clij.macro.CLIJHandler.handleExtension(CLIJHandler.java:51)
	at ij.macro.ExtensionDescriptor.dispatch(ExtensionDescriptor.java:288)
	at ij.macro.Functions.doExt(Functions.java:4787)
	at ij.macro.Functions.getStringFunction(Functions.java:276)
	at ij.macro.Interpreter.getStringTerm(Interpreter.java:1407)
	at ij.macro.Interpreter.getString(Interpreter.java:1385)
	at ij.macro.Interpreter.doStatement(Interpreter.java:329)
	at ij.macro.Interpreter.doBlock(Interpreter.java:671)
	at ij.macro.Interpreter.doStatement(Interpreter.java:320)
	at ij.macro.Interpreter.doFor(Interpreter.java:593)
	at ij.macro.Interpreter.doStatement(Interpreter.java:302)
	at ij.macro.Interpreter.doBlock(Interpreter.java:671)
	at ij.macro.Interpreter.doStatement(Interpreter.java:320)
	at ij.macro.Interpreter.doFor(Interpreter.java:593)
	at ij.macro.Interpreter.doStatement(Interpreter.java:302)
	at ij.macro.Interpreter.doStatements(Interpreter.java:261)
	at ij.macro.Interpreter.run(Interpreter.java:157)
	at ij.macro.Interpreter.run(Interpreter.java:91)
	at ij.macro.Interpreter.run(Interpreter.java:102)
	at ij.plugin.Macro_Runner.runMacro(Macro_Runner.java:161)
	at ij.IJ.runMacro(IJ.java:148)
	at ij.IJ.runMacro(IJ.java:137)
	at net.imagej.legacy.IJ1Helper$3.call(IJ1Helper.java:1108)
	at net.imagej.legacy.IJ1Helper$3.call(IJ1Helper.java:1104)
	at net.imagej.legacy.IJ1Helper.runMacroFriendly(IJ1Helper.java:1055)
	at net.imagej.legacy.IJ1Helper.runMacro(IJ1Helper.java:1104)
	at net.imagej.legacy.plugin.IJ1MacroEngine.eval(IJ1MacroEngine.java:147)
	at org.scijava.script.ScriptModule.run(ScriptModule.java:160)
	at org.scijava.module.ModuleRunner.run(ModuleRunner.java:168)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:127)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:66)
	at org.scijava.thread.DefaultThreadService$3.call(DefaultThreadService.java:238)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
1 Like

Hey @xiphius,

that sounds like a solvable issue. Would you mind sharing the macro you programmed and some example images? Otherwise it’s hard to guess what might be the issue :wink:

Thanks!
Robert

Thanks for the quick response @haesleinhuepf ! I would be more than happy to share the macro and some example images for troubleshooting. I cannot upload them here because of size limitations, but have uploaded some examples to Box along with the CLIJ macro and a previous version of the macro that does not use CLIJ (and works regardless of the number of files/directories).

https://virginia.box.com/s/ygnz6mvhq6ag0fo374j0ne1lrfzk83wy

I included about 5 directories of images, which may be more than you need. Unfortunately this makes it quite large since each directory is ~10 GBs. However, since the macro seems to run fine on anything less than 3 directories, I was unsure how else to reproduce the errors. The total dataset is about 80 directories. The macro expects/requires the following structure:

Image directory: …/mouse_number_folders/images(.tif)

Filenames: numeric code (4 digits)_slide number(4 digits)_specimen number(2 digits)

Expected outputs are an RGB thumbnail of the specimen for each image, a thumbnail of the binary mask for the detected region, and a log file containing the measurements.

*Apologies in advance if it is something stupid in my code that I have overlooked. It is probably far from as efficient as it could be (but at least there are comments throughout :stuck_out_tongue: ).

1 Like

Hey @xiphius,

great. I will dive into this. One more question: What GPU are you using?

Thanks!

Cheers,
Robert

Awesome @haesleinhuepf.

The system I am using is equipped with an AMD Radeon R5 430 and also has an integrated Intel HD Graphics 630. I assume CLIJ is using the AMD, but am not completely sure.

1 Like

Hey @xiphius,

if you open any CLIJ filter from the Plugins>ImageJ on GPU (CLIJ) menu, it shows the default GPU:

Right away one more question. I’m not sure if I understand this piece of code:


		//Use CLIJ to erode
		for (o = 0; o < 14; o++) {
		Ext.CLIJ_push(in);
		Ext.CLIJ_erodeBox(in, out);
		Ext.CLIJ_pullBinary(out);
		Ext.CLIJ_clear();
		//Swap out the current image for the one processed by CLIJ
		selectWindow(duplTrans);
		close();
		selectWindow(binName);
		rename(duplTrans);
		}

Do you think it might be the same as this?

	//Use CLIJ to erode
	Ext.CLIJ_push(in);
	for (o = 0; o < 7; o++) {
		Ext.CLIJ_erodeBox(in, out);
		Ext.CLIJ_erodeBox(out, in);
	}
	Ext.CLIJ_copy(in, out);
	Ext.CLIJ_pullBinary(out);
	selectWindow(duplTrans);
	close();
	selectWindow(binName);
	rename(duplTrans);

I’m asking because all the push and pull calls take time. If you keep the image in the GPU and process it there without pushing and pulling, it will become much faster. Furthermore, 14 box-erosions are actually the same as one minimum filter with radius 14:

	Ext.CLIJ_minimum2DBox(in, out, 14, 14);

So you might be able to replace the for loop with that single line…

Furthermore, all the “clear” calls might not be necessary. Actually, if all processed images have the same size or different names(any you give them names with the in and out variable), I would recommend removing the clear calls and reusing memory instead of clearing and re-allocating. We have a section about how to optimize performance/speedup in our FAQ section, you might find interesting:

I’ll run your macro now one some thousand images and see if I can reproduce the issue with the error message. Stay tuned!

So far thanks for the code, it looks like an interesting problem :slight_smile:

Cheers,
Robert

2 Likes

Interesting! It was actually using the Intel chip. Guess I should have been more careful about specifying the GPU in the macro.

Also, yes. It looks like that is equivalent code. Like I mentioned, this is probably far from as efficient as it could be since I tried to quickly port over any functions I could from my existing macro (rather than writing a new one from scratch). I am also very new to the concept of GPU computing, so I didn’t know how to make things most efficient. I will definitely give that resource a read. Even with all of the inefficiencies, it was still running much faster than regular ImageJ. Thanks for all the helpful tips!

I am going to run it on the AMD as well and see if it produces a similar issue.

2 Likes

Aaand some more comments:

Instead of

selectWindow(duplFluor);
fluorescent = getTitle();
Ext.CLIJ_push(fluorescent);

you can also call

Ext.CLIJ_push(duplFluor);

This looks a bit dangerous to me:

Ext.CLIJ_pullBinary(binScar);
selectWindow(binName);
rename(binScar);

as you pull an image binScar from the GPU (it will pop up with window title saved in the variable binScar). Then, you select a (different?) window called binName and rename it to binScar.


Similar here:

Ext.CLIJ_push(in);
Ext.CLIJ_downsample2D(in, out, downFac, downFac);
Ext.CLIJ_pullBinary(out);
Ext.CLIJ_clear();
selectWindow(binScar);
close();
selectWindow(binName);
rename(binScar);

I think, you would like to do this:

in = getTitle();
Ext.CLIJ_push(in);
close();
Ext.CLIJ_downsample2D(in, out, downFac, downFac);
Ext.CLIJ_pullBinary(out);

In general, I would recommend using good variable names for the images such asthresholded or mask so that the code becomes easier to read. For example, in order to threshold an image and apply opening it, you can do something like this:

// push data to GPU
input = getTitle();
Ext.CLIJ_push(input);

// cleanup ImageJ
run("Close All");

// create a mask using a threshold method
mask = "mask";
Ext.CLIJ_automaticThreshold(input, mask, "Otsu");

// binary opening: erosion + dilation, twice each
temp = "temp";
Ext.CLIJ_erodeBox(mask, temp);
Ext.CLIJ_dilateBox(temp, mask);

// show result
Ext.CLIJ_pullBinary(mask);

There are more examples online, where you can see that I basically never push, pull or clear between the steps:

1 Like

Thanks for all the tips! This will definitely help me in cleaning it up. A lot of the breaks between steps stem from the conversion of the existing macro. Based on your recommendations, I will definitely go through and try to clean these up to help performance.

Also, binName was added as a quick fix to handle the difference between the “pull()” and “pullBinary()” commands. Originally, I had written the code using pull(), which keeps the image title as whatever the variable name was (so if out=“Image1”, the pulled image has the title “Image1”). But pull() was causing errors when trying to do binary operations on gpu output. So I swapped to pullBinary(). For pullBinary(), the pulled image seemed to be titled “slice” regardless of what the input or output variable was called. So I added “binName=slice” as a quick-fix to juggle the name change. Probably not the best way to handle it, but it fixed the issue at the time.

In the original script, the image subtraction was handled by ImageJ, which automatically names the computed image using the format:

"Result of " + [original image name] (which I stored as “binScar” for the binarized scar image)

This variable is referenced quite a few times throughout the original script.

In the CLIJ version of the script, this no longer existed since the computation was now being handled by CLIJ. So, initially there is no image with this name as the output from CLIJ pullBinary() is called “slice.”

Ext.CLIJ_pullBinary(binScar);
selectWindow(binName);
rename(binScar);

^This first bit of code takes the output from CLIJ, (which is just called “slice”) and renames it to match the ImageJ convention. This meant that I didn’t have to comb through the entire code and rename all instances of that variable throughout.

selectWindow(binScar);
close();
selectWindow(binName);
rename(binScar);

^This second bit of code was just for juggling the name change after each CLIJ operation. After CLIJ output a new image from the operation, the current image of the that title was closed and the CLIJ output (named “slice”) was renamed to match what the original image was called. If I rewrite the code the get rid of superfluous pull() commands, then most of this probably becomes unnecessary :P.

1 Like

Also, the code just ran through 8 directories of images without throwing any errors using the AMD chip. It looks like the problem may have been specific to CLIJ trying to use the Intel chip. I have it running on the rest of the dataset now and will see how it fares!

1 Like

Oh great! I had pretty good experience with the intel chips. Basically 99% of CLIJ development was done on an Intel UHD 620… Anyway: Great if it works now!

That’s obviously a bug! I’ll fix it as soon as possible. Thanks for reporting! In the meantime: After calling pull or pullBinary, you can immediately call rename and the just pulled window/image will get that title.

One more tip:

You can simplify this:

selectWindow(duplTrans);
in = getTitle();
out = in+"_1";
//Note: If you use CLIJ pullBinary, the name of the pulled image is called "slice" rather than the title you named it
binName = "slice";
	
//Use CLIJ to threshold
Ext.CLIJ_push(in);
Ext.CLIJ_automaticThreshold(in, out, "Otsu");
Ext.CLIJ_pullBinary(out);
Ext.CLIJ_clear();
//Swap out the current image for the one processed by CLIJ
selectWindow(duplTrans);
close();
selectWindow(binName);
rename(duplTrans);
		
//Use CLIJ to erode
for (o = 0; o < 14; o++) {
	Ext.CLIJ_push(in);
	Ext.CLIJ_erodeBox(in, out);
	Ext.CLIJ_pullBinary(out);
	Ext.CLIJ_clear();
	//Swap out the current image for the one processed by CLIJ
	selectWindow(duplTrans);
	close();
	selectWindow(binName);
	rename(duplTrans);
}

// Cleanup at the end
Ext.CLIJ_clear();

to this (again, just continue processing in the GPU; don’t pull/push in between);

//Use CLIJ to threshold
Ext.CLIJ_push(duplTrans);
close();
thresholded = "thresholded";
Ext.CLIJ_automaticThreshold(duplTrans, thresholded, "Otsu");
// Use CLIJ for erosion
eroded = "eroded";
Ext.CLIJ_minimum2DBox(thresholded, eroded, 14, 14);
Ext.CLIJ_pullBinary(eroded);
rename(duplTrans);

Furthermore, in your non-CLIJ workflow, you did 15 erosions, in the CLIJ version, you did 14. I’m not sure if this irelevent. However, the overall results of CLIJ versus your non-CLIJ workflow are different. I’d recommend checking intermediate results.

Let me know if you need more support!

Cheers,
Robert

2 Likes

Hey @xiphius.

This bug is fixed now. Update your Fiji to get the recent CLIJ version.

Thanks again for reporting the issue!

Cheers,
Robert

1 Like

Hello @haesleinhuepf

Thanks for fixing that! And thanks for all the tips. Now that it is working on the AMD chip, I’ll incorporate your comments to make a more polished version that is more tailored specifically to using CLIJ. Sorry it turned out to be something as trivial as using the wrong chip!

It’s interesting that the intel chip was causing the issues for me then. As long as it is working on one of the chips, that is good enough for me. :slight_smile:

Not a big issue, but thanks for pointing that out. I had tweaked one of them more recently and must not have also changed the other one to match. Ideally, I will now only be using the CLIJ version of the pipeline and can retire the slower one.

1 Like

Hey @xiphius,

you’re welcome! When you’re done with streamlining the code and run it again, would you be so kind and post here an estimation of spared time or speedup factor? I might take a screenshot of that and post it on twitter :wink:

If you need any further support, let me know!

Cheers,
Robert

No problem @haesleinhuepf

I ran the baseline script and the CLIJ script on one directory of images and then averaged the run-time per image. Since I have ~80 directories and all are about equivalent in terms of the number of images and distribution of sizes for the images, this should be somewhat representative of the time savings for the entire data set.

ImageJ script took 11851 msec per image
CLIJ script took 6429 msec per image (w/minimum2DBox)
CLIJ script took 5904 msec per image (w/erodeBox)

…so it looks like about twice as fast! (bear in mind that there is still a lot of non-CLIJ stuff in the script like routines for Analyze particles and manipulations of the ROI manager)

The AMD GPU seems to run a bit slower than the Intel GPU, but the AMD GPU seems to work fine and still results in very significant time savings! (with a time savings of 5.9 sec/image, this cuts the run time on the entire data set by ~6.5 hours)

As an interesting side note - you mentioned in your post above that:

However, when I did this originally, it seemed like the code was running slower. So I tested it both with the minimum2DBox() and also with the for loop and erodeBox(). Turns out, it is actually faster to use the for loop with erodeBox(). Guess that fewer lines of code doesn’t always equal more efficient run time :stuck_out_tongue:

Thanks again!

1 Like