CLIJ - memory allocation failure / image memory limits?

Hi @haesleinhuepf

Another CLIJ challenge for you. Trying to run a Blur2D on an imagestack (~500MB) without much success owing to a memory allocation failure. If I resample down to ~380MB it works, so I’m wondering - is there a guide as to the CLIJ limit for each function vs available GPU memory?

Trying to run “Blur2D on GPU” on the 500MB stack returns:

conversion target ClearCLImage [mClearCLContext=ClearCLContext [device=ClearCLDevice [mClearCLPlatform=ClearCLPlatform [name=NVIDIA CUDA], name=GRID P40-8Q]], mImageType=IMAGE3D, mImageChannelOrder=R, mImageChannelDataType=UnsignedInt8, mDimensions=[788, 788, 940], getMemAllocMode()=Best, getHostAccessType()=ReadWrite, getKernelAccessType()=ReadWrite, getBackend()=net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL@62ad780, getPeerPointer()=net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@44bdfea1]
conversion target ClearCLImage [mClearCLContext=ClearCLContext [device=ClearCLDevice [mClearCLPlatform=ClearCLPlatform [name=NVIDIA CUDA], name=GRID P40-8Q]], mImageType=IMAGE3D, mImageChannelOrder=R, mImageChannelDataType=UnsignedInt8, mDimensions=[788, 788, 940], getMemAllocMode()=Best, getHostAccessType()=ReadWrite, getKernelAccessType()=ReadWrite, getBackend()=net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL@62ad780, getPeerPointer()=net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@78a19e14]
java.lang.RuntimeException: net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -4 -> CL_MEM_OBJECT_ALLOCATION_FAILURE
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:114)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:64)
	at net.haesleinhuepf.clij.clearcl.ClearCLKernel.run(ClearCLKernel.java:480)
	at net.haesleinhuepf.clij.clearcl.ClearCLKernel.run(ClearCLKernel.java:462)
	at net.haesleinhuepf.clij.utilities.CLKernelExecutor.lambda$enqueue$0(CLKernelExecutor.java:333)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:28)
	at net.haesleinhuepf.clij.utilities.CLKernelExecutor.enqueue(CLKernelExecutor.java:331)
	at net.haesleinhuepf.clij.CLIJ.lambda$execute$0(CLIJ.java:272)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:28)
	at net.haesleinhuepf.clij.CLIJ.execute(CLIJ.java:249)
	at net.haesleinhuepf.clij.CLIJ.execute(CLIJ.java:229)
	at net.haesleinhuepf.clij.kernels.Kernels.executeSeparableKernel(Kernels.java:705)
	at net.haesleinhuepf.clij.kernels.Kernels.blur(Kernels.java:558)
	at net.haesleinhuepf.clij.macro.modules.Blur2D.executeCL(Blur2D.java:30)
	at net.haesleinhuepf.clij.macro.AbstractCLIJPlugin.run(AbstractCLIJPlugin.java:339)
	at ij.plugin.filter.PlugInFilterRunner.processOneImage(PlugInFilterRunner.java:266)
	at ij.plugin.filter.PlugInFilterRunner.<init>(PlugInFilterRunner.java:114)
	at ij.IJ.runUserPlugIn(IJ.java:232)
	at ij.IJ.runPlugIn(IJ.java:193)
	at ij.Executer.runCommand(Executer.java:137)
	at ij.Executer.run(Executer.java:63)
	at java.lang.Thread.run(Thread.java:748)
Caused by: net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -4 -> CL_MEM_OBJECT_ALLOCATION_FAILURE
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkOpenCLError(BackendUtils.java:346)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.lambda$enqueueKernelExecution$21(ClearCLBackendJOCL.java:736)
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkExceptions(BackendUtils.java:171)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.enqueueKernelExecution(ClearCLBackendJOCL.java:735)
	at net.haesleinhuepf.clij.clearcl.ClearCLKernel.lambda$run$0(ClearCLKernel.java:492)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	... 23 more

clInfo details in case that helps:

Available CL backends:
  * net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL@23994466
    Functional backend:net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL@3e48d0f9
    Best backend:net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL@60bd1c78
Used CL backend: net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL@275ccee8
ClearCL: ClearCLBase [mClearCLBackendInterface=net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL@275ccee8, mPeerPointer=null]
  Number of platforms:1
  [0] NVIDIA CUDA
     Number of devices: 1
     Available devices: 
     [0] GRID P40-8Q 
        NumberOfComputeUnits: 30 
        Clock frequency: 1531 
        Version: 1.2 
        Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer 
        GlobalMemorySizeInBytes: 8589934592 
        LocalMemorySizeInBytes: 49152 
        MaxMemoryAllocationSizeInBytes: 2147483648 
        MaxWorkGroupSize: 1024 
        Compatible image types: [SignedNormalizedInt8, SignedNormalizedInt16, UnsignedNormalizedInt8, UnsignedNormalizedInt16, SignedInt8, SignedInt16, SignedInt32, UnsignedInt8, UnsignedInt16, UnsignedInt32, HalfFloat, Float]
Best GPU device for images: GRID P40-8Q
Best largest GPU device: GRID P40-8Q
Best CPU device: GRID P40-8Q
1 Like

Hey @Matt,

interesting question. What immediately jumps to my eyes is: blur2D on an image stack is maybe not a good idea. Try blur3D and set sigmaZ=0 to do a slice-by-slice Gaussian blur.

Furthermore, your GPU has 8 GB of memory and runs out of memory when processing an 500 MB stack. I conclude that you did other operations in advance. Let me mimic your scenario by modifying the excludeLabelsOnEdges macro. I have four processing steps and afterwards take a look at memory consumption in the GPU:

Ext.CLIJ_blur2D(input, blurred, 2, 2);
Ext.CLIJ_automaticThreshold(blurred, mask, "Otsu");
Ext.CLIJx_connectedComponentsLabeling(mask, labelmap);
Ext.CLIJx_excludeLabelsOnEdges(labelmap, labelmap_without_edges);
// report memory consumption afterwards
Ext.CLIJ_reportMemory();

The output is:

GPU contains 7 images.
- labelmap[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@6d09673f] 254.0 kb
- labelmap_without_edges[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@2236e7fc] 254.0 kb
- blobs.gif[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@16620acf] 254.0 kb
- blobs.gif[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4a3ec73c]* 254.0 kb
- blurred[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@55c79cf4] 254.0 kb
- blurred[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@209658cb]* 254.0 kb
- mask[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@405b98b7] 254.0 kb
= 1.7 Mb
* some images are stored twice for technical reasons.

You see some images exist twice “for technical reasons”. This might be the issue you’re trying to tackle down. The background is some performance optimization done here improving processing speed by spending more memory. This strategy is under revision for clij2, but we need to live with it for the moment…

I see two options for dealing with this:

a) release memory as soon as you don’t need it anymore:

Ext.CLIJ_blur2D(input, blurred, 2, 2);
Ext.CLIJ_release(input);
Ext.CLIJ_automaticThreshold(blurred, mask, "Otsu");
Ext.CLIJ_release(blurred);
Ext.CLIJx_connectedComponentsLabeling(mask, labelmap);
Ext.CLIJ_release(mask);
Ext.CLIJx_excludeLabelsOnEdges(labelmap, labelmap_without_edges);

Ext.CLIJ_reportMemory();

Prints out:
GPU contains 2 images.

  • labelmap[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@2fbacca] 254.0 kb
  • labelmap_without_edges[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4b2f99d5] 254.0 kb
    = 508.0 kb

Releasing and re-allocating memory leads to mild performance-drop as you can read in the FAQ. Reusing-memory is faster. Thus, you can try:

b) The flip-flop strategy

Define only two image-names/variables (flip and flop, sometimes you need a third, flap) for processing and use only them:

run("Blobs (25K)");
run("32-bit"); // necessary for the flip-flop strategy because some steps produce 32-bit images
rename("flip");
flip = getTitle();
flop = "flop";

// Init GPU
run("CLIJ Macro Extensions", "cl_device=");
Ext.CLIJ_clear();

// push data to GPU
Ext.CLIJ_push(flip);

// actual processing workflow
Ext.CLIJ_blur2D(flip, flop, 2, 2);
Ext.CLIJ_automaticThreshold(flop, flip, "Otsu");
Ext.CLIJx_connectedComponentsLabeling(flip, flop);
Ext.CLIJx_excludeLabelsOnEdges(flop, flip);

Ext.CLIJ_reportMemory();

This will print:

GPU contains 4 images.
- flop[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@78492648] 254.0 kb
- flop[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@686f57d9]* 254.0 kb
- flip[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@15c3cca] 254.0 kb
- flip[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@42a76972]* 254.0 kb
= 1016.0 kb
* some images are stored twice for technical reasons.

The flip-flop strategy is a bit dangerous because it’s easy to make an error. However, if your workflow runs fine on smaller images, and you just want to make it work on bigger images, you should totally use it.

Let me know if it helps!

Cheers,
Robert

3 Likes

Hi Robert,

Thanks for the quick response!
Working from a completely fresh boot of FIJI I go through the following steps:

run("Image Sequence...", "open=[*XXXXXXX*/DimpleSample_Med__rec0071.bmp] file=_r sort");
// run("CLIJ Macro Extensions", "cl_device=[GRID P40-8Q]");
// Ext.CLIJ_reportMemory();
run("Report about GPU memory usage", "cl_device=[GRID P40-8Q]");
// Ext.CLIJ_push("DimpleSample_Med_Rec");
// Ext.CLIJ_blur3D("DimpleSample_Med_Rec", "CLIJ_blur3D_destination_DimpleSample_Med_Rec", 2.0, 2.0, 0.0);
// Ext.CLIJ_pull("CLIJ_blur3D_destination_DimpleSample_Med_Rec");
run("Blur3D on GPU", "cl_device=[GRID P40-8Q] source=DimpleSample_Med_Rec sigmax=2 sigmay=2 sigmaz=0");
selectWindow("CLIJ_blur3D_destination_DimpleSample_Med_Rec");
// Ext.CLIJ_reportMemory();
run("Report about GPU memory usage", "cl_device=[GRID P40-8Q]");

Looking at the log I get:

GPU contains 0 images. *as expected*
= 0.0 b
 
GPU contains 0 images. *not as expected*
= 0.0 b

It seems like the image isn’t getting pushed at all.
Manually pushing via macro seems to at least get the image into the GPU:

run("CLIJ Macro Extensions", "cl_device=[GRID P40-8Q]");
input = getTitle();
Ext.CLIJ_push(input);
Ext.CLIJ_reportMemory();

**Log:**
GPU contains 1 images.
- DimpleSample_Med_Rec[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1ac076d7] 556.6 Mb
= 556.6 Mb

Extending this further I can see that the files are all being created in the GPU even though the resulting pull is a blank stack, and I’m only at 2.2GB of GPU usage

run("CLIJ Macro Extensions", "cl_device=[GRID P40-8Q]");
Ext.CLIJ_clear();
Ext.CLIJ_reportMemory();

input = getTitle();
blur = "blur";
Ext.CLIJ_push(input);
Ext.CLIJ_reportMemory();

Ext.CLIJ_blur3D(input, blur, 2, 2, 0);

Ext.CLIJ_reportMemory();
Ext.CLIJ_pull(blur);

**Log:**
GPU contains 0 images.
= 0.0 b
 
GPU contains 1 images.
- DimpleSample_Med_Rec[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@563fd914] 556.6 Mb
= 556.6 Mb
 
GPU contains 4 images.
- DimpleSample_Med_Rec[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@563fd914] 556.6 Mb
- DimpleSample_Med_Rec[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7f63ce6f]* 556.6 Mb
- blur[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@6a88390e] 556.6 Mb
- blur[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@39be695a]* 556.6 Mb
= 2.2 Gb
* some images are stored twice for technical reasons.

Any other steps that I can run to troubleshoot further?

Thanks in advance!
Cheers,
Matt

1 Like

Hey @Matt,

that’s a tough case. First of all: By clicking CLIJs menus, or calling them via the run("..."); method, no image is expected in GPU memory afterwards. All menus do push(), whateverAction();, pull(); and clear();. That’s not recommended for gaining speedup.

Two ideas:

Thanks for your support!

Cheers,
Robert

Hi @haesleinhuepf,

I realised soon after posting that there was probably a hidden clear() going on in the background - makes perfect sense really (quick way to fill the memory otherwise!)

The macro console is attached below:
CLIJ_DebugConsole_Macro.txt (73.4 KB)

The benchmarking log:

CPU mean filter no 1 took 948 msec
CPU mean filter no 2 took 794 msec
CPU mean filter no 3 took 774 msec
CPU mean filter no 4 took 750 msec
CPU mean filter no 5 took 772 msec
CPU mean filter no 6 took 743 msec
CPU mean filter no 7 took 724 msec
CPU mean filter no 8 took 720 msec
CPU mean filter no 9 took 710 msec
CPU mean filter no 10 took 701 msec
Pushing one image to the GPU took 24 msec
GPU mean filter no 1 took 2708 msec
GPU mean filter no 2 took 37 msec
GPU mean filter no 3 took 42 msec
GPU mean filter no 4 took 38 msec
GPU mean filter no 5 took 38 msec
GPU mean filter no 6 took 39 msec
GPU mean filter no 7 took 40 msec
GPU mean filter no 8 took 42 msec
GPU mean filter no 9 took 40 msec
GPU mean filter no 10 took 39 msec
Pulling one image from the GPU took 154 msec

No errors in the console as far as I can tell for the benchmark macro.

Could the fact that the system is a Virtual Machine using 1/3 of a P40 have any impact/explain this behaviour at all?

Thanks again for looking into this for me.

Cheers,
Matt

1 Like

Hi @Matt,

it appears, clij runs on that GPU - in general. However every GPU has limitations given by hardware and driver. To analyse your situation a bit further, would you mind executing some memory allocation stress tests? They should tell us what you can do with your GPU and what not. Both tests are designed to crash at some point. Just saying :wink:

a) Allocate many images

// init GPU
run("CLIJ Macro Extensions", "cl_device=");
Ext.CLIJ_clear();

// Get test data
width = 1024;
height = 1024;
slices = 100;

for (i = 0; i < 100; i++) {
	
	newImage("image" + i, "8-bit ramp", width, height, slices);
	
	// push image to GPU
	Ext.CLIJ_push("image" + i);
	
	if(i > 0) {
		Ext.CLIJ_blur3D("image" + i, "image" + (i-1), 1, 1, 0);
	}
	
	// cleanup imagej
	run("Close All");
	
	Ext.CLIJ_reportMemory();
}

b) allocate images of increasing size

run("Close All");
// init GPU
run("CLIJ Macro Extensions", "cl_device=");
Ext.CLIJ_clear();

for (i = 0; i < 100; i++) {
	// Get test data
	width = 1024;
	height = 1024;
	slices = 100 * i;
	newImage("image" + i, "8-bit ramp", width, height, slices);
	
	// push image to GPU
	Ext.CLIJ_push("image" + i);
	
	if(i > 0) {
		Ext.CLIJ_blur3D("image" + i, "image" + (i-1), 1, 1, 0);
	}
	
	// cleanup imagej
	run("Close All");
	
	Ext.CLIJ_reportMemory();
	Ext.CLIJ_clear();
}

Last but not least, have you tried the blur3DSliceBySlice() method? Does it also cause these errors?

Cheers,
Robert

1 Like

Hi @haesleinhuepf,

Thanks for the tips.
I added in a line to the scripts so I could pin down exactly where the failure happens in the console/log:
eval("script", "System.err.println('Loop "+i+" of 100');"); and print("Loop "+i);

For multiple images, the collapse occurs between loop 23 and 24 (~5GB load on the GPU)

Loop 23
GPU contains 48 images.
- image13[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4ede2df3] 100.0 Mb
- image13[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@133b4e5c]* 100.0 Mb
- image12[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@eef3466] 100.0 Mb
- image12[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@5f8ca898]* 100.0 Mb
- image15[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@32eea0f8] 100.0 Mb
- image15[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1e27b323]* 100.0 Mb
- image14[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@f0be8] 100.0 Mb
- image14[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1f92dccc]* 100.0 Mb
- image11[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1e9ebc4b] 100.0 Mb
- image11[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@32231274]* 100.0 Mb
- image10[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@2b719b2b] 100.0 Mb
- image10[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@320b24c2]* 100.0 Mb
- image5[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1aea8b4b] 100.0 Mb
- image5[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4eb228c]* 100.0 Mb
- image6[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@abb740f] 100.0 Mb
- image6[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@5c587f35]* 100.0 Mb
- image3[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@351b0138] 100.0 Mb
- image3[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4f2dfc2e]* 100.0 Mb
- image4[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@415f5416] 100.0 Mb
- image4[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7a824696]* 100.0 Mb
- image9[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@593787a7] 100.0 Mb
- image9[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4cf67f7a]* 100.0 Mb
- image7[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@764f9f1c] 100.0 Mb
- image7[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@369b47ad]* 100.0 Mb
- image8[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@60dd7e33] 100.0 Mb
- image8[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1375a2de]* 100.0 Mb
- image23[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@257e5973] 100.0 Mb
- image23[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7cdf589]* 100.0 Mb
- image1[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7203dff4] 100.0 Mb
- image1[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@6a5c5173]* 100.0 Mb
- image20[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@3c2ba82b] 100.0 Mb
- image20[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7cc4cd04]* 100.0 Mb
- image2[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@6f99b344] 100.0 Mb
- image2[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@40aceb41]* 100.0 Mb
- image22[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1f363e4e] 100.0 Mb
- image22[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@19674f8]* 100.0 Mb
- image0[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@ed99ab9] 100.0 Mb
- image0[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@557c909f]* 100.0 Mb
- image21[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@56fc697f] 100.0 Mb
- image21[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@52d61afa]* 100.0 Mb
- image17[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@6dd530ac] 100.0 Mb
- image17[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@c5b5ac8]* 100.0 Mb
- image16[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1f6fff76] 100.0 Mb
- image16[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@53bb8c74]* 100.0 Mb
- image19[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@18f986b2] 100.0 Mb
- image19[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@8b4ad04]* 100.0 Mb
- image18[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@72336917] 100.0 Mb
- image18[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4dd3a5b5]* 100.0 Mb
= 4.7 Gb
* some images are stored twice for technical reasons.
 
Loop 24
GPU contains 50 images.
- image13[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4ede2df3] 100.0 Mb
- image13[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@133b4e5c]* 100.0 Mb
- image12[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@eef3466] 100.0 Mb
- image12[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@5f8ca898]* 100.0 Mb
- image15[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@32eea0f8] 100.0 Mb
- image15[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1e27b323]* 100.0 Mb
- image14[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@f0be8] 100.0 Mb
- image14[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1f92dccc]* 100.0 Mb
- image11[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1e9ebc4b] 100.0 Mb
- image11[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@32231274]* 100.0 Mb
- image10[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@2b719b2b] 100.0 Mb
- image10[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@320b24c2]* 100.0 Mb
- image5[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1aea8b4b] 100.0 Mb
- image5[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4eb228c]* 100.0 Mb
- image6[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@abb740f] 100.0 Mb
- image6[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@5c587f35]* 100.0 Mb
- image3[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@351b0138] 100.0 Mb
- image3[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4f2dfc2e]* 100.0 Mb
- image4[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@415f5416] 100.0 Mb
- image4[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7a824696]* 100.0 Mb
- image9[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@593787a7] 100.0 Mb
- image9[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4cf67f7a]* 100.0 Mb
- image7[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@764f9f1c] 100.0 Mb
- image7[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@369b47ad]* 100.0 Mb
- image8[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@60dd7e33] 100.0 Mb
- image8[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1375a2de]* 100.0 Mb
- image24[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1b9724b9] 100.0 Mb
- image24[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@3b1c8c8a]* 100.0 Mb
- image23[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@257e5973] 100.0 Mb
- image23[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7cdf589]* 100.0 Mb
- image1[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7203dff4] 100.0 Mb
- image1[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@6a5c5173]* 100.0 Mb
- image20[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@3c2ba82b] 100.0 Mb
- image20[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7cc4cd04]* 100.0 Mb
- image2[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@6f99b344] 100.0 Mb
- image2[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@40aceb41]* 100.0 Mb
- image22[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1f363e4e] 100.0 Mb
- image22[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@19674f8]* 100.0 Mb
- image0[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@ed99ab9] 100.0 Mb
- image0[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@557c909f]* 100.0 Mb
- image21[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@56fc697f] 100.0 Mb
- image21[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@52d61afa]* 100.0 Mb
- image17[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@6dd530ac] 100.0 Mb
- image17[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@c5b5ac8]* 100.0 Mb
- image16[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@1f6fff76] 100.0 Mb
- image16[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@53bb8c74]* 100.0 Mb
- image19[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@18f986b2] 100.0 Mb
- image19[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@8b4ad04]* 100.0 Mb
- image18[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@72336917] 100.0 Mb
- image18[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@4dd3a5b5]* 100.0 Mb
= 4.9 Gb
* some images are stored twice for technical reasons.

while for the increasing size, the limit seems to be between loops 5 & 6 - ~2.0 - 2.3GB

Loop 5
GPU contains 4 images.
- image5[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@555830c9] 500.0 Mb
- image5[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@584ff96a]* 500.0 Mb
- image4[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@15879ee9] 500.0 Mb
- image4[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7f03f119]* 500.0 Mb
= 2.0 Gb
* some images are stored twice for technical reasons.
 
Loop 6
GPU contains 4 images.
- image5[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@7b16e7a2] 600.0 Mb
- image5[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@50cbd85a]* 600.0 Mb
- image6[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@72441a87] 600.0 Mb
- image6[net.haesleinhuepf.clij.clearcl.ClearCLPeerPointer@ede3ff8]* 600.0 Mb
= 2.3 Gb
* some images are stored twice for technical reasons.

Both times the error is the CL_MEM_OBJECT_ALLOCATION_FAILURE

The good news is that the blur3DSliceBySlice() method works on the 600MB stack - thank you for the suggestion!

Interestingly, if I run

run("CLIJ Macro Extensions");
Ext.CLIJ_clear();

input = getTitle();
output = "output";

Ext.CLIJ_push(input);

Ext.CLIJ_blur3DSliceBySlice(input, output, 2, 2);
eval("script", "System.err.println('"+output+"');");
Ext.CLIJ_reportMemory();
output = "out_2";
Ext.CLIJ_blur3DSliceBySlice(input, output, 2, 2);
eval("script", "System.err.println('"+output+"');");
Ext.CLIJ_reportMemory();
output = "out_3";
Ext.CLIJ_blur3DSliceBySlice(input, output, 2, 2);
eval("script", "System.err.println('"+output+"');");
Ext.CLIJ_reportMemory();
output = "out_4";
Ext.CLIJ_blur3DSliceBySlice(input, output, 2, 2);
eval("script", "System.err.println('"+output+"');");
Ext.CLIJ_reportMemory();
output = "out_5";
Ext.CLIJ_blur3DSliceBySlice(input, output, 2, 2);
eval("script", "System.err.println('"+output+"');");
Ext.CLIJ_reportMemory();
output = "out_6";
Ext.CLIJ_blur3DSliceBySlice(input, output, 2, 2);
eval("script", "System.err.println('"+output+"');");
Ext.CLIJ_reportMemory();

Ext.CLIJ_pull(output);

I also get up to ~5.4GB on the GPU (crashes out ~out_4 or so) so that might be the limit for the GPU (which I can certainly work with!) but still puzzled as to why I cap out at ~2.0/2.3 for the blur3D method or increasing individual image stacks.

Thanks for the assistance so far - I’ll proceed with using the slice-by-slice filtering for now - and apologies for the wall of text!

Cheers,
Matt

1 Like

Hey @Matt,

happy to hear your workflow works now. The thing about Blur3D is also internally, it is a separable filter which has to store intermediate results and therefore needs more memory. All for the purpose of speed.

Again: Let me know how it goes!

Cheers,
Robert

1 Like