Cl_mem_object_allocation_failure

Hello @haesleinhuepf
I’m having the CL_MEM_OBJECT_ALLOCATION_FAILURE error when attempting to load a 4GB image.

I see in the documentation
image

In my registry


I see that

TdrDelay

Already exists and is set to 8.
image

Am I supposed to edit it as follows

"TdrDelay"=dword:0000003c

Or add a new key ?

Thank you very much

1 Like

Hey @LPUoO,

you should change the value if it exists already. I may not be possible to create another one with the same name. Afterwards restart your machine.

Furthermore, a questions: on what hardware are you running? I’m asking because the maximum image size limitation is hardware dependent. You can run this macro to figure out what’s the limits of your hardware:

Let me know if I can help further!

Cheers,
Roberrt

1 Like

Thank you @haesleinhuepf
So I created the entry for :

TdrDdiDelay

image
image image

Hope this is good now?

According to

Ext.CLIJ2_clInfo();
     [0] GeForce RTX 2060 
        NumberOfComputeUnits: 30 
        Clock frequency: 1680 
        Version: 1.2 
        Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics 
        GlobalMemorySizeInBytes: 6442450944 
        LocalMemorySizeInBytes: 49152 
        MaxMemoryAllocationSizeInBytes: 1610612736 
        MaxWorkGroupSize: 1024 
        Compatible image types: [SignedNormalizedInt8, SignedNormalizedInt16, UnsignedNormalizedInt8, UnsignedNormalizedInt16, SignedInt8, SignedInt16, SignedInt32, UnsignedInt8, UnsignedInt16, UnsignedInt32, HalfFloat, Float]

So I’m guessing this would be sufficient to work with 4GB images ? (for gausian blur for example)

EDIT: I see MaxMemoryAllocationSizeInBytes: 1610612736 . What does this mean? Does this mean that max image size is 1.6 GB?

1 Like

Yes. It also makes sense in most workflows as you will need memory for other intermediate result images. If you had a 4 GB memory image in a 6 GB graphics card, the only thing you could do effectively would be downsampling or maximum-projections. For anything else, there is no space :wink:

May I ask what you plan to do with the image? Maybe it’s possible to process it in tiles or slice by slice?

Cheers,
Robert

1 Like

Thank you very much @haesleinhuepf ,

I actually just wanted to do a simple

// gaussian blur
image1 = "for seg";
Ext.CLIJ2_push(image1);
image2 = "gaussian_blur-870825594";
sigma_x = 2.0;
sigma_y = 2.0;
sigma_z = 0.0;
Ext.CLIJ2_gaussianBlur3D(image1, image2, sigma_x, sigma_y, sigma_z);
Ext.CLIJ2_pull(image2);

I sorted out the initial CL_MEM_OBJECT_ALLOCATION_FAILURE problem by drastically cropping the image thus reducing it to about 300 MB.



I see that

is exactly

divided by 4.

I remember that in the GPU-accelerated Image Processing with CLIJ2 - [NEUBIASAcademy@Home] Webinar you said that the image should be about a quarter of the GPU memory.

So is the MaxMemoryAllocationSizeInBytes: 1610612736 a hard limit and I won’t be able to push images larger than that or can I for example load larger images than that and just do maximum-projections for example?

Thank you

1 Like

Yes, that’s the limit set by the GPU driver.

What I mentioned in the webinar is more related to practical implementing workflows: If you want to do background subtraction for example, you need space for three images: input, background, result. Other operations need even more. Thus, if you want to develop workflows on GPUs conveniently without cleaning up memory after every single step, you want to have enough free memory. If I was you, I would process the 4GB dataset in 200-500MB blocks, potentially downsampled depending on scientific goal.

Btw: You may remember that from the webinar, but for others reading this:
Depending on sigma, a Gaussian blur on the CPU might be faster than pushing the image to the GPU, blurring it there and pulling the result back. Push and pull take time. GPU-acceleration makes most sense if multiple steps are executed on the GPU between push and pull :slightly_smiling_face:

1 Like

Just FYI: There are GPUs where this factor is not 4 but 1.something. I just saw an AMD Vega 7 (integrated GPU) with access to 6 GB of memory where 4 GB can be allocated in one image. Side note: The computer has 16 GB RAM and an AMD Ryzen 4700U CPU.

1 Like