Problems and questions about CLIJ

Hi @haesleinhuepf,

I’m trying to install CLIJ/CLIJ2 on Fiji, but when i’m trying to run the benchmarking macro, i’m stuck with this error message:

(Fiji Is Just) ImageJ 2.1.0/1.53c; Java 1.8.0_172 [64-bit]; Linux 5.4.0-70-generic; 143MB of 11000MB (1%)
 
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: Could not initialize class net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL
	at net.imagej.legacy.LegacyService.runLegacyCompatibleCommand(LegacyService.java:307)
	at net.imagej.legacy.DefaultLegacyHooks.interceptRunPlugIn(DefaultLegacyHooks.java:166)
	at ij.IJ.runPlugIn(IJ.java)
	at ij.Executer.runCommand(Executer.java:150)
	at ij.Executer.run(Executer.java:68)
	at ij.IJ.run(IJ.java:317)
	at ij.IJ.run(IJ.java:328)
	at ij.macro.Functions.doRun(Functions.java:686)
	at ij.macro.Functions.doFunction(Functions.java:98)
	at ij.macro.Interpreter.doStatement(Interpreter.java:278)
	at ij.macro.Interpreter.doStatements(Interpreter.java:264)
	at ij.macro.Interpreter.run(Interpreter.java:160)
	at ij.macro.Interpreter.run(Interpreter.java:93)
	at ij.macro.Interpreter.run(Interpreter.java:104)
	at ij.plugin.Macro_Runner.runMacro(Macro_Runner.java:161)
	at ij.IJ.runMacro(IJ.java:153)
	at ij.IJ.runMacro(IJ.java:142)
	at net.imagej.legacy.IJ1Helper$3.call(IJ1Helper.java:1148)
	at net.imagej.legacy.IJ1Helper$3.call(IJ1Helper.java:1144)
	at net.imagej.legacy.IJ1Helper.runMacroFriendly(IJ1Helper.java:1095)
	at net.imagej.legacy.IJ1Helper.runMacro(IJ1Helper.java:1144)
	at net.imagej.legacy.plugin.IJ1MacroEngine.eval(IJ1MacroEngine.java:145)
	at org.scijava.script.ScriptModule.run(ScriptModule.java:157)
	at org.scijava.module.ModuleRunner.run(ModuleRunner.java:165)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:124)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:63)
	at org.scijava.thread.DefaultThreadService.lambda$wrap$2(DefaultThreadService.java:225)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: Could not initialize class net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at net.imagej.legacy.LegacyService.runLegacyCompatibleCommand(LegacyService.java:303)
	... 30 more
Caused by: java.lang.NoClassDefFoundError: Could not initialize class net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL
	at net.haesleinhuepf.clij.clearcl.backend.ClearCLBackends.getBestBackend(ClearCLBackends.java:117)
	at net.haesleinhuepf.clij.CLIJ.getAvailableDeviceNames(CLIJ.java:212)
	at net.haesleinhuepf.clij.macro.CLIJMacroExtensions.run(CLIJMacroExtensions.java:32)
	at org.scijava.command.CommandModule.run(CommandModule.java:196)
	... 8 more

I already installed ‘ocl-icd-opencl-dev’, because i had a similar error just before this one, so i looked in the troubleshooting section (Troubleshooting | CLIJ) and found this advice, but it seems there are other troubles anyway.

Also, i looked into the reference section (CLIJ reference for ImageJ Jython | CLIJ) and the examples section (clij-docs/src/main/jython at master · clij/clij-docs · GitHub), and i wanted to know if it’s possible to perform an operation on a list of numbers, or to use custom math formulas. In fact, i would like to use CLIJ to improve the speed of a plugin i made by using the GPU capabilities. This plugin is to be used on a batch of images to make montages or composites, hence the need for speed. You can download it here, if you want to see:

Best regards, Marc.

1 Like

Hi @mmongy ,

to solve your first problem first: can you install and run clinfo and post its output here?

Hi, Robert @haesleinhuepf ,

Here is the output of clinfo:

mmongy@mmongy-HP-Z400-Workstation:~/Téléchargements$ clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 9.1.84
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     Quadro 2000
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.1 CUDA
  Driver Version                                  390.141
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0f:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               4
  Max clock frequency                             1251MHz
  Compute Capability (NV)                         2.1
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              1006108672 (959.5MiB)
  Error Correction support                        No
  Max memory allocation                           251527168 (239.9MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        65536 (64KiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        32768
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
  clCreateContext(NULL, ...) [default]            Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

This suggests your driver is very old. Please update it. Current versions should be OpenCL 1.2 and 460…

Let me know if the benchmark script works then!

Hi, Robert,

This PC embarks a quite old Quadro 2000 which is not supported by a driver more recent than the 390.141, so i tried an other computer, with a Quadro K4200, with Windows 10 as OS. After following the advice on

and

i installed the last driver for Windows 10, and now the output of the benchmarking macro gives

CPU mean filter no 1 took 7562 msec
CPU mean filter no 2 took 2922 msec
CPU mean filter no 3 took 1484 msec
CPU mean filter no 4 took 2125 msec
CPU mean filter no 5 took 5593 msec
CPU mean filter no 6 took 5109 msec
CPU mean filter no 7 took 6968 msec
CPU mean filter no 8 took 5343 msec
CPU mean filter no 9 took 10999 msec
CPU mean filter no 10 took 5469 msec
Pushing one image to the GPU took 46 msec
CLIJ2 GPU mean filter no 1 took 1343 msec
CLIJ2 GPU mean filter no 2 took 31 msec
CLIJ2 GPU mean filter no 3 took 30 msec
CLIJ2 GPU mean filter no 4 took 32 msec
CLIJ2 GPU mean filter no 5 took 31 msec
CLIJ2 GPU mean filter no 6 took 31 msec
CLIJ2 GPU mean filter no 7 took 31 msec
CLIJ2 GPU mean filter no 8 took 16 msec
CLIJ2 GPU mean filter no 9 took 31 msec
CLIJ2 GPU mean filter no 10 took 32 msec
Preparing the convolution kernel in GPU memory took 359 msec
CLIJ2 GPU mean filter using convolution no 1 took 891 msec
CLIJ2 GPU mean filter using convolution no 2 took 265 msec
CLIJ2 GPU mean filter using convolution no 3 took 250 msec
CLIJ2 GPU mean filter using convolution no 4 took 257 msec
CLIJ2 GPU mean filter using convolution no 5 took 245 msec
CLIJ2 GPU mean filter using convolution no 6 took 262 msec
CLIJ2 GPU mean filter using convolution no 7 took 250 msec
CLIJ2 GPU mean filter using convolution no 8 took 264 msec
CLIJ2 GPU mean filter using convolution no 9 took 250 msec
CLIJ2 GPU mean filter using convolution no 10 took 257 msec
CLIJ GPU mean filter no 1 took 7775 msec
CLIJ GPU mean filter no 2 took 31 msec
CLIJ GPU mean filter no 3 took 32 msec
CLIJ GPU mean filter no 4 took 26 msec
CLIJ GPU mean filter no 5 took 30 msec
CLIJ GPU mean filter no 6 took 31 msec
CLIJ GPU mean filter no 7 took 32 msec
CLIJ GPU mean filter no 8 took 31 msec
CLIJ GPU mean filter no 9 took 31 msec
CLIJ GPU mean filter no 10 took 27 msec
Pulling one image from the GPU took 438 msec
GPU: Quadro K4200
Memory in GB: 4
OpenCL version: 1.2

There are no errors, so i guess CLIJ works now, which is a good thing, but it doesn’t currently help with my plugin. Maybe if you can advise me with a good place to start…

Thanks anyway, Marc.

1 Like

Great to hear that! Using a more recent GPU would have been my other hint anyway. The Quadro K4200 has a memory bandwidth of 173 GB/s. You could use a recent gaming card (RTX or RX) which typically come with memory bandwidth 400-900 GB/s. Your current GPU is good. I’m just saying there is potential when using GPUs with GDDR6 or HBM2e memory.

Sure! I’m happy to help now that it works on your PC :slight_smile:

Just in case you have not seen the introduction for Java/Jython/Groovy developers already. That’s a good start.

For operations on vectors and matrixes, you may find this tutorial interesting.
Furthermore, there is a clij2 customOperation which allows you to execute custom OpenCL code. However, for getting started with CLIJ, I would recommend exploring what operations are available. Otherwise you may reinvent the wheel :wink:

There is an example notebook how to make montages for example. Not sure which math or filters you apply towards a composite image, but you may find colorDeconvolution or convertRGBStackToGraySlice useful.

I looked at the code but I’m a bit overwhelmed. It’s 4000 lines of code :wink: If you point me to a code section that you know is slow, I’m happy to give more detailed hints.

Let me know if you need more pointer to specific operations.

Best,
Robert

@haesleinhuepf, you were right about my code: there was a problem. A copy/paste was badly done, resulting in the duplication of almost the integrality of the code.

I just posted the fixed script file on github, if you want to test. On my side, i will experiment a bit with OpenCL/CLIJ2.

About the script, the idea is to make montages/composites, but also to apply the same set of tunings (pixel min/max, pseudocolor, for each channel), for all the images of the directory.

Best regards, Marc.

1 Like

In fact, the “slow” portion resides in the generation of the resulting images (montage or composite). On one/two images, it’s not too bad (2-3 minutes if you don’t resize), but on a full batch of images, the time increases with the number of files. If a user wants to process a full directory, he might want to consider doing it on a night, by letting the PC on. But on a day, it might be tedious if the user want to make it quick.

1 Like

Also, for the Z-projections, are there any alternatives other than “Ext.CLIJ2_argMaximumZProjection(in, result, arg_z)” and “Ext.CLIJ2_standardDeviationZProjection(in,result)”? Like minimum, average, median, sum…

Hey @mmongy ,

Yes, there are currently 32 projections available in CLIJ. You find the full list here.

You can also explore the CLIJ API by using the auto-completion in Fiji. Just enter “project” and browse operations which have that in their name:

Hi, Robert,

I read the documentation a bit. Let’s say i want to rework my function “makeZProjectOnSingleChannel(channelImage, methodName)”

Is it all right if i write the new one this way:

def makeZProjectOnSingleChannel_CLIJ2(channelImagePlus, methodName, clij2_instance):

    # convert ImageJ image to CL images (ready for the GPU)
    channelImageBuffer = clij2_instance.push(channelImagePlus);
    resultImageBuffer = clij2_instance.create(channelImageBuffer); # allocate memory for result image

    if methodName == "max":
        clij2_instance.maximumZProjection(channelImageBuffer, resultImageBuffer) #differnce with argMaximumZProjection?
    if methodName == "min":
        clij2_instance.minimumZProjection(channelImageBuffer, resultImageBuffer)
    if methodName == "sum":
        clij2_instance.sumZProjection(channelImageBuffer, resultImageBuffer)
    if methodName == "average":
        clij2_instance.meanZProjection(channelImageBuffer, resultImageBuffer)
    if methodName == "median":
        clij2_instance.medianZProjection(channelImageBuffer, resultImageBuffer)
    if methodName == "sd":
        clij2_instance.standardDeviationZProjection(channelImageBuffer, resultImageBuffer)

    # convert the result back to imglib2
    newZProjectChannelImagePlus = clij2_instance.pull(resultImageBuffer)

    # free memory on the GPU - needs to be done explicitly
    channelImageBuffer.close()
    resultImageBuffer.close()

    return newZProjectChannelImagePlus

Have i understood well the way it works?

Regards, Marc.

In principle, yes. However, you should try to avoid push and pull commands if possible. For example, if you have such a workflow combining projections and montage:

push()
maximumZProjection
pull()
push()
paste()
pull()

It will be substantially slower than

push()
maximumZProjection
paste()
pull()

Thus, I recommend not pulling/pushing intermediate results. Try to push once at the beginning and pull once at the end. Keep intermediate results on the GPU. You find a more detailed explanation on youtube:

Hi, Robert,

I reworked some functions in my plugin to allow CLIJ to be used: the one to make a Z-Projection, an other one, to make the montages, and the last one, to make a channels-merged image. But i don’t see any difference in terms of quickness, and it doesn’t function at all in some cases. I put the resulting script on Github if you to examine it.

For the montages, it seems CLIJ2 cannot work with RGB (24 bit) images. Also, i noticed that the function to add images can take only two images at a time, so i tried to use recursivity to circumvent this problem without using loops to merge the images. It works, but the result image is a monocanal image, not a composite.

Also, you discouraged me to repeatedly push/pull images (it takes time), but it is the only way to obtain the “ImagePlus” for the images and to use ImageJ functions on them.

It seems the functions of CLIJ2 only operate on “buffers”, in fact buffered images. So, it seems i can’t re-use any of my functions or the ImageJ API while they are “buffered”. Is that right?

Also, when the images are buffered, shall i use loops to process them (put the buffered images in a Python array and iterate over them)?

Best regards, Marc.

Hi @mmongy ,

can you provide any details? How long do operations take in ImageJ and how long when executing them on a GPU using CLIJ? Feel free to check out our detailed benchmarks online. As you can see there, performance depends on image size, parameters of operations and used GPU hardware. For example the maximum-z projection on GPUs (orange, red) and CPUs (green, blue) performs like this depending on image size:


and push/pull take that much time per GB:
image image

If your benchmarks / performance measurements look differently than ours, we should investigate deeper.

Can you provide details? What does not work? Under which circumstances? Feel free to share code snippets which result in wrong results, screenshots etc. I cannot help you otherwise.

Correct. CLIJ only supports 2D and 3D stacks. If the third dimension is channels (RGB stacks), you can process them. If not, I recommend processing all channels in a for-loop.

Again, if you can provide a code snippet that does result in a wrong result, I’m happy to spend some time and try to fix it.

Yes, it is an art to translate workflows from ImageJ to CLIJ with minimal push/pull. Let me know where in particular you struggle and I can maybe give additional hints.

Correct. ImageJ and ImageJ operations cannot be executed on GPUs. Thus, CLIJ is a partial rewrite of ImageJ that runs on GPUs. However, your functions could take GPU-images (buffers) as parameters and operate on them.

Sure, you can do that. Just take care of memory consumption: If your GPU has 2 GB of memory and you put 5 500 MB images in an array, the GPU will run out of memory before you started processing images. I actually recommend not do this:

step1(imageA)
step1(imageB)
step1(imageC)

step2(imageA)
step2(imageB)
step2(imageC)

but this instead to use the power of GPUs optimally:

step1(imageA)
step2(imageA)

step1(imageB)
step2(imageB)

step1(imageC)
step2(imageC)

A detailed explanation exists here on youtube

Again, I’m happy to assist. I can most efficiently help you if we focus on small code snippets and how to optimize them. Step by step :wink:

Cheers,
Robert

Hi, Robert,

I rewrote my plugin to use CLIJ2 correctly, and now it works, at least technically.

I say “technically” because, when i try to use it on my images (for montages or composites) with CLIJ2 enabled, i have this message:

CLIJ Error: Creating an image or kernel failed because your device ran out of memory. 
You can check memory consumption in CLIJ2 by calling these methods from time to time and see which images live in memory at specific points in your workflow:  Ext.CLIJ2_reportMemory(); // ImageJ Macro  print(clij2.reportMemory()); // Java/groovy/jythonFor support please contact the CLIJ2 developers via the forum on https://image.sc or create an issue on https://github.com/clij/clij2/issues .
Therefore, please report the complete error message, the code snippet or workflow you were running, an example image if possible and details about your graphics hardware.

Exception in thread "AWT-EventQueue-0" net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -4 -> CL_MEM_OBJECT_ALLOCATION_FAILURE
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkOpenCLError(BackendUtils.java:346)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.lambda$enqueueKernelExecution$23(ClearCLBackendJOCL.java:769)
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkExceptions(BackendUtils.java:171)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.enqueueKernelExecution(ClearCLBackendJOCL.java:768)
	at net.haesleinhuepf.clij.clearcl.ClearCLKernel.lambda$run$0(ClearCLKernel.java:489)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:64)
	at net.haesleinhuepf.clij.clearcl.ClearCLKernel.run(ClearCLKernel.java:477)
	at net.haesleinhuepf.clij.clearcl.ClearCLKernel.run(ClearCLKernel.java:459)
	at net.haesleinhuepf.clij.clearcl.util.CLKernelExecutor.lambda$enqueue$1(CLKernelExecutor.java:267)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:28)
	at net.haesleinhuepf.clij.clearcl.util.CLKernelExecutor.enqueue(CLKernelExecutor.java:266)
	at net.haesleinhuepf.clij2.CLIJ2.lambda$executeSubsequently$1(CLIJ2.java:502)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
	at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:28)
	at net.haesleinhuepf.clij2.CLIJ2.executeSubsequently(CLIJ2.java:492)
	at net.haesleinhuepf.clij2.CLIJ2.executeSubsequently(CLIJ2.java:479)
	at net.haesleinhuepf.clij2.CLIJ2.executeSubsequently(CLIJ2.java:474)
	at net.haesleinhuepf.clij2.CLIJ2.execute(CLIJ2.java:459)
	at net.haesleinhuepf.clij2.plugins.AddImageAndScalar.addImageAndScalar(AddImageAndScalar.java:61)
	at net.haesleinhuepf.clij2.CLIJ2Ops.addImageAndScalar(CLIJ2Ops.java:5384)
	at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:190)
	at org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:208)
	at org.python.core.PyObject.__call__(PyObject.java:512)
	at org.python.core.PyObject.__call__(PyObject.java:517)
	at org.python.core.PyMethod.__call__(PyMethod.java:171)
	at org.python.pycode._pyx14.calculateTargetImage_CLIJ2$180(<string>:1833)
	at org.python.pycode._pyx14.call_function(<string>)
	at org.python.core.PyTableCode.call(PyTableCode.java:173)
	at org.python.core.PyBaseCode.call(PyBaseCode.java:306)
	at org.python.core.PyFunction.function___call__(PyFunction.java:474)
	at org.python.core.PyFunction.__call__(PyFunction.java:469)
	at org.python.core.PyFunction.__call__(PyFunction.java:459)
	at org.python.pycode._pyx14.makeChannelTuningPipeline_CLIJ2$188(<string>:1994)
	at org.python.pycode._pyx14.call_function(<string>)
	at org.python.core.PyTableCode.call(PyTableCode.java:173)
	at org.python.core.PyBaseCode.call(PyBaseCode.java:306)
	at org.python.core.PyFunction.function___call__(PyFunction.java:474)
	at org.python.core.PyFunction.__call__(PyFunction.java:469)
	at org.python.core.PyFunction.__call__(PyFunction.java:459)
	at org.python.pycode._pyx14.makeTunedChannelImages$189(<string>:2020)
	at org.python.pycode._pyx14.call_function(<string>)
	at org.python.core.PyTableCode.call(PyTableCode.java:173)
	at org.python.core.PyBaseCode.call(PyBaseCode.java:150)
	at org.python.core.PyFunction.__call__(PyFunction.java:426)
	at org.python.pycode._pyx14.makeCompositePipeline$204(<string>:2403)
	at org.python.pycode._pyx14.call_function(<string>)
	at org.python.core.PyTableCode.call(PyTableCode.java:173)
	at org.python.core.PyBaseCode.call(PyBaseCode.java:168)
	at org.python.core.PyFunction.__call__(PyFunction.java:437)
	at org.python.pycode._pyx14.globalPipeline$152(<string>:1273)
	at org.python.pycode._pyx14.call_function(<string>)
	at org.python.core.PyTableCode.call(PyTableCode.java:173)
	at org.python.core.PyBaseCode.call(PyBaseCode.java:306)
	at org.python.core.PyFunction.function___call__(PyFunction.java:474)
	at org.python.core.PyFunction.__call__(PyFunction.java:469)
	at org.python.core.PyFunction.__call__(PyFunction.java:464)
	at org.python.core.PyCompoundCallable.__call__(PyCompoundCallable.java:26)
	at org.python.core.PyObject.__call__(PyObject.java:433)
	at org.python.core.PyObject._jcallexc(PyObject.java:3565)
	at org.python.core.PyObject._jcall(PyObject.java:3598)
	at org.python.proxies.java.awt.event.ActionListener.actionPerformed(Unknown Source)
	at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
	at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
	at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
	at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
	at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
	at java.awt.Component.processMouseEvent(Component.java:6539)
	at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
	at java.awt.Component.processEvent(Component.java:6304)
	at java.awt.Container.processEvent(Container.java:2239)
	at java.awt.Component.dispatchEventImpl(Component.java:4889)
	at java.awt.Container.dispatchEventImpl(Container.java:2297)
	at java.awt.Component.dispatchEvent(Component.java:4711)
	at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4904)
	at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4535)
	at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4476)
	at java.awt.Container.dispatchEventImpl(Container.java:2283)
	at java.awt.Window.dispatchEventImpl(Window.java:2746)
	at java.awt.Component.dispatchEvent(Component.java:4711)
	at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:760)
	at java.awt.EventQueue.access$500(EventQueue.java:97)
	at java.awt.EventQueue$3.run(EventQueue.java:709)
	at java.awt.EventQueue$3.run(EventQueue.java:703)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74)
	at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:84)
	at java.awt.EventQueue$4.run(EventQueue.java:733)
	at java.awt.EventQueue$4.run(EventQueue.java:731)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74)
	at java.awt.EventQueue.dispatchEvent(EventQueue.java:730)
	at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205)
	at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
	at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
	at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)

To be able to use it, i have to rescale the destination images (at first, my rescale feature was a counter-measure to lower the time needed by the CPU to generate the destination images, in the case the size is not a problem) to a factor of 0.05 (0.05 x original height and 0.05 x original length) to generate images.

I updated my code on github, but i suspect the problem comes from this function, or at least the way it uses memory:

def calculateTargetImage_CLIJ2(originalChannelImagePlus, minOriginalImageShownPixelValue, maxOriginalImageShownPixelValue, maxPossibleTargetImagePixelValue, gammaCorrection_A_linear_factor, gammaCorrection_gamma_power_factor, clij2_instance):
    #Homologue de calculateRealPixelValue mais pour l'image entière et sur GPU.
    #Explication pour les float: https://stackoverflow.com/questions/10768724/why-does-python-return-0-for-simple-division-calculation
    A_pixel_equation_coefficient = float(maxPossibleTargetImagePixelValue)/(float(maxOriginalImageShownPixelValue)-float(minOriginalImageShownPixelValue))
    B_pixel_equation_factor = -1.0*(float(maxPossibleTargetImagePixelValue)*float(minOriginalImageShownPixelValue))/(float(maxOriginalImageShownPixelValue)-float(minOriginalImageShownPixelValue))
    print("A_pixel_equation_coefficient: "+str(A_pixel_equation_coefficient)+" B_pixel_equation_factor: "+str(B_pixel_equation_factor))
    originalChannelImageBuffer = clij2_instance.push(originalChannelImagePlus)
    tempBuffer1 = clij2_instance.create(originalChannelImageBuffer)
    tempBuffer2 = clij2_instance.create(originalChannelImageBuffer)
    tempBuffer3 = clij2_instance.create(originalChannelImageBuffer)
    targetChannelImageBuffer = clij2_instance.create(originalChannelImageBuffer)
    clij2_instance.multiplyImageAndScalar(originalChannelImageBuffer, tempBuffer1, A_pixel_equation_coefficient)
    clij2_instance.addImageAndScalar(tempBuffer1, tempBuffer2, B_pixel_equation_factor)

    #Gamma correction
    clij2_instance.power(tempBuffer2, tempBuffer3, 1/gammaCorrection_gamma_power_factor)
    clij2_instance.multiplyImageAndScalar(tempBuffer3, targetChannelImageBuffer, 1/gammaCorrection_A_linear_factor)

    targetChannelImagePlus = clij2_instance.pull(targetChannelImageBuffer)
    return targetChannelImagePlus

So, is there a way to optimize memory consumption? I’m not sure my colleagues would want to work with postal stamp-sized images… :wink:

Best regards, Marc.

I see one issue immediately. You should release memory on the GPU after you don’t need it anymore. In a simple scenario like the code snipped you posted, you should call by the end of the function:

originalChannelImageBuffer.close()
tempBuffer1.close()
tempBuffer2.close()
tempBuffer3.close()
targetChannelImageBuffer.close()

Basically, you should close() all images that have beend push()ed or create()ed.

See also the basics tutorial introducing memory management and introduction to java/jython/groovy developers.

Let me know if this helps.

Cheers,
Robert

@haesleinhuepf, it helped.

Here is the comparison of conversion to composite of 3 spinning-disk images (1080x1080, 4 channels each, 25-30 Z-depths each)

with GPU: End after 120.766000032 seconds
no GPU: End after 2287.27799988 seconds

So i guess it helped.

Also, do you have a beginning of idea, to process RGB images? Maybe by calculate not on buffered images but on buffered matrixes, and try to “force” the resulting matrix on a RGB imagePlus with ImageJ/Fiji functions… I’m still searching a solution.

Regards anyway, and thanks, Marc.

1 Like

Fantastic! Congrats!

You can turn RGB images into RGB stacks and process them as any other stack.

What kind of processing are we talking about and what kind of images (stacks?)?

In that context, i want to turn microscopy images (8-16bit, gray-level, +/- pseudocoloured) in channels+fused image montages (RGB).

You told me i couldn’t process directly RGB 24 bit images with CLIJ2. So my basic idea is to create a gray-level image (8-16 bit) with tuned minimal and maximal pixel values (with setMinAndMax, or setDynamicRange, i think) and after that, to make a fused image: in my case, it is RGB 24bit (the function i wrote creates the fused image byte per byte with “for” loops → costly in terms of CPU time), but if i want to make it with CLIJ2, it would probably be a hard time, because i would probably use a 32bit blank image to make the fused image (the only kind of image large enough to house 3 x 8 bits values), and not with “documented methods”: per se, the 32bit image is of gray-level type, and i would probably have to doctor it with python byte operators ("<<" or “>>” depending if you push the bytes by the left end or the right end), and cut the unwanted part of the dynamic range to make it “24-bit-like”. I don’t know if it’s possible, maybe with matrixes, and then, set the matrix (an array of values) on a 24-bit RGB ImagePlus with setPixels(). Also, i would probably have to do the same thing with the channel images, to convert them in 24 bit RGB, then make a array of 24 bit images.

Then, i would use my own functions to transform the array to a montage, i can’t use CLIJ2 at this step because the images are 24-bit

It looks tricky, but i want to shave the maximum of CPU time in my pipeline.

Regards, Marc.

I’m very sorry but I can’t follow. Feel free to post a script here which does that on an example image and then we will see how to translate it to clij. :slight_smile: