Clij headless benchmarking errors

Hi @haesleinhuepf, I’m trying to run your benchmarks on a P5000 and RTX2080. I downloaded FIJI locally, added the clij update site, and copied it to the cluster (all running Ubuntu).

I checked OpenCL is installed:

(base) atyson@gpu-380-13:~/fiji_clij_benchmark$  ls -alRt /opt/nvidia/cuda_10.0.130_410.48/lib64/libO*
lrwxrwxrwx 1 root root    14 Nov 22  2018 /opt/nvidia/cuda_10.0.130_410.48/lib64/libOpenCL.so -> libOpenCL.so.1
-rwxr-xr-x 1 root root 27096 Nov 22  2018 /opt/nvidia/cuda_10.0.130_410.48/lib64/libOpenCL.so.1.1
lrwxrwxrwx 1 root root    16 Nov 22  2018 /opt/nvidia/cuda_10.0.130_410.48/lib64/libOpenCL.so.1 -> libOpenCL.so.1.1

Then tried to run the benchmarks, which gives this error:

(base) atyson@gpu-380-13:~/fiji_clij_benchmark$ Fiji.app/ImageJ-linux64 --headless -macro benchmarking_orig.ijm 
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
java.lang.RuntimeException: java.util.concurrent.ExecutionException: net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -1001 -> Unknown OpenCL error:-1001
	at net.imagej.legacy.LegacyService.runLegacyCompatibleCommand(LegacyService.java:309)
	at net.imagej.legacy.DefaultLegacyHooks.interceptRunPlugIn(DefaultLegacyHooks.java:163)
	at ij.IJ.runPlugIn(IJ.java)
	at ij.Executer.runCommand(Executer.java:137)
	at ij.Executer.run(Executer.java:66)
	at ij.IJ.run(IJ.java:312)
	at ij.IJ.run(IJ.java:323)
	at ij.macro.Functions.doRun(Functions.java:624)
	at ij.macro.Functions.doFunction(Functions.java:97)
	at ij.macro.Interpreter.doStatement(Interpreter.java:275)
	at ij.macro.Interpreter.doStatements(Interpreter.java:261)
	at ij.macro.Interpreter.run(Interpreter.java:157)
	at ij.macro.Interpreter.run(Interpreter.java:91)
	at ij.macro.Interpreter.run(Interpreter.java:102)
	at ij.plugin.Macro_Runner.runMacro(Macro_Runner.java:161)
	at ij.plugin.Macro_Runner.runMacroFile(Macro_Runner.java:145)
	at ij.IJ.runMacroFile(IJ.java:160)
	at net.imagej.legacy.IJ1Helper$4.call(IJ1Helper.java:1125)
	at net.imagej.legacy.IJ1Helper$4.call(IJ1Helper.java:1121)
	at net.imagej.legacy.IJ1Helper.runMacroFriendly(IJ1Helper.java:1055)
	at net.imagej.legacy.IJ1Helper.runMacroFile(IJ1Helper.java:1121)
	at net.imagej.legacy.LegacyCommandline$Macro.handle(LegacyCommandline.java:187)
	at org.scijava.console.DefaultConsoleService.processArgs(DefaultConsoleService.java:102)
	at org.scijava.AbstractGateway.launch(AbstractGateway.java:97)
	at net.imagej.Main.main(Main.java:55)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at net.imagej.launcher.ClassLauncher.launch(ClassLauncher.java:279)
	at net.imagej.launcher.ClassLauncher.run(ClassLauncher.java:186)
	at net.imagej.launcher.ClassLauncher.main(ClassLauncher.java:77)
Caused by: java.util.concurrent.ExecutionException: net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -1001 -> Unknown OpenCL error:-1001
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at net.imagej.legacy.LegacyService.runLegacyCompatibleCommand(LegacyService.java:305)
	... 31 more
Caused by: net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -1001 -> Unknown OpenCL error:-1001
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkOpenCLError(BackendUtils.java:346)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.lambda$getNumberOfPlatforms$0(ClearCLBackendJOCL.java:83)
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkExceptions(BackendUtils.java:156)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.getNumberOfPlatforms(ClearCLBackendJOCL.java:81)
	at net.haesleinhuepf.clij.clearcl.ClearCL.getNumberOfPlatforms(ClearCL.java:44)
	at net.haesleinhuepf.clij.clearcl.ClearCL.getAllDevices(ClearCL.java:232)
	at net.haesleinhuepf.clij.CLIJ.getAvailableDeviceNames(CLIJ.java:198)
	at net.haesleinhuepf.clij.macro.CLIJMacroExtensions.run(CLIJMacroExtensions.java:30)
	at org.scijava.command.CommandModule.run(CommandModule.java:199)
	at org.scijava.module.ModuleRunner.run(ModuleRunner.java:168)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:127)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:66)
	at org.scijava.thread.DefaultThreadService$3.call(DefaultThreadService.java:238)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

This worked locally, but not on the cluster. Is it an OpenCL issue, or do I need to install FIJI onto the cluster properly?

Thanks!

1 Like

Hey @adamltyson,

sorry for the inconvenience.
This looks like an OpenCL installation issue. But as you said, it’s installed. Could you please run clinfo on the command line and share the output with us?

Thanks!

Cheers,
Robert

I also would involve @frauzufall as she is my local Ubuntu-clij master mind :sunglasses:

clinfo isn’t installed. I can try to get it installed tomorrow, or let me know if I can share any other cluster/gpu info.

1 Like

CLIJ crashes while parsing the device tree of installed GPUs. So, it would be interesting to see that tree. Clinfo prints it out…

I’ve asked our sysadmin to install clinfo for me. I should hopefully be able to let you know tomorrow.

1 Like

Just for the record: By chance I had a very similar error message on a Windows workstation.

Updating the NVidia driver solved the problem here…

@adamltyson if I read it correctly on Twitter, you just tried it for fun. So feel free to drop out now.
No pressure :wink: If other users come with similar issues, I can proceed with them.

But thanks for your time! :slight_smile:

I’ll see if I can still run these benchmarks. I’d like to use clij on the cluster, and I also want to benchmark some other GPU processing things to compare to my local machine. This is falling down my to-do list, but I’ll hopefully send you some results soon.

Are you interested in any other benchmarks, or just Turing vs Pascal?

1 Like

I’ve never used an Nvidia RTX GPU - so knowing anything about it executing CLIJ would be interesting! Furthermore, for cluster-support we were actually thinking about using the Java way and not going via Fiji headless. If you want to dive deeper here, have a look at this repository:

With new NVIDIA drivers on the cluster, OpenCL works fine. I ran the benchmarking macro on the cluster with a P5000 and RTX 2080, along with the two machines I had in front of me (with a TITAN RTX and a laptop RTX 2080).

There doesn’t seem to be much difference. I see big differences between these GPUS using CUDA, but I think that has a lot to do with memory size, bandwidth and hard drive speed etc. on these various machines. Let me know if it’s useful to run anything else.

Desktop workstation:
Dual Intel Xeon 6132 @ 3.7 GHz 14 core (28 thread)
384GB RAM
NVIDIA TITAN RTX
Ubuntu 18.04

Clearing 
0.350182 ms for Whole extension handling 
CPU mean filter no 1 took 1088 msec
CPU mean filter no 2 took 453 msec
CPU mean filter no 3 took 422 msec
CPU mean filter no 4 took 438 msec
CPU mean filter no 5 took 472 msec
CPU mean filter no 6 took 461 msec
CPU mean filter no 7 took 426 msec
CPU mean filter no 8 took 440 msec
CPU mean filter no 9 took 433 msec
CPU mean filter no 10 took 436 msec
23.1088 ms for Whole extension handling 
Pushing one image to the GPU took 24 msec
57.4624 ms for Whole extension handling 
GPU mean filter no 1 took 58 msec
7.02761 ms for Whole extension handling 
GPU mean filter no 2 took 7 msec
6.68363 ms for Whole extension handling 
GPU mean filter no 3 took 7 msec
11.0009 ms for Whole extension handling 
GPU mean filter no 4 took 12 msec
13.8493 ms for Whole extension handling 
GPU mean filter no 5 took 14 msec
8.66943 ms for Whole extension handling 
GPU mean filter no 6 took 9 msec
8.21591 ms for Whole extension handling 
GPU mean filter no 7 took 8 msec
6.27409 ms for Whole extension handling 
GPU mean filter no 8 took 7 msec
7.16023 ms for Whole extension handling 
GPU mean filter no 9 took 7 msec
6.36035 ms for Whole extension handling 
GPU mean filter no 10 took 7 msec
16.5268 ms for Whole extension handling 
Pulling one image from the GPU took 16 msec
Clearing 
Releasing buffer Blurred
Releasing buffer t1-head.tif
0.900909 ms for Whole extension handling 


########################################################
########################################################
Laptop:
Intel i9-9900K @ 5.0 GHz 8 core (16 thread)
32GB RAM
NVIDIA RTX 2080 (laptop)
Ubuntu 18.04

Clearing 
0.135337 ms for Whole extension handling 
CPU mean filter no 1 took 936 msec
CPU mean filter no 2 took 818 msec
CPU mean filter no 3 took 837 msec
CPU mean filter no 4 took 792 msec
CPU mean filter no 5 took 806 msec
CPU mean filter no 6 took 799 msec
CPU mean filter no 7 took 854 msec
CPU mean filter no 8 took 796 msec
CPU mean filter no 9 took 795 msec
CPU mean filter no 10 took 795 msec
6.84478 ms for Whole extension handling 
Pushing one image to the GPU took 7 msec
1294.63 ms for Whole extension handling 
GPU mean filter no 1 took 1295 msec
5.89057 ms for Whole extension handling 
GPU mean filter no 2 took 6 msec
6.25283 ms for Whole extension handling 
GPU mean filter no 3 took 6 msec
6.45460 ms for Whole extension handling 
GPU mean filter no 4 took 7 msec
5.77895 ms for Whole extension handling 
GPU mean filter no 5 took 6 msec
5.80706 ms for Whole extension handling 
GPU mean filter no 6 took 6 msec
6.08888 ms for Whole extension handling 
GPU mean filter no 7 took 6 msec
5.64306 ms for Whole extension handling 
GPU mean filter no 8 took 6 msec
5.73713 ms for Whole extension handling 
GPU mean filter no 9 took 5 msec
5.63303 ms for Whole extension handling 
GPU mean filter no 10 took 6 msec
11.1459 ms for Whole extension handling 
Pulling one image from the GPU took 11 msec
Clearing 
Releasing buffer Blurred
Releasing buffer t1-head.tif
1.06831 ms for Whole extension handling 


########################################################
########################################################
Cluster GPU node:
Intel Xeon E5-2660 v3 @ 2.60GHz 20 core
660GB RAM
NVIDIA Quadro P5000 & GeForce RTX 2080
Ubuntu 16.04

Quadro P5000
-------------------------------------------------------
Clearing 
0.318576 ms for Whole extension handling 
CPU mean filter no 1 took 1289 msec
CPU mean filter no 2 took 861 msec
CPU mean filter no 3 took 806 msec
CPU mean filter no 4 took 801 msec
CPU mean filter no 5 took 820 msec
CPU mean filter no 6 took 814 msec
CPU mean filter no 7 took 835 msec
CPU mean filter no 8 took 812 msec
CPU mean filter no 9 took 812 msec
CPU mean filter no 10 took 801 msec
22.6014 ms for Whole extension handling 
Pushing one image to the GPU took 23 msec
44.4803 ms for Whole extension handling 
GPU mean filter no 1 took 45 msec
10.3943 ms for Whole extension handling 
GPU mean filter no 2 took 11 msec
10.0987 ms for Whole extension handling 
GPU mean filter no 3 took 10 msec
10.7331 ms for Whole extension handling 
GPU mean filter no 4 took 11 msec
10.0027 ms for Whole extension handling 
GPU mean filter no 5 took 10 msec
9.93740 ms for Whole extension handling 
GPU mean filter no 6 took 10 msec
10.2144 ms for Whole extension handling 
GPU mean filter no 7 took 10 msec
10.1720 ms for Whole extension handling 
GPU mean filter no 8 took 11 msec
10.0405 ms for Whole extension handling 
GPU mean filter no 9 took 10 msec
10.4083 ms for Whole extension handling 
GPU mean filter no 10 took 11 msec
23.9805 ms for Whole extension handling 
Pulling one image from the GPU took 24 msec
Clearing 
Releasing buffer Blurred
Releasing buffer t1-head.tif
1.06213 ms for Whole extension handling


RTX 2080
-------------------------------------------------------
Clearing 
0.354838 ms for Whole extension handling 
CPU mean filter no 1 took 1331 msec
CPU mean filter no 2 took 901 msec
CPU mean filter no 3 took 878 msec
CPU mean filter no 4 took 920 msec
CPU mean filter no 5 took 893 msec
CPU mean filter no 6 took 899 msec
CPU mean filter no 7 took 868 msec
CPU mean filter no 8 took 858 msec
CPU mean filter no 9 took 884 msec
CPU mean filter no 10 took 867 msec
19.1582 ms for Whole extension handling 
Pushing one image to the GPU took 20 msec
40.1346 ms for Whole extension handling 
GPU mean filter no 1 took 40 msec
10.4258 ms for Whole extension handling 
GPU mean filter no 2 took 11 msec
10.2674 ms for Whole extension handling 
GPU mean filter no 3 took 10 msec
10.1658 ms for Whole extension handling 
GPU mean filter no 4 took 11 msec
10.5585 ms for Whole extension handling 
GPU mean filter no 5 took 11 msec
10.0785 ms for Whole extension handling 
GPU mean filter no 6 took 10 msec
10.1109 ms for Whole extension handling 
GPU mean filter no 7 took 10 msec
10.7083 ms for Whole extension handling 
GPU mean filter no 8 took 11 msec
9.98351 ms for Whole extension handling 
GPU mean filter no 9 took 11 msec
9.84945 ms for Whole extension handling 
GPU mean filter no 10 took 10 msec
33.0178 ms for Whole extension handling 
Pulling one image from the GPU took 33 msec
Clearing 
Releasing buffer Blurred
Releasing buffer t1-head.tif
1.30920 ms for Whole extension handling 
2 Likes

Hey @adamltyson,

super cool! Thanks for testing!

It’s a bit suspicious that the GPUs on the cluster run “so slow” (11ms versus 6ms). But that’s cluster versus laptop comparison. The laptop CPU is much faster in single-thread performance. Thus, it could be that the difference we are observing comes from the overhead: the procedures around the actual operation on the GPU.
When comparing the GPUs on the cluster, it boils down to memory bandwidth. I would expect a speedup going from P5000 to RTX 2080 because of the difference between GDDR5 and GDDR6. It’s hard to look inside the GPU on the cluster to find out what costs these 5ms. I might need to get my hands on an RTX or other GDDR6 GPU to trace this down.

For the moment: I’m happy to see that it runs fine on Ubuntu. :slight_smile:

Again, thanks a lot for your efforts!

Cheers,
Robert

1 Like

I’m getting the same OpenCL - 1001 error as described above for the Windows machine except that updating the NVidia driver didn’t fix the problem. I’m using a Quadro RTX 8000 GPU. Any ideas?

1 Like

Hey @bglancy,

wow; an RTX8000 - a rare piece!

Would you mind sharing a screenshot of the Windows device manager showing the GPU config / driver version?

Cheers,
Robert

1 Like

1 Like

Alright @bglancy thanks. I’m just guessing: Did you by chance download the driver via Windows update?

I’m asking because I would suspect a different driver date: According to the nvidia Website, there is a more recent driver available:
https://www.nvidia.com/Download/index.aspx?lang=en-us

QUADRO DESKTOP/QUADRO NOTEBOOK DRIVER RELEASE 440
 
Version:	R440 U3 (441.28)  WHQL
Release Date:	2019.11.18
Operating System:	Windows 10 64-bit
Language:	English (US)
File Size:	415.44 MB

Would you mind trying this one? Or if you downloaded it from the Nvidia site already, what driver type did you choose?

Cheers,
Robert

Wr had similar issues earlier with display drivers from Windows update:

I had updated through the NVidia control panel earlier today and just updated through the website to the 441.28 version, but it still isn’t working.

1 Like

Ok. Sad. Obviously, the OpenCL device listing / device discovery doesn’t work.

I’ll try to build a little debug script for your system and send you tomorrow. Stay tuned.

And thanks for your patience!

Cheers,
Robert

Hey @bglancy,

ok, here you go. Please open Fijis script editor, select “Python” from the “Language” menu, execute the following snippets and copy-paste their output. If one script snippet crashes, please restart Fiji and run the next one.

Retrieve general information from installed OpenCL-devices

from ij import IJ;
from net.haesleinhuepf.clij import CLIJ;
IJ.log("\\Clear");
IJ.log(CLIJ.clinfo());

Check if the base can be initialized

from ij import IJ;
from net.haesleinhuepf.clij.clearcl.backend import ClearCLBackends;
from net.haesleinhuepf.clij.clearcl import ClearCL;

backend = ClearCLBackends.getBestBackend();

IJ.log("\\Clear");
IJ.log("backend: " + str(backend));

clearCL = ClearCL(backend);
IJ.log("ClearCL: " + str(clearCL));

Test if we can access installed platforms:

from ij import IJ;
from net.haesleinhuepf.clij.clearcl.backend import ClearCLBackends;
from net.haesleinhuepf.clij.clearcl import ClearCL;

backend = ClearCLBackends.getBestBackend();

firstPlatformPointer = backend.getPlatformPeerPointer(0);
IJ.log("\\Clear");
IJ.log("First platform name: " + backend.getPlatformName(firstPlatformPointer));

Test if we can access the second platform:

from ij import IJ;
from net.haesleinhuepf.clij.clearcl.backend import ClearCLBackends;
from net.haesleinhuepf.clij.clearcl import ClearCL;

backend = ClearCLBackends.getBestBackend();

secondPlatformPointer = backend.getPlatformPeerPointer(1);
IJ.log("\\Clear");
IJ.log("Second platform name: " + backend.getPlatformName(secondPlatformPointer));

In case one of the platforms worked, let’s see if we can access the devices inside the platform:

working_platform_index = 1; # please enter the index of the working platform

from ij import IJ;
from net.haesleinhuepf.clij.clearcl.backend import ClearCLBackends;
from net.haesleinhuepf.clij.clearcl import ClearCL;

backend = ClearCLBackends.getBestBackend();
clearCL = ClearCL(backend);
platform = clearCL.getPlatform(working_platform_index);

IJ.log("\\Clear");
for i in range(0, platform.getNumberOfDevices()):
	IJ.log("Device: " + str(platform.getDevice(i)));

Let me know what the snippets output :slight_smile:

And thanks again for your patience!

Cheers,
Robert

Hey @haesleinhuepf,

Here are the outputs of the first four scripts:

Thanks for all of your help.

Best regards,
Brian