CLIJ2 on Windows Server 2019

Hi @haesleinhuepf,

Quick question: did you ever hear about CLIJ - OpenCL issues on Windows server 2019?

I’m trying to get CLIJ2 to work on our nice Acquifer HIVE system (via remote desktop), but I get
Unknown OpenCL error:-1001
already when trying
run("CLIJ2 Macro Extensions"); (or run("CLIJ2 Macro Extensions", "cl_device=");).

Apparently it cannot find or open OpenCL? The graphics card is a NVIDIA Quadro RTX 6000. I updated the driver to the latest version (451.48 WHQL).
Sorry to bother you (again) with such issues. I’ve tried to solve it, but I can’t find anything online, and I’m rather stuck at this point.

Here is the full error:

(Fiji Is Just) ImageJ 2.1.0/1.53c; Java 1.8.0_172 [64-bit]; Windows Server 2016 10.0; 213MB of 9000000MB (<1%)
 
java.lang.RuntimeException: java.util.concurrent.ExecutionException: net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -1001 -> Unknown OpenCL error:-1001
	at net.imagej.legacy.LegacyService.runLegacyCompatibleCommand(LegacyService.java:307)
	at net.imagej.legacy.DefaultLegacyHooks.interceptRunPlugIn(DefaultLegacyHooks.java:166)
	at ij.IJ.runPlugIn(IJ.java)
	at ij.Executer.runCommand(Executer.java:150)
	at ij.Executer.run(Executer.java:68)
	at ij.IJ.run(IJ.java:317)
	at ij.IJ.run(IJ.java:328)
	at ij.macro.Functions.doRun(Functions.java:686)
	at ij.macro.Functions.doFunction(Functions.java:98)
	at ij.macro.Interpreter.doStatement(Interpreter.java:278)
	at ij.macro.Interpreter.doStatements(Interpreter.java:264)
	at ij.macro.Interpreter.run(Interpreter.java:160)
	at ij.macro.Interpreter.run(Interpreter.java:93)
	at ij.macro.Interpreter.run(Interpreter.java:104)
	at ij.plugin.Macro_Runner.runMacro(Macro_Runner.java:161)
	at ij.IJ.runMacro(IJ.java:153)
	at ij.IJ.runMacro(IJ.java:142)
	at net.imagej.legacy.IJ1Helper$3.call(IJ1Helper.java:1148)
	at net.imagej.legacy.IJ1Helper$3.call(IJ1Helper.java:1144)
	at net.imagej.legacy.IJ1Helper.runMacroFriendly(IJ1Helper.java:1095)
	at net.imagej.legacy.IJ1Helper.runMacro(IJ1Helper.java:1144)
	at net.imagej.legacy.plugin.IJ1MacroEngine.eval(IJ1MacroEngine.java:145)
	at org.scijava.script.ScriptModule.run(ScriptModule.java:157)
	at org.scijava.module.ModuleRunner.run(ModuleRunner.java:165)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:124)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:63)
	at org.scijava.thread.DefaultThreadService.lambda$wrap$2(DefaultThreadService.java:225)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -1001 -> Unknown OpenCL error:-1001
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at net.imagej.legacy.LegacyService.runLegacyCompatibleCommand(LegacyService.java:303)
	... 30 more
Caused by: net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -1001 -> Unknown OpenCL error:-1001
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkOpenCLError(BackendUtils.java:346)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.lambda$getNumberOfPlatforms$0(ClearCLBackendJOCL.java:87)
	at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkExceptions(BackendUtils.java:156)
	at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.getNumberOfPlatforms(ClearCLBackendJOCL.java:82)
	at net.haesleinhuepf.clij.clearcl.ClearCL.getNumberOfPlatforms(ClearCL.java:57)
	at net.haesleinhuepf.clij.clearcl.ClearCL.getAllDevices(ClearCL.java:245)
	at net.haesleinhuepf.clij.CLIJ.getAvailableDeviceNames(CLIJ.java:214)
	at net.haesleinhuepf.clij2.utilities.CLIJ2MacroExtensions.run(CLIJ2MacroExtensions.java:33)
	at org.scijava.command.CommandModule.run(CommandModule.java:196)
	... 8 more

Any help is appreciated!
Best regards,
Bram

1 Like

Hey Bram @bramvdbroek,

yes, I’ve hear about this one. It’s definitely a driver issue - likely related to the specific Windows Server version. You find a discussion about a similar issue here in the forum. And I think @LThomas also experienced a similar issue at some point. Laurent, did you find a solution? Any hint is appreciated! I’m also happy to update the troubleshooting section on the website, when we found a suitable solution to this issue.

Thanks for reporting this issue Bram!

Cheers,
Robert

Hey @bramvdbroek and @haesleinhuepf,
Indeed I come across a similar issue.
I don’t have a solution yet, the only new info is that I managed to run opencl code via opencv on those machines, either in “standalone” python or within Fiji in python via ij-opencv.
Could it be something with java/the jvm?

I wanted to get back to you Robert but with the thesis writing… the priorities kind of shifted :sweat_smile:
I think by the end of September would be better for me, but if you have some time to dedicate to this problem earlier I can surely arrange some time too !

1 Like

That sounds good! Can you share that with us? If we have something that runs opencl from Fiji that would help us tracing down the issue!

Sounds like fun! I can just motivate you: there is a great time between the day you hand in and the day you get notified when the defense will be.

Have a great weekend and good luck!

Sure I will latest next week, I’m just coming back from vacation :wink:

1 Like

There you go, 2 example codes using OpenCL via OpenCV, one in python, the other in jython within Fiji.
Both works fine on aHive with WinServer 2019 with a Quadro P5000
You might have to dig into the c++ code of opencv to find out the difference though :exploding_head:
I think most of the opencl related code is there

Python

# -*- coding: utf-8 -*-
"""
Benchmark OpenCL python with OpenCV
requirements:
- opencv
- skimage
"""

import cv2, timeit
from skimage.data import coins

coin = coins()

print ("Has OpenCL: ", cv2.ocl.haveOpenCL())
print ("Use OpenCL: ", cv2.ocl.useOpenCL())

device = cv2.ocl_Device.getDefault()
name = cv2.ocl_Device.name(device)
print ("Name: ", name)


CPU = "cv2.blur(coin, ksize=(3,3))"
timeCPU = timeit.timeit(CPU, number=10, globals=globals())

GPU = """
umat = cv2.UMat(coin)
cv2.blur(umat, ksize=(3,3))
"""
timeGPU = timeit.timeit(GPU, number=10, globals=globals())

print("CPU:", timeCPU)
print("GPU:", timeGPU)

Jython in Fiji

requires IJ-OpenCV update site + an image opened
This is the documentation code I shared on the wiki of IJ-OpenCV btw :stuck_out_tongue:

#@ImagePlus imp
import org.bytedeco.javacpp.opencv_core as cv2
from org.bytedeco.javacpp.opencv_imgproc import blur
from ijopencv.ij     import ImagePlusMatConverter as ImpToMat
from ijopencv.opencv import MatImagePlusConverter as MatToImp
from ij import ImagePlus

# Check that has opencl and use it
print "Has opencl: ", cv2.haveOpenCL()
print "Use opencl: ", cv2.useOpenCL()

# Get device name
dev = cv2.Device.getDefault()
name = dev.name()
print "OpenCL device: ", name.getString()

# I - Convert the ImagePlus to an opencv matrix
imCV = ImpToMat.toMat(imp.getProcessor())
#print imCV

# Classical CPU processing
blured = cv2.Mat() # allocate free Mat
kernel = cv2.Size(5,5)
blur(imCV, blured, kernel)
print "OpenCV Mat: ", blured

# Processing on GPU using OpenCL
#imCL = cv2.UMat(imCV)           # Convert to UMat - this works in pure python, but not for Java apparently
imCL = imCV.getUMat(cv2.ACCESS_READ) # Convert to UMat - alternative in java 
bluredCL = imCL.clone()          # Assign UMat memory (same size and type)

blur(imCL, bluredCL, kernel)
print "OpenCV UMat", bluredCL

# UMat back to Mat
bluredCV = bluredCL.getMat(cv2.ACCESS_READ)

# Display convert UMat in Fiji
imProc = MatToImp.toImageProcessor(bluredCV)
impNew = ImagePlus("testCL", imProc)
impNew.show()
2 Likes

Hi @LThomas,

Thanks for picking this up again. I ran the Fiji Jython script on our Hive system (with Windows Server 2019, NVIDIA RTX 6000 GPU, IJ-OpenCV plugins update site active). I get the following output:

Has opencl:  False
Use opencl:  False
OpenCL device:
OpenCV Mat:  org.bytedeco.javacpp.opencv_core$Mat[width=256,height=254,depth=8,channels=1]
OpenCV UMat org.bytedeco.javacpp.opencv_core$UMat[address=0x3060f0b0,position=0,limit=1,capacity=1,deallocator=org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0x3060f0b0,deallocatorAddress=0x7ffa44056b30]]

It returns a blurred image, but I guess this is only the CPU-processed version…
Any ideas?

Best regards,
Bram

1 Like

Hey @bramvdbroek,

I take over as @LThomas needs to finish his thesis :wink:

Apparently, there is an issue with OpenCL on Windows Server 2019. Would you mind trying the procedure described in this forum thread - Especially executing clinfo and posting its output here would be helpful.

I also read a bit online about Windows Server and OpenCL and found some hints towards RemoteFX. My suspicion is that in Windows Server, there is a way for running OpenCL-based tools from virtual machines. This tool (which might be RemoveFX, or not) hinders proper discovery of OpenCL-devices. But again, I suspect something like this might be the issue.

Keep us posted and let me know if I can help further.

Cheers,
Robert

1 Like

Indeed, I also wanted to double check before replying right away :stuck_out_tongue:
And thanks for the feedback Robert !

@bramvdbroek this is not expected actually, I was getting true/true and the correct name of the card on one of our win server 2019 machine with a Quadro P5000, while still not being able to run clij operation though… But let’s try to find out why this is already different !

Normally opencl support is shipped with the nvidia drivers, in my case it was version 442.92, I usually use the GeForce Experience software to check the version and get notified if new drivers are available.

Did you update your driver to one of the latest version ?
I can remote login via the acquifer support services if you want me to have a closer look at some point. And inversely I can get you remote-access to one of our machine @haesleinhuepf if you want to test things more deeply :wink:

Indeed, I believe opencv falls back to the cpu automatically if no opencl-capable GPU is found.

1 Like

Hi @haesleinhuepf and @LThomas,

Thanks for making this a separate topic, and sorry, I was away for a few days.
I had the driver version 442.92, but since CLIJ2 did not work I updated to the latest version about two week ago (451.48), without any luck.

Anyway, I ran clinfo.exe and here are the results:

C:\Users\b.vd.broek.NKI\Downloads>clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 11.0.197
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_kernel_attribute_nv
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     Quadro RTX 6000
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  451.48
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, d8:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               72
  Max clock frequency                             1770MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              25769803776 (24GiB)
  Error Correction support                        No
  Max memory allocation                           6442450944 (6GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2359296 (2.25MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_kernel_attribute_nv

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

C:\Users\b.vd.broek.NKI\Downloads>

The top of the list seems ok to me. The bottom not. (?)

Thanks,
Bram

1 Like

I was also wondering the first time, but I think it’s actually testing what would be the output when the platform is NULL. So no worries on this side…
Well it all looks fine to me the GPU is recognized… It’s quite puzzling.

One thing, when you tested the script did you use remote desktop ?
In my case I did via teamviewer, and the jython opencl script were working, but maybe when it goes through the windows remote desktop (which is what we use on the customer side for the multi-user) the GPU is not accepting opencl computation.

That might sounds crazy but could you try testing the jython script again while directly logging on the hive, without remotly login if it was not the case before :stuck_out_tongue:

2 Likes

Just found a resource online, FYI: It doesn’t directly mention OpenCL in the context of Server 2019. Hence, I’m wondering if support might officially have been dropped:


1 Like

Hmm what’s puzzling is that we have the issue on our Win Server 2016 machines but not 2012.
But since they also back-propagated some update to the 2016 version it might well be the problem indeed.
I have a colleague testing stuff at the moment, like manually reactivating RemoteFX vGPU, and going into the HyperV parameters. I will let you know if we have new insight !

2 Likes

One thing, when you tested the script did you use remote desktop ?

Yes, most of the testing I did was via remote desktop. This is indeed how the use of the HIVE is intended. I’ll try the script directly on the HIVE tomorrow.

@haesleinhuepf That doesn’t look too promising indeed. On the same page I found:

As Robert and you already found out, I realize now that GPU acceleration is not working by default on virtual machines. This seems to affect other programs as well (like the ZEN software from Zeiss). (I’m not even sure if the HIVE is using Hyper-V. Do you, @LThomas?

By the way, did you also see this?

And thanks for the offer to remotely login via Acquifer support. I’ll keep it in mind. We can of course involve the people from Acquifer as well at some point.

Bram

1 Like

You mean when you use remote desktop ?

Actually I am wondering if it really has to do with virtualization :thinking:
I am not 100% sure but I don’t think remote desktopping involves virtualization, which is more about running an os within another os.
I tested on the host machine directly so without any remote desktopping, and I was still having the issue.

And if OpenCL was not supported with WinServer 2019 then clinfo should not be able to list the devices, and it also does not explain why I could apparently run opencl code via opencv…

I though maybe some virtualization of the hardware happens through the Java Virtual Machine, and the problem could be that the access to opencl via java is somehow not working.

Yes we are checking if that could be due to that too

1 Like

I don’t think virtualization is a direct issue. But when debugging a similar issue, the OpenCL-device-discovery-mechanism failed. It appeared like there was an additional device (or platform) which caused a crash when being asked for its name. I concluded there is an additional OpenCL-device driver installed (for virtualization?) which was broken.

2 Likes

Allright, running the script directly on the HIVE does not solve the issue; same error messsage as via remote desktop:

Started Macro.ijm.py at Fri Aug 28 11:57:56 CEST 2020
Has opencl:  False
Use opencl:  False
OpenCL device:
OpenCV Mat:  org.bytedeco.javacpp.opencv_core$Mat[width=256,height=254,depth=8,channels=1]
OpenCV UMat org.bytedeco.javacpp.opencv_core$UMat[address=0x7d36c0,position=0,limit=1,capacity=1,deallocator=org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0x7d36c0,deallocatorAddress=0x7ffa5b976b30]]

Perhaps @Olaf (Selchow) would also like to take a look at this issue at some point? Or do you still have ideas left?

Best regards,
Bram

1 Like

Hi Bram, Laurent, Robert,

All I can add to this is that virtualization is unlikely to be a problem, nor teamviewer or RDP - related topics. Unless it is a clij2 related problem.
I was using GPU / CUDA processing in VMs and on the host, via rdp and TeamViewer, in ZEISS Zen and Huygens, and I think others have used microvolution (Fiji) plugin just fine on various HIVEs (including Win Server 2019 with RTX GPUs - but also older hardware and OS). Even python-based CARE … The latter not in a VM (yet) though…

So, since I have no clue about java nor clij2, I am afraid I am not of much help here. Laurent would be the person I ask …

However, if you want me to support you with some comparative troubleshooting and get people from other software involved (Zeiss Zen developers, arivis developers, … ) I could give it a try. If you think that helps.

Best regards,
Olaf

3 Likes

I have started a detailed comparison between our configs, see if I can find something.
But it seems that it impacts only java, with opencl via opencv/python it seems to work fine

2 Likes

Thanks @Olaf! Just checking.
So it seems that the issue is relatively narrow indeed.

I (or Marjolijn) will surely not hesitate to contact you on issues with other software. :grinning:

Best regards,
Bram

1 Like