Hi,
I’m trying to set up a server for image analysis by deep learning at my institute. I want it to run all the popular DL stuff - CARE, U-Nets etc. Computing have set me up with a VM with access to a grid v100 GPU. For the GPU to work on the VM, I have to use the same driver version as the host server (442.06) in this case. I’ve had no joy at all getting cuda working. Does anyone have ideas? I suspect it might be that tesnor flow 1.x that most image analysis software uses does not support the CUDA version of my driver? I’ve tried many different cuda toolkit/cuDNN version. Does anyone have any ideas? More troubleshooting information below…
When I run…
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
I get
2020-09-29 14:50:46.880025: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
device_lib.list_local_devices()
2020-09-29 14:50:55.491271: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-09-29 14:50:55.496631: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-09-29 14:50:55.536935: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-09-29 14:50:55.541056: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: Y95-JHAL-W-V
2020-09-29 14:50:55.541661: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: Y95-JHAL-W-V
Many thanks for your help!