Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

Hi,

I am encountering a problem that has been raised a few times on this forum but I just cannot seem to find a fix for it. I am getting the following errors when I attempt to run the train_network command in my dlc-windowsGPU environment:

Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

  • Cuda: 10.0 (according to the cudatoolkit in my dlc anaconda environment and the conda list cudnn command, though the nvidia-smi command tells me that 10.2 is installed??)
  • tensorflow-gpu: 1.13.1
  • cudnn: 7.6.5
  • GPU: GeForce GTX 1650 SUPER
  • NVIDIA driver: 442.19
  • OS: windows 10

Looking at compatibility, (https://www.tensorflow.org/install/source_windows#gpu) tf 1.13 is compatible with CUDA 10 and cuDNN 7.4. Although I have cuDNN 7.6.5 installed on this PC, I have the GPU version of dlc running on my office computer with no problems and with those versions of cuda, cuDNN and tf (the only difference being that this second machine has windows 7 running and a different GPU/driver). This is what is confusing to me as it suggests that it is not necessarily a cuda-cuDNN-tf incompatibility issue (since they work together fine on the other machine).

Any advice you could give here would be much appreciated as I have really hit a wall.

Thanks!

likely your driver is too advanced: NVIDIA driver: 442.19

thanks for the reply @MWMathis. Looking now, my other computer has a GeForce GTX750 running with the NVIDIA driver 430.86. The dlc-gpu version works fine on this other machine. According to the folks at nvidia, the earliest compatible driver for the GeForce GTX 1650 SUPER is 441.08.

I suppose this would mean that I need to change my graphics card? Are you able to advise on what the latest compatible driver would be?

Thanks again!