DLC2.1.9 seems to analyze videos on GPU, but writes no .h5 files (Windows 10)

Dear all,

I am struggling to get single animal DLC2.1.9 to work on Windows 10 with GeForce RTX 3070 GPU.

For some background: I first installed the latest GPU drivers (461.92) using NVIDIA GeForce Experience. Next, I installed the latest CUDA toolkit, following the guide here. I also had to manually install Microsoft Visual Studio 2019 with added workload "Desktop Development with C++”. CUDA installed fine, which I checked by passing nvcc -V in cmd; I attach the output here as reference:
screen3

Lastly, I installed deeplabcut through the provided anaconda environment: conda env create -f DLC-GPU.yaml.

Here is where the issues start, however. I am running my code from a Jupyter notebook: I first activate DLC-GPU environment from anaconda command prompt then simply call jupyter notebook from my work directory. When I run deeplabcut.analyze_videos, I notice very strange behavior: it looks like the GPU is engaged and the video is found and being analyzed, as seen from the below screenshot:

For reference, I also attach the terminal output:

However, once the process finishes, no .h5 files are written to the specified directory. Additionally, the whole process seems incredibly slow - the program looks to be frozen both at Initializing Resnet and Starting to extract posture. I made sure to format filepaths in Windows syntax (besides, if any paths would be wrong, the program would complain about missing files).

I tried running the same code from IPython, testing if perhaps running through Jupyter IDE might be a problem; similarly to above, the program is simply frozen at Starting to extract posture for a very long time:

After about 5-10 min, the program suddenly shoots through all the videos in the directory, but from the progress bar, it doesn’t seem it analyzed them:

For reference, I use exactly the same code (also run from Jupyter notebook) on another Ubuntu 20 machine with NVIDIA TITAN V GPU (Driver Version: 450.102.04, CUDA Version: 11.0) and out of the box deeplabcut (that is, installed from DLC-GPU.yaml environment file). The DLC network was trained on this machine, after which I moved the neural network to the Windows machine in question by copying config.yaml file and the dlc-model that I am using for analysis into my work directory. I kept the DLC project folder structure the same as when it was first created with dlc.create_new_project; in other words, I only have dlc_project/config.yaml and dlc_project/dlc-models/iteration-X on the Windows machine. Lastly, I also changed the project path to the Windows path (and path format) in the config.yaml file.

I haven’t got a clue why it doesn’t work, especially because no specific errors seem to crop up. Any ideas?

Many, many thanks for the awesome support and work you are doing!

UPDATE: I replaced Windows 10 with Ubuntu 20.04 LTS. Next, I tried 2 things:

  1. I downgraded the driver to 450, as this automatically installs CUDA 11.0. The GPU apparently only supports 460, so that’s not an option.
  2. I downgraded CUDA to 11.0, keeping the driver at 460. Whereas the strange behavior described above is still the same (the code seems frozen for several minutes at Initializing ResNet and Starting to extract posture), the following error now cropped up: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Replied on Gitter. There is no way for tensorflow 1.x to work with 3000 series card. If you want to use 3070 you have to use deeplabcut-core.

1 Like