A few weeks ago I reinstalled DLC using the .yaml file on the GitHub because I had issues updating to the new version. I ran some test videos through the whole process and it worked fine. This week I tried training a coworker’s labelled videos and then subsequently tried it again on my own videos because the training kept stopping. There are no errors, the GPU just goes from 5-6% usage to 0% and training completely stops. The most successful run got to 112,000 iterations before freezing, and did save snapshots at 50,000 and 100,000. When I tried restarting from the last snapshot, it froze around 25,000. In some training runs it doesn’t even make it to 10,000.
I run the anaconda prompt as an administrator and use the GUI. In response to a GitHub comment trying to solve a similar issue, I also unchecked the anaconda prompt’s “Quick Edit Mode”. On several of the training runs I have kept an eye on the CPU and memory usage, and they stay around 35% and 50% respectively the whole time.
In possibly an unrelated issue, my coworker has a very similar setup and uses Jupyter to run DLC. In her training she got to around 13,000 iterations before the kernel stopped. When she tried to start running again, it said that the cudnn was not properly initialized. I’m not sure that these are connected, but these are the issues we have both run into.
System specs: Windows10, NVIDIA GeForce GTX 1060 6GB
DLC specs from “conda list” in the environment.
Any help is appreciated, thanks!