How to stop running out of VRAM

I have a RTX 2070 with 8GB. X server needs about 800M, and the rest can be used by DLC. However, I keep running out of space, even with the demo. What should I do?

You can change how much memory Tensorflow allocates, check this out: https://riptutorial.com/tensorflow/example/31879/control-the-gpu-memory-allocation

I don’t get a CUDA out of memory error. I get something like:
UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node resnet_v1_50/conv1/Conv2D}}]] [[ConstantFoldingCtrl/absolute_difference/weighted_loss/assert_broadcastable/AssertGuard/Switch_0/_564]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node resnet_v1_50/conv1/Conv2D}}]] 0 successful operations. 0 derived errors ignored.
But I am observing that the memory seems to fill up completely as that happens.

I fixed it but it’s very hacky. Apparently, for rtx 2060/70/80 there’s a tensorflow bug that makes it run out of memory all the time. What I had to do is to add something like:


config = tf.ConfigProto()
config.gpu_options.allow_growth = True

the in the train function:

sess = tf.Session(config=config)

in deeplabcut/pose_estimation_tensorflow/train.py

Is there a better way of doing this?

As far as I know that’s the best way to set this, yes. Environment variables don’t work: https://github.com/tensorflow/tensorflow/issues/8040

So, it needs to be whenever tf is opened and used (i.e. train/predict/evaluate)

In my case only training causes issues. I did not make any modifications to the predict/evaluate source files and it worked fine. Maybe this option should be used by default as I believe there are a lot of people using 2060/70/80 for this. Not everyone can afford a 2080Ti xd.

Ok, @zandimna thanks for your PR.

For everybody else: We updated the code and one can now set this when calling training: https://github.com/AlexEMG/DeepLabCut/pull/458