Most Important GPU features and settings for speed

We’ve benchmarked a similar model on two machines with slightly different GPUs and seen little differences in the time per 100 iterations. In general, we’re at something like 12-14 hours to reach 200k iterations (not bad but not great).

As we get things working and consider upgrading to better GPUs and optimizing, I am curious which features others have found to optimize the time to train the model? I have some assumptions about this but would love to know if anyone has tested this explicitly or has experience with two very different GPUs in order to speak to this? Any other tips for speeding up the model /w different input parameters etc. would also be appreciated!

Typically I don’t worry about network training times so much, as you likely only do this step a few times, then the bulk of the processing is running inference (i.e. analyze_videos). I.e. the idea is to make a robust network once (maybe one refine step, etc), and then just use it going forward (so you might only train once or twice a year!)

That being said, the biggest factor in network speed is the pixel-size per image, the smaller the faster. You can use global_scale to downsample your data (in pose_confg, defaults are listed here: https://github.com/AlexEMG/DeepLabCut/blob/2b0745e74e436448c69e251d7c77845a82c0da72/deeplabcut/pose_estimation_tensorflow/default_config.py).

Please see our preprint on speed and robustness (here: https://www.biorxiv.org/content/early/2018/10/30/457242), as these principles generally also apply to training.