Deeplabcut2.2 inference is 10 x slower than realtime

Hi

We trained a DLC NN on 4 animals, 14 features each, on videos taken with 1280 x 1024 resolution @ 25FPS.

The training was very fast, but the “extract posture” step was about 10 times slower than realtime. We have many hours of recording, and are wondering if there is something we could do to speed this up?

Here’s the log from conversion. It took 30minutes to convert a 3 minute (4500 frame) video.

Thanks!


 'weigh_part_predictions': False,
 'weight_decay': 0.0001,
 'x1': None,
 'x2': None,
 'y1': None,
 'y2': None}
Using snapshot-50000 for model /media/cat/14TB/insync_cm5636/march_2/video/dlc_2.26b_results/march_2_redo-cat-2020-06-17/dlc-models/iteration-0/march_2_redoJun17-trainset95shuffle1
Initializing ResNet
Activating extracting of PAFs
2020-06-18 14:45:04.104479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-06-18 14:45:04.104556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-18 14:45:04.104569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2020-06-18 14:45:04.104578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2020-06-18 14:45:04.104756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10663 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:03:00.0, compute capability: 6.1)
Starting to analyze %  /media/cat/14TB/insync_cm5636/march_2/video/dlc_2.26b_results/march_2_redo-cat-2020-06-17/videos/2020-3-8_12_08_57_943006_compressed_3min_same.avi
/media/cat/14TB/insync_cm5636/march_2/video/dlc_2.26b_results/march_2_redo-cat-2020-06-17/videos  already exists!
Loading  /media/cat/14TB/insync_cm5636/march_2/video/dlc_2.26b_results/march_2_redo-cat-2020-06-17/videos/2020-3-8_12_08_57_943006_compressed_3min_same.avi
Duration of video [s]:  180.0 , recorded with  25.0 fps!
Overall # of frames:  4500  found with (before cropping) frame dimensions:  1280 1024
Starting to extract posture
  0%|                                                                                                                                         | 0/4500 [00:00<?, ?it/s]2020-06-18 14:45:33.771904: W tensorflow/core/framework/allocator.cc:124] Allocation of 125829120 exceeds 10% of system memory.
2020-06-18 14:45:39.854302: W tensorflow/core/framework/allocator.cc:124] Allocation of 125829120 exceeds 10% of system memory.
2020-06-18 14:45:44.694814: W tensorflow/core/framework/allocator.cc:124] Allocation of 125829120 exceeds 10% of system memory.
2020-06-18 14:45:49.393273: W tensorflow/core/framework/allocator.cc:124] Allocation of 125829120 exceeds 10% of system memory.
2020-06-18 14:45:53.788871: W tensorflow/core/framework/allocator.cc:124] Allocation of 125829120 exceeds 10% of system memory.
4545it [30:38,  2.87it/s]                                                                                                                                              Detected frames:  4500
4545it [30:41,  2.47it/s]
Saving results in /media/cat/14TB/insync_cm5636/march_2/video/dlc_2.26b_results/march_2_redo-cat-2020-06-17/videos...

hi again, you can speed it up by downsampling your videos! Or, use a different network.

Downsampling: The networks are trained with scaling 0.5-1.5x in size, so scale down and performance is the same. We have speed benchmarks available:

for 2.2, when the paper comes out we will release more information.

Re: downsampling, it’s not an option (for now) as some of the features are small and are challenging to label. But DLC seems to do relatively well now!

Re: using a different network, the latest DLC GUI doesn’t have the NN dropdown (or I can’t find it). Should I just input it in the config file? And where are the names listed (I did read through this link https://github.com/DeepLabCut/DeepLabCut/wiki/What-neural-network-should-I-use%3F)

For example, how do I call this network: MobileNetV2_0.35

2.2 we only released our modified resnet-50 backbone for now. Re: downsampling, you don’t need to re-label, just downsample the video and run video analysis; see if it looks fine to you :wink:

Awesome, that would be amazing if it works as simple as that. Will try!

1 Like

Hi. The inference was much faster: 72FPS (640x512) vs. 2FPS (1280x1024). But the quality of the annotations are significantly worse.

  1. It seems there’s some nonlinear scaling issue here, maybe it’s related to insufficient memory? Do you recommend getting a 24GB GPU for this frame size?

  2. When will the mobile nets be available? And will they give us a 10x speedup?

Thanks so much

Overall # of frames: 4498 found with (before cropping) frame dimensions: 1280 1024
Starting to extract posture
4532it [50:55, 1.96it/s] # 1024 x 1280 FRAME SIZES Detected frames: 4498
4532it [51:05, 1.48it/s]

Creating labeled video for 2020-3-8_12_08_57_943006_compressed_3min_lower
100%| … | 4498/4500
[01:07<00:00, 72.41it/s] # 512 x 640 FRAME SIZES
Error: %s axis 2 is out of bounds for array of dimension 0
4498 no data
4498 frame writing error.
Error: %s axis 2 is out of bounds for array of dimension 0
4499 no data
4499 frame writing error.
100%| 4500/4500 [01:07<00:00, 66.83it/s]