Recommended Settings for Tracking Fine Parts


I have been struggling to get accurate labeling of small body parts (Drosophila leg tips) despite using an extensive training dataset (800 frames from 5 diff animals males+females). I have tried lowering pos_dist_thresh to 9 and used 1 for glo_scale instead of .8 (also used imgaug). The resulting network was way too conservative (had hard time labeling even the obvious body parts that worked with the default settings). I did 800k iterations.

My video is 1300x900 and leg tips are usually 12-15 pixel in diameter. Any suggestions to improve labeling?

Thanks so much,


Can you share some test images for default parameters & (pos_dist_thresh=9)?

Yes, below is with the default settings. The leg tip tracking is not accurate. Maybe this is related to the fact that I use both left and right leg tips but not all of them are visible all the time.

Here’s with the size 9. Even though leg tips are still visible, network did not pick it up:

Here’s a better example of what I mentioned in my first post. The confidence of labeling drops but the tracked animal is in very close position.

Here’s from the same video 500 frames later (~10 sec), likelihood drops to 0.4:

Thanks again for the help!!

Your first image with standard parameters looks great, can you explain what exactly you want improved here?

What I see is for all bodyparts your ground truth (+) is really close to the confident predictions (the dots). There is one additional x on top of the front leg, which I suppose is a prediction for the occluded left hind leg tip? But given that it is an x, you will know that it is occluded due to the confidence. Is the accuracy of the other images too low?

As a side note, if the pos_dist_thresh is too small the network cannot be trained, which seems happened in your second case. Note that due to the locref layer and the architecture of DLC the scoremaps can have larger peaks than objects, but the regression vector is used to make accurate predictions.


When I analyzed a novel video and extracted the outlier frames I noticed that the labels for L and Right leg tips overlap sometimes with the wrong one having a higher likelihood value. I primarily wanted to improve that by labeling more frames.

Thanks for the side note!! I did not realize that. I think I will switch back to the default settings and try again with more frames (the first one has 400 labeled frames where as the pos_dist_thresh = 9 has 800).

To your question in the first paragraph: Yes! Some was very low like the above example

Absolutely, I do recommend to pool as many frames as possible. Also try to use the imgaug-loader, it gives better performance (and do full rotation augmentation).

You can easily set it like this before training (read and write to pose_cfg.yaml)

trainposeconfigfile,testposeconfigfile,snapshotfolder=deeplabcut.return_train_network_path(path_config_file, shuffle=shuffle ,trainFraction=cfg["TrainingFraction"][trainingsetindex])

cfg_dlc['scale_jitter_lo']= 0.5

cfg_dlc['batch_size']=8 #pick that as large as your GPU can handle it
cfg_dlc['motion_blur'] = True
cfg_dlc['optimizer'] ="adam"
cfg_dlc['multi_step']=[[1e-4, 7500], [5*1e-5, 12000], [1e-5, 50000]]


print("TRAIN NETWORK", shuffle)
deeplabcut.train_network(path_config_file, shuffle=shuffle,saveiters=5000,displayiters=500,max_snapshots_to_keep=11)

I see, cerainly make sure that you have multiple such challenging images. Also labeling multiple bodyparts per legs can greatly help accuracy. Reminds me a bit of our original data from Kevin Cury: Cool stuff! We were fine with ~500 diverse, labeled images there!