Training settings for low resolution/contrast and thin features

Sample image

cam2_20200303_0414153606-obj1.tif (39.2 KB)


  • I have high framerate videos of mosquitoes (see image above) but with low contrast and low resolution (200 x 200 px) because I had to crop around the individual.

Analysis goals

  • I want to track the tips of the legs, body tips as well as wings hinges and tips.


  • For now I am using the defaults settings in the train cfg files with 150 labeled images and resnet 50. It gives me descent results for a first try but the accuracy is not great yet, the labels are often off by a few pixel, which in my case makes a big differences
  • Is there a tutorial/documentation about how to find the best settings for the training?
  • for example, multi_step seems to be important, here: Recommended Settings for Tracking Fine Parts it’s it recommanded to set multi_step with the following values, would that make sens in my case? And then why?

cfg_dlc[‘multi_step’]=[[1e-4, 7500], [5*1e-5, 12000], [1e-5, 50000]]

Yes the recommended setting from the link make sense.

In your case, I would also recommend to crop further (get rid of the gray pixels and then upsample the images). This will greatly improve the spatial accuracy. Also you could perhaps change your lens, when you record your data.

If you annotate the same image twice, can you actually annotate it with 1pixel/subpixel accuracy?

Thanks for your quick answer!
Cropping and upsampling seems like a good idea, I will try that. I can’t however change the lens.

Can you tell me more about why those recommended settings make sense?

Also, I don’t understand why annotating twice the same image will get me subpixel accuracy because I though it was already the case (using DeepLabCut GUI, I generate coordinates with subpixel accuracy).

What Alex is asking is if your human-applied labels are < 2 pixel in error; i.e. as you can see in our original report the best labeler we had was 2.7 pixel errors, which becomes about as good as the network can do. So, you are asking for < 2 pixels accuracy, so how good is your input data ;). Secondly, the default resolution downsamples, so you should set global_scale to 1 in the pose_cfg.yaml in the train folder before you start training…

Oh okay. In my case, the labeling accuracy is definitively sub-pixel. I didn’t realize that the training used down-sampled images by default (which totally make sense). I will try without, I guess it will be at the cost of speed.

what I mean by accuracy is comparing yourself labeling the exact same frame 2 different times; sub-pixel accuracy is essentially impossible :wink: Of course I understand you can zoom in our GUI and label 1 image w/sub=ix, but it’s the variability that really matters, as collectively this comes across in the dataset you amass for training.

Just a small update, I tried up-sampling and the recommended settings. It did improve the tracking quite a lot, so thanks again!!

I didn’t take the time yet to find the labeling accuracy.

1 Like