Train RMSE increases as number of frames increases


I’m working on a multi-animal project on monkey videos. In each video there are two monkeys.
I defined 31 body parts with 2 individuals (monkey 1 & monkey 2) and defined the skeletons following the over-connect rule in maDLC. I’m doing the training up to 103K iterations.


I do the training each time I add a few frames, and I notice that the train_rmse (calculated at the cross validation step) does not have a downward trend with increased frames, which is kind of counterintuitive to me. Here is the trend:

#frames train_rmse
164 3.145755
209 11.779597
263 3.578877
298 5.168994
338 5.737633
353 3.862328


  1. What might be the possible reasons behind this? I double checked all the labels in all the frames, so there is no label quality issue here.
  2. Is train_rmse a good indicator at all to evaluate the performance of the model?

Any advice or comment is appreciated.

if you are creating a new shuffle each time with different frame index IDs then the split is random in test vs train; so you should expect some noise here. 11 seems the outlier; the others all in the range; how many images are you using. If you want to truly compare models, you should always train multiple shuffler per set of labeled data, read more on how to compare models here: What neural network should I use? · DeepLabCut/DeepLabCut Wiki · GitHub

Thanks for the response! I have been using default network as well as augmentation method so I think it’s definitely a good idea to do a benchmark model comparison. I’m running it right now.

Beyond that, what I’m really want to compare here is the model performance as I increase my training dataset size. Is there a way to fix the train/test split as the training data grow? Maybe I can just fix the test data, and look at how the test_rmse change?