Test error worse after extracting outlier frames & refining labels

I wanted to retrain my network with some new videos, so I extracted outlier frames and refined the labels before retraining.

Here’s the plot of test (orange) and train (blue) error from the first time vs training iterations.

And here is the same plot for the second network.

I was also surprised to see the test error spike.

Any thoughts?


Did you retrain from scratch or from the previous snapshot?

I retrained from scratch.

I just redid the training with a new project and dataset, and I see something similar.

The only difference I’ve come up with is that the first time I did training, I used imgaug loader and the last 2 times I’ve used the default loader. Could that cause the spikes in test error?

Hi ijh,
Sadly I can’t answer your questions, however you can probably answer mine ;).
I would like to generate a similar rmse versus iteration# graph. Could you explain me how you did?




Check out this forum post Plotting RMSE and # iter


  1. max_snapshots_to_keep = None while training
  2. snapshot_index = all in config.yaml before evaluating
  3. take the csv with results after evaluation
  4. make graph

Thanks for your answer!!

I tried running the evaluation with snapshot_index = all, and it does evaluate all snapshots.
However I can’t find the .csv with the rmse vs iterations #, are you also using 2.2b7? with a multi-animal project? If yes where is the .csv supposed to be?

Hi ijh,
I am getting back to you about the rmse vs iterations number. Are you also using the version 2.2b7? with a multi-animal project? If yes where is the .csv supposed to be?

@Ant1, you’d find the CSV file under the dlc-models/test folder after running evaluate_multianimal_crossvalidate. Note that with DLC 2.2b8, the file was renamed to results.csv and is now stored in the evaluation-results folder :slight_smile:

Okay thanks!! However in the results.csv (I am using 2.2b8 now), there is only crossvalidation results for the last iteration. I am using the GUI to do the crossvalidation, should I run evaluate_multianimal_crossvalidate manually? Is the idea to run it for each iteration separately?

Hi @jeylau

Any thoughts on why some snapshots had way higher test error than the ones before or after?
The only difference I can think of is that the first time, I used the imgaug loader, and the second and third time, I used the default loader because I forgot to change it.

If it matters, I have deeplabcut, and it’s not a multianimal project.