Is there a way to evaluate a DeepLabCut network on data not present in the test/training set?

I am interested in the performance (RMSE in pixels) of my network on a separate annotated dataset entirely from the one it’s trained on. The separate dataset is generated by DLC and so is in the correct format, but has no overlap with the training dataset. My question is: is there a simple way to run DLC’s evaluate_network function on a chosen dataset? I have tried changing the dataset parameter in the test pose_cfg file to the desired filepath, but it still seems to evaluate on the training dataset.

Alternatively, is there a way to control specifically which images in the training dataset are used for training, and which for testing? As far as I understand it, DLC chooses test and training images at random according to the Training Fraction.

I think there is some confusion about what evaluate is doing. Evaluation is comparing the human-labeled position in an image to the networks postion, in that same image. Thus, you cannot evaluate on non-human labeled data. If you want to analyze new videos, you just run deeplabcut.analyze_videos

Firstly, you can evaluate a model on a new set of images: see How to move partial labeled frame to a new project & how to quickly evaluate a trained dataset on analyzing new frames without retraining

Secondly, you can then compare the performance with the functionality in here:

Thirdly, yes you can also pick specific images for training/testing by passing trainIndexes and testIndexes see:

These are exactly what I needed! To clarify the first solution, in what format is the parameter “DataCombined” in the pairwisedistances function? From the source code I would guess I pass it a pandas DataFrame with data stored under 2 different scorers, one a human labeller and the other the network itself?

Thanks for the quick and thorough response!

Exactly, you can just stack the two dataframes: