Combining Labeled Datasets

Hi, I am trying to expand my labeled data set because the first time I trained, it didn’t have the performance I was hoping for. This may not have been the correct way to do this, but I created a new project with the videos I was planning to add. After I labeled those, I moved the labeled data and video files from my previous project into the new project folder. I also added the corresponding video paths in the new config file; however, I am getting this error when I try to create a training dataset.

I also used the same body parts in the same order when labeling the different image sets.

Please see the documentation on active learning on how to improve the models and merge data:

Also this process is outlined in the Nature Protocols paper, and see Figure 1.

I attempted to use deeplabcut.merge_datasets (from the document); however, I still got the same error when I tried to create a training a dataset.

are all datasets, etc on linux?

I created them both on linux (two separate projects), and then I combined the labeled datasets/video folders and added the new video paths in the config file.

I just dragged the labeled-images folders (with the extracted frames and .csv files) from a project I had already trained into a new project folder that only had other labeled frames in it and the corresponding videos.

the csv files are not used within DLC, only the H5s. But this is really hard for me to debug, as it’s not how we recommend using DLC; i.e. we suggest don’t start a new project, simply fix the other one, and that is how merge works (i.e the links I posted above). If you are combing across projects you have to be absolutely sure the experimenter name is identical, bodyparts are identical.

And what if the data sets were created by different scorers? Its impossible to merge at that point without going in and editing the HDF5 keys I suppose? We have some nice training sets that were created by somebody who was updating the experimenter name for each training set he created. It would be unfortunate if this mistake cost him all that work.

There may not be a cleaner (built in) way to merge them all than brute force of dragging and dropping folders, but you don’t have to edit the hdf5 keys directly. You can run deeplabcut.convertcsvtoh5(config, user_feedback=False, scorer=newname) (see here) on that project.
I suggest making a backup of all of the data before doing that though. E.g. you might create a project from scratch with your desired scorer name (let’s say it’s “newname”. Then copy (don’t move, copy) the folders in labeled-video from the other project(s). Add the video paths and otherwise update your config.yaml. Then run deeplabcut.convertcsvtoh5(config, user_feedback=False, scorer=newname) script which will change the scorer in all the files in that project folder automatically. If you set user_feedback=True, it will ask you to confirm for each one.

Note that this will not merge any trained models, just the labeled data. You would need to start training from scratch.

Excellent. Thank you, Brandon.