Multi-animal refining training workflow suggestions

I’m looking for suggestions for speeding the refining portion for a training data set for a complex multi-animal project. I’m sure my questions are based on me knowing just enough about this to get myself into trouble, but not enough to solve my own problems.

Currently, I’m training the project on one video with one animal, 38 bodyparts. Some bodyparts are obscured frequently, others are always visible. I haven’t added a multi animal video (yet).

After analyzing the video, I can create a video with

dlc.create_video_with_all_detections(config, vids, scorername)

That video has all of the markers, though not necessarily in the correct place, as expected.

For single animal projects, I would extract outliers, refine, and merge. If there were points missing that I needed to add but that did not show in the refine GUI (or I deleted points by mistake), I would reload the image in label frames, and save. Then move on to retraining.

With 2.2b6, if I convert to tracklets and open the refine_tracklets GUI, there are problems:

  1. Some bodyparts are not marked (maybe off screen?) in some frames, so I can’t refine those frames properly. If I change various things in inference_cfg.yaml or max_gaps, then some points show up, but not necessarily all of them. So, again, I can’t properly refine without suggesting a negative image for specific points.
  2. Yet, some bodyparts are marked and shouldn’t be, or are in a different place that the detection video would suggest. And there is no way to delete them in the refine tracklets window (or is there? Some of the keyboard shortcuts don’t work on my windows workstation. I’ve changed a few and will submit a PR to provide some options that work).

So, I’ve tried the old style of extracting outlier frames. But that can’t find an unfiltered track file immediately after analyzing the video.

My first workaround was to convert to tracklets, refine the tracklets, save without changing anything, then extract outlier frames and refine.

That leaves a couple of questions, some of which might be answered in a future paper from you all?.

  1. Are there specific parameters to change in the config file(s) that will let refine_tracklets show the raw detections for this kind of early-in-the-process training? In that way, the refine tracklets GUI would almost act like a manual extraction of outliers and refining?

  2. Is there a function that would convert the output of analyze_videos (as used in create_video_with_all_detections) to a format usable by extract_outlier_frames? I have tried convert_raw_tracks_to_h5 (on a whim), and it doesn’t accept the full.pickle (KeyError: ‘header’, so obviously not the right format). Passing it a tracklet pickle works, and lets me extract outlier frames. But the labels have the same problem as the refine_tracklets (some are missing completely).

  3. Is there a better way to refine such errors? Am I missing something completely obvious?


Hey Brandon,
I can answer more later, but this is also really helpful to see your workflow, etc!

you can hit the “delete” key, but also be aware in the tracklet GUI even low conf. points are shown, that might not be there if you create a labeled video with the video.

– you can do the same in maDLC, i.e. if you video you create at the pose estimation step is not good, then going to tracklets will indeed be bad.

i.e. you can label more data at this point to be sure the pose estimation looks good; here, the evaluation metrics you get are much more than a normal pre 2.2. project; you can run the evaluation, and see the RMSE, the pck (perfect correct keypoints) for train and test data. if this looks bad, abort and label more data for sure.

Thanks for the reply.

Ok. The delete key wasn’t working, but then again, I’m using a mac laptop (small keyboard) to remotely access a windows 10 workstation, so my key mapping may all be a bit wonky.

I guess then there are two options in the workflow. Option 1 is abort, extract more frames and label. Manual frame extraction would probably be best to try to find different appearances. I’m kind of hoping for option 2: find poor fits with extract outliers. I think then my main question is if there is a way to extract outliers from that initial pose estimation?

I’m guessing you all have been a bit busy with other things and haven’t written the function to convert from the full.pickle to the style expected by extract outliers. I just want to make sure I’m not missing something that already exists.

1 Like

good q! “’m guessing you all have been a bit busy with other things and haven’t written the function to convert from the full.pickle to the style expected by extract outliers. I just want to make sure I’m not missing something that already exists.”

@jeylau - do we have such a way to do this right now, i.e. post-full.pickle --> extract outlier, this is a good idea to do before full 2.2 :slight_smile:

@MWMathis Yes (although this is not yet wrapped in a single function)! One can call deeplabcut.convert_detections2tracklets, deeplabcut.convert_raw_tracks_to_h5, and deeplabcut.extract_outlier_frames in succession :slight_smile:

@jeylau, Thanks. I’ve tried that. But I was hoping to avoid the convert_detections2tracklets part. The results from that are pretty different than the video based on the full.pickle.

But maybe there’s something I can change in inference_cfg.yaml to help? I’ve tried modifying a number of things already (by trial and error). I want the tracklets to mimic the full.pickle as much as possible (no filtering, no fitting, etc, just the best -even if bad- detection for that bodypart in that frame).

Basically, the full.pickle pose estimation (with a very low pcutoff) shows that most of the tracking is at least on the animal (just the wrong place at times), with an occasional jump to something else in the view. That’s what I want to work with to find where I need to add labels.

The output from the tracklets looks like those data are smoothed or somewhat ignored. There are frames where none of the points are even on the animal (which makes dragging them all into place a real pain).

1 Like

I second backyardbiomech, in the refine tracklets GUI (after all the filtering, fitting, etc) many body parts appear incorrectly labeled although when I look at the labeled video created during the analyze videos step (i.e. before the refine tracklets step), everything looks kind of ok (at least some bodyparts that appear correctly labeled in the video appear wrongly positioned in the tracklets GUI).

There are also way too many ID swaps observed in the trackelets GUI; as well as many frames where the labels from both animals are all on one animal.

Seconding backyardbiomech, what can I change in the inference params to improve these issues? (I’ve tried changing most of the parameters by trial and error, but see little improvement). What else can I try? Do I need to go back and label more frames and retrain the network?

And is there a way to get the next/previous frame buttons to work in the refine tracklets GUI? It’s incredibly hard to do so many corrections without this feature.

Just for reference, we trained the network on 9 videos of 2 mice in a cage (and labeled 8 bodyparts in each animal). The x-validate errors were: rmse train 3.8 and rmse test 29.83.


For what it’s worth, my evaluation rmse’s are in the range of 3-5 for train and test, and I still have those problems with the refine tracklets. I definitely don’t have good sampling of views with my labelling, though.

And @LJacinto, I submitted some minor code changes that should help with the next/previous keys on Windows.

Does anyone have a solution for this? Or should I ask, has anyone been able to get good results with the refine tracklets stage for multi-animal videos? No matter how much you train the network (or whatever errors you get when you evaluate the network) or how many parameteres you change before loading the tracklets GUI, the refine tracklets output is always poor (especially compared with the video output created at the analyze videos stage): way too many ID swaps, wandering/mis-positioned points, which makes it impossible to refine by hand.

Hey @LJacinto, I am digging into the issue!
Meanwhile, could you have a look at deeplabcut.utils.make_labeled_video.create_video_from_pickled_tracks()and let me know if the video it produces look fine? That would allow me to better diagnose tracking :slight_smile:

1 Like

Hi @Jeylau, thank you for the reply and help.

I’ve ran the code. But in the resultant video, the labels just keep popping on and off throughout, and changing color continuously. (when they do show up, they appear to be in the correct place).

Thanks for letting me know! Then there is definitely something going on with tracking. Did you run cross-validation or do you use manually tuned parameters in the inference_cfg.yaml? Also, do both tracking methods perform the same?

I’ve done the cross-validation.

And I’ve also tried manually tunning the inference.cfg.yaml parameters. No matter what parameters I change the result is always poor (I would say even poorer when manually editing).

I’ve also tried both tracking methods. Box performs better.

Is there anything I can do to improve the tracking? Judging from other comments here and elsewhere, this issue seems to be happening to more people…

I’ve done some more testing and tweaking, and gotten things to work a bit better.

I manually extracted some frames that represented times of the videos where the tracklets seemed especially off. That got my evaluation errors down.
Then, the default parameters led to better tracklets. It’s still not perfect, though.
Until the multi animal paper publishes and clearly describes all of the extra parameters, I think the best strategy is to focus on getting the best tracking for each point, and worry about jumping between individuals later.
This means my training workflow is now a bit different than for single animal DLC.
Extract > label > train> evaluate > use refine tracklets ONLY to find problematic frames > extract frame (not extract outliers) > label frames > train > etc.

Once the tracking is pretty good solid, the refine tracklets GUI is far more useful.


Thank you @backyardbiomech for your input. The problem with the approach of removing the frames is that there are way too many tracking issues and that would take a really long time (ie. finding and removing those frames), but we’ll give it a try for a couple of videos. We are also trying to crop the videos to segments where there are less tracking issues (but that defeats the purpose, in part).

We’re also looking into social interaction between animals, and having the constant ID swaps renders any further analysis impossible.

I agree, we need a clear descripition of each inference parameter and how it possibly affects the output.

1 Like

Yes, I’m tracking crawfish fights, so they really “interact”.
You don’t have to extract every frame with an issue. If you have a length of video where a few points are not tracked well, just extract one or two of those frames.

Once each point tracks pretty well, the work on the jumps between individuals. I’ve found that once things are tracking well, I can increase the connections threshold (I start with it pretty low) and get fewer jumps. Once it’s up to about 0.75 * the number of multianimalparts, I get far fewer jumps between individuals.

Hi All, I also worked on a document specifically for maDLC that I will continue to update as modes/suggestions pop up. Feedback welcomed here:

I just had a look. This is very helpful, especially noting the break point after pose estimation!

1 Like

Hi all,

I am experiencing the same issues presented above. Background: I am using maDLC (lastest 2.2v8 version) for single animal tracking. As already described above, I get a decent labeled video after calling analyze_videos and create_video_with_all_detections, however upon converting to tracklets (‘convert_detections2tracklets’ + ‘convert_raw_tracks_to_h5’) and checking the created video afterwards (utils.make_labeled_video.create_video_from_pickled_tracks), the labels are always mixed up. This includes trying out both box and skeleton trackings and fiddling around with inference parameters (both with cross-validation and manual setting).

I am posting here, due to an observation that wasn’t noted above and might help with debugging. I’ve noticed a strange behavior with labels after converting to tracklets: in labeled videos after conversion, some frames seem to be labeled correctly, however in frames immediately following such correctly labeled ones, the labels seem to start drifting away from the animal in linear trajectories. The best analogy to what I see is breaking a cluster of billiard balls at the start of the game of pool - at the beginning, the balls are nice and ordered (=correctly placed labels), but after hitting them with the white ball, they scatter in all directions on straight (linear) trajectories. This seems to happen throughout the labeled video - scatter from correct positions, snap back to correct positions, scatter from new correct positions, etc.

I know you are incredibly busy trying to resolve this, however I would at this point like to ask for 2 clarifications that might provide a temporary solution:

  1. Does convert_detections2tracklets + convert_raw_tracks_to_h5 save labels seen in labeled videos before conversion to tracklets (saved in full.pickle file) or the labels seen in tracklet GUI? I’m afraid that since the labels are misplaced in the GUI, these are the ones saved and thus working with that .h5 would provide wrong pose estimations.
  2. On the Overview page of your docs, you stated 3 scenarios for using maDLC. I don’t understand the difference between the first two. If I only wish to track a single animal in my videos, do I require to use maDLC ( multianimal=True) or not ( multianimal=False) in order to get DLC2.2 version benefits (faster training with less iterations, the use of skeleton in training, etc…)? Maybe having only a single animal, but using a multianimal project structure is somehow messing with the tracklets GUI.

I thank you sincerely for your continued support and hard work!

EDIT: I am not sure, if people are aware of this topic. Pandas contains a read_pickle function, with which you can easily read the pickle file into python to get the label coordinates, that you see in the full labeled videos. This can be a temporary workaround for the tracklet issue at the moment.

This is because of the gap filling function. When the tracklets are being built, detections in individual frames are linked together into tracklets if they pass certain thresholds (way more than just the pcutoff). You can find reference to those parameters in other posts; I’ve found that dropping pafthreshold is especially helpful. When detections (even decent ones) in one frame can’t be linked to detections in the previous frames, those detections may be built into a new “individual” or ignored (depending on parameters like minimum trackless length). If they are ignored, you have a “gap” (even if that individual isn’t detected again).

So, if a frame doesn’t have detections that can be linked to a previous track, you have the max gap to fill parameter to “fill” those gaps. I haven’t explored that part of the code, but I’m assuming there is some spline or polynomial fit to the pre-gap track, and those points you see are just that gap. It works really well if you have a gap <5, or very slow moving points, but it does what you describe for larger gaps or fast moving points. Try setting max gaps very low (1 or 2) and that behavior will go away.

1 Like