The tracking effect of multiple animals under iron net is poor(DLC2.2b)

Sample image and/or code

Background

Hello, everyone
The project I’m working on right now is to track the movement of the sheep in a series of videos like this and get the x and y coordinates.However, I found the DLC had a poor identification effect on the two goats under the cover of iron bars, Could you please tell me how to reduce the impact of the fence?
In addition, if I have a large sample of different sheep, how many sheep do I need to train, or train all of them and then track the tracks?
Thank you very much!I look forward to your advice

Challenges

cover

  • What have you tried already?
  • Have you found any related forum topics? If so, cross-link them.
    yep, I tried to replace it with a skeleton, but there would be a far offset point
  • What software packages and/or plugins have you tried?

First thing I would personally try would be to use EfficientNet (update to newest version of DLC) instead of ResNet for a bit more accurate tracking since image is dusted with small, sharp edges of other objects. Also not sure how represenative the photos are, but it looks out of focus (maybe too long focal length of the lense? since everything looks similarly sharp or rather not sharp). If you set the camera to be focused more on the distance the tracked objects are at it will help a bit.

Proper size of dataset you feed to the model, refine model by extracting outliers, remember to have skeleton connected properly. The fence will probably result in a lot of short occlusions so interpolate the tracking after video analysis.

Train on as many different frames, different animals, different conditions etc. as possible. For instance 20 frames per video from 10 different videos will be better than 100 frames from 2 videos. Consider cropping the input for efficiency (bottom and top of the image together take like 2/3 of the view but have no important information - assuming goats are only behind the fence). With maDLC train for ~50k iterations.

1 Like

Thank you for your suggestion,now I have updated to newest version. And it is true that the focus is not on the animals, and it is impossible to re-record the video now, so I have to figure out a way to solve this problem.I have tried to extract outliers, but the result is not ideal. The labels of two animals will be misplaced.
How should the skeleton be connected correctly?The workbook tells me to add redundant connections.But I retried connecting only the backbone and it worked better than redundant connections.In addition, it seems that the smaller dot size has better recognition effect (less affected by iron mesh)
Thank you for your reminding. Before the training, I used clipping to improve the efficiency. The number of videos I used was 5-10, but 10 frames were usually used for each video.And it’s been trained nearly 50,000 times.
I also have a question, if I train 10 videos together, which involve 20 different individual sheep.Should they be set as individual1-individual20 in the config file, or should they only set individual1 and individual2 in each video?

According to documentation, skeleton should be connected “everything to everything”, cover all possible connections. Dotsize is only a plotting feature, it doesn’t change the models performance in any way. Though bigger dot might make your performance in labelling worse (I tend to use very small dotsize when labelling for that exact purpose - so my accuracy would be better).

You should label at least 200-250 frames for the model before first training. Otherwise it’s not even close to being robust enough for proper pose estimation. Also remember to use augmentation.

There should be as many individuals specified as can be seen at once in the video. If you can see 2 goats at most, 2 individuals is how many you want. Model doesn’t know the goat it just keeps indentity inside one analyzed video to one individual. Think about like it looks at first frame and goes “There are two objects I can detect features of and keep track which labels belong to which one” rather than “There’s Josh and Kevin but no Andrew and Frank” :smiley:

1 Like

Hahahah is an interesting example. So I can train the model with 40 sheep, and then use it to recognize all the videos with 200 sheep in total?(Maximum of 2 per video)

Yes, the model should find out what features, around where your labels are placed, represent broad idea of a certain animal. I thought these were goats and just now read that they are in fact sheep (sheared I assume) :smiley:

Anyway, don’t be discouraged by missing detections due to the fence occluding bodyparts (if skeleton won’t solve this for you), these should be easy enough to interpolate.

1 Like

Thank you Konrad, your advice was very useful!

Hello, sorry to bother you again.In my multi-animal analysis, 6,000 iterations track better than 100,000. Why is this happening?(I haven’t changed anything except iterating more times.) :sob:

Hard to say not knowing all the details but if your dataset isn’t big enough the model might be overfitted at 100k point resulting in poor detections on test data/analyzed video. Check previous snapshots, maybe at around 50k point (if you have one saved) and consider expanding your training dataset.

Could you provide plots from a video analyzed with 6k iters and 100k iters for comparison and evaluation results for those two snapshots?

2 Likes

6000_sk_bp_labeled 118000_sk_bp_labeled
The top one is 6,000, the bottom one is 100,000. The sk
I tried to connect all the skeletons, but that caused the tags of the two animals to misplace

My best guess is that it’s overfitted since some detections are correct (perhpas most similar to trainig data). The top one looks quite all right. If the likelihood of detections is low or test error is far from train error, I’d expand the dataset by extracting and labelling outliers and train for more iters or simply use a snapshot around 40-50k). Since there seem to be some problems with detecing the bigger sheep the retrainig with some corrected outliers might help. Anyway you’re pretty much good to go, filtering and interpolation will solve the rest.

1 Like

Thank you for your advice. I will try to do so.And I’m sorry I didn’t find an introduction to interpolation, what does this word mean?

It’s an estimation of missing data using neighbouring data. For instance if the bodypart is detected in frame, 1 and 3 but not 2, the frame number 2 can be interpolated based on how the label moved from 1 to 3. Different methods would use different ways to interpolate, for instance linear would just draw a line between 1 and 3 and take point in the middle as the frame 2 detection. Cubic spline would try to make the signal more like analog, creating kind of a curve between points etc. You can read up more on this in scipy documentation for instance (Interpolation (scipy.interpolate) — SciPy v1.6.1 Reference Guide) There’s some examples that make it easier to understand.

1 Like

did you use refine detections? if you don’t delete bad points in the GUI, then these will go into the training set. you should check carefully the output of check_labels before creating a new trainingset as well :slight_smile:

1 Like

Thank you for your introduction of the principle. May I ask how to realize this kind of estimation in DLC?Or do you need to write additional code to optimize the data?

Hello, I did Check Lebels before creating the data set. Maybe I was not sensitive to its result, so I did not make any improvement.How to modify the bad points in Label Frames according to check_labels?Sometimes I don’t know which of my labels are wrong.

There is a “fill of gap size” in Refine Tracklest, do you mean this one?

Yes, that’s one way you can do this. Check User Guide for more information DeepLabCut/maDLC_AdvUserGuide.md at b683f78394c86d3f008d676cf24606eab9fea558 · DeepLabCut/DeepLabCut · GitHub

If you want to use particular method for filtering and interpolation, use deeplabcut.convert_raw_tracks_to_h5(config_path, picklefile) and write custom code to use on raw data of model’s detections.

Thank you! I still have a lot to learn about this.Thank you for your patient guidance

Also, I took another look at the videos you posted, not sure if this is due to how the model was trained or due to labelling so a piece of advice. Even the better tracking shows high variation of label placement. I know you might sometimes want to place a label closer to the place it should be when it is occluded but in reality it’s better to skip it and focus on being as consistent as possible with the way you label bodyparts. For instance, when I label my rats ears I consistently label the tip of the ear and not other parts even though it still is the ear.

Being consistent provides information for the model that allows it to also be more consistent in labelling, reducing jumps and mismatches because information it used to learn prediction was very specific. It of course depends on what’s your use case and threshold for accuracy with certain bodyparts, maybe just smoothing what you have here is good enough for how you’re using the data.