Best way to deal with false labelling (2 place preference with mice)

Sample image and/or code

Background

I have a MobileNet2 V1 model trained on ears, snout, tail base, and the tail tip of mice. I am doing a 2PP test under dim light conditions with a cover on top (reflections from the ceiling), which is why I ended up using deeplabcut instead of other options.

I trained the network with 10 videos from different mice and it performed well. Yesterday I decided to do a sanity check and see whether it still performs like it used to and was surprised to see a lot of mislabelling happening. A few weeks ago the whole setup was moved a few centimeters back and it seemed to have introduced some new reflections from above that are confusing the model.

Below are two examples from the same animal in one video. As you can see, one of these reflections is seen as the head/snout of the mouse. While in the other example everything is detected semi-okay.

Analysis goals

Given the “bad” conditions of imaging that I am working with, I accept that the labeling will never be so good as to be able to analyze animal pose. For now, I am happy with finding the center of the animal :slight_smile:

I am doing that by taking the center point of the snout, the ears, and the tail base, leaving out the tail tip since the labeling performance is rather bad.

I have 350 videos recorded. Might be relevant.

Challenges

What is the best approach to solving such issues? Some things I thought about.

  1. Retrain model: This is the obvious first option. But, does it mean that I have to retrain it every time there is a slight alteration in the setup?

  2. Crop videos: Cropping each video to only the area of interest reduces the number of weird reflections in the video. A quick test with a single video improves the situation but will need retraining of the model.

  3. Adding skeleton: I did not build a skeleton for the model. Will it make sense to add one? Maybe I am understanding it wrong but it seems like the skeleton defines the relation of the different points to each other. So adding it should improve the prediction and avoid such very separated predictions. Or am I mistaken?

  4. Switch to ResNet50/100: Would switching to another model besides MobileNet perform better? Originally I switched to MobileNet because we didn’t have a graphics card. Now we have a RTX 2080 Ti so things are at least fast :smile:

What is the best option? Anything else I am missing?

Thanks in advance!

If the conditions changed you probably will have to label additional frames. I’d go for labeling additional 50 frames and since you already analyzed video with the model, refine some labels from this video. After that, retrain model for 50k iterations and evaluate again. Switching to ResNet will probably help at least a little so you could just label additional frames and train new, ResNet50 model. Also, is it neccesary for you to have the cover - is the mouse able to escape when setup is uncovered? Those reflections are really bad.