# What to do with likelihoods < 0.95?

Hi everyone!

First of all, I should note I have been programming for less than a year, so there are still so many things I need to learn and understand.

I’ve been using deeplabcut for a while and I’m currently working on some Python scripts to analyse common variables in several typical behavioural tasks in neuroscience, such as object recognition, y-maze, elevated plus maze, etc. Because this tool is so cool, now everyone’s working on developing scripts to analyse data from DLC. Actually, it has been recently published a paper in which they developed a package in R to analyse different values from dlc-trackings: https://doi.org/10.1038/s41386-020-0776-y.

Together with @EMerinoM, we’ve been having a look at results in this script and we find that all ideas they present are so great. They’ve done a great job. However, we have one small doubt.

In order to clean up the rows that contain a likelihood < 0.95, I assume that they delete those rows (or that’s what we’ve understood). I was not deleting any row, but correcting and retraining with DLC. However, I see that not 100% of the estimations have likelihood > 0.95. I mean, although I retrain I still get some likelihoods <0.95. I was working with those predictions because I understood that these were predictions made from other rows below and above. Having a look at ‘x’ and ‘y’ coordinates, they did not change so much from coordinates above and below. Here an example:

I noticed that, for the nose tracking for example, when mice are grooming (I work on mice), the nose is not visible in the frame, and that’s why I understand it gives a low likelihood value. However, I understood coordinates were estimated based on previous and following frames. I hope I’m explaining myself…

I want to understand what kind of prediction is done when likelihoods are below 0.95, as I think I don’t really understand it. And I’d also like to know which error do you think is minor: a) deleting rows containing bad likelihood values; b) keeping those rows because they are estimations. I hope I’ve explained …

Thank you, everyone, for your help and suggestions.

The per frame data is x, y coordinates in pixel space of the keypoint, and the likelihood of the network. When you have occlusions, this should drop, assuming you do NOT label when you do not see a point. Therfore, the likelihood is not previous - next, i.e, n-1 to n+1, it is just n.

Also note, the value of 0.95 is completely dependent on your data. You should plot the likelihood over time. You can use `plot_rajectories` to do so:

You can also of course set a p-cutoff (threshold for the likelihood to be included), as you do for videos, for your analysis, for all body parts, or a different pcutoff per body part.

I have not used the DLC R-Analyzer from the ETH group (also note many helper packages are listed here: https://github.com/DeepLabCut/DLCutils)

But, you can use all points, or thresholded data in your analysis; I would not (for good practices) delete data - just threshold to see how this affects your performance…

hope that helps

1 Like

Thank you very much for your answer!

1 Like