Basic questions on the User Guide

Hi everyone! I am very new to DLC and am struggling a lot. If someone could answer some basic questions I have, it would be very helpful.

  1. Could someone elaborate on what does num_shuffles and shuffle index mean? I understand that num_shuffle is used to create multiple splits, but why would someone want to do that?

  2. While following the modules posted on the summer course 2020, I came across this video: https://www.youtube.com/watch?v=WXCVr6xAcCA. Why is shuffle index selected to be 2 in step 5 train network? How do you decide by looking at the log which is the best model?

  3. How do you decide what p-cutoff value is optimal?

Thank you so so much in advance for taking out the time to answer these questions! Really appreciate it.

1 Like

Hi Sarah, welcome to DLC and the forum.

Could someone elaborate on what does num_shuffles and shuffle index mean? I understand that num_shuffle is used to create multiple splits, but why would someone want to do that?

So the idea is you want to make replicates to test how good a network is, just like an experiment in the lab. Here, shuffles are different pools of images for training/testing. When you label and then create a training set, the default is 95% goes to training, and 5% of your labeled data is held out for testing. That is how you get the blue and red lines in the plots (Mathis 2018, 1 shown below). The multiple +'s (more faint) are the 3 replicates, i.e. shuffles. So, you would want to do this if you are trying to test performance.
So, you can train shuffle=1 or shuffle=2, etc and run evaluation on this. Index is just that, the index of what shuffle is i.e. [1, 2, 3] or [4, 5, 6] you can fetch index=0 and that would grab 1 or 4, respectively.

While following the modules posted on the summer course 2020, I came across this video: https://www.youtube.com/watch?v=WXCVr6xAcCA. Why is shuffle index selected to be 2 in step 5 train network? How do you decide by looking at the log which is the best model?

Partly answered above, I just used shuffle=2 vs. 1 nothing else special about it. You decide on which model is best during evaluation: looking at the errors (RMSE #) and the plotted images, you should inspect them. We discuss this quite a bit in the Nature Protocol paper (Nath et al, 2019) as well, so check that out.

How do you decide what p-cutoff value is optimal?

You run plot_trajectories and look at the plots. Here is an example (the file is called likelihood), where the animal becomes occluded, so the confidence (likelihood) drops. If I want to analyze this file, or create a video, I would set the p-cutoff to 0.7 to get some “lower” conf. points, but not to get points that should not be tracked, i.e. setting it below 0.5 would not be great. You can also try out diff p-cutoffs; it doesn’t hurt the .h5 file that is used; you just simply delete the video and re-run it

1 Like

Thank you so much Dr. Mathis for writing out such a detailed response. This helps me a lot! I am sorry for getting back to you so late, I was unwell for sometime and not working. Thank you again!

1 Like