Which measurement to report if only a small portion of the dataset has groundtruth annotation?

If we have done 1M iterations using ResNet50 for Pranav mouse openfield dataset which has 116 frames out of 2330 frames annotated, which is 5% of data, what measurement should we report if we are comparing different hyperparamaters for deep learning?

Considering that we don’t have groundtruth for the entire data, I am perplexed by how the goodness of annotations could be verified apart from eyeballing them.

This is basically what I had which shows training loss as far as I guess:

iteration: 999970 loss: 0.0008 lr: 0.001
iteration: 999980 loss: 0.0005 lr: 0.001
iteration: 999990 loss: 0.0009 lr: 0.001
iteration: 1000000 loss: 0.0006 lr: 0.001

Specifically, after analyzing is done and trajectories are plotted, the following result file is empty:

[jalal@goku openfield-filteredDec4-trainset95shuffle1]$ cat DLC_resnet50_openfield-filteredDec4shuffle1_1000000-results.csv 
,Training iterations:,%Training dataset,Shuffle number, Train error(px), Test error(px),p-cutoff used,Train error with p-cutoff,Test error with p-cutoff
[jalal@goku openfield-filteredDec4-trainset95shuffle1]$ pwd
/scratch3/3d_pose/animalpose/experiments/mouse1M_resnet50_DONE/openfield-filtered/evaluation-results/iteration-0/openfield-filteredDec4-trainset95shuffle1

Please review our paper: https://www.nature.com/articles/s41593-018-0209-y

Also note from our demo file names, the labeled data doesn’t not come from the test video we provide.

Also, please see the email I sent you yesterday.