Editing .h5 files to reduce training set size after labelling?

Hi,

I’ve labeled a bunch of frames from 4 videos, with NumFramesToPick initially set as 300. However I understood this as the total number of frames that would be picked across all videos, but actually it picked 300 frames from each video. I have labelled 100 from each, and would like to use only those for training.

It seems that I could set NumFramesToPick as 100,delete the unlabelled frames from the labeled-data folder, and the corresponding lines from each of the CollectedData.csv files. However these folders also contain CollectedData.h5 files, which are not easily opened (they appear empty when viewed with HDFViewer, for example). Is there a need to edit these as well?

Thanks, and apologies if I’ve missed this somewhere.

Marcus Watson

The correct way would be to use dlc.dropimagesduetolackofannotation(config) which would delete unlabeled images and then use dlc.dropannotationfileentriesduetodeletedimages(config) to remove the entries from deleted images (I don’t think the first function does that after deleting images). In your case you should use the second function now, since the .h5 probably contains those entries still.

1 Like

OK, thanks for this, I will read the help files for these function, try them out later, and mark your response as the solution assuming it works as expected.

Another question: Is it perhaps useful to leave some unlabelled images in the set where none of the body parts are in view? I have several images before the animal has entered the environment, and hence there are no labels, but it seems it might be useful for the model to incorporate the positive information that there is no interesting information in those images. However my understanding of the deep learning process may be naive.

As far as I know model gets nothing from “not seeing stuff”. To simplify, it’s a math equation, if there is no X in your equation there is nothing for you to find. What model of pose estimation and object detection does, again very simplified, is multiple matrix calculations (filter x pixel values of the frame) and tries to amplify what would be most prominent features and help find solution to your question (then there’s weight changing, backpropagation and a lot of stuff I don’t even pretend to understand :smiley: ). From my understanding it does nothing if not hurts the performance (the model still has to perform calculations on empty frames with no way to even check the performance since there are no human labels to compare to with it’s prediction and no way to adjust weights).

Hmmm… having tried these functions, they don’t seem to do anything. E.g. the output of deeplabcut.dropimagesduetolackofannotation('/home/marcus/DLC/BardTest/BardTest-Marcus-2021-01-24/config.yaml' is

Annotated images: 300 In folder: 300
PROCESSED: /home/marcus/DLC/BardTest/BardTest-Marcus-2021-01-24/labeled-data/SceneB now # of annotated images: 300 in folder: 300

However I know for a fact that only about 85 of these images have any annotations at all. Anything I’m doing wrong here?

There is not much that can go wrong using this function. It just checks .h5 file to see if certain image file is listed there with annotations and if not, deletes it from the folder (in loop over all folders in that project). Can you open the the Collected_data_***.h5 and check the listed images and their annotations? A screenshot of the file content would be great.

Hi Konrad,

Sorry for the delay in response, it’s been a hectic couple of weeks.

I wonder if the problem is caused by the fact that there are multiple collected_data.h5 files? E.g. in my project’s config.yaml file, I have

# Annotation data set configuration (and individual video cropping parameters)
video_sets:
  /media/marcus/Data1/Videos/Bard/ncam-2020-12-15-11-59/FaceA.avi:
    crop: 0, 1920, 0, 1080
  /media/marcus/Data1/Videos/Bard/ncam-2020-12-15-11-59/SceneA.avi:
    crop: 0, 1920, 0, 1080
  /media/marcus/Data1/Videos/Bard/ncam-2020-12-15-11-59/SceneB.avi:
    crop: 0, 1920, 0, 1080
  /media/marcus/Data1/Videos/Bard/ncam-2020-12-15-11-59/SceneC.avi:
    crop: 0, 1920, 0, 1080

I can dig into one of these files and find that the 1st entry has no values, but the 3rd does:

In [25]: file = "/home/marcus/DLC/BardTest/BardTest-Marcus-2021-01-24/labeled-data/FaceA/CollectedData_Marcus.h5"

In [26]: h5 = h5py.File(file, 'r')

In [27]: list(h5.keys)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-5ab44f6e42bd> in <module>
----> 1 list(h5.keys)

TypeError: 'method' object is not iterable

In [28]: list(h5.keys())
Out[28]: ['df_with_missing']

In [29]: grp = h5['df_with_missing']

In [30]: grp.keys()
Out[30]: <KeysViewHDF5 ['_i_table', 'table']>

In [31]: grp['table'][0]
Out[31]: (b'labeled-data/FaceA/img000761.png', [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])

In [32]: grp['table'][2]
Out[32]: (b'labeled-data/FaceA/img001407.png', [ 481.77758007,  278.85548864,           nan,           nan,  467.06090884,  435.83331508,  611.7748426 ,  426.02220093,           nan,           nan,  511.21092253,  477.53055023,  687.81097728,  109.6137695 ,           nan,           nan,  766.2998905 ,  249.42214618,           nan,           nan,  913.46660279,  364.70273748,           nan,           nan, 1038.55830824,  676.20561183,           nan,           nan,  572.53038598, 1002.4251574 ,  619.13317821,  673.75283329,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan,           nan])

But when I run dropimagesduetolackofannotation I get the results described above, with all images staying in place.

Any idea what I’ve done?

Could you clarify what you mean by more than one .h5 file? Is it more than one in a folder with frames extracted from one video or just one in every video labeled-data folder?

This is how the content of one labeled-data folder should look like. One csv, one h5 and your extracted images. There can be additional files if you extracted outlier frames (machinelabels), looking like this:

Also, did you run dlc.dropannotationfileentriesduetodeletedimages(config)? Since you said you deleted images manually you should run this.

Could you clarify what you mean by more than one .h5 file? Is it more than one in a folder with frames extracted from one video or just one in every video labeled-data folder?

The latter. I also didn’t extract outlier frames.

Also, did you run dlc.dropannotationfileentriesduetodeletedimages(config)? Since you said you deleted images manually you should run this.

I didn’t actually delete them manually yet. I was trying to do it through dropimagesduetolackofannotation first.

One possible source of issues is that I don’t think there’s any single frame at all in which all the body parts I’ve selected are in view (there is a lot of occlusion in our environment), so there’s never a frame with no NaNs. But I don’t see why that would affect this issue.

Cheers,

Marcus