Training set (or config?) screwed up

Hey guys,

I am currently beginning to write my Master’s Thesis on detecting/tracking bees and thereby stumbled upon your amazing tool.

I basically created a skeleton with 4 points (blue: head, pink: torso, orange: abdomen, yellow: stinger) for my bees.
My labeled images look like this:

Then I trained the network on a few annotated frames.

Applying the trained network to a test video, resulted in the following detections, which were perfectly fine for having trained on only a handful of frames:

I then annotated more frames and trained again to improve the performance, but somehow the network performed worse than better and the detections look as follows:

Evaluation shows ~0.83 pck_train, ~0.77 rpck_train, ~0.15 pck_test and ~0.17 rpck_test.

This was the result of training (with cropping augmentation (20 x 200x200)) on 64 annotated frames from 4 videos (1440x1080p) for 50k epochs.
My config file looks like this: config.yaml -

My question is: How do I fix this mess again?
Is the video quality too poor?

  • motion blur
    • some frames (extracted by kmeans) contain much motion blur
    • tried with both keeping and removing them for training → no visual impact
    • Is the best practice to keep or discard them?
  • hardly a background contrast
    • hard to tell the difference between the background (honey comb) and the bee
    • Is the best practice to label those hardly recognizable bees or should I only annotate the clearly visible ones?
  • occlusions
    • the bees often occlude each other, so I only labeled the visible bodyparts
  • config
    • do the original or cropped videos still have impact during training or only the labeled_data?
    • I only have the original videos in the videos directory and no cropped ones
    • both cropped videos and original videos have the crop parameter set to “0, 200, 0, 200” (probably from last training crop setting), although the original videos are 1440x1080p and they were manually cropped during extract_frames

Thanks in advance for your help!
If you need more input, just hit me up.


Could you provide plots from one of the analyzed videos? Your use case seems particularly hard since the image is super crowded with a lot of possible individuals.

Could you explain what you did, i.e., did you add more to original training set, or make a new iteration, by i.e. “merge_datasets” – sounds like there is an issue with the old and new data formatting

cc @jeylau