Issue to align 'detected frames' with time scale

My original video has 6252 frames. After training, only 6241 frames were detected and the result showed just 6241 values (including x,y and likelihood). Even though the trace of 6241 frames could still reflect animal movement, the frame number cannot be aligned to time anymore. Also, there’re light stimulation in my video, the on/off time of the light stimulation may not be accurate if 11 frames were lost. Are there any ways to keep the undetected frames in the final results?

Thank you very much, in advance, for your help!

undetected frames are not dropped, so it’s likely an encoding error in the video. If you open the video with opencv how many frames does it have? or, in ImageJ, for example?

(opencv is installed alongside deeplabcut)
https://www.learnopencv.com/read-write-and-display-a-video-using-opencv-cpp-python/

Hi Mathis, thank you very much for your advice. I opened another testing video and checked frame number with openCV. My original video (.avi) has 6543 frames but the labeled video(.mp4) has 6532 frames. The .csv file also has only 6532 values. Is this caused by the change of video format? I have a separated TTL file to track onset of the light and this file has 6543 frames. How can I combine values in .csv file into original 6543 frames?

The inference (to get you X,Y points) should be frame-by-frame on your “original” video (the tracked, re-encoded video is visualization). When you run deeplabcut.analyze_videos, what number of frames is identified? When you make this call, DLC should print Overall # of frames: xx found with (before cropping) frame dimensions... is the number it prints consistent with your original video length or the length of analyzed data?

After running deeplabcut.analyze_videos, I got this:

Config:
{‘all_joints’: [[0], [1], [2], [3], [4], [5], [6]],

  • ‘all_joints_names’: [‘FLpaw’,*
  •                  'FRpaw',*
    
  •                  'HLpaw',*
    
  •                  'HRpaw',*
    
  •                  'tail1',*
    
  •                  'tail2',*
    
  •                  'tail3'],*
    
  • ‘batch_size’: 8,*
  • ‘bottomheight’: 400,*
  • ‘crop’: True,*
  • ‘crop_pad’: 0,*
  • ‘cropratio’: 0.4,*
  • ‘dataset’: ‘training-datasets\iteration-0\UnaugmentedDataSet_m-sAug16\m-s_Helen95shuffle1.mat’,*
  • ‘dataset_type’: ‘default’,*
  • ‘deterministic’: False,*
  • ‘display_iters’: 1000,*
  • ‘fg_fraction’: 0.25,*
  • ‘global_scale’: 0.8,*
  • ‘init_weights’: ‘C:\Users\Helen\Anaconda3\envs\dlc-windowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\models\pretrained\resnet_v1_50.ckpt’,*
  • ‘intermediate_supervision’: False,*
  • ‘intermediate_supervision_layer’: 12,*
  • ‘leftwidth’: 400,*
  • ‘location_refinement’: True,*
  • ‘locref_huber_loss’: True,*
  • ‘locref_loss_weight’: 0.05,*
  • ‘locref_stdev’: 7.2801,*
  • ‘log_dir’: ‘log’,*
  • ‘max_input_size’: 1500,*
  • ‘mean_pixel’: [123.68, 116.779, 103.939],*
  • ‘metadataset’: ‘training-datasets\iteration-0\UnaugmentedDataSet_m-sAug16\Documentation_data-m-s_95shuffle1.pickle’,*
  • ‘min_input_size’: 64,*
  • ‘minsize’: 100,*
  • ‘mirror’: False,*
  • ‘multi_step’: [[0.005, 10000],*
  •            [0.02, 430000],*
    
  •            [0.002, 730000],*
    
  •            [0.001, 1030000]],*
    
  • ‘net_type’: ‘resnet_50’,*
  • ‘num_joints’: 7,*
  • ‘num_outputs’: 1,*
  • ‘optimizer’: ‘sgd’,*
  • ‘pos_dist_thresh’: 17,*
  • ‘project_path’: ‘C:\Users\Helen\.spyder-py3\m-s-Helen-2019-08-16’,*
  • ‘regularize’: False,*
  • ‘rightwidth’: 400,*
  • ‘save_iters’: 50000,*
  • ‘scale_jitter_lo’: 0.5,*
  • ‘scale_jitter_up’: 1.25,*
  • ‘scoremap_dir’: ‘test’,*
  • ‘shuffle’: True,*
  • ‘snapshot_prefix’: ‘C:\Users\Helen\.spyder-py3\m-s-Helen-2019-08-16\dlc-models\iteration-0\m-sAug16-trainset95shuffle1\test\snapshot’,*
  • ‘stride’: 8.0,*
  • ‘topheight’: 400,*
  • ‘weigh_negatives’: False,*
  • ‘weigh_only_present_joints’: False,*
  • ‘weigh_part_predictions’: False,*
  • ‘weight_decay’: 0.0001}*
    Using snapshot-300000 for model C:\Users\Helen.spyder-py3\m-s-Helen-2019-08-16\dlc-models\iteration-0\m-sAug16-trainset95shuffle1
    num_outputs = 1
    INFO:tensorflow:Restoring parameters from C:\Users\Helen.spyder-py3\m-s-Helen-2019-08-16\dlc-models\iteration-0\m-sAug16-trainset95shuffle1\train\snapshot-300000
    INFO:tensorflow:Restoring parameters from C:\Users\Helen.spyder-py3\m-s-Helen-2019-08-16\dlc-models\iteration-0\m-sAug16-trainset95shuffle1\train\snapshot-300000
    Starting to analyze % D:\DeepLabCut-D\training2\Result\nos1Gz3-an4-T2.avi
    Loading D:\DeepLabCut-D\training2\Result\nos1Gz3-an4-T2.avi
    Duration of video [s]: 65.43 , recorded with 100.0 fps!
    Overall # of frames: 6543 found with (before cropping) frame dimensions: 640 338
    Starting to extract posture
    6565it [09:06, 11.73it/s] Detected frames: 6532

Saving results in D:\DeepLabCut-D\training2\Result…
Saving csv poses!
*The videos are analyzed. Now your research can truly start! *

  • You can create labeled videos with ‘create_labeled_video’.*
    If the tracking is not satisfactory for some videos, consider expanding the training set. You can use the function ‘extract_outlier_frames’ to extract any outlier frames!

Even though the ‘Overall # of frames’ is 6543 (same as my original video), the number of detected frame is only 6532. Also, in my .csv file, the frame number is only 6532.

How can I fix this problem? Thanks!

I think the first number, “Overall # of frames” is determined by OpenCV by looking at the video metadata (either if there’s a number of frames entry, but if not, just by multiplying the frame rate by the recording time). The second number, “Detected frames,” is determined by actually decoding each frame individually. A few more details on this Stack Overflow exchange (look especially at the solution and following responses).

There are (at least) two reasons why the numbers might not match: one is that some frames of the original video are corrupted and can’t be decoded. The other is that the original videos don’t actually have the number of frames the metadata suggest. If the first is the case, perhaps try re-encoding your original video before analysis, although I’m not sure why that would help. I think the second option might be more likely; how confident are you in the frame rate actually being exactly 100.0 fps? Many times actual frame rates from cameras are 99.9% of the “advertised” rate (see this explanation), although this doesn’t quite get you from 6543 to 6532…

2 Likes