Delta frames in h264/5 compression

I have an issue where frames from my videos are sparse representations of the video data (delta frames) rather than full frames due to compression.

My videos are compressed on a Hikvision DVR using h264+ or h265+ or on a Raspberry Pi with h264.

These videos look completely normal on a video player such as VLC (due to the way they are decoded there), but they look weird as individual frames extracted for DLC because they are the raw delta frames, not the full interpolation from the key frame.

If possible, I would like to know how get each frame to look “normal” without having to transcode the videos to another format.

Examples :

Key Frame

Delta Frame

Very Sparse Delta Frame

“Ghost Rat” w/ cable

All extracted with DLC’s interface.

Interesting issue !! I also happen to use all h264 compressed videos for DLC, and have never had this issue. Can you tell me more; i.e. what operating system are you using? which DLC version?

== OS ==

$ uname -a

Linux 5.8.0-44-generic #50~20.04.1-Ubuntu SMP
Wed Feb 10 21:07:30 UTC 2021
x86_64 x86_64 x86_64 GNU/Linux

Ubuntu is fully up to date.

== DeepLabCut ==

I installed and run the DLC GUI with the following commands :

$ conda create -n dlc python=3.7
$ conda activate dlc
$ pip install tensorfl ow-GPU==1.15
$ pip install deeplabcut
$ conda install -c anaconda wxpython
$ conda activate dlc
$ python3 -m deeplabcut
Starting GUI…
$ python3
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import deeplabcut
>>> deeplabcut.version

== Training ==

I use the docker instance because installing everything by hand (as the docs warn) was taking too long. This is probably irrelevant to this issue anyway.

== Video Recording & Encoding ==

Hikvision Embedded Net DVR DS-7204HQH1-K1

This device has something on it called H.264/H.265/H.264+/H.265+ encoding available

I am not 100% certain which of those it is using, nor am I certain how to determine this. The interface to the device is not especially specific, but I did attempt to enable the “plus” versions due to their reportedly superior performance for the same bitrate. I think the “plus” versions are some sort of customized and proprietary version that they provide which they claim conform to the standards in a product sheet for that specific encoding technology.

Corporate blurb & whitepaper : H.265+ Hikvision Core Technology

I also use a Raspberry Pi 4 Model B 2019 Quad Core 64 Bit WiFi Bluetooth (4GB), which offers H.264 videos. I am not sure if they also have this problem. These videos behave a little bit strangely in terms of compatibility with available media players.

If I run the Linux program “mediainfo” on the Hikvision DVR files, I obtain the following output :

Format : MPEG-PS
File size : 84.0 MiB
Duration : 43 min 15 s
Overall bit rate : 271 kb/s
FileExtension_Invalid : mpeg mpg m2p vob vro pss evo

ID : 224 (0xE0)
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L3@Main
Duration : 43 min 15 s
Bit rate : 266 kb/s
Width : 960 pixels
Height : 576 pixels
Display aspect ratio : 5:3
Frame rate : 50.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Bits/(Pixel*Frame) : 0.010
Stream size : 82.3 MiB (98%)
Color range : Full
Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709

ID : 189 (0xBD)
Format : RLE
Format/Info : Run-length encoding
Duration : 43 min 14 s

== ffmpeg ==

I don’t know if this matters.

$ ffmpeg
ffmpeg version 4.2.4-1ubuntu0.1 Copyright © 2000-2020 the FFmpeg developers
built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]… {[outfile options] outfile}…

Let me know if I can provide some further details about the setup. The whole system was setup on a clean installation on December of 2020, so things are generally up to date.

Hi @neuralcoding, I am afraid that properly decoding delta frames is not doable with opencv (which is what we use under the hood to read video frames) :confused: Another approach would have been to flag I and P frames, but opencv doesn’t give access to these metadata either. You could extract the ‘normal’ frames manually using our GUI, but I believe that re-encoding the video is just simpler.

Did this also happen when using regular H.265 or 264? OpenCV supports those codecs, I don’t suppose it has support for the “+” versions that seem to be used exclusively by hikvision.

The superiority of the “+” version might be true for surveillance where you have mostly the same image in every frame and can just dump almost empty intraframes everywhere. Also secuirty systems are very specific in how they operate, using high res/low framerate so the codecs may be tuned specifically for this use case. When you introduce a higher fps (for security systems even 30 fps is super high, they usually operate around 10, some even as low as 5) and constant movement, it may not work as intended I guess.

I wondered if it was something to do with opencv.

Does anyone know how VLC performs this decompression? Do they use a different library?

I’m curious because the video appears normal in VLC, so it seems like it isn’t being tripped up by the plus version, although it’s not the best way to confirm.

It could be a function of several things.

I have a few videos from another postdoc, which use I think the default H.264 codec and less than half the bitrate with fairly low illumination. DLC struggles on these videos because they are so dark and grainy.

My lab didn’t like the DVR system anyway because the videos were so dark and grainy, so I embarked on a process of figuring out how we could improve the video quality, adding two different cameras and some sources of illumination as well as tweaking the settings on the devices.

(The Wyze cameras are OK for monitoring, but not so great for DLC due to blurring during fast motion.)

(The raspberry pi camera is great, but it’s difficult to set up and would be expensive to put in 12 boxes.)

Over about a week on the DVR, I :

(1) added diffuse IR illumination
(2) tweaked some image processing options
(3) boosted the bitrate substantially
(4) switched to the H.265+ codec
(5) modified the cam perspective to remove an annoying obfuscating object

I believe that it started doing this without the “plus” codec, but I am not sure. This means the behavior of the codec they use in general might depend on things like bitrate and dynamic range (overall illumination) also.

The information and control over which specific codec is being used isn’t very specific as far as I can tell, but perhaps I don’t know how to spot the difference in the metadata between the codec versions.

This DVR seems to be designed with a large amount of flexibility, and it also seems to be intended for use with a large number of cameras, as it supports IP and analog cameras simultaneously. The interface allows for selection of FPS, and this one is currently set to “camera max” or something similar to that, which it seems to interpret as “50 fps”. I am not certain how to check if it really is 50 FPS or not, but it does take what appear to be substantially clearer looking frames compared to the Wyze Cam @ 10 or 15 FPS, which does not take clear frames when the animal moves rapidly.

The cameras are some sort of thimble sized Foxeer brand model, but I can’t figure out exactly which one they are to learn more about its specs.

In any event, I tried feeding DLC these weird looking frames to see what it would do, as it almost looks like the compression algorithm sort of preselects a set of features for the network to learn while removing background noise. DLC did a fairly decent job of detecting the animal and pose without a lot of special effort on my end. (I basically ignored the very low contrast delta frames, but labeled most relatively good looking ghost rat frames.)

The only weird thing it did was something I have seen reported elsewhere on the forum. During analysis and video generation, it seems to have only processed a little over 50% of the frames. This still makes an OK looking data set and useable telemetry data series, but it is a little weird.

I have not yet tried this with the raspberry pi camera, which was often recording at 80 or 110 fps next to this camera because the files are huge, even with the h264 compression that the pi defaults to. (This is like 3+ GB for 30 minutes on the pi vs around 80 MB on the DVR.) The other potential problem is that the pi is heavily saturated by the indicator LEDs on the headstage of the wired rats, and DLC seems to sometimes find this a bit confusing, because there are other lights and panels in the chamber that come on during the behavior. Finally, I run the Pi FPS so high to keep the frames clear and not blurred during any fast motion, but I don’t expect that I would actually need that level of temporal granularity. However, I have not found a way to perform the equivalent of a downsample function on the video in DLC. (My concern here is that sometimes the rat is on for an hour, which is a lot of frames @ 80 or 110.)

I wanted to try the DVR videos first because it would be easier to distribute to my lab as a solution, but maybe I’ll just have to mess around with this some more.

I should also try grabbing frames in Matlab and see what it does with the DVR file. (It might refuse entirely.)

Correction : The video looks really weird in VLC and appears to effectively run at 25 FPS.

If your resolution on raspberry pi camera is above 720p, you could use h.265 to compress the videos more (probably like 5 times more tan h.264 when using slow preset, although my go to is crf 18 and veryfast preset, depends what’s most important to you). If your scene is bright enough, and it seems to be, you can go for higher shutter speed (if it’s possible with the camera settings) instead of higher fps, this will help with file size while preventing video from becoming blurry.

The r.pi allows me to take 960x600 @ 80 FPS, so I have been accepting that, and scheming to downsample the frames.

I haven’t figured out how to get the raspivid function to operate at a lower framerate with a fast shutter speed.

I have a sufficient amount of IR lighting that I can add more photons if necessary. (Hopefully the rat doesn’t feel too warm. At least the boxes have fans.)

I did try writing some Python code to capture at a lower framerate with a short shutter speed, but I determined that the pause function is not the right way to go about doing that for short intervals. I haven’t gotten back to figuring out the right call to use, but that idea had occurred to me.

I am also factoring in the idea that it won’t just be me using this, so I’m hoping to keep the degree of technical wizardry required to do this as low as possible (i.e. good functions, push button hardware and good documentation.)

I have compressed the videos from this with H.265, but it’s so slow on the r.pi that I have done it on a multicore Linux machine instead. (Another task is figuring out how to get the CUDA enabled codecs to work correctly on the GPU…)

(I end up installing and removing the r.pi every time I record with it, which is a bit tedious.)

Do you use ffmpeg to transode on GPU, or the Nvidia provided API? I have no experience with the API, but used ffmpeg to encode videos using h.264_nvenc during recording (should be easy enough to transform the command into transcoding existing video)

My guess would be that VLC uses some very low level ffmpeg/libavcodec operations, unlike opencv.
As for the video analysis stopping early, this is characteristic of a frame that opencv cannot decode; we’ve fixed that behavior recently though, so that from next release on analysis will carry on Avoid premature video analysis stopping in the case of a corrupted frame by jeylau · Pull Request #1105 · DeepLabCut/DeepLabCut · GitHub

I haven’t been able to install the GPU enabled libraries with ffmpeg successfully yet. It runs OK on the multicore processor, but it would certainly be nice to use the idle GPU.

(I forget at which stage exactly I quit trying to install that. I should go look in my notes and figure out what was wrong with it. It might have to do with an attempt to natively install DLC so I didn’t have to run the GUI in the OS … GUI and the GPU enabled learning in the Docker instance. I messed around with different combinations of versions of the software stack and manually installed things, even building TensorFlow with Bazel manually at one point before I became impatient and moved on to the Docker build. I’m balancing a lot of things at once, so sometimes I just have to move on if something is working well enough.)

I believe I disabled the “+” version of the implementation on the DVR, but it doesn’t seem to completely fix the delta frame issue or the weird decoding on VLC. (It appears to reduce the number of delta frames.) Changing the settings on the DVR alters the available menu options, which is why I think I succeeded in changing the settings. The files don’t have an obvious difference to indicate “+” or regular. (According to VLC.) Both the plus and regular files are H.265.

It seems that Totem is the default Ubuntu player. (The unhelpfully named “Videos” player.)

I also looked at installed decoders and discovered that I have a libx265 and a libde265. I suspect that VLC uses one and that Totem uses the other, and that this perhaps accounts for the differences in how the streams look when they are decoded. (Maybe opencv uses the implementation that doesn’t work so well also, but I’m not sure how to check that out yet.)

Meanwhile, I’m pestering the Hikvision tech support people to see if they can explain anything. I forgot about this fact, but they provide a player with every exported video file, so maybe they have done some weird things that standard decoders don’t like, although they claim compliance with the standard in the white paper.

Have you thought about just using custom ffmpeg command to record with the cameras instead of using the Hikvision software? They probably specify allowed number of delta frames in the encoding to lower the file size and I assume there is no choice in their soft to change this. Seem like the easiest solution is to just not use their software

The Hikvision device is a closed proprietary system that does not offer a command line interface anywhere, so it isn’t possible to do this, as far as I am aware.

My current solution is to re-encode the videos using ffmpeg.

The hikvision DVR still encodes to h.265+ at around 2.3 Mbps.

To make this less annoying, I built myself a script that automates the process and can be run with cron without me needing to think about it.

== CPU version ==

ffmpeg -i /<inputpath>/<filename> -c:a aac -c:v libx265 -crf 18 /<outputpath>/<filename>.mp4

== GPU version ==

ffmpeg -hwaccel cuvid -i /<inputpath>/<filename> -c:a aac -vcodec hevc_nvenc -pix_fmt p010le -preset slow -rc vbr_hq -b:v 1M -maxrate:v 2M /<outputpath>/<filename>.mp4

The CUDA enabled encoder blazes through the computations very fast without burdening either the CPU or the GPU heavily.

Edit : I would consider this “closed”.

1 Like