Video object detection without GPU with post-processing with FFMPEG and DarkNet YoloV2
Without a good GPU Card is impossible to detect on high speed aka real time, the content of each frame. With post processing is this thing achievable, and even so, it takes a big amout of time until the output is generated. For a short film, 2 minutes long, could take few hours with YoloV3 from DarkNet.
As improvment for speed I chose YoloV2 because YoloV2 does not have so much convolutional layers in configuration, so it would go through process 40 times instead of 100 times. Is not better than YoloV3, but it is fasrter. Another trick is to scale down the images, from 1920x1080 or 1280x720 to 640x360 or 320x180. YOLO is configurable, so you can set the number of convolutional layers to be smaller too.
Here is a small example of output, picture in picture, processed with FFMPEG and DarkNet YoloV2:
https://www.youtube.com/watch?v=D2AHMIupEro&t=24s
Here is a comparison video between YOLO v2 vs YOLO v3 vs Mask RCNN vs Deeplab Xception
https://www.youtube.com/watch?v=s8Ui_kV9dhw
How it works:
- video must be extracted into images
- yolov2 object detection runs through every image
- images resulted from output detection must be merged again into a single file video
Here is the Bash Code
# create in yolo git folder a new one
mkdir jpg2
# extract images from video into folder, down scaling video from 1280x720 (1920x1080)
# start video from secind 26, run first 33 seconds of video
ffmpeg -ss 00:00:26.000 -i myvid.mp4 -s 640x360 -t 00:00:33 -r 6 jpg2/vid_%04d.jpg
# detect object in every image
# for f in jpg2/*.jpg; do echo "$f"; done
# run yolo detection on each image with yolo3 and weights for yolov3
for f in jpg2/*.jpg; do ./darknet detect cfg/yolov3.cfg yolov3.weights "$f" -out "$f"_out; done
# version with yolo2 and weights for yolov2
# for f in jpg4/*.jpg; do ./darknet detect cfg/yolov2.cfg yolov2.weights "$f" -thresh 0.1 -out "$f"_out; done
# tiny yolo for video 320x180 format - but nothing detected :/
# for f in jpg2/*.jpg; do ./darknet detect cfg/yolov1-tiny.cfg yolo-tiny.weights "$f" -out "$f"_out; done
# rebuild merge images into video
#ffmpeg -framerate 1 -pattern_type glob -i '*.jpg' -c:v libx264 -r 30 -pix_fmt yuv420p out.mp4
ffmpeg -framerate 1 -r 10 -pattern_type glob -i '*.png' -c:v libx264 -r 30 -pix_fmt yuv420p out4.mp4
As gist file:
https://gist.github.com/maranemil/1b52b6e1566d273002f357769f646579
Resources
https://pjreddie.com/darknet/yolo/
https://pjreddie.com/media/files/papers/YOLOv3.pdf
https://ffmpeg.org