Python OpenCV: Splitting video into frames

Introduction

In this tutorial we will learn how to split a video into frames, using Python and OpenCV. We will save each frame in a folder in our file system.

After learning how to split a video into individual frames, we will also check how to apply some processing over each frame. For illustration purposes, we are going to try to identify hands in each frame and, in case they are found, estimate their landmarks, like covered on this previous tutorial, which makes use of MediaPipe for this task.

This tutorial was tested on Windows 8.1, with version 4.1.2 of OpenCV and version 0.8.3.1 of MediaPipe.

Splitting video into frames

As usual, we start by importing the cv2 module.

import cv2

Then we will create an object of class VideoCapture, which will allow us to obtain frames from a video. As input, we will pass a string containing the path to the video, in our file system.

In my case, I’ll use a .avi file I’ve previously recorded from my webcam using OpenCV, following the procedure described on this previous tutorial. Naturally, you can use any other video you want to test.

capture = cv2.VideoCapture('C:/Users/N/Desktop/video.avi')

We will also define a variable that will track the number of the current frame we are processing, starting at zero.

frameNr = 0

Then we will start reading frames in an infinite loop that will break when there are no more frames to read.

while (True):
    # process frames

To grab a frame from the video, we call the read method on our VideoCapture object. This method takes no arguments and returns a tuple. The first value of the tuple is a Boolean indicating if we could read the frame or not and the second value is the actual frame.

success, frame = capture.read()

In case of success, we will write the frame to the file system. The process to do it is explained in detail here. In short, we simply need to call the imwrite function from the cv2 module, passing as first input a string with the file name to be saved, and as second the frame.

Since we have multiple frames to save, we will use a name pattern like the following:

frame_[frameNumber]

The code to save the file is shown below. Note that the strings include the path to the folder where we want to save the frames, in our file system. You should adapt the code to point to a folder in your computer.

cv2.imwrite(f'C:/Users/N/Desktop/output/frame_{frameNr}.jpg', frame)

At the end of the loop, we will increment the frame number.

frameNr = frameNr+1

The complete code is shown in the snippet below. As can be seen, after the loop breaks, we are releasing the VideoCapture object.

import cv2

capture = cv2.VideoCapture('C:/Users/N/Desktop/video.avi')

frameNr = 0

while (True):

    success, frame = capture.read()

    if success:
        cv2.imwrite(f'C:/Users/N/Desktop/output/frame_{frameNr}.jpg', frame)

    else:
        break

    frameNr = frameNr+1

capture.release()

As usual, to test the code, simply run it using any tool of your choice. I’ll be using PyCharm, a Python IDE. After the execution is finished, look for the folder that you specified as destination of your frames. You should get a result similar to figure 1.

Splitting video into frames with OpenCV.
Figure 1- Output of the program.

Identifying hands in frames

Now that we have learned the basic operation of splitting the video into frames and saving them in a folder, we will check how to apply some processing to each frame. In particular, and like already mentioned in the introductory section, we are going to perform hands detection and landmarks estimation over each frame.

We will start by the module imports. We will need the cv2 module, for the operations we have already seen in the previous section, and the mediapipe module, to be able to detect hands in the frames and perform landmarks estimation.

For convenience, we will also access the hands and the drawing_utils submodules from MediaPipe, which expose the functions to do the landmarks estimation and draw them in an image.

drawingModule = mediapipe.solutions.drawing_utils
handsModule = mediapipe.solutions.hands

After this we will create a VideoCapture object, passing as input the path to the file we want to process. We will also define a variable to hold the number of the frame, like we have done before.

capture = cv2.VideoCapture('C:/Users/N/Desktop/video.avi')

frameNr = 0

After that we will create an object of class Hands, which will allow us to perform the hand tracking and landmarks estimation on the frames. For this test, we stay with the default values, meaning that we won’t pass any parameter to the constructor.

with handsModule.Hands() as hands:
    # Read and process frames

Then, like on the previous section, we will start reading the frames in a loop.

while (True):

    success, frame = capture.read()

    if not success:
        break

    # process the frame

After reading the current frame, we will perform the hands detection and, in case any is found, draw the corresponding landmarks.

results = hands.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

if results.multi_hand_landmarks != None:
    for handLandmarks in results.multi_hand_landmarks:
        drawingModule.draw_landmarks(frame, handLandmarks, handsModule.HAND_CONNECTIONS)

Then, we will write the frame to the file system and increment the frame number.

cv2.imwrite(f'C:/Users/N/Desktop/output/frame_{frameNr}.jpg', frame)

frameNr = frameNr+1

The complete code can be seen below and it already includes releasing the VideoCapture object at the end.

import cv2
import mediapipe

drawingModule = mediapipe.solutions.drawing_utils
handsModule = mediapipe.solutions.hands

capture = cv2.VideoCapture('C:/Users/N/Desktop/video.avi')

frameNr = 0

with handsModule.Hands() as hands:

    while (True):

        success, frame = capture.read()

        if not success:
            break

        results = hands.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

        if results.multi_hand_landmarks != None:
            for handLandmarks in results.multi_hand_landmarks:
                drawingModule.draw_landmarks(frame, handLandmarks, handsModule.HAND_CONNECTIONS)

        cv2.imwrite(f'C:/Users/N/Desktop/output/frame_{frameNr}.jpg', frame)

        frameNr = frameNr+1

capture.release()

Once again, to test the code, simply run it and check the destination folder. Figure 2 shows the new result. As can be seen, this time, the hands detected on the image are showing the estimated landmarks, as expected.

Target folder with the video frames, showing the detected hands.
Figure 2 – Target folder with the frames, showing the detected hands.

Suggested Python readings

Leave a Reply