Python MediaPipe: track finger regarding region of interest

Introduction

In this tutorial we will learn how to check if the index finger of a hand detected in an image stream is inside a given area. We will be using Python, OpenCV and MediaPipe.

We will define our area of interest as a circle with a radius of 40 pixels, positioned at the center of the frame and change its color depending if the index finger tip is inside or outside that circle.

We are also going to make use of the SciPy module for this tutorial, in order to calculate the Euclidian distance between the tip of the finger and the center of the area of interest. You can check here the installation instructions.

This tutorial was tested on Windows 8.1, with version 4.1.2 of OpenCV and version 0.8.3.1 of MediaPipe. The Python version used was 3.7.2.

The code

We will start our code by the module imports. We will need the following modules:

  • cv2: allows us to grab frames from a camera and draw content on those frames.
  • mediapipe: exposes the functions we need to detect hands in an image and estimate the landmarks.
  • scipy.spatial: SciPy module that contains spatial algorithms. We will make use of it to perform Euclidian distance calculations.
import cv2
import mediapipe
import scipy.spatial

Then, for convenience, we will access some sub-modules of the modules we imported

drawingModule = mediapipe.solutions.drawing_utils
handsModule = mediapipe.solutions.hands
distanceModule = scipy.spatial.distance

After this we will create a VideoCapture object, so we can access the frames of a camera attached to the computer. As input, we need to pass the index of the camera we want to use. In my case, since I only have one camera, I’ll pass the value 0.

capture = cv2.VideoCapture(0)

Now that we have access to the camera, we will get the width and the height of the frames it returns. We do this with a call to the get method on our VideoCapture object, passing as input the constant that identifies the property we want to obtain.

frameWidth = capture.get(cv2.CAP_PROP_FRAME_WIDTH)
frameHeight = capture.get(cv2.CAP_PROP_FRAME_HEIGHT)

Like mentioned in the introductory section, our region of interest will be a circle positioned in the center of the frame. So, we will use the previously obtained width and height of the frames of the camera and calculate the x and y coordinates of the center. We will save these coordinates in a tuple.

circleCenter = (round(frameWidth/2), round(frameHeight/2))

We will also define a variable containing the radius of the circle.

circleRadius = 40

Now we will create an object of class Hands. We will use this obejct to perform the hand tracking and landmarks estimation on the frames obtained from the camera. I’m assuming that we want to track a single hand.

with handsModule.Hands(static_image_mode=False, min_detection_confidence=0.7, min_tracking_confidence=0.7, max_num_hands=1) as hands:
     # process each frame in infinite loop, until the user finishes

In an infinite loop, we are going to grab each frame of the camera and process it, until the user clicks the ESC key. Note that we are going to flip each frame horizontally (over the y axis), so our movements are mirrored in the image and thus feel more natural. You can read more about image flipping with OpenCV on this previous post.

with handsModule.Hands(static_image_mode=False, min_detection_confidence=0.7, min_tracking_confidence=0.7, max_num_hands=1) as hands:

    while (True):

        ret, frame = capture.read()

        # check if frame grabbed successfully
        if ret == False:
            continue

        frame = cv2.flip(frame, 1)

        # perform hand detection and process result

        if cv2.waitKey(1) == 27:
            break

Now we are going to perform the hand landmark estimation with a call to the process method on our Hands object. Like covered on the introductory tutorial of MediaPipe, don’t forget that this method receives an image in RGB format but OpenCV stores images in BGR, meaning that we need to perform the conversion before.

results = hands.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

Before we start processing the landmarks detected, we will just define a variable that will hold the color of the circle (a tuple with the BGR values), which we will initialize with the color black. Note that if we don’t detect hands in the image, we will leave the circle color to the default value of black.

circleColor = (0, 0, 0)

Don’t forget also to first check if any hand was detected.

if results.multi_hand_landmarks != None:
    # process detection

If a hand was detected, we will then access the normalized coordinated of the tip of the index finger and then convert these to pixel coordinates. Note that we are directly going to access the element zero of the list of hands detected because we have set before that we want to detect, at most, one hand.

normalizedLandmark = results.multi_hand_landmarks[0].landmark[handsModule.HandLandmark.INDEX_FINGER_TIP]
pixelCoordinatesLandmark = drawingModule._normalized_to_pixel_coordinates(normalizedLandmark.x, normalizedLandmark.y, frameWidth, frameHeight)

To facilitate testing, we are going to draw a small circle on the landmark of the index finger tip. The landmark is just a single point and thus visualizing it makes it easier. In a real application scenario, we could be more sophisticated and consider, for example, a circular area around the landmark (covering the whole tip of the finger) and detect when both circles intersect. Nonetheless, we will keep this tutorial simple.

cv2.circle(frame, pixelCoordinatesLandmark, 2, (255,0,0), -1)

Then we are going to calculate the Euclidean distance between the index finger tip landmark and the center of the circle. If the result is lesser than the circle radius, it means that the finger tip is inside the circle (we will set its color to green). Otherwise, it means it is outside, so we are going to set its color to red.

if distanceModule.euclidean(pixelCoordinatesLandmark, circleCenter) < circleRadius:
      circleColor = (0,255,0)

 else:
      circleColor = (0,0,255)

Finally, we are going to draw the circle at the center of the frame and display it on a window.

cv2.circle(frame, circleCenter, circleRadius, circleColor, -1)
cv2.imshow('Test image', frame)

The whole code can be seen below. We also added the calls to free the VideoCapture and destroy the window, after the frame grabbing loop is broken by the user.

import cv2
import mediapipe
import scipy.spatial

drawingModule = mediapipe.solutions.drawing_utils
handsModule = mediapipe.solutions.hands
distanceModule = scipy.spatial.distance

capture = cv2.VideoCapture(0)

frameWidth = capture.get(cv2.CAP_PROP_FRAME_WIDTH)
frameHeight = capture.get(cv2.CAP_PROP_FRAME_HEIGHT)

circleCenter = (round(frameWidth/2), round(frameHeight/2))
circleRadius = 40

with handsModule.Hands(static_image_mode=False, min_detection_confidence=0.7, min_tracking_confidence=0.7, max_num_hands=1) as hands:

    while (True):

        ret, frame = capture.read()

        if ret == False:
            continue

        frame = cv2.flip(frame, 1)

        results = hands.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        circleColor = (0, 0, 0)

        if results.multi_hand_landmarks != None:

            normalizedLandmark = results.multi_hand_landmarks[0].landmark[handsModule.HandLandmark.INDEX_FINGER_TIP]
            pixelCoordinatesLandmark = drawingModule._normalized_to_pixel_coordinates(normalizedLandmark.x,
                                                                                      normalizedLandmark.y,
                                                                                      frameWidth,
                                                                                      frameHeight)

            cv2.circle(frame, pixelCoordinatesLandmark, 2, (255,0,0), -1)

            if distanceModule.euclidean(pixelCoordinatesLandmark, circleCenter) < circleRadius:
                circleColor = (0,255,0)

            else:
                circleColor = (0,0,255)

        cv2.circle(frame, circleCenter, circleRadius, circleColor, -1)

        cv2.imshow('Test image', frame)

        if cv2.waitKey(1) == 27:
            break

cv2.destroyAllWindows()
capture.release()

Testing the code

To test the code, simply run it using a tool of your choice. I’ll be using PyCharm, a Python IDE. You should get a result similar to the one shown in figure 1 below. As can be seen, the circle starts black, becomes red when a hand is detected and the index finger is outside, and turns green if the index finger is inside.

Detecting when index finger enters region of interest, with MediaPipe.
Figure 1 – Detecting when index finger enters region of interest.

Suggested Readings

Leave a Reply

%d bloggers like this: