Python MediaPipe: Face Landmarks estimation

Introduction

In this tutorial we will learn how to use MediaPipe and Python to perform face landmarks estimation. To achieve this result, we will use the Face Mesh solution from MediaPipe, which estimates 468 face landmarks. For comparison, the solution we have analyzed on this previous tutorial, using dlib, estimates only 68 landmarks.

If you are interested in learning more about the face landmark model from MediaPipe, you can consult the paper here.

This tutorial was tested on Windows 8.1, with version 4.1.2 of OpenCV and version 0.8.3.1 of MediaPipe. The Python version used was 3.7.2.

The code for the face landmarks estimation

We will start by taking care of the module imports. We will need the cv2 module, to be able to read and display the image, and the mediapipe module, which exposes the functionality we need to perform the face landmarks estimation.

import cv2
import mediapipe

After this, we will access two sub-modules from mediapipe: the drawing_utils and the face_mesh. This is a convenience to avoid using the full path every time we want to access a function from one of these sub-modules.

The drawing_utils sub-module exposes a function that allows to draw the detected face landmarks over the image. The face_mesh sub-module exposes the function necessary to do the face detection and landmarks estimation.

drawingModule = mediapipe.solutions.drawing_utils
faceModule = mediapipe.solutions.face_mesh

After this we will create two objects of class DrawingSpec. These will allow us to customize how MediaPipe draws the detected face landmarks on the image. The constructor of this class receives the following arguments:

  • thickness: Thickness for drawing the annotation. Defaults to 2 pixels.
  • circle_radius: Circle radius for drawing the landmarks. Defaults to 2 pixels.
  • color: Color for drawing the annotation. Defaults to green.

The auxiliary function we will use to draw the detected landmarks will draw them as small circles and also connect some of the landmarks with lines. As such, the first DrawingSpec we will instantiate will be used to customize the circles representing the landmarks, and the second to customize the lines that connect them.

circleDrawingSpec = drawingModule.DrawingSpec(thickness=1, circle_radius=1, color=(0,255,0))
lineDrawingSpec = drawingModule.DrawingSpec(thickness=1, color=(0,255,0))

After this we are going to create an object of class FaceMesh. This object will allow us to process the image and perform the face landmark estimation. The constructor of this class supports the following parameters:

  • static_image_mode: Boolean indicating if the images it processes should be treated as unrelated images (True) or as a video stream (False). In case it is set to True, it means that face detection should run on every input image [1]. If set to False, it will try to detect faces in the first input images and upon a successful detection it simply tracks those landmarks without invoking another detection until it loses track of any of the faces [1]. Defaults to False.
  • max_num_faces: Maximum number of faces to be detected in the image. Defaults to 1.
  • min_detection_confidence. Minimum confidence value (in an interval between 0.0 and 1.0) from the face detection model for the detection to be considered successful [1]. Defaults to 0.5,
  • min_tracking_confidence: Minimum confidence value (in an interval from 0.0 to 1.0) from the landmark-tracking model for the face landmarks to be considered tracked successfully, or otherwise face detection will be invoked automatically on the next input image [1] . Passing a higher value can increase robustness at the cost of higher latency. This parameter is ignored if the static_image_mode parameter is set to True [1]. Defaults to 0.5.

For our use case we will only set static_image_mode to True (we are going to work with a static image) and leave all the rest of the constructor arguments with their default values.

We are going to wrap the creation of this object on a with statement. This ensures the resources are freed after we no longer need the object. You can check the implementation of the __enter__ and __exit__ functions on the parent class of the FaceMesh class, which is called SolutionBase.

with faceModule.FaceMesh(static_image_mode=True) as face:
    # Face landmarks estimation

Moving on, inside the with block, we will take care of reading an image using the imread function from OpenCV. As input, we need to pass a string pointing to the file in our file system.

image = cv2.imread("C:/Users/N/Desktop/Test.jpg")

After we obtain the image, we perform the landmark estimation by calling the process method on our FaceMesh object. As input we need to pass the image in RGB format. Since OpenCV reads images in BGR format, we need to perform the conversion of the image we just read, before passing it as input of the process method.

As output, this method will return a NamedTuple with a field called multi_face_landmarks that contains the face landmarks on each detected face. We will store the result in a variable.

results = face.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

We will iterate through each element of the multi_face_landmarks field. Each element contains the landmarks for that face detection. In our case, we expect just one detection since we kept the defaults when creating the FaceMesh object, which corresponds to a maximum of 1 face detection.

To draw the landmarks, we simply call the draw_landmarks function of the drawing_utils module, passing the following parameters:

  • Image where to draw the landmarks.
  • Detected landmarks.
  • A frozenset available in the face_mesh module (here) that specifies how landmarks connect with each other.
  • A drawing spec specifying how to draw the landmarks.
  • A drawing spec specifying how to draw the connections between the landmarks.
if results.multi_face_landmarks != None:
    for faceLandmarks in results.multi_face_landmarks:
        drawingModule.draw_landmarks(image, faceLandmarks, faceModule.FACE_CONNECTIONS, circleDrawingSpec, lineDrawingSpec)

After this we will display the image and wait for the user to press a key, to finish the program.

cv2.imshow('Test image', image)

cv2.waitKey(0)
cv2.destroyAllWindows()

The complete code can be seen below

import cv2
import mediapipe

drawingModule = mediapipe.solutions.drawing_utils
faceModule = mediapipe.solutions.face_mesh

circleDrawingSpec = drawingModule.DrawingSpec(thickness=1, circle_radius=1, color=(0,255,0))
lineDrawingSpec = drawingModule.DrawingSpec(thickness=1, color=(0,255,0))

with faceModule.FaceMesh(static_image_mode=True) as face:
    image = cv2.imread("C:/Users/N/Desktop/Test.jpg")

    results = face.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

    if results.multi_face_landmarks != None:
        for faceLandmarks in results.multi_face_landmarks:
            drawingModule.draw_landmarks(image, faceLandmarks, faceModule.FACE_CONNECTIONS, circleDrawingSpec, lineDrawingSpec)

    cv2.imshow('Test image', image)

    cv2.waitKey(0)
    cv2.destroyAllWindows()

Testing the code

To test the code, simply run it in a tool of your choice. I’ll be using PyCharm, a Python IDE. You should get a result similar to figure 1. As can be seen below, the estimated face landmarks are drawn in the image, with some lines connecting them.

Estimated face landmarks drawn on the image.
Figure 1 – Estimated face landmarks drawn on the image.

References

[1] https://google.github.io/mediapipe/solutions/face_mesh

Suggested Readings

1 thought on “Python MediaPipe: Face Landmarks estimation”

  1. Really appreciate your examples here. The data structures returned by these calls are very complex if we are not simply using the draw_landmarks functions. In the hands example you did, you showed how to access the individual coordinates of a landmark. I have had difficulty doing the same thing on the face results. Would be great if you could show how to access the individual landmark coordinates on this face example.

Leave a Reply