Face Detection is one of the most important fields in computer vision, it’s the first step for further face analysis and data manipulation. Through this post I will explain how to detect a face into an image using Python and Mediapipe, a computer vision library for Python.

I will divide this post into three stages: Obtain, Preprocess and Analyze.

Obtain

When working in a Machine Learning (ML) or Computer Vision project we need to follow a basic structure, commonly the first step is to obtain our data, so in this example I will use an image, for further face detection analysis.

Before continuing verify to have installed the next libraries using pip install <library-to-install>.

Let’s define our first code lines. In your working directory create a new file called main.py, here we’ll define some basic functions.

    
import cv2
import numpy as np

def show_img(img: np.ndarray) -> None:
    """Show image in a window
    Args:
        img: image to be shown
    """
    cv2.imshow("image", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


def main() -> None:
    """Main function
    """
    image_path = "img.png"
    img = cv2.imread(image_path)
    show_img(img)


if __name__ == "__main__":
    main()
    

Preprocess

In case of Mediapipe face detection the only preprocess we need to do is convert the image color before pass it to the model. Let’s create a function to get the prediction of the model for further use. We also need to add some lines to main function.

    
import mediapipe as np # Add this line next to previous imports

def get_prediction(img: np.ndarray, model: mp.solutions.face_detection.FaceDetection) -> list:
    """Get prediction of the model
    Args:
        img: image to be predicted
        model: model to be used
    Returns:
        prediction of the model
    """
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    return model.process(img_rgb).detections

def main() -> None:
    """Main function
    """
    # ... Our previous code

    with mp.solutions.face_detection.FaceDetection(
        model_selection=1, min_detection_confidence=0.5
    ) as face_detection:
        prediction = get_prediction(img, face_detection)
        print(prediction) # This line will print the predictions from the model

    

Analyze

Analyze is the stage in which we need to analyze our model output, for the case of face detection it is a list with face detection coordinates. The next task is to interpret this data, as part of analyze I’ll also include a function to add a rectangle in each face.

    
def draw_prediction(img: np.ndarray, coordinates: list) -> np.ndarray:
    """Draw prediction on the image
    Args:
        img: image to be drawn
        coordinates: coordinates of the prediction
    Returns:
        img: image with the prediction drawn
    """
    img_height, img_width, _ = img.shape # Get image dimensions

    for detection in coordinates:
        location_data = detection.location_data # Get location data of the prediction
        bbox = location_data.relative_bounding_box # Get bounding box of the prediction

        x1, y1, w, h = bbox.xmin, bbox.ymin, bbox.width, bbox.height # Get coordinates of the bounding box

        # Convert coordinates from relative to absolute
        x1 = int(x1 * img_width)
        y1 = int(y1 * img_height)
        w = int(w * img_width)
        h = int(h * img_height)

        # Adjust the region of interest if it exceeds the image boundaries
        if y1 < 0:
            h += y1  # Reduce the height by the excess amount
            y1 = 0  # Set y1 to 0 to start from the top

        if x1 < 0:
            w += x1  # Reduce the width by the excess amount
            x1 = 0  # Set x1 to 0 to start from the left

        if y1 + h > img_height:
            h = img_height - y1  # Reduce the height if it exceeds the image height

        if x1 + w > img_width:
            w = img_width - x1  # Reduce the width if it exceeds the image width

        # Draw bounding box on the image
        img = cv2.rectangle(img, (x1, y1), (x1 + w, y1 + h), (0, 255, 0), 5)

    return img

def main() -> None:
    """Main function
    """
    # ... Our previous code

    show_img(draw_prediction(img, prediction)) # Show image with prediction drawn
    

Don’t feel scared about the size of draw_prediction() function, the complexity of this function is not elevated, just take your time to read the code and comments and you will understand it. Remember to add the new line to main() function.

Result

Here is my test image after being passed to the model and draw the prediction:

Conclusion

In my experience working with face detection models, Mediapipe offers a good response time performance with an acceptable detection confidence. Mediapipe is also very good when working with real-time video, for example with a webcam.

I really recommend this model to beginners in face detection due to previous aspects and ease to use. If you are searching for a most powerful face detection model I recommend you to read about RetinaFace.

Final code

Here is the complete code of our face detection code, in case of needed.

    
import cv2
import numpy as np
import mediapipe as mp

def show_img(img: np.ndarray) -> None:
    """Show image in a window
    Args:
        img: image to be shown
    """
    cv2.imshow("image", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


def get_prediction(img: np.ndarray, model: mp.solutions.face_detection.FaceDetection) -> np.ndarray:
    """Get prediction of the model
    Args:
        img: image to be predicted
        model: model to be used
    Returns:
        prediction: prediction of the model
    """
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    return model.process(img_rgb).detections


def draw_prediction(img: np.ndarray, coordinates: list) -> np.ndarray:
    """Draw prediction on the image
    Args:
        img: image to be drawn
        coordinates: coordinates of the prediction
    Returns:
        img: image with the prediction drawn
    """
    img_height, img_width, _ = img.shape # Get image dimensions

    for detection in coordinates:
        location_data = detection.location_data # Get location data of the prediction
        bbox = location_data.relative_bounding_box # Get bounding box of the prediction

        x1, y1, w, h = bbox.xmin, bbox.ymin, bbox.width, bbox.height # Get coordinates of the bounding box

        # Convert coordinates from relative to absolute
        x1 = int(x1 * img_width)
        y1 = int(y1 * img_height)
        w = int(w * img_width)
        h = int(h * img_height)

        # Adjust the region of interest if it exceeds the image boundaries
        if y1 < 0:
            h += y1  # Reduce the height by the excess amount
            y1 = 0  # Set y1 to 0 to start from the top

        if x1 < 0:
            w += x1  # Reduce the width by the excess amount
            x1 = 0  # Set x1 to 0 to start from the left

        if y1 + h > img_height:
            h = img_height - y1  # Reduce the height if it exceeds the image height

        if x1 + w > img_width:
            w = img_width - x1  # Reduce the width if it exceeds the image width

        # Draw bounding box on the image
        img = cv2.rectangle(img, (x1, y1), (x1 + w, y1 + h), (0, 255, 0), 5)

    return img


def main() -> None:
    """Main function
    """
    image_path = "img.png"
    img = cv2.imread(image_path)

    with mp.solutions.face_detection.FaceDetection(
        model_selection=1, min_detection_confidence=0.5
    ) as face_detection:
        prediction = get_prediction(img, face_detection)

    show_img(draw_prediction(img, prediction)) # Show image with prediction drawn

if __name__ == "__main__":
    main()