Loading [MathJax]/extensions/Safe.js
  • Introduction
  • Libraries and data
  • Face detection
    • Bounding boxes detection
    • Extract detected faces
  • Embedding of faces
  • Measuring face similarity with embeddings
  • Collection of reference embeddings
  • Pipeline for face detection and recognition
    • Pipeline for face recognition in images
    • Pipeline for face recognition in videos
    • Pipeline for face recognition in webcam
  • Session Information
  • How to Cite


More about data science: cienciadedatos.net


Introduction

Deep learning models have become the reference standard across many fields—one of them being computer vision, also known as artificial vision. A widely expanding application of this technology is facial recognition, that is, the automated identification of people appearing in an image or video.

Similar to how humans do it, for a computer system to be able to identify the people shown in an image, several stages are required:

  1. Detect the faces in the image.

  2. Use a neural network capable of mapping the features of a human face into a numerical representation. This step is known as embedding or encoding.

  3. Measure the similarity between the numerical representation of the detected faces and the reference representations available in a database.

  4. Determine whether they are sufficiently similar to be considered the same person and assign the corresponding identity.

Throughout this document, each of these steps is described and applied using OpenFaceKit, a Python package developed by the author of this document that provides tools for face detection and recognition using deep learning.

Diagram of the steps followed in a facial recognition system.

Libraries and data

# Data manipulation
# ==============================================================================
import os
import urllib
import zipfile
import numpy as np
from urllib.request import urlretrieve

# Image processing
# ==============================================================================
from PIL import Image
import cv2
import matplotlib.pyplot as plt

# Models
# ==============================================================================
import torch
from scipy.spatial.distance import euclidean
from openfacekit import (
    FaceRecognizer,
    convert_to_matplotlib_rgb,
    ReferenceEmbeddings,
)

For the examples in this document, images of the actors from the hilarious series Modern Family are used. As a first step, the images are downloaded into a local folder. This can be easily done in Python using the urlretrieve function from the urllib library.

# Download images
# ==============================================================================
if not os.path.exists('images'):
    os.makedirs('images')
# Image with a single face
url = ('https://github.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/' +
       'raw/master/images/phil_dunphy.jpg')
urlretrieve(url=url, filename='images/image_1.jpg')

# Image with multiple faces
url = ('https://github.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/'
       'raw/master/images/modernfamily.jpg')
urlretrieve(url=url, filename='images/image_2.png');

There are several libraries in Python that allow image processing (reading, writing, resizing, cropping, etc.). Three of the most commonly used are OpenCV (cv2), PIL, and matplotlib. It is important to note that OpenCV uses the BGR color format, while PIL and matplotlib use RGB. Fortunately, it is easy to switch between these formats using the functions cv2.cvtColor(image, cv2.COLOR_BGR2RGB) and cv2.cvtColor(image, cv2.COLOR_RGB2BGR).

# Reading images
# ==============================================================================
image_1 = Image.open('./images/image_1.jpg')
image_2 = Image.open('./images/image_2.png')

# image_1 = cv2.imread('images/image_1.jpg')
# image_2 = cv2.imread('images/image_2.png')

# Plot images
# ==============================================================================
plt.figure(figsize=(5, 4))
plt.imshow(image_1)
plt.axis('off')

plt.figure(figsize=(10, 6))
plt.imshow(image_2)
plt.axis('off');

Face detection

The first step in the facial recognition process is to detect where faces are located within an image. Many strategies and detection methods have been developed since the early research in this field, two of the most notable ones being:

  • MultiTask Cascaded Convolutional Neural Network (MTCNN): This detector combines three neural network models that sequentially refine the detections. Several MTCNN detectors are available for Python. One of the most efficient implementations uses PyTorch and is accessible via the facenet-pytorch library.

  • YuNet: a Convolutional Neural Network (CNN) based face detector developed by Shiqi Yu in 2018 and open-sourced in 2019. It is optimized for real-time applications and is included in OpenCV since version 4.5.1.

DeepFaceRecognition includes implementations of both detectors, allowing users to choose the one that best suits their needs.

✏️ Note

In order to use the YuNet detector, weights must be downloaded first. They can be obtained from the opencv/face_detection_yunet repository on Hugging Face. The file to download is named `face_detection_yunet_2023mar.onnx`. Once downloaded, it should be placed in a local folder, and its path must be provided when initializing the FaceRecognizer using the argument opencv_yunet_model_path. Users may also download the weights automatically by using the download_opencv_yunet_model() function from the openfacekit library.

# Initialize face detector
# ==============================================================================
face_detector = FaceRecognizer(
    detector                = "MTCNN", #"OpenCV_Yunet",
    encoder                 = None,
    min_face_size           = 20,
    thresholds              = [0.6, 0.7, 0.7],
    min_confidence_detector = 0.5,
    similarity_threshold    = 0.5,
    similarity_metric       = "cosine",
    keep_all                = True,
    verbose                 = True
)
face_detector
--------------
FaceRecognizer
--------------
Detector type: MTCNN
Encoder type: InceptionResnetV1
Device: cpu
Number of reference identities: 0
Similarity metric: cosine
Similarity threshold: 0.5
Minimum confidence detector: 0.5

Bounding boxes detection

The method detect_bboxes of the FaceDetector class is used to detect the bounding boxes of the faces present in an image. This method returns the coordinates of the bounding boxes and their associated probabilities.

# Detection of bounding boxes and its probabilities
# ==============================================================================
boxes, probs = face_detector.detect_bboxes(
    image    = image_2,
    fix_bbox = True
)
boxes
----------------
Scanned image
----------------
Detected faces: 12
Detected faces with minimum confidence: 12
Bounding box correction applied: True
Bounding box coordinates: [[293, 64, 402, 194], [505, 89, 605, 224], [108, 95, 210, 227], [427, 207, 529, 333], [47, 235, 145, 361], [1069, 134, 1165, 262], [682, 126, 778, 248], [659, 291, 750, 402], [886, 128, 968, 250], [239, 245, 326, 355], [931, 496, 1012, 613], [816, 663, 889, 751]]
Bounding box confidence: [0.9999438524246216, 0.9982789754867554, 0.999267041683197, 0.9998809099197388, 0.9999357461929321, 0.9999068975448608, 0.9999818801879883, 0.9996474981307983, 0.9995021820068359, 0.9993504881858826, 0.9991905093193054, 0.9989155530929565]

array([[ 293,   64,  402,  194],
       [ 505,   89,  605,  224],
       [ 108,   95,  210,  227],
       [ 427,  207,  529,  333],
       [  47,  235,  145,  361],
       [1069,  134, 1165,  262],
       [ 682,  126,  778,  248],
       [ 659,  291,  750,  402],
       [ 886,  128,  968,  250],
       [ 239,  245,  326,  355],
       [ 931,  496, 1012,  613],
       [ 816,  663,  889,  751]])

It is common to plot the detected bounding boxes on the original image to visually verify the detection results. This can be done using detect_faces method of the FaceDetector class.

# Plot detected bounding boxes on the original image
# ==============================================================================
face_detector.detect_faces(
    image    = image_2,
    fix_bbox = True
)
----------------
Scanned image
----------------
Detected faces: 12
Detected faces with minimum confidence: 12
Bounding box correction applied: True
Bounding box coordinates: [[293, 64, 402, 194], [505, 89, 605, 224], [108, 95, 210, 227], [427, 207, 529, 333], [47, 235, 145, 361], [1069, 134, 1165, 262], [682, 126, 778, 248], [659, 291, 750, 402], [886, 128, 968, 250], [239, 245, 326, 355], [931, 496, 1012, 613], [816, 663, 889, 751]]
Bounding box confidence: [0.9999438524246216, 0.9982789754867554, 0.999267041683197, 0.9998809099197388, 0.9999357461929321, 0.9999068975448608, 0.9999818801879883, 0.9996474981307983, 0.9995021820068359, 0.9993504881858826, 0.9991905093193054, 0.9989155530929565]

Extract detected faces

The method extract_faces of the FaceDetector class is used to extract the portions of the image that contain faces. The returned object is a tensor with the pixel values of the cropped faces (3 color channels x image_size x image_size). If more than one face is detected, then a tensor of dimensions (number of faces x 3 color channels x image_size x image_size) is returned.

# Extract detected faces
# ==============================================================================
faces, probs = face_detector.extract_faces(image=image_2)
print(f"Shape: {faces.shape}")
Shape: torch.Size([12, 3, 160, 160])

The returned image from the detector is a tensor with dimensions [3, 160, 160], meaning that the color channels are in the first position. To display the image using matplotlib, the channels need to be moved to the last position [160, 160, 3] and converted from a tensor object to a numpy array.

# Plot extracted faces with matplotlib
# ==============================================================================
fig, axs = plt.subplots(nrows=2, ncols=int(np.ceil(len(faces)/2)), figsize=(10, 4))
axs = axs.flatten()
for i in range(faces.shape[0]):
    face = convert_to_matplotlib_rgb(faces[i])
    # add a title with the probability
    axs[i].set_title(f'Prob: {probs[i]:.4f}')
    axs[i].imshow(face)
    axs[i].axis('off')
fig.tight_layout()
print(faces[0])
tensor([[[ 0.9414,  0.9336,  0.9336,  ...,  0.9102,  0.9258,  0.9570],
         [ 0.9492,  0.9414,  0.9414,  ...,  0.9180,  0.9180,  0.9258],
         [ 0.9570,  0.9570,  0.9570,  ...,  0.9492,  0.9258,  0.9023],
         ...,
         [-0.2617, -0.2227, -0.1914,  ..., -0.2305, -0.2539, -0.2852],
         [-0.2539, -0.2148, -0.1914,  ..., -0.1914, -0.2148, -0.2539],
         [-0.2695, -0.2383, -0.2227,  ..., -0.1602, -0.1758, -0.1992]],

        [[ 0.9727,  0.9648,  0.9648,  ...,  0.9102,  0.9258,  0.9570],
         [ 0.9805,  0.9727,  0.9648,  ...,  0.9180,  0.9180,  0.9258],
         [ 0.9883,  0.9805,  0.9727,  ...,  0.9336,  0.9180,  0.9023],
         ...,
         [-0.4023, -0.3633, -0.3398,  ..., -0.5117, -0.5273, -0.5586],
         [-0.4023, -0.3633, -0.3477,  ..., -0.4961, -0.5273, -0.5664],
         [-0.4180, -0.3945, -0.3867,  ..., -0.4883, -0.5039, -0.5273]],

        [[ 0.9961,  0.9883,  0.9883,  ...,  0.8867,  0.9180,  0.9570],
         [ 0.9961,  0.9961,  0.9961,  ...,  0.8945,  0.9102,  0.9258],
         [ 0.9961,  0.9961,  0.9961,  ...,  0.9102,  0.9102,  0.9023],
         ...,
         [-0.5273, -0.4961, -0.4727,  ..., -0.6445, -0.6602, -0.6914],
         [-0.5195, -0.4961, -0.4805,  ..., -0.6289, -0.6602, -0.6992],
         [-0.5508, -0.5273, -0.5195,  ..., -0.6211, -0.6289, -0.6523]]])

Embedding of faces

Once the faces in the image have been identified, the next step is to obtain a numerical transformation that represents each face’s unique characteristics. The resulting numerical vector is known as an embedding or encoding.

Deep learning models (convolutional neural networks) capable of generating face embeddings are not easy to train. Fortunately, several pre-trained models are available in Python. Two of the most common are:

To build this type of model, a classification network is first trained on a dataset containing many individuals. Once the network is trained, the final softmax layer is removed so that the model’s output becomes a numerical vector.

In this document, the model used is InceptionResnetV1, specifically the version trained on the VGGFace2 dataset.
For more details about this type of model, refer to the VGGFace2 paper.

Diagram of a face embedding process.
# Embeding of faces
# ==============================================================================
embeddings = face_detector.calculate_embeddings(face_images=faces)
print(f"Shape: {embeddings.shape}")
embeddings
Shape: torch.Size([12, 512])
tensor([[ 7.3119e-02,  1.3807e-02, -2.0866e-02,  ..., -5.1862e-02,
          5.5240e-02,  4.7610e-03],
        [-6.3946e-02, -1.7960e-02, -3.9687e-02,  ...,  6.9325e-02,
         -9.6542e-02, -9.4992e-02],
        [ 3.1438e-02,  6.8282e-02, -6.3409e-03,  ..., -9.6854e-05,
          3.9739e-02, -1.2795e-02],
        ...,
        [ 4.7704e-02, -1.6753e-02,  3.7802e-02,  ..., -2.9759e-02,
         -3.5267e-02, -1.2076e-02],
        [ 4.3862e-02, -7.2438e-03,  2.1455e-02,  ..., -1.0147e-02,
          1.3584e-02,  4.7367e-03],
        [-4.3593e-03, -2.1744e-02,  3.1877e-02,  ..., -6.5899e-02,
          2.2239e-02,  1.8709e-02]])

Measuring face similarity with embeddings

The goal of obtaining a numerical representation of faces (embeddings) is to quantify how similar they are to one another.
Two common ways to calculate this similarity are by using the Euclidean distance or the cosine distance between embeddings. The smaller the distance, the greater the similarity between the faces.

$$ \text{similarity} = 1 - \text{distance} $$

Below is an example where one image is compared against two others: the first belongs to the same person, Phil Dunphy, and the second to Cameron Tucker.

# Extraction of faces
# ==============================================================================
phil_1 = face_detector.extract_faces(image=image_1)[0][0]
phil_2 = face_detector.extract_faces(image=image_2)[0][1]
cameron = face_detector.extract_faces(image=image_2)[0][2]
# Plot extracted faces
# ==============================================================================
fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(10, 6))

face = convert_to_matplotlib_rgb(phil_1)
axs[0].imshow(face)
axs[0].set_title('Phil 1')
axs[0].axis('off')

face = convert_to_matplotlib_rgb(phil_2)
axs[1].imshow(face)
axs[1].set_title('Phil 2')
axs[1].axis('off')

face = convert_to_matplotlib_rgb(cameron)
axs[2].imshow(face)
axs[2].set_title('Cameron')
axs[2].axis('off');

Once the 3 faces are extracted from the images, their embeddings are created, and the similarities between them are calculated using the Euclidean distance.

# Embeddings
# ==============================================================================
embedding_phil_1 = face_detector.calculate_embeddings(face_images=phil_1.reshape((1, 3, 160, 160))).flatten()
embedding_phil_2 = face_detector.calculate_embeddings(face_images=phil_2.reshape((1, 3, 160, 160))).flatten()
embedding_cameron = face_detector.calculate_embeddings(face_images=cameron.reshape((1, 3, 160, 160))).flatten()
# Similarity between embeddings
# ==============================================================================
print(f"Similarity between the same image Phil: {1 - euclidean(embedding_phil_1, embedding_phil_1)}")
print(f"Similarity between the two images of Phil: {1 - euclidean(embedding_phil_1, embedding_phil_2)}")
print(f"Similarity between Phil and Cameron: {1 - euclidean(embedding_phil_1, embedding_cameron)}")
Similarity between the same image Phil: 1.0
Similarity between the two images of Phil: 0.40968143939971924
Similarity between Phil and Cameron: -0.3885413408279419

It can be observed that the similarity between the two images of Phil is significantly higher than the similarity between Phil and Cameron, indicating that the embeddings effectively capture the facial features that differentiate individuals.

Collection of reference embeddings

To identify who a face belongs to, it is necessary to compare it against a database that contains a reference embedding for each identity.

The newly detected face is compared against all the reference embeddings in the database. If the similarity between the new embedding and any of the reference embeddings is above a certain threshold, the identity associated with that reference embedding is assigned to the detected face. If no similarities are above the threshold, the face is classified as "unknown."

DeepFaceRecognition provides the class ReferenceEmbeddings to facilitate the management of a database of reference embeddings. Given a path to a folder with images of known individuals, this class can automatically create and store the reference embeddings for each identity.

The image or images of each person are assumed to be located in a subfolder named after that person's identity. For this example, a folder structure like the following is used:

./images/reference_images

├── AlexDunphy
│   ├── 5e8f3e2373d0c84a052dc5e2.jpg
│   ├── AlexDunphy.png
│   └── alex-black-floral-ruffle-blouse.jpg
├── CameronTucker
│   ├── CameronTucker.png
│   └── descarga.jpg
├── ClaireDunphy
│   ├── ClaireDunphy.png
│   ├── descarga (1).jpg
│   └── frantic-claire-modern-family-s11e17.jpg
├── GloriaPritchett
│   ├── 46623-1532336916.jpg
│   ├── GloriaPritchett.png
│   └── descarga.jpg
├── HaleyDunphy
│   ├── HaleyDunphy.png
│   ├── descarga (1).jpg
│   ├── descarga.jpg
│   └── sarah-hyland-modern-family-1557405158.jpg
├── JayPritchett
│   ├── JayPritchett.png
│   └── descarga (2).jpg
├── JoePritchett
│   ├── JoePritchett.png
│   ├── Joe_Pritchett.jpg
│   ├── descarga.jpg
│   ├── images (1).jpg
│   └── images.jpg
├── LilyTucker-Pritchett
│   ├── LilyTucker-Pritchett.png
│   ├── descarga (1).jpg
│   ├── descarga (2).jpg
│   └── descarga.jpg
├── LukeDunphy
│   ├── LukeDunphy.png
│   ├── descarga (1).jpg
│   ├── descarga (3).jpg
│   └── descarga.jpg
├── MannyDelgado
│   ├── 58.jpg
│   ├── Manny-S11.jpg
│   ├── MannyDelgado.png
│   ├── descarga (2).jpg
│   ├── fca262d92ce831635d991deeb61fed1b.png
│   └── modern_family_manny_delgado_1_201215_glv63ple3v.jpeg
├── MitchellPritchett
│   ├── Jesse_Tyler_Fergeuson_Muppets_Most_Wanted_Premiere_(cropped).jpg
│   ├── MitchellPritchett.png
│   └── descarga.jpg
└── PhilDunphy
    ├── 1503333585_649717_1503333744_noticia_normal.jpg
    ├── 186708-4.jpg
    ├── Phil-S11.jpg
    └── PhilDunphy.png
# Download reference images
# ==============================================================================
url = (
    "https://github.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/"
    "raw/master/images/imagenes_referencia_reconocimiento_facial.zip"
)
extract_dir = "./images/reference_images"
os.makedirs(extract_dir, exist_ok=True)
zip_path, _ = urllib.request.urlretrieve(url)
with zipfile.ZipFile(zip_path, "r") as f:
    f.extractall(extract_dir)

print("Images successfully downloaded and extracted to:", extract_dir)
Images successfully downloaded and extracted to: ./images/reference_images
# Create reference embeddings from a folder of images
# ==============================================================================
reference_embeddings = ReferenceEmbeddings(
    folder_path='./images/reference_images'
)

reference_embeddings.calculate_reference_embeddings()
reference_embeddings
Processing identity: JayPritchett
  Reading image: ./images/reference_images/JayPritchett/descarga (2).jpg
  Reading image: ./images/reference_images/JayPritchett/JayPritchett.png
Processing identity: AlexDunphy
  Reading image: ./images/reference_images/AlexDunphy/5e8f3e2373d0c84a052dc5e2.jpg
  Reading image: ./images/reference_images/AlexDunphy/alex-black-floral-ruffle-blouse.jpg
  Reading image: ./images/reference_images/AlexDunphy/AlexDunphy.png
Processing identity: CameronTucker
  Reading image: ./images/reference_images/CameronTucker/descarga.jpg
  Reading image: ./images/reference_images/CameronTucker/CameronTucker.png
Processing identity: JoePritchett
  Reading image: ./images/reference_images/JoePritchett/images (1).jpg
  Reading image: ./images/reference_images/JoePritchett/images.jpg
  Reading image: ./images/reference_images/JoePritchett/Joe_Pritchett.jpg
  Reading image: ./images/reference_images/JoePritchett/descarga.jpg
  Reading image: ./images/reference_images/JoePritchett/JoePritchett.png
Processing identity: MitchellPritchett
  Reading image: ./images/reference_images/MitchellPritchett/Jesse_Tyler_Fergeuson_Muppets_Most_Wanted_Premiere_(cropped).jpg
  Reading image: ./images/reference_images/MitchellPritchett/descarga.jpg
  Reading image: ./images/reference_images/MitchellPritchett/MitchellPritchett.png
Processing identity: HaleyDunphy
  Reading image: ./images/reference_images/HaleyDunphy/sarah-hyland-modern-family-1557405158.jpg
  More than 2 faces detected in image, The face with the highest confidence will be used: ./images/reference_images/HaleyDunphy/sarah-hyland-modern-family-1557405158.jpg
  Reading image: ./images/reference_images/HaleyDunphy/descarga (1).jpg
  Reading image: ./images/reference_images/HaleyDunphy/descarga.jpg
  Reading image: ./images/reference_images/HaleyDunphy/HaleyDunphy.png
Processing identity: PhilDunphy
  Reading image: ./images/reference_images/PhilDunphy/186708-4.jpg
  Reading image: ./images/reference_images/PhilDunphy/Phil-S11.jpg
  Reading image: ./images/reference_images/PhilDunphy/1503333585_649717_1503333744_noticia_normal.jpg
  Reading image: ./images/reference_images/PhilDunphy/PhilDunphy.png
Processing identity: JoaquinAmat
  Reading image: ./images/reference_images/JoaquinAmat/joaquin_amat.jpg
Processing identity: MannyDelgado
  Reading image: ./images/reference_images/MannyDelgado/Manny-S11.jpg
  Reading image: ./images/reference_images/MannyDelgado/descarga (2).jpg
  Reading image: ./images/reference_images/MannyDelgado/58.jpg
  Reading image: ./images/reference_images/MannyDelgado/modern_family_manny_delgado_1_201215_glv63ple3v.jpeg
  Reading image: ./images/reference_images/MannyDelgado/fca262d92ce831635d991deeb61fed1b.png
  Reading image: ./images/reference_images/MannyDelgado/MannyDelgado.png
Processing identity: LilyTucker-Pritchett
  Reading image: ./images/reference_images/LilyTucker-Pritchett/descarga (1).jpg
  More than 2 faces detected in image, The face with the highest confidence will be used: ./images/reference_images/LilyTucker-Pritchett/descarga (1).jpg
  Reading image: ./images/reference_images/LilyTucker-Pritchett/descarga (2).jpg
  Reading image: ./images/reference_images/LilyTucker-Pritchett/descarga.jpg
  Reading image: ./images/reference_images/LilyTucker-Pritchett/LilyTucker-Pritchett.png
Processing identity: ClaireDunphy
  Reading image: ./images/reference_images/ClaireDunphy/descarga (1).jpg
  Reading image: ./images/reference_images/ClaireDunphy/frantic-claire-modern-family-s11e17.jpg
  More than 2 faces detected in image, The face with the highest confidence will be used: ./images/reference_images/ClaireDunphy/frantic-claire-modern-family-s11e17.jpg
  Reading image: ./images/reference_images/ClaireDunphy/ClaireDunphy.png
Processing identity: GloriaPritchett
  Reading image: ./images/reference_images/GloriaPritchett/46623-1532336916.jpg
  Reading image: ./images/reference_images/GloriaPritchett/descarga.jpg
  Reading image: ./images/reference_images/GloriaPritchett/GloriaPritchett.png
Processing identity: LukeDunphy
  Reading image: ./images/reference_images/LukeDunphy/descarga (1).jpg
  Reading image: ./images/reference_images/LukeDunphy/descarga (3).jpg
  Reading image: ./images/reference_images/LukeDunphy/descarga.jpg
  Reading image: ./images/reference_images/LukeDunphy/LukeDunphy.png
-------------------
ReferenceEmbeddings
-------------------
Number of identities: 13
Number of images per identity: {'JayPritchett': 2, 'AlexDunphy': 3, 'CameronTucker': 2, 'JoePritchett': 5, 'MitchellPritchett': 3, 'HaleyDunphy': 4, 'PhilDunphy': 4, 'JoaquinAmat': 1, 'MannyDelgado': 6, 'LilyTucker-Pritchett': 4, 'ClaireDunphy': 3, 'GloriaPritchett': 3, 'LukeDunphy': 4}
Source folder: ./images/reference_images
Save path: None
Device: None
Minimum face size: 20
Detection thresholds: [0.6, 0.7, 0.7]
Minimum confidence for detection: 0.5
Verbose: True

Once the reference embeddings have been created, they can be loaded into the FaceDetector class to perform face recognition on new images using the load_reference_embeddings method. Then, the identify_faces method can be used to detect and recognize faces in new images.

# Load reference embeddings into the face_detector
# ==============================================================================
face_detector.load_reference_embeddings(reference_embeddings)
# Detect and recognize faces in a image
# ==============================================================================
identities, similarities = face_detector.identify_faces(embeddings=embeddings)
print(f"Identities: {identities}")
print(f"Similarities: {similarities}")
----------------
Identified faces
----------------
Face 0: Identity: JayPritchett, Similarity: 0.69
Face 1: Identity: PhilDunphy, Similarity: 0.92
Face 2: Identity: CameronTucker, Similarity: 0.80
Face 3: Identity: MannyDelgado, Similarity: 0.87
Face 4: Identity: HaleyDunphy, Similarity: 0.75
Face 5: Identity: ClaireDunphy, Similarity: 0.78
Face 6: Identity: MitchellPritchett, Similarity: 0.61
Face 7: Identity: AlexDunphy, Similarity: 0.62
Face 8: Identity: GloriaPritchett, Similarity: 0.68
Face 9: Identity: LukeDunphy, Similarity: 0.79
Face 10: Identity: LilyTucker-Pritchett, Similarity: 0.80
Face 11: Identity: JoePritchett, Similarity: 0.77
Identities: ['JayPritchett', 'PhilDunphy', 'CameronTucker', 'MannyDelgado', 'HaleyDunphy', 'ClaireDunphy', 'MitchellPritchett', 'AlexDunphy', 'GloriaPritchett', 'LukeDunphy', 'LilyTucker-Pritchett', 'JoePritchett']
Similarities: [0.6871325969696045, 0.9191560745239258, 0.7987002730369568, 0.874779462814331, 0.746529757976532, 0.7806506752967834, 0.6057980060577393, 0.6189405918121338, 0.6765539646148682, 0.7890835404396057, 0.7986713647842407, 0.7654215693473816]

Is it also possible detect, identify, and plot the bounding boxes of the recognized faces in a single step using the detect_and_identify_faces method.

face_detector.detect_and_identify_faces(image=image_2)
----------------
Scanned image
----------------
Detected faces: 12
Detected faces with minimum confidence: 12
Bounding box correction applied: True
Bounding box coordinates: [[293, 64, 402, 194], [505, 89, 605, 224], [108, 95, 210, 227], [427, 207, 529, 333], [47, 235, 145, 361], [1069, 134, 1165, 262], [682, 126, 778, 248], [659, 291, 750, 402], [886, 128, 968, 250], [239, 245, 326, 355], [931, 496, 1012, 613], [816, 663, 889, 751]]
Bounding box confidence: [0.9999438524246216, 0.9982789754867554, 0.999267041683197, 0.9998809099197388, 0.9999357461929321, 0.9999068975448608, 0.9999818801879883, 0.9996474981307983, 0.9995021820068359, 0.9993504881858826, 0.9991905093193054, 0.9989155530929565]

----------------
Identified faces
----------------
Face 0: Identity: JayPritchett, Similarity: 0.68
Face 1: Identity: PhilDunphy, Similarity: 0.92
Face 2: Identity: CameronTucker, Similarity: 0.80
Face 3: Identity: MannyDelgado, Similarity: 0.88
Face 4: Identity: HaleyDunphy, Similarity: 0.74
Face 5: Identity: ClaireDunphy, Similarity: 0.77
Face 6: Identity: MitchellPritchett, Similarity: 0.61
Face 7: Identity: AlexDunphy, Similarity: 0.63
Face 8: Identity: GloriaPritchett, Similarity: 0.67
Face 9: Identity: LukeDunphy, Similarity: 0.79
Face 10: Identity: LilyTucker-Pritchett, Similarity: 0.80
Face 11: Identity: JoePritchett, Similarity: 0.76

Pipeline for face detection and recognition

All the steps described above can be combined into a single pipeline for face recognition in images, videos, or real-time video streams.

Pipeline for face recognition in images

# Create reference embeddings from a folder of images
# ==============================================================================
reference_embeddings = ReferenceEmbeddings(
    folder_path ='./images/reference_images',
    verbose     = False
)

reference_embeddings.calculate_reference_embeddings()
reference_embeddings
-------------------
ReferenceEmbeddings
-------------------
Number of identities: 13
Number of images per identity: {'JayPritchett': 2, 'AlexDunphy': 3, 'CameronTucker': 2, 'JoePritchett': 5, 'MitchellPritchett': 3, 'HaleyDunphy': 4, 'PhilDunphy': 4, 'JoaquinAmat': 1, 'MannyDelgado': 6, 'LilyTucker-Pritchett': 4, 'ClaireDunphy': 3, 'GloriaPritchett': 3, 'LukeDunphy': 4}
Source folder: ./images/reference_images
Save path: None
Device: None
Minimum face size: 20
Detection thresholds: [0.6, 0.7, 0.7]
Minimum confidence for detection: 0.5
Verbose: False
# Load reference embeddings into the face detector
# ==============================================================================
face_detector.load_reference_embeddings(reference_embeddings)
# Detect and recognize faces in an image
# ==============================================================================
face_detector.detect_and_identify_faces(image=image_2)
----------------
Scanned image
----------------
Detected faces: 12
Detected faces with minimum confidence: 12
Bounding box correction applied: True
Bounding box coordinates: [[293, 64, 402, 194], [505, 89, 605, 224], [108, 95, 210, 227], [427, 207, 529, 333], [47, 235, 145, 361], [1069, 134, 1165, 262], [682, 126, 778, 248], [659, 291, 750, 402], [886, 128, 968, 250], [239, 245, 326, 355], [931, 496, 1012, 613], [816, 663, 889, 751]]
Bounding box confidence: [0.9999438524246216, 0.9982789754867554, 0.999267041683197, 0.9998809099197388, 0.9999357461929321, 0.9999068975448608, 0.9999818801879883, 0.9996474981307983, 0.9995021820068359, 0.9993504881858826, 0.9991905093193054, 0.9989155530929565]

----------------
Identified faces
----------------
Face 0: Identity: JayPritchett, Similarity: 0.68
Face 1: Identity: PhilDunphy, Similarity: 0.92
Face 2: Identity: CameronTucker, Similarity: 0.80
Face 3: Identity: MannyDelgado, Similarity: 0.88
Face 4: Identity: HaleyDunphy, Similarity: 0.74
Face 5: Identity: ClaireDunphy, Similarity: 0.77
Face 6: Identity: MitchellPritchett, Similarity: 0.61
Face 7: Identity: AlexDunphy, Similarity: 0.63
Face 8: Identity: GloriaPritchett, Similarity: 0.67
Face 9: Identity: LukeDunphy, Similarity: 0.79
Face 10: Identity: LilyTucker-Pritchett, Similarity: 0.80
Face 11: Identity: JoePritchett, Similarity: 0.76

Pipeline for face recognition in videos

Processing videos requires handling each of its frames, making it computationally intensive. It is recommended to use GPUs. The video used for this example can be downloaded from the following link.

# Detect and recognize faces in a video
# ==============================================================================
face_detector.detect_and_identify_faces_video(
    video_path  = './videos/video_modern_family.mp4',
    output_path = './videos/output_test.mp4'
)

Pipeline for face recognition in webcam

The argument capture_index specifies which camera to use (0 for the default camera, 1 for an external camera, etc.). The argument skip_frames allows skipping a certain number of frames between each processing step to improve performance. The argument show indicates whether to display the video with the detected and identified faces in real-time.

# Detect and recognize faces in webcam (real-time streaming)
# ==============================================================================
face_detector.detect_and_identify_faces_webcam(
    capture_index = 0, # change to 1 or 2 if you have an external webcam
    skip_frames   = 2,
    show          = True
)

Session Information

import session_info
session_info.show(html=False)
-----
PIL                 12.0.0
cv2                 4.12.0
matplotlib          3.10.7
numpy               2.2.6
openfacekit         0.2.0
scipy               1.16.3
session_info        v1.0.1
torch               2.7.1+cu126
-----
IPython             9.6.0
jupyter_client      8.6.3
jupyter_core        5.9.1
-----
Python 3.12.12 | packaged by conda-forge | (main, Oct 22 2025, 23:25:55) [GCC 14.3.0]
Linux-6.14.0-34-generic-x86_64-with-glibc2.39
-----
Session information updated at 2025-11-04 11:25

How to Cite

How to cite this document?

If you use this document or any part of it, we appreciate your citation. Thank you very much!

Face detection and recognition with deep learning and python by Joaquín Amat Rodrigo, available under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) license at https://cienciadedatos.net/documentos/py34-face-detection-and-recognition-python.html

Did you like the article? Your support is important

Your contribution will help me continue generating free educational content. Thank you very much! 😊

Become a GitHub Sponsor

Creative Commons Licence

This document created by Joaquín Amat Rodrigo is licensed under an Attribution-NonCommercial-ShareAlike 4.0 International license.

You are allowed to:

  • Share: copy and redistribute the material in any medium or format.

  • Adapt: remix, transform, and build upon the material.

Under the following terms:

  • Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

  • Non-Commercial: You may not use the material for commercial purposes.

  • ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.