iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🐰

A Gentle Introduction to Face-Sim: Evaluating Face Similarity in Image and Video Generation

に公開

Rabbit-friendly Guide to Face-Sim Metrics: Face Similarity Evaluation in Image and Video Generation

👇️ You can also listen on PodCast
https://youtu.be/g0PZ6xTrYQQ

Hello everyone! 🐰 here.

Recently, AI image and video generation technologies have evolved rapidly, making the task of generating human faces particularly important. However, objectively evaluating "how similar the generated face is to the original face" is a difficult problem.

So today, I will explain Face-Sim, a metric for evaluating face similarity. Face-Sim is a powerful tool that leverages the latest face recognition technology to quantitatively measure the similarity between a generated face image and the original face image.

By reading this article, you should gain a solid understanding of everything from the basic concepts to the implementation methods of Face-Sim, which are useful for evaluating image and video generation models. Let's dive right in!

Table of Contents

Background

In the fields of image and video generation, evaluating the "quality" of generated results is extremely important. Generally, metrics like FID (Fréchet Inception Distance) and LPIPS (Learned Perceptual Image Patch Similarity) are used for image generation evaluation, but these are not necessarily specialized for "face" similarity.

Especially in face generation, humans are very sensitive to subtle differences in faces, so general image evaluation metrics are often insufficient. For example, when generating a person's face with different movements or expressions, it is required that the facial identity is preserved while the expression or angle changes.

Against this background, Face-Sim was developed as an evaluation metric specialized for faces, and it plays an important role in the research and application development of image and video generation.

What is Face-Sim?

Face-Sim Concept

Face-Sim is a metric for quantitatively evaluating the similarity between face images. Formally, it is sometimes called "Cosine Similarity" or "CSIM." This metric is primarily used to evaluate "whether it is the same person" between the original face image and the generated face image.

Features of Face-Sim

Face-Sim has the following features:

  1. Specialized for Facial Identity: Face-Sim is optimized for evaluating facial identity. Even if the pose or expression of the face changes, it can accurately evaluate whether it is the same person's face.

  2. Score Range of 0 to 1: Face-Sim typically takes a value in the range of 0 to 1, where a value closer to 1 indicates higher facial similarity. Generally, a score of 0.5 or higher is often considered the same person, though the actual threshold is adjusted depending on the application.

  3. Deep Learning-Based: Face-Sim uses state-of-the-art face recognition models like ArcFace to extract features and calculates the cosine similarity in a high-dimensional feature space.

  4. Correlation with Human Perception: Face-Sim scores are designed to have a high correlation with human face recognition ability; pairs of faces that humans judge as "the same person" show high Face-Sim scores.

Main Uses of Face-Sim

Face-Sim is utilized in scenarios such as:

  • Evaluation of Face Swap Technology: Quality evaluation of face replacement (face swapping).
  • Verification of Identity Preservation: Checking if the original person's characteristics are maintained after style or age transformation.
  • Consistency Evaluation of Video Generation: Measuring facial consistency within a generated video.
  • Identity Verification Across Different Poses/Expressions: Evaluating whether it is the same person despite different poses or expressions.

Because Face-Sim measures similarity based on the essential features of the face rather than just the visual similarity of the image, it has become a very important metric in the evaluation of image generation technology.

Technical Mechanism of Face-Sim

Technical Mechanism of Face-Sim

Let's take a closer look at the technical mechanism behind how Face-Sim evaluates facial similarity.

Step 1: Face Detection and Alignment

First, the face region is detected from the image being evaluated. At this stage, high-precision face detection models such as RetinaFace are used. The detected face undergoes the following pre-processing:

  • Face Region Cropping: Removing excess background around the face.
  • Face Alignment: Aligning the face based on facial landmarks (feature points) such as the eyes and mouth.
  • Size Unification: Resizing to a fixed size, typically 112x112 pixels.
  • Normalization: Normalizing pixel values to an appropriate range (e.g., -1 to 1).

Through these pre-processing steps, face images under various conditions (brightness, angle, size, etc.) can be handled uniformly.

Step 2: Feature Extraction (ArcFace)

The pre-processed face image is fed into a face recognition model such as ArcFace (Additive Angular Margin Loss for Deep Face Recognition). ArcFace is a model with the following characteristics:

  • ResNet-50/100 Based Architecture: A deep convolutional neural network.
  • Angular Margin Loss Function: Converts facial features into highly discriminative representations.
  • 512-Dimensional Feature Vector Output: Represents facial features as a 512-dimensional vector.

The ArcFace model is pre-trained on massive face datasets (millions of images), enabling robust feature extraction across various factors such as race, age, and lighting conditions.

Step 3: Cosine Similarity Calculation

The Face-Sim score is obtained by calculating the cosine similarity between the 512-dimensional feature vectors extracted from two face images:

def calculate_face_sim(embedding1, embedding2):
    # L2 Normalization (unifying vector lengths to 1)
    embedding1 = embedding1 / np.linalg.norm(embedding1)
    embedding2 = embedding2 / np.linalg.norm(embedding2)
    
    # Calculation of cosine similarity
    similarity = np.dot(embedding1, embedding2)
    
    return similarity  # Value between 0 and 1

Cosine similarity measures similarity by calculating the cosine of the angle between two vectors. A value closer to 1 means the directions of the two vectors are closer (i.e., the faces are more similar).

ArcFace Details

ArcFace is the core technology of Face-Sim and has the following characteristics:

  • Angular Margin: By adding an angular margin to the standard softmax loss, it improves the separation performance of each class (person) in the feature space.
  • Feature Normalization: By projecting feature vectors onto a hypersphere, similarity is judged based on direction rather than vector length.
  • High Accuracy: Achieves a high accuracy of 99.83% on the LFW (Labeled Faces in the Wild) dataset, a standard benchmark for face recognition.

In this way, Face-Sim leverages ArcFace's powerful feature extraction capabilities to capture the essential characteristics of a face, achieving robust similarity evaluation even across changes in expression and angle.

How to Implement Face-Sim

Face-Sim Implementation and Usage Flow

I will explain how to actually implement and use Face-Sim. Let's look at an implementation example using Python.

Required Libraries and Tools

The following libraries are required to implement Face-Sim:

# Install required libraries
pip install numpy opencv-python insightface onnxruntime
  • NumPy: A library for numerical computation.
  • OpenCV: A library for image processing.
  • InsightFace: A library providing face recognition models such as ArcFace.
  • ONNX Runtime: An inference engine for the models.

Basic Implementation Code

Below is an example implementation of Face-Sim using InsightFace:

import cv2
import numpy as np
import insightface
from insightface.app import FaceAnalysis

# Initialize FaceAnalysis (class for face detection and feature extraction)
face_analyzer = FaceAnalysis(name='buffalo_l', root='./models')
face_analyzer.prepare(ctx_id=0, det_size=(640, 640))

def extract_face_embedding(image_path):
    """Function to extract face feature vectors from an image"""
    # Load image
    img = cv2.imread(image_path)
    
    # BGR -> RGB conversion (InsightFace assumes RGB)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # Face detection and feature extraction
    faces = face_analyzer.get(img)
    
    if len(faces) == 0:
        raise ValueError("No face detected")
    
    # Return the feature vector of the largest face (main face)
    # If there are multiple faces, use the largest one (the central face)
    faces = sorted(faces, key=lambda x: x.bbox[2] - x.bbox[0], reverse=True)
    embedding = faces[0].embedding
    
    return embedding

def calculate_face_sim(embedding1, embedding2):
    """Calculate Face-Sim (cosine similarity) between two face feature vectors"""
    # May already be normalized, but normalize just in case
    embedding1 = embedding1 / np.linalg.norm(embedding1)
    embedding2 = embedding2 / np.linalg.norm(embedding2)
    
    # Calculate cosine similarity
    similarity = np.dot(embedding1, embedding2)
    
    return similarity

# Usage Example
def compare_faces(image1_path, image2_path):
    """Calculate the facial similarity of two images"""
    # Feature extraction
    embedding1 = extract_face_embedding(image1_path)
    embedding2 = extract_face_embedding(image2_path)
    
    # Face-Sim calculation
    sim_score = calculate_face_sim(embedding1, embedding2)
    
    return sim_score

# Execution Example
if __name__ == "__main__":
    # Paths to the original and generated images
    original_image = "original_face.jpg"
    generated_image = "generated_face.jpg"
    
    # Calculate similarity
    similarity = compare_faces(original_image, generated_image)
    
    print(f"Face-Sim Score: {similarity:.4f}")
    
    # Judgment by threshold (e.g., considered same person if 0.5 or higher)
    threshold = 0.5
    if similarity >= threshold:
        print("Determined to be the same person")
    else:
        print("Determined to be a different person")

Adjusting Parameters

When actually using Face-Sim, you can adjust the following parameters to configure the settings according to the evaluation accuracy and purpose:

  1. Similarity Threshold: The threshold score for determining whether it is the same person. It is usually adjusted in the range of 0.3 to 0.7 depending on the application.
  2. Detection Size: The image size used during face detection. Larger sizes allow for the detection of smaller faces but increase processing time.
  3. Model Selection: You can choose from models of different sizes, such as buffalo_l, buffalo_s, or buffalo_m.
  4. Pre-processing Method: You can customize the face alignment and resizing methods.

Efficiency through Batch Processing

When evaluating a large number of image pairs, it is also possible to improve efficiency through batch processing:

def batch_compare_faces(original_images, generated_images):
    """Batch process multiple image pairs"""
    results = []
    
    # Perform feature extraction all at once
    original_embeddings = [extract_face_embedding(img) for img in original_images]
    generated_embeddings = [extract_face_embedding(img) for img in generated_images]
    
    # Similarity calculation
    for orig_emb, gen_emb in zip(original_embeddings, generated_embeddings):
        sim = calculate_face_sim(orig_emb, gen_emb)
        results.append(sim)
    
    return results

Notes on Implementation

  • Face Detection Accuracy: If the face cannot be detected accurately, the precision of the similarity score will decrease. Use images with sufficient resolution and brightness.
  • Multiple Faces: If there are multiple faces in a single image, you need to clarify which face is the target for evaluation.
  • Computational Cost: Consider using a GPU when processing a large volume of images.
  • Consistency in Pre-processing: It is important to apply the same pre-processing to both the original and generated images.

Through the methods above, you can implement Face-Sim and utilize it for the quality evaluation of generated face images. Next, let's look at actual use cases.

Actual Use Cases

Face-Sim is used for evaluating various image and video generation tasks. Let's look at specific examples.

Evaluation of Face Swap Technology

In face swap technology, one person's face is replaced with another person's face. Face-Sim is used in evaluating this technology from the following perspectives:

  1. Evaluation of Target Preservation: How much the generated face resembles the target (the person to be replaced).
  2. Evaluation of Consistency: Facial consistency between consecutive frames in a video.

In actual research cases, Face-Sim is used to compare the performance of face generation models such as DaGAN, FOMM, and LIA. These models are evaluated as achieving high-quality face swapping when the Face-Sim score is 0.8 or higher.

Evaluation of Age Transformation

In age transformation technology (rejuvenation or aging simulation), Face-Sim is used from the following perspectives:

  1. Identity Preservation: Whether the original person's characteristics are maintained even after the age is changed.
  2. Evaluation of Naturalness: Whether the age transformation looks natural.

For example, a certain AI aging application judges that identity is appropriately preserved when the Face-Sim score is 0.7 or higher.

Consistency Evaluation of Video Generation

In AI video generation, facial consistency between consecutive frames is important. Using Face-Sim, evaluations such as the following are performed:

  1. Inter-frame Consistency: Facial similarity between consecutive frames.
  2. Consistency with Reference Image: Similarity between the original reference image and the face in each generated frame.

For example, a certain study employs the following evaluation method:

def evaluate_video_consistency(video_frames, reference_image):
    # Feature extraction from the reference image
    ref_embedding = extract_face_embedding(reference_image)
    
    # Consistency evaluation for each frame
    frame_scores = []
    for frame in video_frames:
        frame_embedding = extract_face_embedding(frame)
        score = calculate_face_sim(ref_embedding, frame_embedding)
        frame_scores.append(score)
    
    # Calculate average score and standard deviation (a measure of variability)
    avg_score = np.mean(frame_scores)
    std_score = np.std(frame_scores)
    
    return {
        "average_similarity": avg_score,
        "consistency": 1 - std_score  # Smaller standard deviation means higher consistency
    }

With this approach, consistency scores for videos can be evaluated in the range of 0 to 1.

Examples of Threshold Settings for Commercial Use

In actual commercial applications, threshold settings like the following are observed:

  • Identity Verification Systems: Judged as the person if 0.6 or higher (emphasizing security).
  • Face Swap Apps: Judged as high-quality similarity if 0.5 or higher (emphasizing user experience).
  • Video Generation: Targeting inter-frame consistency of 0.7 or higher (emphasizing high quality).

Research Case: "A Comprehensive Framework for Evaluating Deepfake Generators"

At ICCV (International Conference on Computer Vision) 2023, research proposing an evaluation framework for deepfake generation technology was presented. This study proposes a framework that combines multiple evaluation metrics, including Face-Sim (CSIM).

The evaluation results reported that the latest face generation models achieve scores around 0.85 in Face-Sim, showing a high correlation with human perceptual evaluation.

In this way, Face-Sim is widely utilized in the fields of face image and video generation, playing an important role in quantitatively evaluating the evolution of technology.

Comparison with Other Face Similarity Metrics

Comparison of Face Similarity Metrics

Face-Sim is not the only metric for evaluating face similarity. Let's examine the characteristics of Face-Sim while comparing it with other representative image evaluation metrics.

SSIM (Structural Similarity Index Measure)

SSIM is a metric used to evaluate the structural similarity of images:

  • Calculation Method: Calculates similarity based on three elements: luminance, contrast, and structure.
  • Features: Based on comparison at the pixel level.
  • Pros: Relatively simple to calculate and widely used.
  • Cons: Vulnerable to changes in pose and expression; unsuitable for facial identity evaluation.

SSIM differs from Face-Sim in the following ways:

  • While Face-Sim is an evaluation based on the essential features of a face, SSIM evaluates the visual similarity of the image appearance.
  • Face-Sim can accurately evaluate the same person even if the pose or expression changes, whereas SSIM is sensitive to changes in pixel positions.

LPIPS (Learned Perceptual Image Patch Similarity)

LPIPS is a perceptual similarity metric based on deep learning:

  • Calculation Method: Measures distance in the feature space of a pre-trained CNN (such as VGG).
  • Features: Similarity evaluation with a high correlation to human perception.
  • Pros: Excellent for evaluating textures and local features.
  • Cons: Not specialized for facial identity.

Main differences between LPIPS and Face-Sim:

  • While LPIPS evaluates the perceptual similarity of general images, Face-Sim is specialized for facial identity.
  • LPIPS is applicable to a wider range of image similarity tasks, but Face-Sim is more appropriate for evaluating facial identity preservation.

FID (Fréchet Inception Distance)

FID is a metric that evaluates the overall distribution of generated images:

  • Calculation Method: The Fréchet distance between the distributions of real and generated images in the feature space of InceptionNet.
  • Features: Used for evaluating an entire set of images.
  • Pros: Can evaluate the overall quality and diversity of a generative model.
  • Cons: Unsuitable for individual image pair evaluation; requires a large number of image samples.

Comparison between FID and Face-Sim:

  • While FID is used to evaluate an entire image set, Face-Sim is used to evaluate individual face image pairs.
  • FID is widely used for general image generation evaluation, but Face-Sim is more suitable for facial identity evaluation.

Guidelines for Selection

It is important to use these evaluation metrics selectively according to the purpose:

  1. Facial Identity Evaluation: Face-Sim (CSIM)
  2. Overall Image Structural Similarity: SSIM
  3. Similarity Evaluation Close to Human Perception: LPIPS
  4. Comprehensive Evaluation of Image Generative Models: FID

In actual research and application development, it is common to combine these metrics for a multi-faceted evaluation. For example:

  • Evaluation of face image generation: Face-Sim + LPIPS + FID
  • Evaluation of face video generation: Face-Sim + Temporal Consistency Metrics + FVD (Fréchet Video Distance)

In this way, Face-Sim maintains a complementary relationship with other evaluation metrics, playing a crucial role especially in evaluations regarding facial identity.

Limitations and Challenges of Face-Sim

While Face-Sim is a powerful face similarity evaluation metric, several limitations and challenges exist. It is important to understand these points when using it in practice.

1. Face Detection Dependency

Face-Sim is heavily dependent on the accuracy of face detection:

  • Small Faces or Low Resolution: Detection accuracy decreases when faces are small or the resolution is low.
  • Occlusions or Extreme Angles: Detection is difficult if part of the face is hidden or at an extreme angle.
  • Lighting Conditions: Extreme lighting conditions (too dark or too bright) affect detection accuracy.

For example, in cases like faces wearing masks or extreme angles (such as profiles), the face detection itself might fail, making it impossible to calculate Face-Sim.

2. Computational Cost and Implementation Complexity

Calculating Face-Sim requires relatively high computational resources:

  • Model Size: ArcFace models can exceed 100MB, resulting in high memory usage.
  • Computation Speed: While calculation is possible without a GPU, it becomes slow when processing a large number of images.
  • Implementation Complexity: Implementation requires expert knowledge for proper pre-processing and model selection.

Computational cost can be a challenge, especially in applications requiring real-time processing.

3. Support for Different Races and Ages

Depending on the training data of the ArcFace model, performance biases may occur for certain demographic groups:

  • Racial Bias: Biases in training data can lead to decreased accuracy for faces of specific races.
  • Age Impact: It is difficult to evaluate cases with large age differences (e.g., transformation from child to adult).
  • Gender Differences: There may be differences in how features are captured depending on gender.

While these issues have improved in the latest models through training on diverse datasets, they are not completely resolved.

4. Ignoring Background and Non-Facial Elements

Face-Sim focuses solely on facial features, meaning the following elements are not included in the evaluation:

  • Background: Similarity of the background is not evaluated.
  • Hairstyle: Changes in hairstyle are either not included or have minimal impact on the evaluation.
  • Accessories: Similarity of accessories such as glasses or hats is not considered.

While this is an advantage from the perspective of evaluating essential facial features, it is a limitation when you want to evaluate the similarity of the entire image.

5. Difficulty of Threshold Setting

The appropriate threshold for Face-Sim (the boundary value for determining the same person) varies by use case and can be difficult to set:

  • Security Use: Requires strict evaluation with a high threshold (0.7 or higher).
  • Entertainment Use: A moderate threshold (around 0.5) is preferred to prioritize user experience.
  • Research Use: An appropriate threshold must be set according to the target and purpose of the evaluation.

No standard threshold exists, and it must be determined through verification experiments for each application.

Case Study: Actual Edge Cases

It is known that Face-Sim performance decreases in the following situations:

  1. Distinguishing Twins: Distinguishing between very similar faces (such as twins) can be difficult.
  2. Extreme Makeup: When facial features are significantly altered by heavy makeup.
  3. Intentional Disguise: When features are intentionally changed through disguise or special effects makeup.
  4. Extreme Expressions: Highly exaggerated expressions may affect feature extraction.

It is important to utilize Face-Sim appropriately while understanding these limitations. In the next section, we will look at future directions and practical implications for Face-Sim.

Future Directions and Implications

Considering the current limitations of Face-Sim, let's look at future development directions and implications for practical use.

Technical Development Directions

Face-Sim technology is expected to evolve in the following directions:

1. Development of More Robust Models

  • Multimodal feature extraction: Evaluation by combining multiple features such as face shape, texture, and expression.
  • Utilization of self-supervised learning: Improving model generalization performance through training using unlabeled data.
  • Lightweight and high-precision models: Efficient model architectures that can operate even on edge devices.

For example, recent research has also proposed feature extraction models using Transformer-based architectures, which are becoming capable of capturing more complex facial features.

2. Improvement of Diversity and Fairness

  • More diverse datasets: Training with datasets encompassing various races, ages, and genders.
  • Introduction of fairness metrics: Evaluating performance fairness across different demographic groups.
  • Consideration of cultural context: Models that consider differences in face recognition and expression across cultures.

This direction is linked to the major challenge of AI fairness and is expected to become increasingly important in the future.

3. Integrated Evaluation Frameworks

  • Integration of multiple metrics: Comprehensive evaluation combining several metrics like Face-Sim, LPIPS, and FID.
  • Task-specific evaluation: Evaluation methods optimized for specific tasks such as face swapping, age transformation, and expression manipulation.
  • Improved alignment with human perception: Development of evaluation metrics that have a high correlation with human subjective evaluation.

In fact, several recent studies have proposed frameworks that combine multiple evaluation metrics.

Implications for Practical Application

Here is a summary of implications for effectively utilizing Face-Sim technology:

1. Selection of Appropriate Use Cases

Use cases where Face-Sim is particularly effective:

  • Face recognition systems: Spoofing detection and identity verification.
  • Evaluation of face generation models: Verification of facial identity preservation.
  • Facial consistency evaluation in videos: Verification of facial consistency within long-duration videos.

Use cases for which it is not suitable:

  • Evaluation of full-body images: Non-facial parts cannot be evaluated.
  • Image generation evaluation including backgrounds: Another metric is needed for background evaluation.
  • Lightweight apps requiring real-time processing: Computational cost constraints.

2. Optimization During Implementation

The following optimizations should be considered during practical implementation:

  • Model quantization: Optimization of model size and inference speed.
  • Batch processing: Improving efficiency through batch processing when handling large numbers of images.
  • Optimization of preprocessing pipelines: Efficient integration of face detection and feature extraction.
  • Threshold tuning: Setting appropriate thresholds based on the application.
# Model optimization example (using ONNX Runtime)
import onnxruntime as ort

# Acceleration settings
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
sess_options.intra_op_num_threads = 4  # Specify the number of threads to use

# Optimized session
session = ort.InferenceSession("arcface_model.onnx", sess_options)

The application of Face-Sim technology requires the following considerations:

  • Privacy considerations: Great care is needed in the handling of face data.
  • Obtaining consent: Explicit user consent regarding the use of face recognition technology.
  • Monitoring and mitigating bias: Ensuring fairness across different user groups.
  • Ensuring transparency: Appropriate information disclosure, including the limitations of the technology.

In parallel with the development and application of technology, sufficient consideration of these ethical and legal aspects is required.

4. Advice for Practitioners

Specific advice for practitioners utilizing Face-Sim technology:

  • Comparative verification of multiple models: Compare multiple implementations such as InsightFace, FaceNet, and DeepFace.
  • Appropriate data preprocessing: Optimization of face alignment and normalization methods.
  • Continuous monitoring: Continuously evaluate performance under different conditions.
  • Utilization of user feedback: Improvements incorporating actual user experiences.

By referring to these implications and effectively utilizing Face-Sim technology, you can contribute to improving the quality of face image and video generation applications.

Conclusion

In this article, we have explained the Face-Sim evaluation metric in detail. Face-Sim is a powerful tool for evaluating facial similarity in image and video generation, leveraging the latest face recognition technology to quantitatively measure how much facial identity is preserved.

Key Points of Face-Sim

  1. Definition and Principle: Face-Sim uses deep learning-based face recognition models (primarily ArcFace) to extract facial feature vectors and represents facial similarity as a value from 0 to 1 by calculating their cosine similarity.

  2. Technical Mechanism: Evaluation is performed through a flow of face detection and alignment → feature extraction (512-dimensional vector) → cosine similarity calculation.

  3. Implementation Method: It can be implemented relatively easily in Python using libraries such as InsightFace.

  4. Use Cases: It is utilized for evaluating various face generation tasks, such as face swapping, age transformation, and video generation consistency evaluation.

  5. Comparison with Other Metrics: A major characteristic is its specialization in facial identity evaluation compared to general image evaluation metrics like SSIM, LPIPS, and FID.

  6. Limitations and Challenges: While there are challenges such as dependency on face detection, computational cost, and handling diversity, continuous improvements are being made.

  7. Future Potential and Implications: Future developments are expected in areas such as developing more robust and diverse models and building integrated evaluation frameworks.

Summary Code: Simple Implementation Example

Finally, here is a simple summary of a basic implementation example of Face-Sim:

import cv2
import numpy as np
from insightface.app import FaceAnalysis

# Initialize model
app = FaceAnalysis(name='buffalo_l')
app.prepare(ctx_id=0)  # -1 for CPU, 0 or higher for GPU

def face_sim(img1_path, img2_path):
    # Load images
    img1 = cv2.imread(img1_path)
    img2 = cv2.imread(img2_path)
    
    # Convert to RGB
    img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2RGB)
    img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
    
    # Face detection and feature extraction
    faces1 = app.get(img1)
    faces2 = app.get(img2)
    
    if len(faces1) == 0 or len(faces2) == 0:
        return "Face not detected"
    
    # Get feature vectors
    embedding1 = faces1[0].embedding
    embedding2 = faces2[0].embedding
    
    # Calculate cosine similarity
    sim = np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2))
    
    return sim

# Usage example
similarity = face_sim("original.jpg", "generated.jpg")
print(f"Face-Sim Score: {similarity:.4f}")

By utilizing the Face-Sim evaluation metric, you can objectively evaluate the quality of face generation technology and use it for improvements. Especially in the latest image and video generation technologies that need to capture subtle features of human faces, Face-Sim has become an indispensable tool.

Using this article as a reference, please try utilizing Face-Sim in your own face generation projects! 🐰

Discussion