🐣

【XAI Method】About Grad-CAM

2024/07/22に公開

1. what is CAM?

Class Activation Mapping (CAM) methods are techniques used in computer vision to visualize which parts of an image are important for a convolutional neural network's (CNN) prediction.

The normal CAM need model has Global Average Pooling(GAP) in its architecture, this condition causes this method not to be used so popular.

2. Grad-CAM

Grad-CAM is a popular CNN visualization method that solved the above problem.

Grad-CAM uses a feature map that is weighted by the global average pooled gradient as a heat-map. This method uses gradient as weight because this is based on the logic that the important part of the image also gets bigger partial differential values(Recent research has shown that this is not necessarily the case.).

3. Implementation

・Apply Grad-CAM

import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import cv2

# Load a pre-trained model
model = models.resnet50(pretrained=True)
model.eval()

# Define the image preprocessing function
preprocess = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Load and preprocess the input image
def load_image(image_path):
    img = Image.open(image_path)
    img = preprocess(img)
    img = img.unsqueeze(0)  # Add batch dimension
    return img

# Grad-CAM implementation
class GradCAM:
    def __init__(self, model, target_layer):
        self.model = model
        self.target_layer = target_layer
        self.gradients = None
        self.activations = None
        
        self.hook_layers()
    
    def hook_layers(self):
        def backward_hook(module, grad_in, grad_out):
            self.gradients = grad_out[0]
        
        def forward_hook(module, input, output):
            self.activations = output
        
        self.target_layer.register_forward_hook(forward_hook)
        self.target_layer.register_backward_hook(backward_hook)
    
    def generate_cam(self, input_image, target_class=None):
        output = self.model(input_image)
        
        if target_class is None:
            target_class = output.argmax().item()
        
        self.model.zero_grad()
        output[:, target_class].backward()
        
        gradients = self.gradients[0].detach().cpu().numpy()
        activations = self.activations[0].detach().cpu().numpy()
        
        weights = np.mean(gradients, axis=(1, 2))
        
        cam = np.zeros(activations.shape[1:], dtype=np.float32)
        for i, w in enumerate(weights):
            cam += w * activations[i]
        
        cam = np.maximum(cam, 0)
        cam = cv2.resize(cam, (224, 224))
        cam = cam - np.min(cam)
        cam = cam / np.max(cam)
        
        return cam

# Example usage
if __name__ == "__main__":
    image_path = "/kaggle/input/a-simple-dog/dog.png"
    input_image = load_image(image_path)
    
    target_layer = model.layer4[2].conv3
    grad_cam = GradCAM(model, target_layer)
    cam_mask = grad_cam.generate_cam(input_image)
    
    # Load the original image for visualization
    img = cv2.imread(image_path, 1)
    img = cv2.resize(img, (224, 224))
    img = np.float32(img) / 255
    
    # Display the image with the color bar
    plt.imshow(img)
    plt.axis('off')
    
    # Add color bar
    plt.imshow(cam_mask, cmap='jet', alpha=0.5)  # This is to display the color map only for the color bar
    plt.colorbar()
    
    plt.show()

・Output

It seems the resnet could catch the feature of an input image(dog).
Methods like this, are helpful for understanding and interpreting the model behavior.

Other CAM

There are various other CAM methods besides Grad-CAM. I will explain the details in another article.

  • ScoreCAM
  • Eigen-CAM
  • Ablation-CAM
  • Layer-CAM

Class activation mapping can also be used to improve models, not just provide a basis for judgment. Other papers have proposed methods to use saliency maps to pad data.

When using class activation mapping to improve a model, Eigen-CAM and Ablation CAM are used. Eigen-CAM is used to check whether the backbone feature map is a good feature, and Ablation CAM is used to check whether the fully connected layer is learning well.
Quote(translation): [1]

In this way, it is possible to improve a model by combining various methods.

Reference

[1] Grad-CAMだけじゃない画像認識におけるCAM手法を徹底解説

Discussion