【XAI Method】About Grad-CAM
1. what is CAM?
Class Activation Mapping (CAM) methods are techniques used in computer vision to visualize which parts of an image are important for a convolutional neural network's (CNN) prediction.
The normal CAM need model has Global Average Pooling(GAP) in its architecture, this condition causes this method not to be used so popular.
2. Grad-CAM
Grad-CAM is a popular CNN visualization method that solved the above problem.
Grad-CAM uses a feature map that is weighted by the global average pooled gradient as a heat-map. This method uses gradient as weight because this is based on the logic that the important part of the image also gets bigger partial differential values(Recent research has shown that this is not necessarily the case.).
3. Implementation
・Apply Grad-CAM
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import cv2
# Load a pre-trained model
model = models.resnet50(pretrained=True)
model.eval()
# Define the image preprocessing function
preprocess = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
# Load and preprocess the input image
def load_image(image_path):
img = Image.open(image_path)
img = preprocess(img)
img = img.unsqueeze(0) # Add batch dimension
return img
# Grad-CAM implementation
class GradCAM:
def __init__(self, model, target_layer):
self.model = model
self.target_layer = target_layer
self.gradients = None
self.activations = None
self.hook_layers()
def hook_layers(self):
def backward_hook(module, grad_in, grad_out):
self.gradients = grad_out[0]
def forward_hook(module, input, output):
self.activations = output
self.target_layer.register_forward_hook(forward_hook)
self.target_layer.register_backward_hook(backward_hook)
def generate_cam(self, input_image, target_class=None):
output = self.model(input_image)
if target_class is None:
target_class = output.argmax().item()
self.model.zero_grad()
output[:, target_class].backward()
gradients = self.gradients[0].detach().cpu().numpy()
activations = self.activations[0].detach().cpu().numpy()
weights = np.mean(gradients, axis=(1, 2))
cam = np.zeros(activations.shape[1:], dtype=np.float32)
for i, w in enumerate(weights):
cam += w * activations[i]
cam = np.maximum(cam, 0)
cam = cv2.resize(cam, (224, 224))
cam = cam - np.min(cam)
cam = cam / np.max(cam)
return cam
# Example usage
if __name__ == "__main__":
image_path = "/kaggle/input/a-simple-dog/dog.png"
input_image = load_image(image_path)
target_layer = model.layer4[2].conv3
grad_cam = GradCAM(model, target_layer)
cam_mask = grad_cam.generate_cam(input_image)
# Load the original image for visualization
img = cv2.imread(image_path, 1)
img = cv2.resize(img, (224, 224))
img = np.float32(img) / 255
# Display the image with the color bar
plt.imshow(img)
plt.axis('off')
# Add color bar
plt.imshow(cam_mask, cmap='jet', alpha=0.5) # This is to display the color map only for the color bar
plt.colorbar()
plt.show()
・Output
It seems the resnet could catch the feature of an input image(dog).
Methods like this, are helpful for understanding and interpreting the model behavior.
Other CAM
There are various other CAM methods besides Grad-CAM. I will explain the details in another article.
- ScoreCAM
- Eigen-CAM
- Ablation-CAM
- Layer-CAM
Class activation mapping can also be used to improve models, not just provide a basis for judgment. Other papers have proposed methods to use saliency maps to pad data.
When using class activation mapping to improve a model, Eigen-CAM and Ablation CAM are used. Eigen-CAM is used to check whether the backbone feature map is a good feature, and Ablation CAM is used to check whether the fully connected layer is learning well.
Quote(translation): [1]
In this way, it is possible to improve a model by combining various methods.
Discussion