iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🚀

Solving Sample Problems from the International Olympiad in Artificial Intelligence (IOAI)

に公開

As the 1st International Olympiad in Artificial Intelligence (IOAI) is set to take place this year, I've decided to try solving some of the sample problems provided on the official website.

About IOAI

This is one of the Science Olympiads starting this year. It appears that it will be held in Burgas, Bulgaria, this year.

https://ioai-official.org/

The contest consists of two main parts: the Scientific Round and the Practical Round. In the Scientific Round, contestants solve problems provided as ipynb notebooks concerning machine learning and deep learning. In the Practical Round, contestants examine scientific problems using GUI applications, including ChatGPT.

It seems the former, the Scientific Round, is where Kaggle-like coding elements are required.

While Kaggle is the dominant name in competitive AI, I was curious to see what kind of questions IOAI would ask.

The official website listed three questions:

  1. NLP Task (Training a language model, re-implementing a paper)
  2. NLP Task (Removing bias from word embeddings)
  3. Image Task (Adversarial Attack) <- I'll be doing this one

The problem I'm working on can be found at the following Google Colab link. (There don't seem to be any official solutions.)

https://colab.research.google.com/drive/1yFzMkHsmnLPVPilrJo9oF5usEfGeXRii?usp=sharing

Task 1: Creating a CNN Model

Requirements:

  • Load the CIFAR-10 dataset.
  • Train a ResNet-18 model using the CIFAR-10 training set.
  • Evaluate the model's performance on the CIFAR-10 evaluation set.
  • Achieve an accuracy of over 80% on the evaluation set.
  • Do not modify the model architecture.

Only this code snippet is provided initially:

from torchvision.models import resnet18
net = resnet18(num_classes=10).cuda()

In a preliminary experiment, I tried training without data augmentation or a scheduler, but the performance was below 70%, so it seems I'll need to put in some proper effort.

I struggled for about 3 hours, but it surprisingly wouldn't cross the 80% mark. Why?

I eventually trained under the following conditions:

  • Data Augmentation
    • Crop, Flip, Rotation, Affine Transformation, Color Jitter, Normalization
  • Adam (lr: 1e-1) + OneCycleLR Scheduler
  • Batch Size: 512, Epochs: 100
  • NLLLoss

I finally passed 80%! (๑•̀ㅂ•́)و✧

Log
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 98/98 [00:19<00:00,  5.16it/s]
Epoch 100, Loss: 0.36403972488276815

Accuracy on the test set: 82.61%
Code
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
from torch.functional import F
from tqdm import tqdm

stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))

# Define transformations for the CIFAR-10 dataset
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),

    transforms.RandomHorizontalFlip(), # FLips the image w.r.t horizontal axis
    transforms.RandomRotation((-7,7)),     #Rotates the image to a specified angel
    transforms.RandomAffine(0, shear=10, scale=(0.8,1.2)), #Performs actions like zooms, change shear angles.
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2), # Set the color params
    transforms.ToTensor(),
    transforms.Normalize(*stats)
])
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(*stats)
])

# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=512, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=512, shuffle=False, num_workers=4)
net = resnet18(num_classes=10).cuda()
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=1e-1)
epochs = 100
sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, 1e-1, epochs=epochs,
                                                steps_per_epoch=len(train_loader))

def train_model(model, sched, train_loader, criterion, optimizer, num_epochs=10):
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for images, labels in tqdm(train_loader):
            images, labels = images.cuda(), labels.cuda()
            outputs = model(images)
            loss = F.nll_loss(F.log_softmax(outputs), labels)

            loss.backward()

            optimizer.step()

            optimizer.zero_grad()

            running_loss += loss.item()
        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')
        sched.step()

        evaluate_model(model, test_loader)

def evaluate_model(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.cuda(), labels.cuda()
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    print(f'Accuracy on the test set: {100 * correct / total}%')

# Train and evaluate the model
evaluate_model(net, test_loader)
train_model(net, sched, train_loader, criterion, optimizer, num_epochs=epochs)

Task 2: Creating Adversarial Examples

I will set \epsilon to [0.25, 1.0, 1.5] and submit the accuracy of the evaluation set when Adversarial Examples are created. (The l2 distance between the original image and the Adversarial Example is defined as \epsilon.)
The lower the accuracy, the better.

Reading the literature linked in the Colab, I found that I should try the Fast Gradient Sign Method (FGSM). FGSM is the simplest method for Adversarial Attacks, where noise is added to the sample in the direction of the gradient ascent.

https://adversarial-ml-tutorial.org/adversarial_examples/

When I applied the introduced FGSM (eps=0.1) as is, almost all samples began to be misclassified.

Before

After

The results were as follows. It's amazing that the results are worse than random selection.

Actually, was it okay to apply it to normalized data? I'm starting to feel uneasy about whether the way I'm taking epsilon is correct. It might be wrong... I'm sorry...

=== eps=0.25 ===
Accuracy on the test set: 7.06%
=== eps=1.0 ===
Accuracy on the test set: 6.33%
=== eps=1.5 ===
Accuracy on the test set: 7.02%
Code
def fgsm(model, X, y, epsilon):
    """ Construct FGSM adversarial examples on the examples X"""
    delta = torch.zeros_like(X, requires_grad=True)
    loss = nn.CrossEntropyLoss()(model(X + delta), y)
    loss.backward()
    return epsilon * delta.grad.detach().sign()

# Function to evaluate the model
def evaluate_adversarial_model(model, test_loader, eps=0.01):
    model.eval()
    correct = 0
    total = 0
    # with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.cuda(), labels.cuda()
        # print(images.shape)
        delta = fgsm(net, images, labels, eps)

        outputs = model(images + delta)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    print(f'=== {eps=} ===')
    print(f'Accuracy on the test set: {100 * correct / total}%')

evaluate_adversarial_model(net, test_loader, eps=0.25)
evaluate_adversarial_model(net, test_loader, eps=1.0)
evaluate_adversarial_model(net, test_loader, eps=1.5)

Task 3: Creating Advanced Adversarial Examples

This task involves applying a variation of the traditional method that adds uniform noise, where small noise is applied to the center of the image and larger noise is applied to the outer edges.

Requirements

  • \epsilon = \frac{2}{225} (Center 16x16 pixels of the image)
  • \epsilon = \frac{8}{255} (Others)
  • \epsilon is defined by the l\infty distance.

I will create a coefficient mask using a matrix and multiply it by the sign.

epsilon = 8/255
epsilon_mask = torch.ones((3, 32, 32)) * epsilon
epsilon_mask[:, 8:24, 8:24] /= 4
plt.imshow(epsilon_mask[0])

The accuracy dropped from 82% to 61%.

Accuracy on the test set: 61.38%
Code
# Function to evaluate the model
def evaluate_adversarial_custom_model(model, test_loader, eps_mask):
    model.eval()
    correct = 0
    total = 0
    # with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.cuda(), labels.cuda()
        # print(images.shape)
        delta = fgsm(net, images, labels, 1) * eps_mask

        outputs = model(images + delta)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    # print(f'=== {eps=} ===')
    print(f'Accuracy on the test set: {100 * correct / total}%')

evaluate_adversarial_custom_model(net, test_loader, eps_mask=epsilon_mask.cuda())

Task 4: Attack on a Custom Model

WIP

Conclusion

If you have any questions, please feel free to contact me. I am not at all confident about the way I handled epsilon.

Also, it seems that IOAI is accepting applications until April 17th.

https://ioai-japan.notion.site/1-IOAI2024-7ee01907740141dcbcad2e3bc70b7210

Discussion