
【Kaggle】ISIC2024 240th Solution


This time, I explain my 240th solution for the Kaggle ISIC2024 competition.
I got a bronze medal in this cometition.

1. Overview

Source code: https://github.com/yuto-m12/Kaggle_ISIC2024

・Ensemble of table and image model in a 1:1 ratio
・Using public notebook as the tabler model
・Using VIT as the image model

df_subm['target'] = ((df_table['target'].to_numpy() * 0.5) + \
                     (df_vit_sub["target"].to_numpy() * 0.5))

2. Image model

I'll explain the image model from here.

2.1 Configuration

model: "maxvit_rmlp_pico_rw_256.sw_in1k"
batch_size: 128
max_epoch: 9
n_folds: 5
optimizer: optim.AdamW
scheduler: OneCycleLR
lr: 1.0e-04
weight_decay: 1.0e-02
img_size: 256
interpolation: cv2.INTER_LINEAR
CV: StratifiedGroupKFold with "patient_id"

2.2 Model

self.model = timm.create_model(
        dim = CFG.output_dim_models[CFG.model_name]
        self.dropout = nn.ModuleList([
            nn.Dropout(0.5) for i in range(5)

2.3 What worked well

2.3.1 2-stage learning

I used 2-stage learning, this was the most effective method used.
・First stage
Used all of the past ISIC data from this great dataset by @tomooinubushi.
・Second stage
Used only ISIC2024 dataset.

2.3.2 Augmentations

I searched the seems good augmentations from past ISIC2020 competitions and from my experiments.
I thought the various augmentations to prevent overfitting were better because there are f few positive data(only 0.098% positive).

    augmentations_train = A.Compose([
    A.ColorJitter(brightness=0.2, contrast=0.2, p=0.3),
            A.MotionBlur(blur_limit=5, p=0.5),
            A.MedianBlur(blur_limit=5, p=0.5),
            A.GaussianBlur(blur_limit=5, p=0.5),
        ], p=0.2),
    A.GaussNoise(var_limit=(5.0, 30.0), p=0.1),
    A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=15, border_mode=0, p=0.3),
    A.CoarseDropout(max_holes=20, min_holes=10, p=0.3),
    A.Resize(CFG.img_size, CFG.img_size),

2.3.2 Standardization

I used standardization after augmentation instead of A.Normalize. It boosted the CV score a lot.

# A.Normalize(
        #                 mean=[0.485, 0.456, 0.406], 
        #                 std=[0.229, 0.224, 0.225], 
        #                 max_pixel_value=255.0, 
        #                 p=1.0
        #             ),

x, t = batch
if CFG.standardization:
    x = (x - x.min()) / (x.max() - x.min() +1e-6) * 255

2.3.3 HSV input

I used both RGB and HSV with input dimension 6.

def F_rgb2hsv(rgb: torch.Tensor) -> torch.Tensor:
    cmax, cmax_idx = torch.max(rgb, dim=1, keepdim=True)
    cmin = torch.min(rgb, dim=1, keepdim=True)[0]
    delta = cmax - cmin
    hsv_h = torch.empty_like(rgb[:, 0:1, :, :])
    cmax_idx[delta == 0] = 3
    hsv_h[cmax_idx == 0] = (((rgb[:, 1:2] - rgb[:, 2:3]) / delta) % 6)[cmax_idx == 0]
    hsv_h[cmax_idx == 1] = (((rgb[:, 2:3] - rgb[:, 0:1]) / delta) + 2)[cmax_idx == 1]
    hsv_h[cmax_idx == 2] = (((rgb[:, 0:1] - rgb[:, 1:2]) / delta) + 4)[cmax_idx == 2]
    hsv_h[cmax_idx == 3] = 0.
    hsv_h /= 6.
    hsv_s = torch.where(cmax == 0, torch.tensor(0.).type_as(rgb), delta / cmax)
    hsv_v = cmax
    return torch.cat([hsv_h, hsv_s, hsv_v], dim=1)

2.3.4 Changing Dataset

Towards the end of the learning process (epochs after 60%), I trained on data without applying data augmentation.
Because of the risk of overfitting, I may not have needed to use this method. For competitions with plenty of positive data, changing the dataset to a cleaner one midway should work like fine-grained 2-stage learning.

if CFG.change_dataset & epoch >= ((CFG.max_epoch+1) * 0.6):
    train_loader = train_loader_noaugment

2.3.5 TTA(test time augmentation)

TTA doesn't boost my cv and lb(it was almost the same as before and after applying) but I used it to improve generalization performance of the model, and it worked well on the private score!
Simply, I mixed the normal output with the output obtained when applying training-time data augmentation to the inference-time dataset.

TTA_rate = {'None':0.7, 'with_train_aug':0.3}

# get_dataloader
val_transform_TTA, val_transform = get_transforms()
val_dataset = ISICDataset(df=train_meta[train_meta["fold"] == fold_id], fp_hdf=CFG.TRAIN_HDF5_COMBINED, transform=val_transform)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=CFG.batch_size, num_workers=4, shuffle=False, drop_last=False)
    val_dataset_TTA = ISICDataset(df=train_meta[train_meta["fold"] == fold_id], fp_hdf=CFG.TRAIN_HDF5_COMBINED, transform=val_transform_TTA)
    val_loader_TTA = torch.utils.data.DataLoader(val_dataset_TTA, batch_size=CFG.batch_size, num_workers=4, shuffle=False, drop_last=False)

# prediction
oof_pred_arr_merged = (oof_pred_arr * CFG.TTA_rate['None']) + (oof_pred_arr_TTA * CFG.TTA_rate['with_train_aug'])

2.3.6 Removal of anomalous data

From this discussion.

ids_to_drop = ['ISIC_0573025', 'ISIC_1443812', 'ISIC_5374420', 'ISIC_2611119', 'ISIC_2691718', 'ISIC_9689783', 'ISIC_9520696', 'ISIC_8651165', 'ISIC_9385142', 'ISIC_9680590', 'ISIC_2346081']

3. Environment

I was using my RTX4070 ti super the whole time in the competition.
When close to the end, I tried to use vast.ai and runpod.io, both are good, vast.ai is low-cost, and runpod.ai is scalable and easy to use.

4. What didn't work for me


5. Summary

I used VIT model and tabler model of the elements of the ensemble.
And also took great care to avoid overlearning throughout the competition.

Finally, well done to all the participants!
