🔥

【Kaggle】ISIC2024 240th Solution

2024/09/07に公開

This time, I explain my 240th solution for the Kaggle ISIC2024 competition.
I got a bronze medal in this cometition.

1. Overview

Source code: https://github.com/yuto-m12/Kaggle_ISIC2024

・Ensemble of table and image model in a 1:1 ratio
・Using public notebook as the tabler model
・Using VIT as the image model

df_subm['target'] = ((df_table['target'].to_numpy() * 0.5) + \
                     (df_vit_sub["target"].to_numpy() * 0.5))

2. Image model

I'll explain the image model from here.

2.1 Configuration

model: "maxvit_rmlp_pico_rw_256.sw_in1k"
batch_size: 128
max_epoch: 9
n_folds: 5
optimizer: optim.AdamW
scheduler: OneCycleLR
lr: 1.0e-04
weight_decay: 1.0e-02
img_size: 256
interpolation: cv2.INTER_LINEAR
CV: StratifiedGroupKFold with "patient_id"

2.2 Model

self.model = timm.create_model(
                model_name=model_name, 
                pretrained=pretrained, 
                in_chans=in_channels,
                num_classes=num_classes,
                global_pool=''
            )
        dim = CFG.output_dim_models[CFG.model_name]
        self.dropout = nn.ModuleList([
            nn.Dropout(0.5) for i in range(5)
        ])
        self.target=DynamicLinear(out_size=1)

2.3 What worked well

2.3.1 2-stage learning

I used 2-stage learning, this was the most effective method used.
・First stage
Used all of the past ISIC data from this great dataset by @tomooinubushi.
・Second stage
Used only ISIC2024 dataset.

2.3.2 Augmentations

I searched the seems good augmentations from past ISIC2020 competitions and from my experiments.
I thought the various augmentations to prevent overfitting were better because there are f few positive data(only 0.098% positive).

    augmentations_train = A.Compose([
    A.Transpose(p=0.5),
    A.VerticalFlip(p=0.5),
    A.HorizontalFlip(p=0.5),
    A.ColorJitter(brightness=0.2, contrast=0.2, p=0.3),
    A.OneOf([
            A.MotionBlur(blur_limit=5, p=0.5),
            A.MedianBlur(blur_limit=5, p=0.5),
            A.GaussianBlur(blur_limit=5, p=0.5),
        ], p=0.2),
    A.GaussNoise(var_limit=(5.0, 30.0), p=0.1),
    A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=15, border_mode=0, p=0.3),
    A.CoarseDropout(max_holes=20, min_holes=10, p=0.3),
    A.Resize(CFG.img_size, CFG.img_size),
    ToTensorV2(p=1)
    ])

2.3.2 Standardization

I used standardization after augmentation instead of A.Normalize. It boosted the CV score a lot.

# A.Normalize(
        #                 mean=[0.485, 0.456, 0.406], 
        #                 std=[0.229, 0.224, 0.225], 
        #                 max_pixel_value=255.0, 
        #                 p=1.0
        #             ),

x, t = batch
if CFG.standardization:
    x = (x - x.min()) / (x.max() - x.min() +1e-6) * 255

2.3.3 HSV input

I used both RGB and HSV with input dimension 6.

def F_rgb2hsv(rgb: torch.Tensor) -> torch.Tensor:
    cmax, cmax_idx = torch.max(rgb, dim=1, keepdim=True)
    cmin = torch.min(rgb, dim=1, keepdim=True)[0]
    delta = cmax - cmin
    hsv_h = torch.empty_like(rgb[:, 0:1, :, :])
    cmax_idx[delta == 0] = 3
    hsv_h[cmax_idx == 0] = (((rgb[:, 1:2] - rgb[:, 2:3]) / delta) % 6)[cmax_idx == 0]
    hsv_h[cmax_idx == 1] = (((rgb[:, 2:3] - rgb[:, 0:1]) / delta) + 2)[cmax_idx == 1]
    hsv_h[cmax_idx == 2] = (((rgb[:, 0:1] - rgb[:, 1:2]) / delta) + 4)[cmax_idx == 2]
    hsv_h[cmax_idx == 3] = 0.
    hsv_h /= 6.
    hsv_s = torch.where(cmax == 0, torch.tensor(0.).type_as(rgb), delta / cmax)
    hsv_v = cmax
    return torch.cat([hsv_h, hsv_s, hsv_v], dim=1)

2.3.4 Changing Dataset

Towards the end of the learning process (epochs after 60%), I trained on data without applying data augmentation.
Because of the risk of overfitting, I may not have needed to use this method. For competitions with plenty of positive data, changing the dataset to a cleaner one midway should work like fine-grained 2-stage learning.

if CFG.change_dataset & epoch >= ((CFG.max_epoch+1) * 0.6):
    train_loader = train_loader_noaugment

2.3.5 TTA(test time augmentation)

TTA doesn't boost my cv and lb(it was almost the same as before and after applying) but I used it to improve generalization performance of the model, and it worked well on the private score!
Simply, I mixed the normal output with the output obtained when applying training-time data augmentation to the inference-time dataset.

TTA_rate = {'None':0.7, 'with_train_aug':0.3}

# get_dataloader
val_transform_TTA, val_transform = get_transforms()
val_dataset = ISICDataset(df=train_meta[train_meta["fold"] == fold_id], fp_hdf=CFG.TRAIN_HDF5_COMBINED, transform=val_transform)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=CFG.batch_size, num_workers=4, shuffle=False, drop_last=False)
if CFG.TTA:
    val_dataset_TTA = ISICDataset(df=train_meta[train_meta["fold"] == fold_id], fp_hdf=CFG.TRAIN_HDF5_COMBINED, transform=val_transform_TTA)
    val_loader_TTA = torch.utils.data.DataLoader(val_dataset_TTA, batch_size=CFG.batch_size, num_workers=4, shuffle=False, drop_last=False)

# prediction
oof_pred_arr_merged = (oof_pred_arr * CFG.TTA_rate['None']) + (oof_pred_arr_TTA * CFG.TTA_rate['with_train_aug'])

2.3.6 Removal of anomalous data

From this discussion.

ids_to_drop = ['ISIC_0573025', 'ISIC_1443812', 'ISIC_5374420', 'ISIC_2611119', 'ISIC_2691718', 'ISIC_9689783', 'ISIC_9520696', 'ISIC_8651165', 'ISIC_9385142', 'ISIC_9680590', 'ISIC_2346081']

3. Environment

I was using my RTX4070 ti super the whole time in the competition.
When close to the end, I tried to use vast.ai and runpod.io, both are good, vast.ai is low-cost, and runpod.ai is scalable and easy to use.

4. What didn't work for me

・Oversampling
・Auxiliary-Loss

5. Summary

I used VIT model and tabler model of the elements of the ensemble.
And also took great care to avoid overlearning throughout the competition.

Finally, well done to all the participants!

Discussion