【Kaggle Method】ISIC2024 9th solution explained
This time, I'll explain the 9th solution of Kaggle's ISIC2024 competition.
0. Highlights
・Chapter 2: Easy and strong strategy, validation by multiple seeds CV and LB.
・Chapter 4.1:
For diversity, he chose a different backbone.
・Two efficientnet_b0 with image size 256. LB: 15.5 and 15.8 with TTA 16.1
・swinv2_tiny_window8_256 with image size 256. LB: 15.9
・convnextv2_tiny.fcmae_ft_in22k_in1k with image size 224. LB: 15.8
・Chapter 4.1: Augmentation similar to the previous competition's winner was used. (many of the winners of this competition, also used augmentation similar to the previous competition.)
・Chapter 4.2: Model ensemble weights from hill-climbing algorithm from here.
1. Overview
He builds 2 (+1 from public) different tabular pipelines with various features, data, and model combinations paired with predictions from various image model backbones.
Then he blended the results of these pipelines and made his final predictions. Weights for the different pipelines found via the hill-climbing algorithm.
2. CV Strategy
・5-fold StratifiedGroupKfold for each of all experiments
For each feature he added, he tried them first with 5 different seed combinations. If they improved the score, he submitted it to the LB. And if they improved the score there as well, he kept that feature.
3. Tabular Models
-
Pipeline 1
This is his original script where he tried to extract as many contextual features as possible from the patient data.
He extracted the Z-score, range of the features, skewness, kurtosis, feature/max (feature), and mean of efficientnet predictions as an image feature.
He subsampled the negatives with a ratio of 0.02.
One LGBM and one Catboost here.
This pipeline gave 18.3 on its own in the public LB with a CV of 17.7. -
Pipeline 2
With his code, I was swinging between 10th to 20th place in the LB until @greysky 's notebook came out. Quite luckily, we were using the same CV setup. I wanted to try how the blend would perform, so I added a couple of my features to that notebook such as
.with_columns(
n_images_per_location = pl.col("isic_id").count().over(["patient_id", "tbp_lv_location_simple"])
)
and his swin-transformer model's predictions as an image feature. On its own, this pipeline gave 17.5 CV and 18.3 LB. The blend with pipeline 1 gave his first significant boost of 18.6.
- Pipeline 3
For more diversity, He created rank, and different groupby features on top of the previous features and used a different subset of the data (0.05 negative sampling ratio). This pipeline gave a 17.5 CV and 18.0 LB on its own.
4.1 Image models
For diversity, he chose a different backbone.
・Two efficientnet_b0 with image size 256. LB: 15.5 and 15.8 with TTA 16.1
・swinv2_tiny_window8_256 with image size 256. LB: 15.9
・convnextv2_tiny.fcmae_ft_in22k_in1k with image size 224. LB: 15.8
It seems the little models were performing well.
After many experiments, the consistent configurations are here:
・Use 5% of negative data, and upsampled the positives by 10.
・3epoch. more epochs make the result worse.
・lr: 1e-4 for efficient-net and swin-tranformer
・lr: below scheduler for convnext
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
・optimizer: Adam
・loss function: BCE.
・augmentation: similar to the previous competition's winner
train_transforms = A.Compose([
A.VerticalFlip(p=0.5),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(brightness_limit=0.2,contrast_limit=0.2, p=0.75),
A.OneOf([
A.MotionBlur(blur_limit=5),
A.MedianBlur(blur_limit=5),
A.GaussianBlur(blur_limit=5),
A.GaussNoise(var_limit=(5.0, 30.0)),
], p=0.7),
A.OneOf([
A.OpticalDistortion(distort_limit=1.0),
A.GridDistortion(num_steps=5, distort_limit=1.),
A.ElasticTransform(alpha=3),
], p=0.7),
A.CLAHE(clip_limit=4.0, p=0.7),
A.HueSaturationValue(hue_shift_limit=10, sat_shift_limit=20, val_shift_limit=10, p=0.5),
A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=15, border_mode=0, p=0.85),
A.CoarseDropout(p=0.7),
A.Resize(CFG.img_size, CFG.img_size),
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
ToTensorV2()
])
valid_transforms = A.Compose([
A.Resize(CFG.img_size, CFG.img_size),
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
ToTensorV2()
])
Something funny here at least for the people with a vision background, his image results were stuck at some point at 14.8, and whatever he tried did not work. But moved A.Resize()
to the end just before the normalization improved his scores to the 15.4-15.5 range.
4.2 Final Blend
To find the weights of the final models I used the hill-climbing algorithm. By the implementation here by @cdeotte.
model weight
0 pipe3_pred_lgb 0.407660
1 pipe3_pred_xgb 0.272537
2 pipe1_pred_lgb 0.152162
3 pipe1_pred_cat 0.142441
4 pipe_2_pred 0.124199
5 pipe3_pred_cat -0.099000
The CV of this setup was 18.2 and the LB was 18.7 which was his best CV and LB at the same time.
5. Things did not work
・Many many features
・For image models, he tried to use hard negatives where the models were making the highest errors.
・Focal loss performed always worse than BCE. The setup was difficult.
・Mixup was a nice addition for the diversity but performed usually poorly when added to the ensemble.
・Stacking additional ExtraTreeClassifier/LogisticRegression model on top of the solution
・Scaling predictions with ** 0.5 or rank ensemble.
・Fixing the tbv_lv_y. There were 5 patients in the dataset and each of them was from the same hospital with negative tbv_lv_y value. He added the min per patient to fix it. The CV was slightly better but there was no improvement on the LB.
・Dullrazor algorithm for hair removal
・Hair augmentations
・Sample weights based on lesion id
・Training image models from scratch with dataset mean&std instead of imagenet init
6. Summary
There were so many things to learn. Looking back on the competition's solutions is a little bit hard, but it is worth spending time on, please try it if you are interested in.
Discussion