【Kaggle Method】ISIC2024 2nd solution explained
This article is an explanation of 2nd solution for Kaggle's ISIC2024 competition.
Original Solution Link:https://github.com/uchiyama33/isic-2024-2nd-place
Github Code:0. Highlights
・Chapter 2.2: Patient-wise standardization as a feature of GBDT
・Chapter 3.1: Selected models with low variance for stable prediction
・Chapter 3.1: Nine image models using five different training setups for diversity
・Chapter 3.2: Diagnosis was vectorized using tf-idf, followed by clustering via k-means. The model was trained to predict the distance from each data point to the cluster centroids as an auxiliary loss.
・Chapter 3.2: Self-supervised pre-training with tabular data from paper
・Chapter 3.2: AdamW was used with learning rates set to 1e-5 to 8e-6 for the backbone and 1e-3 for the head, alongside a warmup and cosine scheduler.
・Chapter 5: Patients are binned based on their number of images, which is used for stratification.
1. Overview
Model: Incorporating image model features into tabular data, followed by inference using multiple GBDTs.
They implemented several enhancements to the GBDT and image models both based on public notebooks.
2. GBDT
2.1 General
・Used LGBM, XGBoost, and CatBoost
・Created 18 variations of each algorithm, resulting in a total of 54 models
・Employed seed averaging (n=5) using models trained on the full dataset
2.2 Feature Engineering
From public notebooks:
・LGBM Baseline with New Features
・LightGBM CatBoost with New Features
・ISIC 2024 LGBM ImageNet v5a
They engineered several patient-related features to capture different aspects of the data:
・Patient-wise standardization
・Standardization by patient and tbp_lv_location
・Standardization by patient and tbp_lv_location_simple
・Standardization by patient and anatom_site_general
・Implemented the Tabular Ugly Ducklings technique (as described in this discussion in the competition forum)
・Varing number of Image model's prediction(0-3)
To introduce diversity, some models used only a subset of these features.
2.3 Hyperparams
・num_boost_round: 200-300
・Conducted separate hyperparameter tuning for different combinations of "Number of image features used" and "Number of patient features used"
3. Image Models
3.1 Overview
They created a total of nine image models using five different training setups for diversity. Specifically, they integrated auxiliary losses for predicting tabular data and implemented self-supervised learning to enhance accuracy. Additionally, by selecting models with low variance across folds, they aimed for stable performance.
3.2 Training Setups
There are 5 variations for diversity.
- Standard Training: Models were trained using basic configurations.
- Mixup Augmentation: Mixup was added as a data augmentation technique during training.
- Auxiliary Loss for Predicting Tabular Data: They introduced an auxiliary task for predicting tabular data to encourage learning from multiple modalities.
-
Auxiliary Loss for Predicting
iddx_full
Clusters:iddx_full
shows the fully classified lesion diagnosis(details of diagnosis symptoms).
iddx_full
was vectorized using tf-idf, followed by clustering via k-means. The model was trained to predict the distance from each data point to the cluster centroids as an auxiliary loss. - Self-Supervised Pre-training with Tabular Data: Following a recent multimodal learning paper [2], They conducted self-supervised pre-training with tabular data, then fine-tuned the image models.
Image models List
Common Training Configurations
・Undersampling: Each epoch applied undersampling at a ratio of 1:3 or 1:5.
・Epoch Count: Each model was trained for 50 to 200 epochs without early stopping to prevent overfitting to the validation set. (fully trained various models provide more diversity for stability?)
・Data Augmentation: Data augmentation strategies were adjusted based on the top solution from ISIC 2020 [3], with augmentation intensity varying depending on the model.
・Optimizer: AdamW was used with learning rates set to 1e-5 to 8e-6 for the backbone and 1e-3 for the head, alongside a warmup and cosine scheduler.
4. Inference
・Models were trained on the full dataset and used for inference.
・Automatic Mixed Precision was enabled for faster inference.
5. CV Strategy
They implemented a Triple Stratified Leak-Free KFold CV strategy, inspired by an approach used in a previous Kaggle competition. This method ensures robust model validation while preventing data leakage.
The key aspects of this CV strategy are:
- Patient Isolation: All images from a single patient are kept in the same fold, preventing leakage during cross-validation.
- Malignant Image Balance: The stratification considers the proportion of malignant images for each patient.
- Patient Image Count Distribution: Patients are binned based on their number of images, which is used for stratification.
For the original inspiration and more detailed explanation, refer to: SIIM-ISIC Melanoma Classification - Triple Stratified CV
6. What Didn't Work Out
・Use data from past competitions
Since there was no improvement in accuracy from integrating preprocessed past data and the classification task with past data easily achieved an AUC of 0.99, they determined that there were differences in the data and did not use it.
7. Code:
8. Summary
This time, I looked back at the 2nd solution of ISIC2024.
They used various great ideas and techniques. The winning solution gives us knowledge useful(More if you are taking part in a competition), a good way to look back at the solution.
Finally, please try to check and understand the highlights of what doing.
Thank you the great solution by 2nd solution team.
References
[1] 2nd Place Solution
[2] Du, Siyi, Zheng, Shaoming, Wang, Yinsong, Bai, Wenjia, O'Regan, Declan P., and Qin, Chen. "TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data." In 18th European Conference on Computer Vision (ECCV 2024).
[3] https://www.kaggle.com/competitions/siim-isic-melanoma-classification/discussion/175412
Discussion