【Kaggle】BELKA NeurIPS2024 Look back
This is my look back on this competition.
0. Kaggle BELKA NeurIPS 2024
Result: 508/1982.
The objective of this competition is to predict whether a molecule binds to a specified protein or not.
More detailed description is here.
1. What did I do
1.1 Reading discussion and codes
Summarize to here.
1.2 Try various method
Many different solutions were proposed in this competition.
Like:
・XGB
・Auto ML
・1DCNN
・ChemBERTa
・Mamba
・GNN
I tried all, and extract methods that only got good results.
3. My solution
I used Mamba model and ensemble it to CNN models.
But the final LB result is not good.
2. What is bad
・Slow model creating speed. I can't try all of what I want(resnet, cnn, augmentation, utilize building block, etc.).
・No reliable CV index. I evaluated all models by only LB(LB was so shaken).
・Not enough domain knowledge. I should read more descriptions and related discussions in the chemical field.
3. What is good
・Summarizing competition's data. It is very useful to me at the close end of the competition.
・Read and upvote almost discussion and code. There is so much information and knowledge.
・Create own model. It makes me better and has a lot to learn. However, improving the public model is also important to get a medal and understand the model.
4. What should do the next time
・Read all discussions until understanding.
・Create own model & Improve public model
・Get reliable CV(reliable data and index, robust)
・Try hypothesis testing and randomized experiments
Things Learned
・How to treat huge data that 295_246_830(about 300 million) rows table.
・Mamba architecture
・ChemBERTa
・How to struct ML pipeline for table data
・How to use cloud GPU
・How to use git/github
And can learn more from top solutions.
Discussion