ãKaggleãBirdCLEF2025 65äœð¥ æ¯ãè¿ã
1. ã¯ããã«
é³¥ã³ã³ãããšBirdCLEF2025ã§éã¡ãã«ãç²åŸããããšãã§ããŸãããããŒãã«ããŒã¿ä»¥å€ã®ã³ã³ãã«ã¯åå ããããšããªãã£ãã®ã§åãããªãããšã ããã§ããããChatGPTãšå 人ã®ç¥æµãåããŠããªããšãç®æšãšããŠããéã¡ãã«ãç²åŸããããšãã§ããŸããã以äžã«åå äœéšèšãæžããŸãã
2. ã³ã³ãã®æŠèŠ
é³¥ã®é³Žã声ããé³¥ãåé¡ããã³ã³ããã»ãšãã©ã¯é³¥ã§ããç¬è«é¡ãæè«ãåºä¹³é¡ãå«ãŸããŠããŸãã
3. è§£æ³
æ¬ã³ã³ãã§ã¯ã以äžã®ãã€ãã©ã€ã³ã§é³å£°ããŒã¿ã®ååŠçããã³ã¢ãã«ã®åŠç¿ãè¡ããŸããã
é³å£°ããŒã¿ âã人å£é³å£°ã®é€å» â5secã®ã¯ãªããã³ã°(ã©ã³ãã ïŒâ ã¡ã«ã¹ãã¯ããã°ã©ã ç»åãžã®å€æ â ã¢ãã«åŠç¿
åŠç¿çšé³å£°ããŒã¿ã®äžéšã«ã¯ãåç©ã®ååãèªã¿äžãã人工é³å£°ãå«ãŸããŠããŸãããããã«ããã¢ãã«ãå¯èœæ§ããããããéå»ã®KaggleããŒãããã¯ãåèã«ããªããã人éã®å£°æåãé€å»ããååŠçãè¡ããŸããã
åŠç¿çšã®é³å£°ããŒã¿ã®é·ãã¯æ§ã
ã§5ç§æªæºã®ãã®ãããã°ã1å以äžã®ãã®ããããŸãããããŒã¿ã®ãµã€ãºãçµ±äžããããäžéšãåãåãå¿
èŠããããŸãããåãåãæ¹æ³ã¯è€æ°ãããŸãã
ã»æåãåãåãæ¹æ³
ã»çãäžãåãåãæ¹æ³
ã»ã©ã³ãã ã«åãåã
ã»RMS(Root Mean SquareïŒé³ã®ãšãã«ã®ãŒéã倧ããå Žæããåªå
çã«åãåãæ¹æ³ã
æçµçã«ç§ã¯ã©ã³ãã ã«åãåãæ¹æ³ãéžã³ãŸãããã·ãŒããå€ããŠæ€èšŒãè¡ã£ãçµæãRMSã§ã¯è¥å¹²ã®ç²ŸåºŠæ¹å(+0ã02)ãèŠãããŸãããã人å£é³å£°ãé€å»ããåŸã«ãã®åŠçãå¿
èŠãªã®ããšããçåãš
PublicLBã«ãªãŒããŒãã£ããã£ã³ã°ããããªäºæãããã®ã§ãæ±åæ§èœãåªå
ããŠã©ã³ãã ã«ãµã³ããªã³ã°ããŸããã
æçµçã«ã¯ä»¥äžãããã¯ããŒã³ãšããã¢ãã«ã®ã¢ã³ãµã³ãã«ã§æåºããŸããã
ã»efficientnet_b0(simple CNN)
ã»eca_nfnet_l0ïŒSED)
4. äžæããã£ãããš
ã»ã¢ãã«ã¹ãŒãïŒModel SoupïŒ
ã¢ãã«ã¹ãŒããšã¯ãè€æ°ã®ã¢ãã«ã®éã¿ãåçŽå¹³åããããšã§1ã€ã®æ°ããã¢ãã«ãäœãææ³ã§ãhttps://arxiv.org/abs/220305482
ããã¯ã¢ã³ãµã³ãã«ã®ããã«è€æ°ã¢ãã«ãåæã«æšè«ããããã§ã¯ãªããããæšè«æéãå¢ãããã«ã¢ãã«ã®å€æ§æ§ãåãå
¥ããããã®ãç¹åŸŽã§ã
ãã®ææ³ãä»åã®ã³ã³ãã§éåžžã«å¹æãçºæ®ããŸãã
ãã®ã³ã³ãã§ã¯äŸå¹Žéããã¡ã€ã³ã·ããã倧ãããé©åãªããªããŒã·ã§ã³èšèšãå°é£ãšãã課é¡ããããŸãããã®ãããç¹å®ã®ãšããã¯ã»ã·ãŒãã«äŸåããã¢ãã«ã«ããã®ã§ã¯ãªããã·ãŒã0ã4, ãšããã¯1ã20ã®ã¢ãã«ã®éã¿ãå¹³åããããšã§ãéåŠç¿ãé¿ãã€ã€æ±åæ§èœã®åäžãçããŸãã
çµæãšããŠãã¢ãã«ã¹ãŒãã«ãã£ãŠç°ãªããšããã¯ã§åŠç¿ããã倿§ãªç¹åŸŽãåã蟌ãããšãã§ãããšèããŠããŸã
ã»SEDã¢ãã«
ããã¯ãã£ã¹ã«ãã·ã§ã³ãéå»ã³ã³ãããæå¹ã§ãããšããã£ãŠããŸãããå®è£
ãé£ãããã ã£ãã®ã§æŸçœ®ããŠããŸãããããé 匵ã£ãŠçè§£ããŠå®è£
ãŸã§ãããŸãããïŒãã¡ããéå»ã³ã³ãã¯åèã«ããŠããŸãïŒ
ã»ONNXãšopenvinoã«ããæšè«ã®é«éå
æšè«æéã40%éããªãã¿ããã§ããããããbackboneãNF_NETã ãšããŸã倿ã§ããŸããã§ããã
ã»Labelsmoothing
5. äžæããããªãã£ãããš
ã»ããªããŒã·ã§ã³ææ³ã®ç¢ºç«
ãã®ã³ã³ãã§ã¯ãåçš®ã«ããããµã³ãã«æ°ã®åããéåžžã«å€§ãããããªããŒã·ã§ã³æŠç¥ã®èšèšã«å€§ããèŠæŠããŸããã
ããšãã°ãæãå€ãçš®ã§ã¯990åã®ãµã³ãã«ãããäžæ¹ã§ãæãå°ãªãçš®ã§ã¯ããã2åãããããŸããããã®ãããªæ¥µç«¯ãªã¯ã©ã¹äžåè¡¡ã®ãããéåžžã®ã¯ãã¹ããªããŒã·ã§ã³ïŒStratified K-Foldãªã©ïŒã§ã®åå²ãè¡ãããšãã§ããŸããã§ããã
ãã®åé¡ã®è§£æ±ºã«ã¯ããªãã®æéãè²»ãããŸããããæ®å¿µãªããæçµçãªææã«ã¯ã€ãªãããŸããã§ããã
詊ããã¢ãããŒããšãã®åç
- å°æ°ã¯ã©ã¹ã«å¯ŸããããŒã¿æ¡åŒµ
ãã€ãºã®ä»å ããããã·ãããªã©ãé³å£°ããŒã¿ã«å¯ŸããŠå€æãå ããããšã§ããŒã¿ã®æ°Žå¢ãã詊ã¿ãŸãã - å°æ°ã¯ã©ã¹ã®ã¿ã«å¯Ÿãããµã³ããªã³ã°åæ°ã®å¢å
ãã ãããã®æ¹æ³ã§ã¯çµå±â䌌ããããªç¹åŸŽãæã€ããŒã¿âãç¹°ãè¿ãåŠç¿ãããŠããŸããæå³ãèãã£ãå¯èœæ§ãé«ãã§ã - å°æ°ã¯ã©ã¹ã®è¿œå ããŒã¿æå
¥
åfoldã«å šãŠã®çš®ãå«ãŸãããããå°æ°ã¯ã©ã¹ã«éã£ãŠè¿œå ããŒã¿ãæå ¥ããŸããããããªããŒã·ã§ã³ã¹ã³ã¢ãšLBã¹ã³ã¢ã®ä¹é¢ãèŠãããŸãã
â 远å ããŒã¿ã®è³ªãäœãã£ãããšãäž»ãªåå ãããããŸããã
2äœã®ãœãªã¥ãŒã·ã§ã³ã§ããããªããŒã·ã§ã³ã®é£ããã«ã€ããŠã¯è§ŠããããŠããŸããç¹ã«ãAUCã®æ¹åã1%æªæºã®ã¬ãã«ã«ãªããšãCVã¹ã³ã¢ãšLBã¹ã³ã¢ã®çžé¢ãã»ãšãã©ãªããªããšããææã¯ãèªåã®è©Šè¡é¯èª€ãšãäžèŽããŠããŸãã
âAfter some major improvements, we saw significant positive changes in both metrics, but when tweaking things within ~1% AUC, the correlation was nearly absent.â
â 2nd Place Discussion
- Time Flip
ãé³¥ã®é³Žãå£°ã¯æéæ§é ãéèŠã§ãæéãå転ãããŠããŸããšãç¹åŸŽãããŸãåããããªãããã§ãã - æšè«æã«ååŸã®2ã5ç§ãå«ãã10ç§ã§æšè«
6. äžäœå ¥è³è ã®è§£æ³
-倿®µéåŠç¿
äžäœè
ã®è§£æ³ã§ã¯Pseudo Labelingãè€æ°åè¡ãã粟床ãåäžãããŠããŸããã工倫ãããŠããç¹ãšããŠã¯æ¬äŒŒã©ãã«ã®ãã€ãºã®å¢å ãé²ãããã«åã¯ã©ã¹ã®äºæž¬ç¢ºçãã¹ãä¹å€æããããšã§è§£æ±ºããŠããŸããã
- ãµã³ããªã³ã°æ¹æ³
äžäœå ¥è³è ã®å€ãã«å ±éããŠèŠããããµã³ããªã³ã°æŠç¥ã¯ã
åãåãç§æ°ïŒ5ç§ãŸãã¯10ç§ã
æœåºæ¹æ³ïŒã©ã³ãã ãµã³ããªã³ã°ãRMSïŒRoot Mean SquareïŒã«ãããšãã«ã®ãŒããŒã¹ã®æœåº
ãšãããã®ã§ãã
ç§èªèº«ãé·ãïŒ10ç§,15ç§ïŒã®åºéãåãåãã¢ãããŒãã¯è©ŠããŠããã®ã§ãããLBã®ã¹ã³ã¢ã«ã¯ããŸãå€åãèŠããããæçµçã«ã¯5ç§éã®ã©ã³ãã ãµã³ããªã³ã°ã«èœã¡çããŸãã
æ¬æ¥ãé³¥ã®é³Žã声ã¯5ç§ä»¥äžç¶ãããšãå€ããããã®ã§ããã鳎ããŠããªãéšåããã€ãºã«ãªãæžå¿µããåŠçé床ãšã®ãã©ã³ã¹ãèæ
®ããŠããã®ã¢ã€ãã¢ã¯ãã£ããèŠéã倿ãããŸãããã ãæ¹ããŠæ€èšŒã®äŸ¡å€ã¯ãããšæããŠããŸã
ç¹ã«å°è±¡çã ã£ãã®ã¯ã2äœã®ãœãªã¥ãŒã·ã§ã³ã§ç޹ä»ãããŠãã以äžã®æ¹æ³ã§ã
ãé³å£°ã®æåãŸãã¯æåŸã®7ç§éãããã©ã³ãã ã«5ç§ãæœåºããã
ãã®ææ³ã¯ããé³¥ã鳎ããŠãããšãã«é²é³ãã¿ã³ãæŒãã鳎ãçµãã£ããæ¢ããããšãã人éã®è¡åãã¿ãŒã³ã«åºã¥ãããã®ã§ã人éã®çŽæãšå®ããŒã¿ã®æ§é ãçµã³ã€ããåªããçºæ³ã ãšæããŸããããããèŠç¹ããããã圢ã«ããå®è£ åã¯ä»åŸãã²èº«ã«ã€ããŠããããã§ãã
"We trained our models on 5s and randomly selected segmentsã Based on experience from last year, we initially tried three approaches to picking 5s segments: random 5s from the whole audio, random 5s from the first 7s, and random 5s from the first or last 7sã The reasoning for the last two approaches is that often the recorder starts the audio when the animal is vocalizing and stops it when the animal stops vocalizingã That would help the model avoid false positivesã The 7s was just to add some diversityã"
â 2nd Place Discussion
7. ææ³
ãã®ã³ã³ãã§ã®è¡åç®æšã¯ä»¥äžã®3ã€ã§ããã
ã»GPUãåããç¶ãã
ã»äžéãŸã§submitãã
ã»Discussionãèªã¿èŸŒãã
äž2ã€ã¯ã©ããã®ããã°ã§æèŠãããã®ã§ãããéãããªãå®è£
åã身ã«ã€ãããšããç¹ã§ã¯å€§ãã«åœ¹ç«ã¡ãŸããããŸããæåŸã®Discussionãèªã¿èŸŒãããšãéåžžã«å€§åã§ããã
å¹³æ¥ã¯åž°å®
åŸã®çŽ3æéã忥ã¯ã»ãŒãã¹ãŠã®æéããã®ã³ã³ãã«æ³šãã§ããã®ã§ãæãããã²ãšããã§ãäžäœå
¥è³è
ã®ãœãªã¥ãŒã·ã§ã³ãèŠãŠããéãããªããã ãã§ã¯å€ªåæã¡ã§ããªãå£ãããããšãçæããŸããã
ããã§ãããã®çµéšã¯èªåã«ãšã£ãŠéåžžã«å€§ããªç³§ãšãªããŸãããæ¥å¹Žã¯ãå¿
ãéã¡ãã«ãç²ãã«ãããŸã
Discussion