ãKaggleãBirdCLEF2024 29äœð¥ æ¯ãè¿ã
ä»åã¯ãKaggleã®BirdCLEF2024ã³ã³ãã«åå ãã29äœã§éã¡ãã«ãååŸããããšãã§ããã®ã§ã解æ³ã®æ¯ãè¿ããæžããŠããããšæããŸãã
ã»Competition
ã»ã³ãŒã
0. ã³ã³ãæŠèŠ
é³¥ã¯ç§»åæ§ãé«ãåºåã«ååžãããããçç©å€æ§æ§ã®å€åã確èªããããšãã§ããæ°åå€åãèªç¶åçãããžã§ã¯ããªã©ã®è¯ãææšãšãªãã
ä»åã®ã³ã³ãã®ç®çã¯ãé³¥ã®é³Žã声ã®é³å£°ããŒã¿ããé³¥ã®çš®é¡ãå€å¥ããããšã§ã182çš®é¡ã®äžããã©ã®é³¥ãå«ãŸããŠããããäºæž¬ããŸãã
-
èšç·ŽããŒã¿:
xenocanto.org
ãšããäžçäžã®ç 究è ãéé³¥æ奜家ãéçåç©ã®é³Žã声ãç»é²ãããµã€ãããã182çš®é¡ã®é³¥ã®é³Žã声ãèšç·ŽããŒã¿ãšããŠæäŸãããŸãã
ã³ã³ããã£ã·ã§ã³ã®ãã¹ãã¯ã240000ãµã³ãã«ä»¥äžã®ã©ã³ãã ãªé·ãã®é³Žã声ããŒã¿ãæäŸããŠããããŸãåçš®é¡ã®é³å£°ãµã³ãã«æ°ã¯ç°ãªã£ãŠããŸãã(é²é³æ°ã®å€ãçš®é¡ã®é³¥ãšå°ãªãçš®é¡ã®é³¥ãååš)ã -
èšç·Žã¡ã¿ããŒã¿:
åé³å£°ãµã³ãã«ããšã«ã2次ã©ãã«(äž»èŠã©ãã«ã®ä»ã«é²é³ãããé³¥ã®çš®é¡ã瀺ããç¡ãäºããã)ã鳎ã声ã®çš®é¡(鳎ã声ãæãã¢ã©ãŒã ãªã©)ã緯床çµåºŠãèè ãé³å£°å質ã©ã³ã¯ãªã©ã®ã¡ã¿ããŒã¿ãæäŸãããŸããã -
ãã¹ãããŒã¿:
ãã¹ãããŒã¿ã¯é·ã4åã§çµ±äžãããã1100åã®é³å£°ãã¡ã€ã«ã§ãé³¥ã®é³Žã声ã ãã§ãªã森ã®äžã§èãããä»ã®é³ããåšå²ã®éé³ãªã©ãå«ããŠé²é³ãããŠããŸãã
äºæž¬ã§ã¯ããã®ããŒã¿ã5ç§ééã§åå²ããããããã«å¯ŸããŠæšè«ãè¡ããŸãã -
éã©ãã«ä»ãããŒã¿:
é³¥ã®ååãã©ãã«ä»ããããŠããèšç·ŽããŒã¿ã®ä»ã«ã8444åã®æªã©ãã«ä»ãããŒã¿ãæäŸãããŠããŸããã -
è©äŸ¡æ¹æ³:
è©äŸ¡ã¯ãã¯ãå¹³åAUC-ROCã§è¡ãããŸããçéœæ§ãæããªãã¯ã©ã¹ãé€ãã182åã®äºæž¬å€ããããã«å¯ŸããŠè©äŸ¡ãè¡ãããå¹³åãããå€ãæçµçãªè©äŸ¡å€ã«ãªããŸãã
ã» æåºãã¡ã€ã«ã®ãã©ãŒããã
row_id, asbfly, ashdro1, ashpri1, ...
soundscape_1446779_5, 0.0054, 0.0054, 0.0054, ...
soundscape_1446779_10, 0.0054, 0.0054, 0.0054, ...
soundscape_1446779_15, 0.0054, 0.0054, 0.0054, ...
-
ç¹åŸŽ:
ä»åã®ã³ã³ãã®ç¹åŸŽã¯ä»¥äžã®ããã«ãªã£ãŠããŸãã
ã»ããŒã¿ã®åé¡ç¹ãšããŠãã¢ã³ãã©ã³ã¹(çš®é¡ã«ããäžåè¡¡)ãé³¥ã®é³Žã声以å€ã®é³ãå ¥ãããšããããé³¥ã鳎ããŠããªãéã®ããŒã¿ãªã©ã«å¯ŸåŠããå¿ èŠããã
ã»CPUã«ãã120å以å ã®æšè«ããèªããããŠããªã
ç¹ã«GPUã䜿çšã§ããªããšããå¶éã«ãããæšè«æéãããªãå³ãããã®ã«ãªã£ãŠããŸããã -
ã³ã³ãã®æµã
äžããããããŒã¿ã¯é³å£°ã®äžæ¬¡å ããŒã¿ã§ããããå€ãã®å ¬éããŒãããã¯ã§ãã¹ãã¯ããã°ã©ã (æéåäœã®åšæ³¢æ°ãå¯èŠåãããã®)ãçããŒã¿ããäœãåºããç»åã¢ãã«ã§åŠç¿ãè¡ãããŠããŸããã
â»ã¹ãã¯ããã°ã©ã ã«ã€ããŠã¯ä»¥åããŒãªãšå€æãšäžç·ã«è§£èª¬ããŠããŸãã
ã»ããŒãªãšå€æãç解ãã
1. 解æ³
1.1 æŠèŠ
ã»èªäœã¢ãã«ãšå
¬éã¢ãã«ã®ã¢ã³ãµã³ãã«(1:1)
ã»èªã¢ãã«ã¯timmã®efficientnet_b0ãå©çšããç»åã¢ãã«
ã»å
¬éã¢ãã«ã¯mixupã䜿çšããç»åã¢ãã«ãå©çš
ã»6çš®é¡ã®audioâimageå€æãè©Šããæãè¯ãã£ãmelspectrogramãå©çš
ã»å
šã¢ãã«ã«å¯ŸããŠONNXãšopenvinoã«ããæšè«ã®é«éåãé©çš(çŽ40%ã®æšè«æéåæž)
1.2 è¿œå ããŒã¿
ä»åã®å€§äŒã§ã¯ãäžå
·åã§çšæãããããŒã¿ãåçš®500ãŸã§æã¡æ¢ããšãªã£ãŠãããããåå è
ã®æ¹ãè¿œå ããŠãããxenocanto.org
ããã®è¿œå ããŒã¿ãçšããŠåŠç¿ãè¡ããŸããã
åçŽã§ããããŒã¿ã®éã¯ã¢ãã«ã®æ§èœã«çŽçµããŸãã
2. èªäœã¢ãã«
ããããã¯ãèªäœã®ç»åã¢ãã«ã«ã€ããŠè©±ããŠãããŸãã
2.1 èšå®
model: "efficientnet_b0.ra_in1k"
batch_size: 32
max_epoch: 9
n_folds: 5
optimizer: optim.AdamW
scheduler: OneCycleLR
lr: 1.0e-03
weight_decay: 1.0e-02
img_size: 224
interpolation: cv2.INTER_AREA
enable_amp: Ture
CV: StratifiedKFold
ä»åã®è©äŸ¡ææšã§ã¯ããŒã¿ã»ããã«ãªãã¯ã©ã¹ã®è©äŸ¡ã¯ç¡èŠãããŠããŸããããããªããŒã·ã§ã³ã¯åfoldã®ã¯ã©ã¹ååžãåçã«ãªãããã«StratifiedKFoldã䜿çšããŸããã
2.2 ååŠç
ä»åã¯å
ã®é³å£°ããŒã¿ã以äž6çš®é¡ã®ç»åã«å€æããç»åã¢ãã«ã§åŠç¿ãè¡ããŸããã
ã»ã¹ãã¯ããã°ã©ã
ã»ã¡ã«ã¹ãã¯ããã°ã©ã
ã»ã¹ã«ãã°ã©ã
ã»ã¯ããã°ã©ã
ã»MFCC
ã»ã¹ãã¯ãã«ã³ã³ãã©ã¹ã
ããããã®ç»åã®ã€ã¡ãŒãžã¯ä»¥äžã§ç¢ºèªã§ããŸãã
ã»ãPre-processing MethodãVarious ways to visualize audio data
è©Šããäžã§ãã¡ã«ã¹ãã¯ããã°ã©ã ãäžçªè¯ãæ§èœã瀺ããã®ã§ããã䜿çšããŸãããã¡ã«ã¹ãã¯ããã°ã©ã ã¯åšæ³¢æ°ããšã«é³å£°ãå¯èŠåãããããåãé³¥ã®é³Žã声ã¯åããããªéšåã匷調ãããŠãããšèããããŸãã
ä»ã®ç»åã䜿çšããã¢ãã«ãšã®ã¢ã³ãµã³ãã«ãè©ŠããŸããããæ§èœã¯åäžããªãã£ããã䜿çšããŸããã§ããã
ã»ã¡ã«ã¹ãã¯ããã°ã©ã
2.3 ã¢ãã«
class BirdCLEF2024SpecModel(nn.Module):
def __init__(
self,
model_name: str,
pretrained: bool,
in_channels: int,
num_classes: int,
):
super().__init__()
self.model = timm.create_model(
model_name=model_name,
pretrained=pretrained,
num_classes=num_classes,
in_chans=in_channels
)
def forward(self, x):
h = self.model(x)
return h
2.4 ã¢ãã«åºå
åºåãå®å®ãããããã«ãåfoldå ã§æãCVã®é«ãã£ãã¢ãã«ãéžæããããããã®ã¢ãã«ã§äºæž¬ããã®çµæãå¹³åãããã®ãæçµåºåãšããŸããã
test_pred = test_preds_arr.mean(axis=0)
2.5 ç°åžžããŒã¿ã®åé€
ã¡ã¿ããŒã¿ã«èšè¿°ãããŠããããå®éã«ã¯ååšããªãããŒã¿ãåé€ããŸããã
not_exist_list = [
'aspfly1/XC775312',
'comior1/XC881009',
'hoopoe/XC891005',
'hoopoe/XC891004',
'hoopoe/XC798809',
'hoopoe/XC798808',
'hoopoe/XC798807',
'hoopoe/XC798806',
'hoopoe/XC798805',
'eaywag1/XC835367',
'orihob2/XC762524',
]
2.6 æšè«ã®é«éå
Intelã®OpenVINOãšãããªãŒãã³ãœãŒã¹ãœãããå©çšããããšã§ãéååãæåã(pruning)ãåæ£åãããŒããŠã§ã¢(CPU,GPUç)ã®æé©åãªã©ã«ãã£ãŠæšè«ãé«éåããããšãã§ããŸãã
ä»åã¯åºåã®ããã«5åã¢ãã«ã«ããæšè«(+ã¢ã³ãµã³ãã«ã¢ãã«ã®æšè«)ãè¡ãå¿
èŠããããŸãããããããå©çšããããšã§æšè«æéãçŽ40%çããªããå¶éã§ãã120å以å
ã®æšè«ãæºããããšãã§ããŸããã
# converting models to openvino
def convert_pytorch_to_openvino(device):
for fold_id in range(CFG.n_folds):
# load model
model_path = TRAINED_MODEL / f"best_model_fold{fold_id}.pth"
model = BirdCLEF2024SpecModel(
model_name=CFG.model_name, pretrained=False, num_classes=CFG.N_CLASSES, in_channels=1
)
model.load_state_dict(torch.load(model_path, map_location=device))
model.eval()
# export to onnx
torch.onnx.export(model,
CFG.DUMMY_INPUT_TENSOR,
CFG.OUTPUT_DIR_ONNX / f"fp32_fold{fold_id}.onnx",
opset_version=15,
input_names=['input'],
output_names=['output']
)
# convert model to openvino
ov_model = ov.convert_model(CFG.OUTPUT_DIR_ONNX / f"fp32_fold{fold_id}.onnx",
input=[('input', CFG.INPUT_SHAPE)],)
# save model
ov.save_model(ov_model, CFG.OUTPUT_DIR_OV / f"fp32_fold{fold_id}.xml", compress_to_fp16=False)
convert_pytorch_to_openvino(device=torch.device(CFG.device))
3. äœãæ©èœããªãã£ãã
ã»ç»åã®mixup
ã»ã¡ã«ã¹ãã¯ããã°ã©ã 以å€ã®ç»åã«ããåŠç¿
4. ç°å¢
ã³ã³ãå
šäœãéããŠãKaggleã®GPU(P100)ã䜿çšããŠããŸããã
æ§èœ:
ã»RAM 32GB
ã»VRAM 16GB
å¶é:
ã»é±30æéãŸã§
ã»åæã«èµ·åã§ããã®ã¯1ã€ãŸã§
5. ãŸãšã
æ¬è§£æ³ã§ã¯ã¡ã«ã¹ãã¯ããã°ã©ã ãå©çšããç»åã¢ãã«ãšãOpenVINOã«ããæšè«ã®é«éåãäž»ã«å©çšããŸããã
æ¯ãè¿ãã¯ä»¥äžã«ãªããŸããèªãã§ããã ãããããšãããããŸããïŒ
åè
[1] BirdCLEF 2024, Kaggle
Discussion