Open9

CPUでの学習速度調査

tttttttttttttttttttt

前提条件

  • Python 3.10.14
  • バックボーン: resnet18
  • 画像サイズ: 256×256
  • 画像拡張: なし
  • pytorch 2.4.0+cpu
  • pytorch 2.4.0
tttttttttttttttttttt

kaggle kernel (cpu)

Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Byte Order:                           Little Endian
Address sizes:                        46 bits physical, 48 bits virtual
CPU(s):                               4
On-line CPU(s) list:                  0-3
Thread(s) per core:                   2
Core(s) per socket:                   2
Socket(s):                            1
NUMA node(s):                         1
Vendor ID:                            GenuineIntel
CPU family:                           6
Model:                                79
Model name:                           Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping:                             0
CPU MHz:                              2199.998
BogoMIPS:                             4399.99

学習時間(分): 39.850389403104785
評価時間(分): 4.133974002798398

EPOCH 1: 100%|██████████| 1710/1710 [38:46<00:00,  1.36s/it, postfix=train_loss: 0.2052]
EPOCH 1: 100%|██████████| 3419/3419 [04:01<00:00, 14.14it/s, postfix=valid_loss 0.0632 | ACC: 0.9792]
EPOCH 2: 100%|██████████| 1710/1710 [39:23<00:00,  1.38s/it, postfix=train_loss: 0.0678]
EPOCH 2: 100%|██████████| 3419/3419 [04:06<00:00, 13.87it/s, postfix=valid_loss 0.0566 | ACC: 0.9801]
EPOCH 3: 100%|██████████| 1710/1710 [40:31<00:00,  1.42s/it, postfix=train_loss: 0.0296]
EPOCH 3: 100%|██████████| 3419/3419 [04:10<00:00, 13.63it/s, postfix=valid_loss 0.0557 | ACC: 0.9848]
EPOCH 4: 100%|██████████| 1710/1710 [40:42<00:00,  1.43s/it, postfix=train_loss: 0.0133]
EPOCH 4: 100%|██████████| 3419/3419 [04:12<00:00, 13.52it/s, postfix=valid_loss 0.0322 | ACC: 0.9903]
tttttttttttttttttttt

11th Gen Intel(R) Core(TM) i7-11370H @ 3.30G

Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      39 bits physical, 48 bits virtual
CPU(s):                             8
On-line CPU(s) list:                0-7
Thread(s) per core:                 2
Core(s) per socket:                 4
Socket(s):                          1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              140
Model name:                         11th Gen Intel(R) Core(TM) i7-11370H @ 3.30G
                                    Hz
Stepping:                           1
CPU MHz:                            2995.199
BogoMIPS:                           5990.39

学習時間(分): 26.396757914622626
評価時間(分): 3.530457828442256

fold:1
(13673, 5) (3419, 5)
EPOCH 1: 100%|██████████| 1710/1710 [26:40<00:00,  1.07it/s, postfix=train_loss: 0.1961]
EPOCH 1: 100%|██████████| 3419/3419 [03:51<00:00, 14.78it/s, postfix=valid_loss 0.0755 | ACC: 0.9772]
EPOCH 2: 100%|██████████| 1710/1710 [26:38<00:00,  1.07it/s, postfix=train_loss: 0.0714]
EPOCH 2: 100%|██████████| 3419/3419 [03:55<00:00, 14.49it/s, postfix=valid_loss 0.0468 | ACC: 0.9863]
EPOCH 3: 100%|██████████| 1710/1710 [27:43<00:00,  1.03it/s, postfix=train_loss: 0.0295]
EPOCH 3: 100%|██████████| 3419/3419 [03:16<00:00, 17.40it/s, postfix=valid_loss 0.0381 | ACC: 0.9883]
EPOCH 4: 100%|██████████| 1710/1710 [24:32<00:00,  1.16it/s, postfix=train_loss: 0.0138]
EPOCH 4: 100%|██████████| 3419/3419 [03:03<00:00, 18.63it/s, postfix=valid_loss 0.0551 | ACC: 0.9848]
tttttttttttttttttttt

Intel(R) N100

Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Byte Order:                           Little Endian
Address sizes:                        39 bits physical, 48 bits virtual
CPU(s):                               4
On-line CPU(s) list:                  0-3
Thread(s) per core:                   1
Core(s) per socket:                   4
Socket(s):                            1
Vendor ID:                            GenuineIntel
CPU family:                           6
Model:                                190
Model name:                           Intel(R) N100
Stepping:                             0
CPU MHz:                              806.399
BogoMIPS:                             1612.79

学習時間(分): 36.46781529784202
評価時間(分): 3.6432770530382794

fold:1
(13673, 5) (3419, 5)
EPOCH 1: 100%|██████████| 1710/1710 [36:26<00:00,  1.28s/it, postfix=train_loss: 0.4644]
EPOCH 1: 100%|██████████| 3419/3419 [03:37<00:00, 15.74it/s, postfix=valid_loss 0.2538 | ACC: 0.9146]
EPOCH 2: 100%|██████████| 1710/1710 [36:31<00:00,  1.28s/it, postfix=train_loss: 0.2146]
EPOCH 2: 100%|██████████| 3419/3419 [03:39<00:00, 15.56it/s, postfix=valid_loss 0.2034 | ACC: 0.9342]
EPOCH 3: 100%|██████████| 1710/1710 [36:26<00:00,  1.28s/it, postfix=train_loss: 0.1301]
EPOCH 3: 100%|██████████| 3419/3419 [03:38<00:00, 15.63it/s, postfix=valid_loss 0.0981 | ACC: 0.9678]
EPOCH 4: 100%|██████████| 1710/1710 [36:27<00:00,  1.28s/it, postfix=train_loss: 0.0683]
EPOCH 4: 100%|██████████| 3419/3419 [03:38<00:00, 15.65it/s, postfix=valid_loss 0.0684 | ACC: 0.9769]
tttttttttttttttttttt

Intel(R) Core(TM) i7-14650HX

Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Byte Order:                           Little Endian
Address sizes:                        39 bits physical, 48 bits virtual
CPU(s):                               24
On-line CPU(s) list:                  0-23
Thread(s) per core:                   2
Core(s) per socket:                   12
Socket(s):                            1
Vendor ID:                            GenuineIntel
CPU family:                           6
Model:                                183
Model name:                           Intel(R) Core(TM) i7-14650HX
Stepping:                             1
CPU MHz:                              2419.200
BogoMIPS:                             4838.40

学習時間(分): 7.806483887632688
評価時間(分): 1.075827302535375

EPOCH 1: 100%|██████████| 1710/1710 [07:39<00:00,  3.72it/s, postfix=train_loss: 0.1977]
EPOCH 1: 100%|██████████| 3419/3419 [01:04<00:00, 52.79it/s, postfix=valid_loss 0.1116 | ACC: 0.9637]
EPOCH 2: 100%|██████████| 1710/1710 [07:53<00:00,  3.61it/s, postfix=train_loss: 0.0679]
EPOCH 2: 100%|██████████| 3419/3419 [01:04<00:00, 53.24it/s, postfix=valid_loss 0.0596 | ACC: 0.9798]
EPOCH 3: 100%|██████████| 1710/1710 [07:49<00:00,  3.64it/s, postfix=train_loss: 0.0330]
EPOCH 3: 100%|██████████| 3419/3419 [01:04<00:00, 53.15it/s, postfix=valid_loss 0.0528 | ACC: 0.9830]
EPOCH 4: 100%|██████████| 1710/1710 [07:50<00:00,  3.63it/s, postfix=train_loss: 0.0141]
EPOCH 4: 100%|██████████| 3419/3419 [01:04<00:00, 52.69it/s, postfix=valid_loss 0.0644 | ACC: 0.9819]
tttttttttttttttttttt

google colab

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   2
  On-line CPU(s) list:    0,1
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) CPU @ 2.20GHz
    CPU family:           6
    Model:                79
    Thread(s) per core:   2
    Core(s) per socket:   1
    Socket(s):            1
    Stepping:             0
    BogoMIPS:             4400.46
tttttttttttttttttttt

おまけ kaggle kernel (gpu利用)

gpu

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla P100-PCIE-16GB           Off |   00000000:00:04.0 Off |                    0 |
| N/A   27C    P0             25W /  250W |       0MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

cpu

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) CPU @ 2.00GHz
    CPU family:           6
    Model:                85
    Thread(s) per core:   2
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             3
    BogoMIPS:             4000.28

学習時間(分): 1.1703060746192933
評価時間(分): 0.4584653983513514

tttttttttttttttttttt

kaggle kernel (multi gpu利用)

gpu

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   48C    P8             10W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:00:05.0 Off |                    0 |
| N/A   49C    P8             10W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

cpu

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) CPU @ 2.00GHz
    CPU family:           6
    Model:                85
    Thread(s) per core:   2
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             3
    BogoMIPS:             4000.28

学習時間(分): 4.723866355419159
評価時間(分): 0.776461640993754

遅いのはマルチgpuによる同期のずれや、勾配の再計算などが要因。バッジサイズを増やせば早くなる。

tttttttttttttttttttt

11th Gen Intel(R) Core(TM) i7-11370H @ 3.30G Intel® Extension for PyTorch使用

Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      39 bits physical, 48 bits virtual
CPU(s):                             8
On-line CPU(s) list:                0-7
Thread(s) per core:                 2
Core(s) per socket:                 4
Socket(s):                          1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              140
Model name:                         11th Gen Intel(R) Core(TM) i7-11370H @ 3.30G
                                    Hz
Stepping:                           1
CPU MHz:                            2995.199
BogoMIPS:                           5990.39

学習時間(分): 22.397127333283425
評価時間(分): 2.836812874674797

fold:1
(13673, 5) (3419, 5)
EPOCH 1: 100%|██████████| 1710/1710 [23:42<00:00,  1.20it/s, postfix=train_loss: 0.1979]
EPOCH 1: 100%|██████████| 3419/3419 [03:13<00:00, 17.62it/s, postfix=valid_loss 0.0715 | ACC: 0.9804]
EPOCH 2: 100%|██████████| 1710/1710 [22:49<00:00,  1.25it/s, postfix=train_loss: 0.0673]
EPOCH 2: 100%|██████████| 3419/3419 [02:36<00:00, 21.85it/s, postfix=valid_loss 0.0697 | ACC: 0.9784]
EPOCH 3: 100%|██████████| 1710/1710 [21:12<00:00,  1.34it/s, postfix=train_loss: 0.0307]
EPOCH 3: 100%|██████████| 3419/3419 [02:50<00:00, 20.00it/s, postfix=valid_loss 0.0809 | ACC: 0.9754]
EPOCH 4: 100%|██████████| 1710/1710 [21:51<00:00,  1.30it/s, postfix=train_loss: 0.0139]
EPOCH 4: 100%|██████████| 3419/3419 [02:39<00:00, 21.46it/s, postfix=valid_loss 0.0483 | ACC: 0.9860]