Open9
CPUでの学習速度調査
前提条件
- Python 3.10.14
- バックボーン: resnet18
- 画像サイズ: 256×256
- 画像拡張: なし
- pytorch 2.4.0+cpu
- pytorch 2.4.0
kaggle kernel (cpu)
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 0
CPU MHz: 2199.998
BogoMIPS: 4399.99
学習時間(分): 39.850389403104785
評価時間(分): 4.133974002798398
EPOCH 1: 100%|██████████| 1710/1710 [38:46<00:00, 1.36s/it, postfix=train_loss: 0.2052]
EPOCH 1: 100%|██████████| 3419/3419 [04:01<00:00, 14.14it/s, postfix=valid_loss 0.0632 | ACC: 0.9792]
EPOCH 2: 100%|██████████| 1710/1710 [39:23<00:00, 1.38s/it, postfix=train_loss: 0.0678]
EPOCH 2: 100%|██████████| 3419/3419 [04:06<00:00, 13.87it/s, postfix=valid_loss 0.0566 | ACC: 0.9801]
EPOCH 3: 100%|██████████| 1710/1710 [40:31<00:00, 1.42s/it, postfix=train_loss: 0.0296]
EPOCH 3: 100%|██████████| 3419/3419 [04:10<00:00, 13.63it/s, postfix=valid_loss 0.0557 | ACC: 0.9848]
EPOCH 4: 100%|██████████| 1710/1710 [40:42<00:00, 1.43s/it, postfix=train_loss: 0.0133]
EPOCH 4: 100%|██████████| 3419/3419 [04:12<00:00, 13.52it/s, postfix=valid_loss 0.0322 | ACC: 0.9903]
11th Gen Intel(R) Core(TM) i7-11370H @ 3.30G
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 140
Model name: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30G
Hz
Stepping: 1
CPU MHz: 2995.199
BogoMIPS: 5990.39
学習時間(分): 26.396757914622626
評価時間(分): 3.530457828442256
fold:1
(13673, 5) (3419, 5)
EPOCH 1: 100%|██████████| 1710/1710 [26:40<00:00, 1.07it/s, postfix=train_loss: 0.1961]
EPOCH 1: 100%|██████████| 3419/3419 [03:51<00:00, 14.78it/s, postfix=valid_loss 0.0755 | ACC: 0.9772]
EPOCH 2: 100%|██████████| 1710/1710 [26:38<00:00, 1.07it/s, postfix=train_loss: 0.0714]
EPOCH 2: 100%|██████████| 3419/3419 [03:55<00:00, 14.49it/s, postfix=valid_loss 0.0468 | ACC: 0.9863]
EPOCH 3: 100%|██████████| 1710/1710 [27:43<00:00, 1.03it/s, postfix=train_loss: 0.0295]
EPOCH 3: 100%|██████████| 3419/3419 [03:16<00:00, 17.40it/s, postfix=valid_loss 0.0381 | ACC: 0.9883]
EPOCH 4: 100%|██████████| 1710/1710 [24:32<00:00, 1.16it/s, postfix=train_loss: 0.0138]
EPOCH 4: 100%|██████████| 3419/3419 [03:03<00:00, 18.63it/s, postfix=valid_loss 0.0551 | ACC: 0.9848]
Intel(R) N100
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 190
Model name: Intel(R) N100
Stepping: 0
CPU MHz: 806.399
BogoMIPS: 1612.79
学習時間(分): 36.46781529784202
評価時間(分): 3.6432770530382794
fold:1
(13673, 5) (3419, 5)
EPOCH 1: 100%|██████████| 1710/1710 [36:26<00:00, 1.28s/it, postfix=train_loss: 0.4644]
EPOCH 1: 100%|██████████| 3419/3419 [03:37<00:00, 15.74it/s, postfix=valid_loss 0.2538 | ACC: 0.9146]
EPOCH 2: 100%|██████████| 1710/1710 [36:31<00:00, 1.28s/it, postfix=train_loss: 0.2146]
EPOCH 2: 100%|██████████| 3419/3419 [03:39<00:00, 15.56it/s, postfix=valid_loss 0.2034 | ACC: 0.9342]
EPOCH 3: 100%|██████████| 1710/1710 [36:26<00:00, 1.28s/it, postfix=train_loss: 0.1301]
EPOCH 3: 100%|██████████| 3419/3419 [03:38<00:00, 15.63it/s, postfix=valid_loss 0.0981 | ACC: 0.9678]
EPOCH 4: 100%|██████████| 1710/1710 [36:27<00:00, 1.28s/it, postfix=train_loss: 0.0683]
EPOCH 4: 100%|██████████| 3419/3419 [03:38<00:00, 15.65it/s, postfix=valid_loss 0.0684 | ACC: 0.9769]
Intel(R) Core(TM) i7-14650HX
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 183
Model name: Intel(R) Core(TM) i7-14650HX
Stepping: 1
CPU MHz: 2419.200
BogoMIPS: 4838.40
学習時間(分): 7.806483887632688
評価時間(分): 1.075827302535375
EPOCH 1: 100%|██████████| 1710/1710 [07:39<00:00, 3.72it/s, postfix=train_loss: 0.1977]
EPOCH 1: 100%|██████████| 3419/3419 [01:04<00:00, 52.79it/s, postfix=valid_loss 0.1116 | ACC: 0.9637]
EPOCH 2: 100%|██████████| 1710/1710 [07:53<00:00, 3.61it/s, postfix=train_loss: 0.0679]
EPOCH 2: 100%|██████████| 3419/3419 [01:04<00:00, 53.24it/s, postfix=valid_loss 0.0596 | ACC: 0.9798]
EPOCH 3: 100%|██████████| 1710/1710 [07:49<00:00, 3.64it/s, postfix=train_loss: 0.0330]
EPOCH 3: 100%|██████████| 3419/3419 [01:04<00:00, 53.15it/s, postfix=valid_loss 0.0528 | ACC: 0.9830]
EPOCH 4: 100%|██████████| 1710/1710 [07:50<00:00, 3.63it/s, postfix=train_loss: 0.0141]
EPOCH 4: 100%|██████████| 3419/3419 [01:04<00:00, 52.69it/s, postfix=valid_loss 0.0644 | ACC: 0.9819]
google colab
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
Stepping: 0
BogoMIPS: 4400.46
おまけ kaggle kernel (gpu利用)
gpu
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P100-PCIE-16GB Off | 00000000:00:04.0 Off | 0 |
| N/A 27C P0 25W / 250W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
cpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU @ 2.00GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping: 3
BogoMIPS: 4000.28
学習時間(分): 1.1703060746192933
評価時間(分): 0.4584653983513514
kaggle kernel (multi gpu利用)
gpu
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 48C P8 10W / 70W | 1MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 49C P8 10W / 70W | 1MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
cpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU @ 2.00GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping: 3
BogoMIPS: 4000.28
学習時間(分): 4.723866355419159
評価時間(分): 0.776461640993754
遅いのはマルチgpuによる同期のずれや、勾配の再計算などが要因。バッジサイズを増やせば早くなる。
11th Gen Intel(R) Core(TM) i7-11370H @ 3.30G Intel® Extension for PyTorch使用
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 140
Model name: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30G
Hz
Stepping: 1
CPU MHz: 2995.199
BogoMIPS: 5990.39
学習時間(分): 22.397127333283425
評価時間(分): 2.836812874674797
fold:1
(13673, 5) (3419, 5)
EPOCH 1: 100%|██████████| 1710/1710 [23:42<00:00, 1.20it/s, postfix=train_loss: 0.1979]
EPOCH 1: 100%|██████████| 3419/3419 [03:13<00:00, 17.62it/s, postfix=valid_loss 0.0715 | ACC: 0.9804]
EPOCH 2: 100%|██████████| 1710/1710 [22:49<00:00, 1.25it/s, postfix=train_loss: 0.0673]
EPOCH 2: 100%|██████████| 3419/3419 [02:36<00:00, 21.85it/s, postfix=valid_loss 0.0697 | ACC: 0.9784]
EPOCH 3: 100%|██████████| 1710/1710 [21:12<00:00, 1.34it/s, postfix=train_loss: 0.0307]
EPOCH 3: 100%|██████████| 3419/3419 [02:50<00:00, 20.00it/s, postfix=valid_loss 0.0809 | ACC: 0.9754]
EPOCH 4: 100%|██████████| 1710/1710 [21:51<00:00, 1.30it/s, postfix=train_loss: 0.0139]
EPOCH 4: 100%|██████████| 3419/3419 [02:39<00:00, 21.46it/s, postfix=valid_loss 0.0483 | ACC: 0.9860]