Comparison of learning and inference speed of different gpu with various cnn models in pytorch
Graphics Card Name | NVIDIA GeForce GTX 1080 Ti | NVIDIA GeForce RTX 2080 Ti | NVIDIA TITAN V |
---|---|---|---|
Process | 16nm | 12nm | 12nm |
Die Size | 471mm² | 754mm² | 815mm² |
Transistors | 11,800 million | 18,600 million | 21,100 million |
CUDA Cores | 3584 Cores | 4352 Cores | 5120 Cores |
Tensor Cores | None | 544 Cores | 640 Cores |
Clock(base) | 1481 MHz | 1350 MHz | 1200 MHz |
FP16 (half) performance | 177.2 GFLOPS | 26,895 GFLOPS | 29,798 GFLOPS |
FP32 (float) performance | 11,340 GFLOPS | 13,448 GFLOPS | 14,899 GFLOPS |
FP64 (double) performance | 354.4 GFLOPS | 420.2 GFLOPS | 7,450 GFLOPS |
Memory | 11GB GDDR5X | 11 GB GDDR6 | 12 GB HBM2 |
Memory Speed | 11Gbps | 14.00 Gbps | 1.7Gbps HBM2 |
Memory Interface | 352-bit | 352-bit | 3072-bit |
Memory Bandwidth | 484 GB/s | 616 GB/s | 653GB/s |
Price | $699 US | $1,199 US | $2,999 US |
Release Date | Mar 10th, 2017 | Sep 20th, 2018 | Dec 7th, 2017 |
Single & multi GPU with batch size 12: compare training and inference speed of **SequeezeNet, VGG-16, VGG-19, ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, DenseNet121, DenseNet169, DenseNet201, DenseNet161 mobilenet mnasnet ... **
Experiments are performed on three types of datatype. single precision, double precision, half precision
making plot(plotly)
./test.sh
1.4
4
10.0
7601
Each network is fed with 12 images with 224x224x3 dimensions. For training, time durations of 20 passes of forward and backward are averaged. For inference, time durations of 20 passes of forward are averaged. 5 warm up steps are performed that do not calculate towards the final result.
I conducted the experiment using two rtx 2080ti.
Mode | gpu | precision | densenet121 | densenet161 | densenet169 | densenet201 | resnet101 | resnet152 | resnet18 | resnet34 | resnet50 | squeezenet1_0 | squeezenet1_1 | vgg16 | vgg16_bn | vgg19 | vgg19_bn |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training | TITAN V | single | 56.17 ms | 120.7 ms | 72.59 ms | 93.35 ms | 84.59 ms | 119.5 ms | 16.69 ms | 28.27 ms | 50.54 ms | 15.30 ms | 9.857 ms | 72.85 ms | 80.95 ms | 85.55 ms | 94.42 ms |
Inference | TITAN V | single | 17.49 ms | 39.33 ms | 23.63 ms | 30.93 ms | 23.96 ms | 34.22 ms | 4.827 ms | 8.428 ms | 14.27 ms | 4.565 ms | 2.765 ms | 22.94 ms | 25.41 ms | 27.55 ms | 30.28 ms |
Training | TITAN V | double | 139.8 ms | 387.4 ms | 175.9 ms | 224.5 ms | 509.9 ms | 720.0 ms | 94.21 ms | 194.6 ms | 271.7 ms | 68.38 ms | 31.18 ms | 1463. ms | 1484. ms | 1993. ms | 2016. ms |
Inference | TITAN V | double | 47.68 ms | 170.5 ms | 60.73 ms | 78.43 ms | 317.7 ms | 448.6 ms | 60.26 ms | 129.9 ms | 159.8 ms | 42.37 ms | 11.95 ms | 1261. ms | 1266. ms | 1745. ms | 1751. ms |
Training | TITAN V | half | 43.79 ms | 75.16 ms | 57.53 ms | 70.88 ms | 47.82 ms | 67.43 ms | 10.48 ms | 17.19 ms | 29.08 ms | 13.15 ms | 9.390 ms | 36.03 ms | 46.84 ms | 41.16 ms | 52.65 ms |
Inference | TITAN V | half | 11.87 ms | 22.88 ms | 16.04 ms | 20.70 ms | 12.80 ms | 18.11 ms | 3.085 ms | 5.116 ms | 7.608 ms | 3.694 ms | 2.329 ms | 10.96 ms | 13.26 ms | 12.72 ms | 15.17 ms |
Training | 1080ti | single | 77.18 ms | 164.0 ms | 99.66 ms | 127.6 ms | 112.8 ms | 158.7 ms | 22.48 ms | 36.80 ms | 68.87 ms | 20.56 ms | 13.29 ms | 101.8 ms | 114.1 ms | 119.9 ms | 133.2 ms |
Inference | 1080ti | single | 23.53 ms | 51.53 ms | 31.82 ms | 41.73 ms | 33.02 ms | 47.02 ms | 6.426 ms | 10.97 ms | 20.17 ms | 7.174 ms | 4.370 ms | 33.73 ms | 37.25 ms | 39.95 ms | 44.12 ms |
Training | 1080ti | double | 779.5 ms | 2522. ms | 940.4 ms | 1196. ms | 2410. ms | 3546. ms | 463.3 ms | 969.9 ms | 1216. ms | 259.9 ms | 131.5 ms | 4227. ms | 4271. ms | 5475. ms | 5522. ms |
Inference | 1080ti | double | 47.68 ms | 275.2 ms | 1157. ms | 328.6 ms | 414.9 ms | 1080. ms | 1589. ms | 181.1 ms | 390.8 ms | 529.6 ms | 110.9 ms | 49.96 ms | 2094. ms | 2103. ms | 2775. ms |
Training | 1080ti | half | 43.79 ms | 70.00 ms | 148.4 ms | 89.43 ms | 113.6 ms | 151.0 ms | 219.5 ms | 21.00 ms | 34.84 ms | 76.24 ms | 19.60 ms | 13.18 ms | 91.60 ms | 105.9 ms | 108.1 ms |
Inference | 1080ti | half | 18.62 ms | 42.26 ms | 25.27 ms | 33.01 ms | 27.49 ms | 38.88 ms | 5.645 ms | 9.765 ms | 16.26 ms | 5.869 ms | 3.576 ms | 30.69 ms | 33.22 ms | 36.71 ms | 39.51 ms |
Mode | gpu | precision | resnet18 | resnet34 | resnet50 | resnet101 | resnet152 | densenet121 | densenet169 | densenet201 | densenet161 | squeezenet1_0 | squeezenet1_1 | vgg16 | vgg16_bn | vgg19_bn | vgg19 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training | RTX 2080ti(1) | single | 16.36 ms | 28.44 ms | 49.63 ms | 81.40 ms | 115.1 ms | 57.69 ms | 75.18 ms | 91.69 ms | 112.7 ms | 14.49 ms | 9.108 ms | 75.86 ms | 85.42 ms | 98.43 ms | 88.05 ms |
Inference | RTX 2080ti(1) | single | 4.894 ms | 8.624 ms | 14.65 ms | 24.57 ms | 35.15 ms | 16.70 ms | 21.94 ms | 28.89 ms | 34.64 ms | 4.704 ms | 2.765 ms | 23.70 ms | 26.25 ms | 30.82 ms | 28.03 ms |
Training | RTX 2080ti(1) | double | 367.9 ms | 755.4 ms | 939.9 ms | 1844. ms | 2702. ms | 593.5 ms | 724.3 ms | 921.3 ms | 1916. ms | 187.8 ms | 94.99 ms | 3251. ms | 3277. ms | 4265. ms | 4238. ms |
Inference | RTX 2080ti(1) | double | 165.0 ms | 328.5 ms | 436.4 ms | 831.0 ms | 1196. ms | 213.8 ms | 266.0 ms | 339.5 ms | 910.7 ms | 82.71 ms | 35.79 ms | 1702. ms | 1708. ms | 2280. ms | 2274. ms |
Training | RTX 2080ti(1) | half | 13.17 ms | 22.25 ms | 35.46 ms | 57.50 ms | 81.38 ms | 51.11 ms | 66.88 ms | 80.20 ms | 88.37 ms | 17.87 ms | 35.75 ms | 53.16 ms | 63.06 ms | 72.75 ms | 61.95 ms |
Inference | RTX 2080ti(1) | half | 3.423 ms | 5.662 ms | 9.035 ms | 14.51 ms | 20.52 ms | 13.47 ms | 17.54 ms | 22.51 ms | 27.10 ms | 4.280 ms | 2.397 ms | 16.14 ms | 18.14 ms | 19.76 ms | 17.89 ms |
Training | RTX 2080ti(2) | single | 16.92 ms | 29.51 ms | 51.46 ms | 84.90 ms | 120.0 ms | 58.13 ms | 75.96 ms | 92.47 ms | 117.6 ms | 14.95 ms | 9.255 ms | 78.95 ms | 88.71 ms | 102.3 ms | 91.67 ms |
Inference | RTX 2080ti(2) | single | 5.107 ms | 8.976 ms | 15.18 ms | 25.60 ms | 36.60 ms | 17.02 ms | 22.40 ms | 29.46 ms | 36.72 ms | 4.852 ms | 2.786 ms | 24.76 ms | 27.25 ms | 32.05 ms | 29.27 ms |
Training | RTX 2080ti(2) | double | 381.9 ms | 781.5 ms | 971.6 ms | 1900. ms | 2777. ms | 610.6 ms | 744.7 ms | 948.1 ms | 1974. ms | 191.9 ms | 97.27 ms | 3317. ms | 3350. ms | 4357. ms | 4329. ms |
Inference | RTX 2080ti(2) | double | 171.8 ms | 341.7 ms | 449.5 ms | 849.5 ms | 1231. ms | 221.1 ms | 275.2 ms | 352.5 ms | 938.9 ms | 83.66 ms | 36.48 ms | 1715. ms | 1721. ms | 2294. ms | 2289. ms |
Training | RTX 2080ti(2) | half | 13.57 ms | 22.97 ms | 36.55 ms | 59.10 ms | 83.81 ms | 51.74 ms | 68.35 ms | 81.21 ms | 89.46 ms | 15.75 ms | 35.46 ms | 55.28 ms | 65.43 ms | 75.75 ms | 64.62 ms |
Inference | RTX 2080ti(2) | half | 3.520 ms | 5.837 ms | 9.272 ms | 14.93 ms | 21.13 ms | 13.38 ms | 18.71 ms | 22.40 ms | |||||||
26.82 ms | 4.446 ms | 2.406 ms | 16.29 ms | 17.91 ms | 20.90 ms | 19.14 ms |
TitanV Total |
---|
![]() |
TitanV inference |
|
---|---|
densenet | resnet |
![]() | ![]() |
vgg | squeezenet |
![]() | ![]() |
TitanV Training |
|
---|---|
densenet | resnet |
![]() | ![]() |
vgg | squeezenet |
![]() | ![]() |
1080ti Total |
---|
![]() |
1080ti inference |
|
---|---|
densenet | resnet |
![]() | ![]() |
vgg | squeezenet |
![]() | ![]() |
1080ti Training |
|
---|---|
densenet | resnet |
![]() | ![]() |
vgg | squeezenet |
![]() | ![]() |
rtx2080ti Total |
---|
![]() |
rtx2080ti inference |
|
---|---|
densenet | resnet |
![]() | ![]() |
vgg | squeezenet |
![]() | ![]() |
rtx2080ti Training |
|
---|---|
densenet | resnet |
![]() | ![]() |
vgg | squeezenet |
![]() | ![]() |
Device comparison(Training) |
||
---|---|---|
VGG | ||
half | single | double |
![]() | ![]() |
![]() |
resnet | ||
half | single | double |
![]() | ![]() |
![]() |
densenet | ||
half | single | double |
![]() | ![]() |
![]() |
squeezenet | ||
half | single | double |
![]() | ![]() |
![]() |
Device comparison(inference) |
||
---|---|---|
VGG | ||
half | single | double |
![]() | ![]() |
![]() |
resnet | ||
half | single | double |
![]() | ![]() |
![]() |
densenet | ||
half | single | double |
![]() | ![]() |
![]() |
squeezenet | ||
half | single | double |
![]() | ![]() |
![]() |
Spec | NVIDIA DGX Station |
---|---|
GPUs | 4 x Tesla V100 |
TFLOPS (GPU FP16) | 480 |
GPU Memory | 64 GB total system |
CPU | 20-Core Intel Xeon E5-2698 v4 2.2 GHz |
NVIDIA CUDA Cores | 20,480 |
NVIDIA Tensor Cores | 2,560 |
Maximum Power Requirements | 1,500 W |
System Memory | 256 GB DDR4 LRDIMM |
Storage | 4 (data: 3 and OS: 1) x 1.92 TB SSD RAID 0 |
Network | Dual 10 GbE, 4 IB EDR |
Display | 3X DisplayPort, 4K resolution |
Acoustics | < 35 dB |
Software | Ubuntu Linux Host OSDGX Recommended GPU DriverCUDA Toolkit |
System Weight | 88 lbs / 40 kg |
System Dimensions | 518 D x 256 W x 639 H (mm) |
Operating Temperature Range | 10 – 30 °C |
batchs | gpus | times | |
---|---|---|---|
half | 16 | 1 | 15.6316900253296 |
half | 16 | 2 | 25.2950036525726 |
half | 16 | 3 | 32.5298488140106 |
half | 16 | 4 | 39.5952260494232 |
half | 32 | 1 | 28.9202857017517 |
half | 32 | 2 | 26.9314527511597 |
half | 32 | 3 | 32.6970362663269 |
half | 32 | 4 | 40.0277709960938 |
half | 64 | 1 | 54.6519541740418 |
half | 64 | 2 | 36.9417870044708 |
half | 64 | 3 | 35.1460886001587 |
half | 64 | 4 | 39.9034130573273 |
half | 128 | 1 | 105.689181089401 |
half | 128 | 2 | 62.5697267055512 |
half | 128 | 3 | 50.5970776081085 |
half | 128 | 4 | 45.686126947403 |
single | 16 | 1 | 15.7001733779907 |
single | 16 | 2 | 25.2602100372314 |
single | 16 | 3 | 32.5334632396698 |
single | 16 | 4 | 39.9562275409698 |
single | 32 | 1 | 29.0114963054657 |
single | 32 | 2 | 26.9594860076904 |
single | 32 | 3 | 32.7185535430908 |
single | 32 | 4 | 39.8312091827393 |
single | 64 | 1 | 54.7226464748383 |
single | 64 | 2 | 38.2881510257721 |
single | 64 | 3 | 35.2633249759674 |
single | 64 | 4 | 40.4890751838684 |
single | 128 | 1 | 105.767976045609 |
single | 128 | 2 | 62.6480567455292 |
single | 128 | 3 | 50.3757321834564 |
single | 128 | 4 | 45.5866599082947 |
double | 16 | 1 | 15.703741312027 |
double | 16 | 2 | 25.3219473361969 |
double | 16 | 3 | 33.0831336975098 |
double | 16 | 4 | 40.441951751709 |
double | 32 | 1 | 29.0125107765198 |
double | 32 | 2 | 27.3240101337433 |
double | 32 | 3 | 33.0090951919556 |
double | 32 | 4 | 40.2768909931183 |
double | 64 | 1 | 54.7836709022522 |
double | 64 | 2 | 36.7958390712738 |
double | 64 | 3 | 35.0011682510376 |
double | 64 | 4 | 39.9146497249603 |
double | 128 | 1 | 105.872387886047 |
double | 128 | 2 | 62.9272031784058 |
double | 128 | 3 | 48.4100317955017 |
double | 128 | 4 | 45.5989670753479 |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。