[Bug libfortran/78379] Processor-specific versions for matmul
tkoenig at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Nov 17 17:56:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Here are some measurements with the AVX-enabling patch.
They were done on an AVX machine, namely gcc75 from the compile farm.
This was done with the command line
gfortran -static-libgfortran -finline-matmul-limit=0 -Ofast -o compare_mavx
compare_2.f90
Uncontidionally setting -mavx in the Makefile for matmul, with stock trunk:
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 5000 0.067 0.077 0.051 0.069
3 5000 0.193 0.218 0.157 0.194
4 5000 0.429 0.423 0.368 0.435
5 5000 0.609 0.659 0.556 0.630
7 5000 0.948 1.018 0.931 1.009
8 5000 1.608 1.251 1.589 1.715
9 5000 1.755 1.484 1.745 1.856
15 5000 2.710 2.175 2.963 3.105
16 5000 4.289 2.510 4.541 4.784
17 5000 4.411 3.032 4.675 4.888
31 5000 6.165 4.395 6.912 6.902
32 5000 8.800 4.362 8.793 8.809
33 5000 8.156 4.463 8.145 8.193
63 5000 9.727 4.364 9.709 9.716
64 5000 11.828 4.023 11.810 11.798
65 5000 10.726 4.489 10.654 10.725
127 3920 12.144 4.292 12.281 12.268
128 3829 13.829 4.484 13.807 13.841
129 3741 12.986 4.438 12.964 12.985
255 483 14.446 4.571 14.462 14.442
256 477 15.738 4.707 15.744 15.738
257 472 13.981 4.565 13.995 13.990
511 60 14.954 4.674 14.977 14.933
512 59 16.120 4.840 16.137 16.062
513 59 14.488 4.392 14.497 14.490
1023 7 15.011 3.573 15.021 14.995
1024 7 15.938 3.489 15.947 15.938
1025 7 14.670 3.568 14.683 14.627
With library-side switching
(https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01810.html):
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 5000 0.067 0.080 0.053 0.067
3 5000 0.192 0.226 0.159 0.192
4 5000 0.427 0.436 0.364 0.431
5 5000 0.588 0.664 0.543 0.621
7 5000 0.938 0.914 0.926 1.011
8 5000 1.589 1.235 1.558 1.671
9 5000 1.704 1.486 1.694 1.810
15 5000 2.638 2.175 2.854 3.031
16 5000 4.234 2.532 4.533 4.745
17 5000 4.374 3.044 4.677 4.839
31 5000 6.207 4.401 6.891 6.918
32 5000 8.824 4.364 8.614 8.603
33 5000 7.954 4.349 7.945 7.944
63 5000 8.802 4.369 9.728 9.764
64 5000 11.845 4.025 11.783 11.849
65 5000 10.753 4.595 10.719 10.753
127 3920 12.023 4.314 12.285 12.004
128 3829 13.427 4.369 13.722 13.742
129 3741 12.877 4.323 12.668 12.985
255 483 14.398 4.453 14.336 13.496
256 477 15.708 4.680 15.711 15.465
257 472 13.977 4.439 13.965 13.977
511 60 14.920 4.691 14.937 14.939
512 59 15.959 4.787 16.084 16.082
513 59 14.444 4.636 14.464 14.452
1023 7 14.978 3.448 14.979 14.980
1024 7 15.903 3.640 15.900 15.905
1025 7 14.638 3.464 14.626 14.636
With stock trunk:
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 5000 0.072 0.078 0.053 0.072
3 5000 0.199 0.224 0.165 0.200
4 5000 0.458 0.403 0.387 0.462
5 5000 0.629 0.661 0.563 0.651
7 5000 1.073 1.010 1.029 1.131
8 5000 1.671 1.234 1.637 1.760
9 5000 1.732 1.465 1.720 1.829
15 5000 2.895 2.152 3.195 3.349
16 5000 3.870 2.483 4.168 4.318
17 5000 3.976 3.029 4.253 4.424
31 5000 6.210 4.403 6.861 6.868
32 5000 7.551 4.293 7.544 7.509
33 5000 7.119 4.418 7.094 7.090
63 5000 8.742 4.377 8.753 8.728
64 5000 9.415 4.019 9.384 9.260
65 5000 8.882 4.540 8.842 8.856
127 3920 10.073 4.432 9.966 9.988
128 3829 10.556 4.469 10.552 10.405
129 3741 9.923 4.428 9.990 9.930
255 483 10.827 4.569 10.875 10.768
256 477 11.328 4.705 11.281 11.129
257 472 10.402 4.492 10.344 10.360
511 60 10.947 4.674 11.003 10.938
512 59 11.503 4.842 11.504 11.314
513 59 10.654 4.672 10.651 10.619
1023 7 10.941 3.641 10.944 10.863
1024 7 11.370 3.587 11.261 11.193
1025 7 10.734 3.601 10.652 10.704
With inlined, -Ofast without -mavx:
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 5000 8.979 0.078 0.154 0.241
3 5000 14.042 0.224 0.348 0.451
4 5000 1.686 0.435 0.500 0.707
5 5000 1.989 0.617 0.577 0.829
7 5000 2.163 0.846 0.783 1.123
8 5000 3.742 1.224 0.879 1.322
9 5000 2.764 1.420 0.996 1.458
15 5000 3.461 2.108 1.305 2.420
16 5000 4.395 2.589 1.619 2.901
17 5000 5.238 3.291 1.934 3.579
31 5000 7.207 4.434 2.347 4.385
32 5000 7.318 4.306 2.351 4.329
33 5000 7.204 4.466 2.052 4.421
63 5000 4.688 4.365 2.486 4.700
64 5000 4.246 4.022 2.480 4.664
65 5000 4.238 4.355 2.486 4.703
127 3920 4.411 4.427 2.821 4.340
128 3829 4.365 4.481 2.846 4.434
129 3741 4.427 4.441 2.828 4.396
255 483 4.561 4.569 2.972 4.517
256 477 4.666 4.701 2.905 4.685
257 472 4.520 4.573 2.974 4.550
511 60 4.669 4.675 3.075 4.666
512 59 4.823 4.843 3.095 4.835
513 59 4.655 4.672 3.077 4.651
1023 7 3.555 3.563 2.718 3.554
1024 7 3.519 3.529 2.713 3.519
1025 7 3.527 3.543 2.715 3.536
With inline version with -mavx:
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 5000 8.990 0.074 0.155 0.206
3 5000 7.488 0.212 0.304 0.396
4 5000 1.773 0.342 0.501 0.533
5 5000 2.000 0.552 0.615 0.739
7 5000 2.163 0.919 0.807 1.057
8 5000 3.369 1.388 0.905 1.578
9 5000 2.694 1.347 1.020 1.492
15 5000 3.441 2.201 1.325 2.631
16 5000 1.831 3.399 1.677 4.137
17 5000 4.554 3.461 1.976 4.120
31 5000 7.111 5.286 2.372 5.712
32 5000 8.384 5.887 2.040 6.725
33 5000 7.218 5.374 2.057 5.798
63 5000 8.131 6.107 2.477 6.418
64 5000 8.707 6.518 2.313 7.228
65 5000 7.768 6.003 2.427 4.503
127 3920 6.714 5.688 2.761 6.293
128 3829 7.067 6.688 2.777 6.880
129 3741 6.277 6.023 2.765 6.296
255 483 6.036 5.681 2.877 5.765
256 477 6.177 5.869 2.921 5.917
257 472 6.017 5.687 2.880 5.766
511 60 6.156 5.878 2.848 5.920
512 59 6.338 6.107 3.026 6.092
513 59 6.125 5.826 2.954 5.817
1023 7 4.130 4.111 2.623 4.104
1024 7 4.270 4.219 2.667 4.198
1025 7 4.206 4.159 2.616 4.149
More information about the Gcc-bugs
mailing list