This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC, PR 80689] Copy small aggregates element-wise


Hi,

I thought I sent the following email last Friday but found it in my
drafts folder right now, so let me send it now so that anybody
interested can see what the patch does on Haswell.

I have only skimmed through new messages in the thread.  I am now
looking into something else right now but will get back to this matter
next week at the latest.



On Fri, Nov 03, 2017 at 05:38:30PM +0100, Martin Jambor wrote:
>

...

> 
> Anyway, here are the numbers.  They were taken on two different
> Zen-based machines.  I am also in the process of measuring at least
> something on a Haswell machine but I started later and the machine is
> quite a bit slower so I will not have the numbers until next week (and
> not all equivalents in any way).  I found out I do not have access to
> any more modern .*Lake intel CPU.
> 

OK, I have the numbers now too.  So far I do not know why, in addition
to 416.gamess, also 465.tonto failed to compile, I will investigate
why only later.

Because the machine is quite a bit slower and everything took forever,
I have measured only unpatched trunk three times and then re-run only
those benchmarks which were more than 2% off when compiled with the
patched compiler.

Haswell SPECINT 2006 -O2 generic tuning
=======================================

 Run-time
 --------

| Benchmark      | trunk |   x1 |     % |
|----------------+-------+------+-------|
| 400.perlbench  |   775 |  777 | +0.26 |
| 401.bzip2      |  1200 | 1200 | +0.00 |
| 403.gcc        |   655 |  656 | +0.15 |
| 429.mcf        |   547 |  517 | -5.48 |
| 445.gobmk      |  1140 | 1140 | +0.00 |
| 456.hmmer      |  1130 | 1130 | +0.00 |
| 458.sjeng      |  1310 | 1300 | -0.76 |
| 462.libquantum |   758 |  751 | -0.92 |
| 464.h264ref    |  1370 | 1390 | +1.46 |
| 471.omnetpp    |   475 |  471 | -0.84 |
| 473.astar      |   870 |  867 | -0.34 |
| 483.xalancbmk  |   488 |  486 | -0.41 |

 Text size
 ---------

| Benchmark      |   trunk |      x1 |     % |
|----------------+---------+---------+-------|
| 400.perlbench  |  875874 |  875954 | +0.01 |
| 401.bzip2      |   44754 |   44754 | +0.00 |
| 403.gcc        | 2294466 | 2296098 | +0.07 |
| 429.mcf        |    8226 |    8226 | +0.00 |
| 445.gobmk      |  579778 |  579826 | +0.01 |
| 456.hmmer      |  221058 |  221058 | +0.00 |
| 458.sjeng      |   93362 |   94882 | +1.63 |
| 462.libquantum |   28314 |   28362 | +0.17 |
| 464.h264ref    |  393874 |  393922 | +0.01 |
| 471.omnetpp    |  430306 |  430418 | +0.03 |
| 473.astar      |   29362 |   29538 | +0.60 |
| 483.xalancbmk  | 2361298 | 2361506 | +0.01 |

Haswell SPECINT 2006 -Ofast native tuning
=========================================

 Run-time
 --------

| Benchmark      | trunk |   x1 |     % |
|----------------+-------+------+-------|
| 400.perlbench  |   802 |  803 | +0.12 |
| 401.bzip2      |  1180 | 1170 | -0.85 |
| 403.gcc        |   646 |  647 | +0.15 |
| 429.mcf        |   543 |  508 | -6.45 |
| 445.gobmk      |  1130 | 1130 | +0.00 |
| 456.hmmer      |   529 |  532 | +0.57 |
| 458.sjeng      |  1260 | 1260 | +0.00 |
| 462.libquantum |   764 |  761 | -0.39 |
| 464.h264ref    |  1280 | 1290 | +0.78 |
| 471.omnetpp    |   476 |  464 | -2.52 |
| 473.astar      |   844 |  843 | -0.12 |
| 483.xalancbmk  |   480 |  476 | -0.83 |

 Text size
 ---------

| Benchmark      |   trunk |      x1 |     % |
|----------------+---------+---------+-------|
| 400.perlbench  | 1130994 | 1131058 | +0.01 |
| 401.bzip2      |   77346 |   77346 | +0.00 |
| 403.gcc        | 3099938 | 3101826 | +0.06 |
| 429.mcf        |   10162 |   10162 | +0.00 |
| 445.gobmk      |  766706 |  766786 | +0.01 |
| 456.hmmer      |  346610 |  346610 | +0.00 |
| 458.sjeng      |  143650 |  145522 | +1.30 |
| 462.libquantum |   30986 |   31066 | +0.26 |
| 464.h264ref    |  725218 |  725266 | +0.01 |
| 471.omnetpp    |  546386 |  546642 | +0.05 |
| 473.astar      |   38690 |   38914 | +0.58 |
| 483.xalancbmk  | 3313746 | 3313922 | +0.01 |

Haswell SPECFP 2006 -O2 generic tuning
======================================

 Run-time
 --------

| Benchmark     | trunk |   x1 |     % |
|---------------+-------+------+-------|
| 410.bwaves    |   833 |  831 | -0.24 |
| 416.gamess    |    NR |   NR |       |
| 433.milc      |   820 |  814 | -0.73 |
| 434.zeusmp    |   950 |  949 | -0.11 |
| 435.gromacs   |   945 |  946 | +0.11 |
| 436.cactusADM |  1380 | 1380 | +0.00 |
| 437.leslie3d  |   813 |  812 | -0.12 |
| 444.namd      |   983 |  983 | +0.00 |
| 447.dealII    |   755 |  759 | +0.53 |
| 450.soplex    |   467 |  464 | -0.64 |
| 453.povray    |   402 |  395 | -1.74 |
| 454.calculix  |  1980 | 1980 | +0.00 |
| 459.GemsFDTD  |   765 |  753 | -1.57 |
| 465.tonto     |    NR |   NR |       |
| 470.lbm       |   806 |  806 | +0.00 |
| 481.wrf       |  1330 | 1330 | +0.00 |
| 482.sphinx3   |  1380 | 1380 | +0.00 |

 Text size
 ---------

| Benchmark     |   trunk |      x1 |     % |
|---------------+---------+---------+-------|
| 410.bwaves    |   25954 |   25954 | +0.00 |
| 433.milc      |   87922 |   87922 | +0.00 |
| 434.zeusmp    |  212034 |  212034 | +0.00 |
| 435.gromacs   |  747026 |  747026 | +0.00 |
| 436.cactusADM |  526178 |  526178 | +0.00 |
| 437.leslie3d  |   83234 |   83234 | +0.00 |
| 444.namd      |  297234 |  297266 | +0.01 |
| 447.dealII    | 2165282 | 2172290 | +0.32 |
| 450.soplex    |  347122 |  347122 | +0.00 |
| 453.povray    |  800914 |  801570 | +0.08 |
| 454.calculix  | 1342802 | 1342802 | +0.00 |
| 459.GemsFDTD  |  353410 |  354050 | +0.18 |
| 470.lbm       |    9202 |    9202 | +0.00 |
| 481.wrf       | 3345170 | 3345170 | +0.00 |
| 482.sphinx3   |  125026 |  125026 | +0.00 |

Haswell SPECFP 2006 -Ofast native tuning
========================================

 Run-time
 --------

| Benchmark     | trunk |   x1 |     % |
|---------------+-------+------+-------|
| 410.bwaves    |   551 |  550 | -0.18 |
| 416.gamess    |    NR |   NR |       |
| 433.milc      |   773 |  776 | +0.39 |
| 434.zeusmp    |   660 |  660 | +0.00 |
| 435.gromacs   |   876 |  874 | -0.23 |
| 436.cactusADM |   620 |  619 | -0.16 |
| 437.leslie3d  |   501 |  501 | +0.00 |
| 444.namd      |   974 |  974 | +0.00 |
| 447.dealII    |   722 |  720 | -0.28 |
| 450.soplex    |   459 |  457 | -0.44 |
| 453.povray    |   416 |  410 | -1.44 |
| 454.calculix  |   883 |  882 | -0.11 |
| 459.GemsFDTD  |   625 |  614 | -1.76 |
| 465.tonto     |    NR |   NR |       |
| 470.lbm       |   783 |  781 | -0.26 |
| 481.wrf       |   748 |  746 | -0.27 |
| 482.sphinx3   |  1020 | 1020 | +0.00 |

 Text size
 ---------

| Benchmark     |   trunk |      x1 |     % |
|---------------+---------+---------+-------|
| 410.bwaves    |   30802 |   30802 | +0.00 |
| 433.milc      |  122450 |  122450 | +0.00 |
| 434.zeusmp    |  613458 |  613458 | +0.00 |
| 435.gromacs   |  957922 |  957922 | +0.00 |
| 436.cactusADM |  763794 |  763794 | +0.00 |
| 437.leslie3d  |  154690 |  154690 | +0.00 |
| 444.namd      |  311282 |  311314 | +0.01 |
| 447.dealII    | 2486482 | 2493202 | +0.27 |
| 450.soplex    |  436322 |  436322 | +0.00 |
| 453.povray    | 1088034 | 1088962 | +0.09 |
| 454.calculix  | 1701410 | 1701410 | +0.00 |
| 459.GemsFDTD  |  560642 |  560658 | +0.00 |
| 470.lbm       |    9458 |    9458 | +0.00 |
| 481.wrf       | 5413554 | 5413778 | +0.00 |
| 482.sphinx3   |  190034 |  190034 | +0.00 |

Haswell SPEC INTrate 2017 -O2 generic tuning
============================================

 Run-time
 --------

| Benchmark       | trunk |   x1 |     % |
|-----------------+-------+------+-------|
| 500.perlbench_r |  1201 | 1204 | +0.25 |
| 502.gcc_r       |   798 |  793 | -0.63 |
| 505.mcf_r       |  1038 | 1049 | +1.06 |
| 520.omnetpp_r   |   825 |  824 | -0.12 |
| 523.xalancbmk_r |   985 |  981 | -0.41 |
| 525.x264_r      |  1463 | 1463 | +0.00 |
| 531.deepsjeng_r |   954 |  956 | +0.21 |
| 541.leela_r     |  1570 | 1569 | -0.06 |
| 548.exchange2_r |  1266 | 1267 | +0.08 |
| 557.xz_r        |  1033 | 1029 | -0.39 |

 Test size
 ---------
 
| Benchmark       |   trunk |      x1 |     % |
|-----------------+---------+---------+-------|
| 500.perlbench_r | 1599442 | 1599522 | +0.01 |
| 502.gcc_r       | 6757602 | 6759090 | +0.02 |
| 505.mcf_r       |   16098 |   16098 | +0.00 |
| 520.omnetpp_r   | 1262498 | 1264034 | +0.12 |
| 523.xalancbmk_r | 3989026 | 3989202 | +0.00 |
| 525.x264_r      |  414130 |  414194 | +0.02 |
| 531.deepsjeng_r |   67426 |   67458 | +0.05 |
| 541.leela_r     |  219378 |  219378 | +0.00 |
| 548.exchange2_r |   61234 |   61234 | +0.00 |
| 557.xz_r        |  111490 |  111490 | +0.00 |

Haswell SPEC INTrate 2017 -Ofast native tuning
==============================================

 Run-time
 --------

| Benchmark       | trunk |   x1 |      % |
|-----------------+-------+------+--------|
| 500.perlbench_r |  1169 | 1170 |  +0.09 |
| 502.gcc_r       |   786 |  788 |  +0.25 |
| 505.mcf_r       |  1034 | 1032 |  -0.19 |
| 520.omnetpp_r   |   804 |  794 |  -1.24 |
| 523.xalancbmk_r |   962 |  971 |  +0.94 |
| 525.x264_r      |   886 |  887 |  +0.11 |
| 531.deepsjeng_r |   939 |  944 |  +0.53 |
| 541.leela_r     |  1462 | 1461 |  -0.07 |
| 548.exchange2_r |  1078 | 1082 |  +0.37 |
| 557.xz_r        |   960 |  950 |  -1.04 |

 Text size
 ---------

| Benchmark       |   trunk |      x1 |     % |
|-----------------+---------+---------+-------|
| 500.perlbench_r | 2074450 | 2074498 | +0.00 |
| 502.gcc_r       | 8434514 | 8437250 | +0.03 |
| 505.mcf_r       |   26322 |   26322 | +0.00 |
| 520.omnetpp_r   | 1680082 | 1682130 | +0.12 |
| 523.xalancbmk_r | 4853458 | 4853682 | +0.00 |
| 525.x264_r      |  594210 |  594210 | +0.00 |
| 531.deepsjeng_r |   88050 |   88082 | +0.04 |
| 541.leela_r     |  269298 |  269314 | +0.01 |
| 548.exchange2_r |  114098 |  114098 | +0.00 |
| 557.xz_r        |  152354 |  152354 | +0.00 |

Haswell SPEC FP rate 2017 - generic tuning
==========================================

 Run-time
 --------

| Benchmark       | trunk |   x1 |      % |
|-----------------+-------+------+--------|
| 503.bwaves_r    |  2319 | 2343 |  +1.03 |
| 507.cactuBSSN_r |  1023 |  975 |  -4.69 |
| 508.namd_r      |   934 |  935 |  +0.11 |
| 510.parest_r    |  1391 | 1413 |  +1.58 |
| 511.povray_r    |  1544 | 1570 |  +1.68 |
| 519.lbm_r       |   920 |  920 |  +0.00 |
| 521.wrf_r       |  2955 | 2958 |  +0.10 |
| 526.blender_r   |   976 |  974 |  -0.20 |
| 527.cam4_r      |  1580 | 1586 |  +0.38 |
| 538.imagick_r   |  1758 | 1581 | -10.07 |
| 544.nab_r       |  1357 | 1356 |  -0.07 |
| 549.fotonik3d_r |  1063 | 1077 |  +1.32 |
| 554.roms_r      |  1280 | 1283 |  +0.23 |

 Text size
 ---------
 
| Benchmark       |    trunk |       x1 |     % |
|-----------------+----------+----------+-------|
| 503.bwaves_r    |    32034 |    32034 | +0.00 |
| 507.cactuBSSN_r |  2951634 |  2951634 | +0.00 |
| 508.namd_r      |   837458 |   837490 | +0.00 |
| 510.parest_r    |  6540866 |  6546754 | +0.09 |
| 511.povray_r    |   803618 |   804274 | +0.08 |
| 519.lbm_r       |    12018 |    12018 | +0.00 |
| 521.wrf_r       | 16292962 | 16296978 | +0.02 |
| 526.blender_r   |  7268224 |  7282608 | +0.20 |
| 527.cam4_r      |  5063666 |  5065010 | +0.03 |
| 538.imagick_r   |  1608178 |  1609282 | +0.07 |
| 544.nab_r       |   156242 |   156242 | +0.00 |
| 549.fotonik3d_r |   326738 |   326738 | +0.00 |
| 554.roms_r      |   728546 |   728546 | +0.00 |

Haswell SPEC FP rate 2017 - native tuning
=========================================

 Run-time
 --------

| Benchmark       | trunk |   x1 |     % |
|-----------------+-------+------+-------|
| 503.bwaves_r    |   919 |  919 | +0.00 |
| 507.cactuBSSN_r |   864 |  853 | -1.27 |
| 508.namd_r      |   924 |  924 | +0.00 |
| 510.parest_r    |  1219 | 1220 | +0.08 |
| 511.povray_r    |  1597 | 1624 | +1.69 |
| 519.lbm_r       |   851 |  851 | +0.00 |
| 521.wrf_r       |  1591 | 1594 | +0.19 |
| 526.blender_r   |   912 |  920 | +0.88 |
| 527.cam4_r      |  1296 | 1309 | +1.00 |
| 538.imagick_r   |  1227 | 1207 | -1.63 |
| 544.nab_r       |  1278 | 1278 | +0.00 |
| 549.fotonik3d_r |    VE |   VE |       |
| 554.roms_r      |  1036 | 1037 | +0.10 |

 Text size
 ---------

| Benchmark       |    trunk |       x1 |     % |
|-----------------+----------+----------+-------|
| 503.bwaves_r    |    39426 |    39426 | +0.00 |
| 507.cactuBSSN_r |  3991794 |  3991794 | +0.00 |
| 508.namd_r      |   956450 |   956466 | +0.00 |
| 510.parest_r    |  7341122 |  7345426 | +0.06 |
| 511.povray_r    |  1083010 |  1083938 | +0.09 |
| 519.lbm_r       |    11826 |    11826 | +0.00 |
| 521.wrf_r       | 22028578 | 22032098 | +0.02 |
| 526.blender_r   |  9698768 |  9718544 | +0.20 |
| 527.cam4_r      |  6738562 |  6740050 | +0.02 |
| 538.imagick_r   |  2246674 |  2247122 | +0.02 |
| 544.nab_r       |   211378 |   211378 | +0.00 |
| 549.fotonik3d_r |   582626 |   582626 | +0.00 |
| 554.roms_r      |  1085234 |  1085234 | +0.00 |

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]