This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors
- From: "hubicka at ucw dot cz" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 28 Nov 2017 18:06:36 +0000
- Subject: [Bug target/81616] Update -mtune=generic for the current Intel and AMD processors
- Auto-submitted: auto-generated
- References: <bug-81616-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #25 from Jan Hubicka <hubicka at ucw dot cz> ---
Hi,
I agree that the matric multiplication fma issue is important and hopefully it
will be fixed for GCC 8. See
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00437.html
The irregularity of tune/arch is probably originating from enabling/disabling
fma
and avx256 preferrence. I get
jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=native -mno-fma
mult.c
jh@d136:~> ./a.out
mult took 193593 clocks
jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=native -mno-fma
-mprefer-vector-width=256 mult.c
jh@d136:~> ./a.out
mult took 104745 clocks
jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=haswell
-mprefer-vector-width=256 mult.c
jh@d136:~> ./a.out
mult took 160123 clocks
jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=haswell
-mprefer-vector-width=256 -mno-fma mult.c
jh@d136:~> ./a.out
mult took 102048 clocks
90% difference on a common loop is quite noticeable.
Continuing my benchmarkings on spec2k.
This is -Ofast -march=native -mprefer-vector-width=none compared to
-Ofast -march=native -mtune=haswell -mprefer-vector-width=128.
So neither of those are win compared to -mtune=native.
164.gzip 1400 58.2 2407 * 1400 57.9 2419 *
175.vpr 1400 37.5 3731 * 1400 37.8 3704 *
176.gcc 1100 20.0 5494 * 1100 20.0 5497 *
181.mcf 1800 21.6 8324 * 1800 20.8 8660 *
186.crafty 1000 20.9 4790 * 1000 21.2 4722 *
197.parser 1800 51.4 3499 * 1800 51.8 3472 *
252.eon 1300 19.3 6749 * 1300 18.2 7143 *
253.perlbmk X X
254.gap X X
255.vortex X X
256.bzip2 1500 43.1 3483 * 1500 43.5 3444 *
300.twolf 3000 56.6 5302 * 3000 57.0 5267 *
Est. SPECint_base2000 4563
Est. SPECint2000 4591
168.wupwise 1600 30.9 5179 * 1600 29.7 5387 *
171.swim 3100 27.4 11309 * 3100 26.4 11739 *
172.mgrid 1800 31.0 5814 * 1800 26.1 6895 *
173.applu 2100 25.7 8175 * 2100 25.9 8096 *
177.mesa 1400 23.3 6006 * 1400 23.3 6001 *
178.galgel X X
179.art 2600 11.0 23702 * 2600 11.0 23718 *
183.equake 1300 13.0 10033 * 1300 13.1 9944 *
187.facerec 1900 24.0 7931 * 1900 17.2 11040 *
188.ammp 2200 34.4 6394 * 2200 35.2 6249 *
189.lucas 2000 20.3 9864 * 2000 20.8 9603 *
191.fma3d 2100 31.4 6686 * 2100 30.0 7011 *
200.sixtrack 1100 41.7 2641 * 1100 38.5 2856 *
301.apsi 2600 34.1 7630 * 2600 34.2 7612 *
Est. SPECfp_base2000 7570
Est. SPECfp2000 7947