This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/57315] LTO and/or vectorizer performance regression on salsa20 core, 4.7->4.8
- From: "zackw at panix dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 28 May 2013 20:29:49 +0000
- Subject: [Bug tree-optimization/57315] LTO and/or vectorizer performance regression on salsa20 core, 4.7->4.8
- Auto-submitted: auto-generated
- References: <bug-57315-4 at http dot gcc dot gnu dot org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57315
--- Comment #2 from Zack Weinberg <zackw at panix dot com> ---
Created attachment 30210
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30210&action=edit
self-contained test case
Here's a self-contained test case.
$ gcc-4.7 -std=c99 -O2 -march=native salsa20-regr.c && ./a.out
875.178 keys/s
$ gcc-4.8 -std=c99 -O2 -march=native salsa20-regr.c && ./a.out
808.869 keys/s
$ gcc-4.7 -std=c99 -O3 -march=native salsa20-regr.c && ./a.out
867.879 keys/s
$ gcc-4.8 -std=c99 -O3 -march=native salsa20-regr.c && ./a.out
800.794 keys/s
$ gcc-4.7 -std=c99 -O3 -fwhole-program -march=native salsa20-regr.c && ./a.out
606.605 keys/s
$ gcc-4.8 -std=c99 -O3 -fwhole-program -march=native salsa20-regr.c && ./a.out
571.935 keys/s
These numbers are stable to within about 1 key/s. So there's a 6-8% regression
from 4.7 to 4.8 regardless of optimization level, but also -O3 and -O3
-fwhole-program are inferior to -O2 for this program, with both compilers.
(-O2 -fwhole-program is within noise of just -O2 for both.)
With 4.8, -march=native on my computer expands to
-march=corei7-avx -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm
-mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx
-mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c
-mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt
--param l1-cache-size=0 --param l1-cache-line-size=0 --param l2-cache-size=256
-mtune=corei7-avx