This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/58529] GCC -funroll-loops 150% slower with -march=native on x86-64
- From: "burnus at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 26 Sep 2013 07:26:41 +0000
- Subject: [Bug target/58529] GCC -funroll-loops 150% slower with -march=native on x86-64
- Auto-submitted: auto-generated
- References: <bug-58529-4 at http dot gcc dot gnu dot org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58529
Tobias Burnus <burnus at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|middle-end |target
Summary|Loop 30% faster with Intel |GCC -funroll-loops 150%
|than with GCC |slower with -march=native
| |on x86-64
--- Comment #9 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to Tobias Burnus from comment #8)
> I have to re-check why unrolling made it slower on that Xeon E5-2630
> (comment 0) but faster on the i5.
Seems to be a tuning problem. All timings on the Xeon E5-2630, but using the
-march=native compile from the i5 vs. the -march=native compilation for the
Xeon E5:
real 1.530s user 1.528s sys 0.000s i5, no unrolling
real 1.483s user 1.481s sys 0.000s Xeon, no unrolling
real 0.937s user 0.934s sys 0.002s i5, -funroll-loops
real 2.480s user 2.478s sys 0.000s Xeon, -funroll-loops
real 0.935s user 0.934s sys 0.000s Xeon, -funroll-loops max-unroll-times=7
The i5's -march=native expands into:
-march=core-avx-i -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a
-mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma
-mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2
-msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed
-mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er
-mno-avx512cd -mno-avx512pf --param l1-cache-size=32 --param
l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=core-avx-i
The Xeon's -march=native
-march=corei7-avx -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a
-mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma
-mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2
-msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase
-mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f
-mno-avx512er -mno-avx512cd -mno-avx512pf --param l1-cache-size=32 --param
l1-cache-line-size=64 --param l2-cache-size=15360 -mtune=corei7-avx
Namely:
i5: -march=core-avx-i -mrdrnd -mf16c -mfsgsbase
--param l2-cache-size=6144 -mtune=core-avx-i
Xeon: -march=corei7-avx -mno-rdrnd -mno-f16c -mno-fsgsbase
--param l2-cache-size=15360 -mtune=corei7-avx