[Bug target/38306] [4.4 Regression] 15% slowdown of computational kernel
jv244 at cam dot ac dot uk
gcc-bugzilla@gcc.gnu.org
Sun Nov 30 16:18:00 GMT 2008
------- Comment #4 from jv244 at cam dot ac dot uk 2008-11-30 16:17 -------
(In reply to comment #2)
> Due to the high density of branches in the code this is easily a code layout
> and/or padding issue. Different architectures have different constraints on
> their decoders and branch predictors related to branch density. Core
> introduces other branch limitations for loops that engage the loop stream
> detector.
> We do not at all try to properly optimize (or even model) this apart
> from inserting nops. YMMV with -fschedule-insns.
I'm not expert enough to understand this, but you have it right. However, it
remains a regression (on opteron)
4.4:
-O3 -march=native -funroll-loops -ffast-math ==> 5.064s
-O3 -march=native -funroll-loops -ffast-math -fschedule-insns ==> 4.396
4.3:
-O3 -march=native -funroll-loops -ffast-math ==> 4.376
-O3 -march=native -funroll-loops -ffast-math -fschedule-insns ==> 3.372
-fno-tree-reassoc has no effect.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306
More information about the Gcc-bugs
mailing list