This is the mail archive of the
mailing list for the GCC project.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
- From: Andrew Pinski <pinskia at physics dot uc dot edu>
- To: tbptbp at gmail dot com (tbp)
- Cc: stevenb dot gcc at gmail dot com (Steven Bosscher), ernesto at ornl dot gov (Ernest L. Williams Jr.), nbkolchin at gmail dot com (Nickolay Kolchin), richard dot guenther at gmail dot com (Richard Guenther), gcc at gcc dot gnu dot org
- Date: Sun, 12 Mar 2006 19:18:53 -0500 (EST)
- Subject: Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
> On 3/12/06, Steven Bosscher <firstname.lastname@example.org> wrote:
> > > Yes, why is the benchmark not valid?
> > It is valid. We should understand why this behavior has changed so drastically.
> This benchmark maybe useless, it still exposes a weakness of gcc4. At
> least it's not news to me:
> So that PR has been closed when gcc-devs marked all those intrinsics
> as force_inline. That's also the kludge i use with my code. The real
> problem is once you start marking some functions as force_inline, you
> upset the inlining heuristic even more creating even more silly
> inlining misses, rince, repeat.
> At the end of the day, everything is marked either force_inline or
> noinline and you'd be better off without a heuristic at all.
Actually the best way of improving the inline heuristics is to get
a real testcase (and not some benchmark) where the inline heuristics
is messed up. Now SSE intrinsics are special in that they should be
always inlined and that fact should be hidden from the user. Maybe
they should be rewritten so that they are just like the altivec
intrinsics in that it is just a plain #define and nothing special to
the user and no worrying about the inlining heuristic. I should
note that always inline was added for altivec intrinsics in the
first place and they have now since been rewritten. Also the
kernel uses always inline but I and other feels that is a mistake.