This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/29256] [4.9/5/6 regression] loop performance regression

From: "rguenther at suse dot de" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 12 Aug 2015 07:12:31 +0000
Subject: [Bug target/29256] [4.9/5/6 regression] loop performance regression
Auto-submitted: auto-generated
References: <bug-29256-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256

--- Comment #57 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 11 Aug 2015, wschmidt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
> 
> --- Comment #56 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
> (In reply to Bill Schmidt from comment #53)
> > I'm not a fan of a tree-level unroller.  It's impossible to make good
> > decisions about unroll factors that early.  But your second approach sounds
> > quite promising to me.
> 
> I would be willing to soften this statement.  I think that an early unroller
> might well be a profitable approach for most systems with large caches and so
> forth, where if the unrolling heuristics are not completely accurate we are
> still likely to make a reasonably good decision.  However, I would expect to
> see ports with limited caches/memory to want more accurate control over
> unrolling decisions.  So I could see allowing ports to select between a GIMPLE
> unroller and an RTL unroller (I doubt anybody would want both).
> 
> In general it seems like PowerPC could benefit from more aggressive unrolling
> much of the time, provided we can also solve the related IVOPTS problems that
> cause too much register spill.
> 
> I may have an interest in working on a GIMPLE unroller, depending on how
> quickly I can complete or shed some other projects...

I think that a separate unrolling on GIMPLE would be a hard sell
due to the lack of a good cost mode.  _But_ doing unrolling as part
of another transform like we are doing now makes sense.  So does
eventually moving parts of an RTL pass involving unrolling to
GIMPLE, like modulo scheduling or SMS (leaving the scheduling part
to RTL).

Note that the RTL unroller is not enabled by default by any optimization
level and note that unfortunately the RTL unroller shares flags with
the GIMPLE level complete peeling (where it mainly controls cost 
modeling).  Oh, but it's enabled with -fprofile-use.

It's been a long time since I've done SPEC measuring with/without
-funroll-loops (or/and -fpeel-loops).  Note that these flags have
secondary effects as well:

toplev.c:    flag_web = flag_unroll_loops || flag_peel_loops;
toplev.c:    flag_rename_registers = flag_unroll_loops || flag_peel_loops;

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]