This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: register allocation vs. scheduling and other stuff

From: tm <tm at mail dot kloo dot net>
To: lucier at math dot purdue dot edu
Cc: gcc at gcc dot gnu dot org
Date: Mon, 6 Jan 2003 15:42:12 -0800 (PST)
Subject: Re: register allocation vs. scheduling and other stuff

Brad Lucier wrote:

>I've been playing around with -fnew-ra, -fno-trapping-math, and various
>schedule options for 3.4 on powerpc-darwin.
>
>For a molecular energy minization code, where an affine transformation
>is applied consecutively to the location of each atom in the molecule, 
>
>-O1 -fno-trapping-math -fschedule-insns2 -fnew-ra -mcpu=7400
>
>works very well, the code is a bunch of overlapped loads, stores,
>and floating-point operations, but
>
>-O1 -fno-trapping-math -fschedule-insns -fschedule-insns2 -fnew-ra
>-mcpu=7400
>
>which also schedules *before* register allocation is 50% slower, since
>the schedule pass before hard register allocation loads *all* the x-y-z
>information for all the atoms into pseudo-registers at the top of the
>routine,
>and requires many moves between the stack and registers when these values
>are actually needed for computations.

Actually, I've seen it much worse than this.

The PowerPC has 32 registers. The SH only has 16 registers, and
when the first instruction scheduling was enabled, the code could
run over 3x slower in extreme cases. I would see assembly listings where
about 80% of the page was spent thrashing registers to/from the stack.

>Perhaps, since -fno-trapping-math is a relatively new option, this is
>a recent concern.
>
>I've heard people on these lists talk about making scheduling smarter, so
>it knows something about register pressure.  It seems that the general
>solution would be to do register allocation and scheduling together.

It may be possible to have something simpler. In many cases where I've
seen this problem, it's because the scheduler has hoisted multiple loads
up too high which starves the register allocator. If the register
lifetimes could be shortened by moving down the loads, much of the problem
would be solved.

>Since we will have a new register allocator for 3.4 that is based on
>graph coloring, perhaps one could add a new flag
>
>--fuse-all-registers
>
>that would not try to use the *minimum* number of colors to color an
>interference graph, but, if the minimum number is less than the actual
>number of available registers, it could use the actual numbers of
>registers to color the graph, perhaps guided by liveness information.

This is a kludgy solution IMHO.

1. If you have multiple functions in a source file, then you wind up
applying this option to all the functions in that file, and to have
function-level granularity you would need to split the file.

2. This doesn't permit basic-block level control of the optimization.

3. Most compiler end-users don't understand the concept of machine
registers much less "high register pressure". In order to be really
effective, it needs to be enabled automagically when needed without user
intervention.

Toshi

Follow-Ups:
- Re: register allocation vs. scheduling and other stuff
  - From: Brad Lucier

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]