This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Re: Simple benchmarks with gcc-3_0-branch

To: dberlin at redhat dot com (Daniel Berlin)
Subject: Re: Simple benchmarks with gcc-3_0-branch
From: Toshi Morita <tm2 at best dot com>
Date: Sun, 18 Feb 2001 13:53:02 -0800 (PST)
Cc: gcc at gcc dot gnu dot org

Daniel Berlin wrote:

> Toshi Morita <tm2@best.com> writes:
> 
> 
> > 
> > The new register allocator may mitigate the symptom of registers thrashing
> > to the stack, but I doubt it solves the root problem of overaggressive
> > optimization by previous passes.
> 
> Okay, let's be clear.
> Besides too much spilling, and extending register lifetimes, which are
> the job solely of the register allocator, *what* exactly is the
> problem you are seeing that will be solved? 

I'm very confused by your statement that "extending register lifetimes is
the job solely of the register allocator"? It seems to me be both instruction
scheduling and CSE/GCSE function by extending select register's lifetimes.

> 
> You only mentioned those two things, and put ",etc".
> I'm not clear on what the "etc" is, so i can't agree or disagree on
> whether the new register allocator will resolve those problems, or
> could resolve them with work.

Here are some optimizations which I suspect will be difficult to undo:

loop unrolling
GIV strength reduction
interblock scheduling
speculative load motion

I'm most worried about the loop-related optimizations, though.

> 
> > 
> > > 
> > > > 
> > > > I'm guesing you have the same problem because disabling GCSE improves
> > > > code efficiency. I have often seen the current implementation of GCSE
> > > > pessimize code because it doesn't consider register pressure when
> > > > optmiziming, and often it would be much faster to recalculate a trivial
> > > > subexpression than to create a new pseudo which thrashes to the stack.
> > > > 
> > > Briggs calls this rematerialization, and implements it in his register
> > > allocator. 
> > > I'll eventually get to it.
> > 
> > It seems to me there are frequently situations where optimization A
> > enables optimization B to occur which allows optimization C to
> > occur.
> Of course, there are optimization passes i'm working on that only help
> enable other optimizations to be more effective.
> 
> > 
> > If, however, you decide the resultant code spills too many pseudos
> > and optmization A was to blame, then you would possibly need to undo
> > optimizations C and B prior to undoing optimization A.
> 
> I think you are attacking/viewing the problem from the wrong end.
> It's not the optimizations fault there is too much spilling. It's the
> register allocator's fault. 

It's not the register allocator's fault there are too many pseudos.
It's the previous optimization's fault.

> > 
> > Currently, it sounds like you would only need a register rematerializer that can
> > rematerialize values across multiple basic blocks with multiple dominators
> > and also possibly undo loop-specific optimizations such as biv-to-giv conversion?
> It already does.
> 
> Look at http://compilers.iecc.com/comparch/article/95-03-009
> Notice he raises the exact same issue you do, and the same response is
> given "We let the register allocator handle it".

The post referenced only discusses invariant loop hoisting, and not
BIV to GIV conversion nor GIV strength reduction.

> 
> > 
> > However, as more optimizations are added prior to register allocation,
> > the register rematerializer would ideally need to be modified
> > to handle those as well.
> Not really. 
> Since the register allocator is doing the spilling, it has the code
> before it spills, and thus, only needs to be able to calculate what
> the best thing to do is, and do it. It doesn't have to know how to
> undo a given optimization, only how to recalculate a given values at a
> certain point. And since it has the unspilled code, it knows how the
> value is calculated.

Consider this loop:

	for (i=0; i<32; i++) {
		c[i] = a[i] * b[i];
		(other code here)
	}

...which naively compiles down to something like this
in SH assembly:

	mov.l	#0,r1
L1:
	mov	r1,r0
	shll2	r0
	mov.l	L2,r2
	mov.l	@(r0,r2),r3
	mov.l	L3,r2
	mov.l	@(r0,r2),r4
	mul.l	r3,r4
	sts	macl,r5
	mov.l	L4,r2
	mov.l	r5,@(r0,r2)
	...
	(other code here)
	...
	mov	#32,r2
	cmp/eq	r1,r2
	bf	L1
	.balign	4
L2:
	.long	a
L3:
	.long	b
L4:
	.long	c

The loop optimizer performs strength reduction and creates:

	a_p = a;
	b_p = b;
	c_p = c;

	for (i=0; i<32; i++) {
		*c_p++ = *a_p++ * *b_p++;
		(other code here)
	}

or in SH assembly:

	mov.l	#32,r1
	mov.l	L2,r2
	mov.l	L3,r3
	mov.l	L4,r4
L1:
	mov.l	@r2+,r5
	mov.l	@r3+,r6
	mul	r5,r6
	sts	macl,r7
	mov.l	r7,@r4
	add	#4,r4
	...
	(other code here)
	...
	add	#1,r1
	mov	#32,r5
	cmp/eq	r1,r5
	bf	L1

As you can see, this code is two instructions shorter BUT 
uses two more registers than the previous version.

The Hitachi SH has 16 general-purpose registers. Imagine
that the code in (other code here) uses more than 9 registers.
In this case, the first form is prefereable to the second form
because it's faster to load a constant from memory than to
save and restore a pseudo from memory.

It would be conceptually to do this optimization by rerunning
the loop optimizer with a flag set to avoid creating new pseudos.
I suspect it would be more difficult to do this via register
rematerialization by strength increasing the GIV.

> > 
> > Therefore it seems more straightforward to provide a generic framework for
> > rerunning optimizations and suppressing problematic transformations on
> > subsequent runs.
> No, it doesn't. 
> You are still looking at this from the wrong end, IMHO.
> The register allocator is choosing what to spill, what not to spill,
> etc. 
> In fact, the only thing that knows whether something is going to be a
> spill is the register allocator.
> If you want it to know that it can and should, recalculate this value
> rather than spill, you teach it how. 

In order to do the previous optimization, you would need a loop pessimizer
which performs strength increasing on GIVs.

(Or you could just rerun the loop optimizer with the original data with
a flag set to deprecate creating new pseudos.)

> It doesn't require undoing optimizations, because if you don't spill,
> then there is no problem anyway, because you got everyting into
> registers, and everyone is happy.
> Your solution is overkill for the problem.

Writing a loop pessimizer seems more overkill to me.

> > > 
> > > > IMHO GCC desperately needs the ability to estimate register pressure
> > > > and a framework for "throttling back" the aggressiveness of optimizations
> > > > which generate new pseudos and/or extend the lifeetimes of existing pseudos
> > > > in regions of high register pressure.
> > > 
> > > No, it doesn't (well, not for the reasons you explain). It needs a
> > > register allocator that is good at knowing  when to spill things, and
> > > what to spill, etc. 
> > 
> > My first comment also applies here.
> As does mine.
> 
> > 
> > Toshi
> 

Toshi
References:
- Re: Simple benchmarks with gcc-3_0-branch
  - From: Daniel Berlin
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]