This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Post-register-allocation opportunitistic optimizer?



I'm looking through H8/300h code, and I realize I want a
post-register-allocation opportunitistic optimizer.

To explain, I need to backtrack a bit. In the past, I've mentioned that
GCC handles high register pressure badly, and it should rerun the
optimizer over the original RTX with flags set to avoid generating new
pseudos.

I think this situation may be better handled by some sort of
post-register-allocation opportunitistic optimizer.

To explain simply, we would initially generate code which uses minial
scratch registers, then PRAOO would run after register allocation and
opportunistically replace slow code sequence which require no scratch
register with faster code sequences which require a scratch
register ONLY IF an unused hard register is available.

For example, GCC generates this code for a right shift by 8 on the
H8/300H:

        mov.w   e0,r2
        mov.b   r0h,r0l
        mov.b   r2l,r0h
        mov.b   r2h,r2l
        exts.w  r2
        mov.w   r2,e0

This is fast code but it uses an extra register (r2). It is undesirable if
the compiled function is complex and the register pressure is already
high. In a high register pressure case, we would probably want:

	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0
	shar.l	er0

which is slower but avoids using a scratch register, and thus avoids
spilling a register.

I know the Hitachi SH has the same problem, because the earlier
implementations (SH1 and SH2) lack a barrel shifter and have only
instructions with fixed shift counts.

My first question is: are there other processors which must choose between
code which is "faster and uses scratch register" vs. "slower and no
scratch register"? 

I'm assuming other processors have the same problem. If so, it sounds
better to implement a generic solution rather than hack a 
MACHINE_DEPENDENT_REORG.

The second question is the appropriate implementation of such a feature.
I can think of a few different implementations:

1. Hack register allocation to handle this. This sounds ugly.

2. Hack combine to understand register pressure and rerun after
   reload. This also sounds ugly.

3. New optimizer pass which runs after global alloc which 
   opportunistically replaces slow sequences with fast sequences if hard
   registers are available.

Is #3 the right solution, or are there better solutions available?

Toshi


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]