This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
BCT optimization
- To: Jeffrey Law <law at cygnus dot com>
- Subject: BCT optimization
- From: David Edelsohn <dje at watson dot ibm dot com>
- Date: Tue, 29 Sep 1998 15:10:51 -0400
- Cc: Michael Hayes <m dot hayes at elec dot canterbury dot ac dot nz>, Michael Hayes <michaelh at ongaonga dot chch dot cri dot nz>, egcs at cygnus dot com
Michael and I have been discussing how to generalize the BCT
optimization for all of our needs.
Currently the BCT initialization code is quite PowerPC-specific: a
GPR and a COUNT_REGISTER_REGNUM register are generated. The iteration
count is loaded into the GPR and then transferred into the count register
performing a manual reload because constants cannot be loaded into the
PowerPC CTR directly. This sequence clearly should be a separate pattern
to allow the machine description to load the counter in a
machine-dependent way.
In some cases, the register allocated needs to be communicated
between the initialization pattern and the decrement_and_branch pattern.
I can see three ways to accoplish this and I am not sure which is best:
1) Pass an empty RTX into the initialization pattern. Have the
preparation statements in the pattern call gen_rtx_REG, filling in the
RTX. The loop instrumentation code would then have the filled-in RTX to
pass as one of the parameters to the decrement_and_branch insn.
2) Call a new machine description MACRO definition to initialize
the RTX to pass to both patterns.
3) Have each machine description establish special variables for
this private communication between the two patterns.
This RTX does not always have to be a real RTX, it really is a magic
cookie passed between the two patterns. According to Michael, the ADI
SHARC needs a loop depth parameter which could use this mechanism as well.
Does passing an empty RTX and initializing it within a pattern as
in option (1) cause any problems or violate any GCC rules?
I am a little concerned about option (2) because the initialization
pattern and the macro may need to duplicate a lot of code or do a lot of
backchannel communication if choosing the right register could utilize
some other information known to the pattern.
If a particular target's implementation of the patterns needs more
than one parameter like this, I think it needs to be packed as a PARALLEL
RTX or use some additional out-of-band communication combining (1) or (2)
with (3). I think the 0-1-infinite rule comes into play here that any
fixed number other than 1 will cause problems eventually.
Thanks, David