BCT optimization

David Edelsohn dje@watson.ibm.com
Tue Sep 29 21:55:00 GMT 1998

	Michael and I have been discussing how to generalize the BCT
optimization for all of our needs.

	Currently the BCT initialization code is quite PowerPC-specific: a
GPR and a COUNT_REGISTER_REGNUM register are generated.  The iteration
count is loaded into the GPR and then transferred into the count register
performing a manual reload because constants cannot be loaded into the
PowerPC CTR directly.  This sequence clearly should be a separate pattern
to allow the machine description to load the counter in a
machine-dependent way.

	In some cases, the register allocated needs to be communicated
between the initialization pattern and the decrement_and_branch pattern.
I can see three ways to accoplish this and I am not sure which is best:

	1) Pass an empty RTX into the initialization pattern.  Have the
preparation statements in the pattern call gen_rtx_REG, filling in the
RTX.  The loop instrumentation code would then have the filled-in RTX to
pass as one of the parameters to the decrement_and_branch insn.

	2) Call a new machine description MACRO definition to initialize
the RTX to pass to both patterns.

	3) Have each machine description establish special variables for
this private communication between the two patterns.

This RTX does not always have to be a real RTX, it really is a magic
cookie passed between the two patterns.  According to Michael, the ADI
SHARC needs a loop depth parameter which could use this mechanism as well.

	Does passing an empty RTX and initializing it within a pattern as
in option (1) cause any problems or violate any GCC rules?

	I am a little concerned about option (2) because the initialization
pattern and the macro may need to duplicate a lot of code or do a lot of
backchannel communication if choosing the right register could utilize
some other information known to the pattern.

	If a particular target's implementation of the patterns needs more
than one parameter like this, I think it needs to be packed as a PARALLEL
RTX or use some additional out-of-band communication combining (1) or (2)
with (3).  I think the 0-1-infinite rule comes into play here that any
fixed number other than 1 will cause problems eventually.

Thanks, David

More information about the Gcc mailing list