This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Optimisation pass for dual pack architectures
- To: Michael Hayes <m dot hayes at elec dot canterbury dot ac dot nz>, egcs-patches at cygnus dot com
- Subject: Re: Optimisation pass for dual pack architectures
- From: Richard Henderson <rth at cygnus dot com>
- Date: Sat, 23 Jan 1999 20:29:16 -0800
- References: <vy90evk47g.fsf@ongaonga.elec.canterbury.ac.nz>
- Reply-To: Richard Henderson <rth at cygnus dot com>
On Fri, Jan 22, 1999 at 11:10:59PM +1300, Michael Hayes wrote:
> The most important optimisation involves packing a pair of
> instructions with a data dependency that occur within a loop.
This looks like a generally useful special case of software
pipelining. It has some special combine-like knowledge about
constructing parallels, but is useful beyond two-packs.
> is transformed into:
>
> (set (reg r0) (mult (mem (post_inc ar0)) (mem (post_inc ar1))))
> (repeat (N - 1) [
> (parallel [
> (set (reg r0) (mult (mem (post_inc ar0)) (mem (post_inc ar1))))
> (set (reg r1) (plus (reg r1) (reg r0)))])])
> (set (reg r1) (plus (reg r1) (reg r0)))
(set (reg r0) (mem (reg a0)))
(set (reg a0) (plus (reg a0) (const_int 4)))
(set (reg r1) (mem (reg a1)))
(set (reg a1) (plus (reg a1) (const_int 4)))
(set (reg r2) (mult (reg r0) (reg r1)))
(repeat (N - 1) [
(set (reg r0) (mem (reg a0)))
(set (reg a0) (plus (reg a0) (const_int 4)))
(set (reg r1) (mem (reg a1)))
(set (reg a1) (plus (reg a1) (const_int 4)))
(set (reg r3) (plus (reg r3) (reg r2)))
(set (reg r2) (mult (reg r0) (reg r1)))
])
(set (reg r3) (plus (reg r3) (reg r2)))
... on a traditional RISC machine sans madd insn. It doesn't do much
for memory latency, but does hide most of the 4 cycle fp mult and add
latency that's typical.
It looks like good code -- I'd just hesitate to name it as you did,
so strongly suggesting it's only useful for multipack targets.
r~