Optimisation pass for dual pack architectures

Sun Jan 24 16:59:00 GMT 1999

Richard Henderson writes:
 > ... on a traditional RISC machine sans madd insn.  It doesn't do much
 > for memory latency, but does hide most of the 4 cycle fp mult and add
 > latency that's typical.

My chief motivation is that I wanted to achieve a single cycle inner
loop for a dot multiply operation on the C4x (in conjunction with some
loop optimization patches that I'll shortly submit).  

 > It looks like good code -- I'd just hesitate to name it as you did,
 > so strongly suggesting it's only useful for multipack targets.

The naming rights are open for tender ;-)  

Initially I started looking at the general loop pipelining problem for
VLIW architectures but decided to concentrate on the simpler dual pack
problem.

Michael.