Optimisation pass for dual pack architectures
Michael Hayes
m.hayes@elec.canterbury.ac.nz
Sun Jan 24 16:59:00 GMT 1999
Richard Henderson writes:
> ... on a traditional RISC machine sans madd insn. It doesn't do much
> for memory latency, but does hide most of the 4 cycle fp mult and add
> latency that's typical.
My chief motivation is that I wanted to achieve a single cycle inner
loop for a dot multiply operation on the C4x (in conjunction with some
loop optimization patches that I'll shortly submit).
> It looks like good code -- I'd just hesitate to name it as you did,
> so strongly suggesting it's only useful for multipack targets.
The naming rights are open for tender ;-)
Initially I started looking at the general loop pipelining problem for
VLIW architectures but decided to concentrate on the simpler dual pack
problem.
Michael.
More information about the Gcc-patches
mailing list