I'm working on a port that does loads& stores in two phases.
Every load/store is funneled through the intermediate registers "ld" and "st"
standing between memory and the rest of the register file.
Example:
ld=4(rB)
...
...
rC=ld
st=rD
8(rB)=st
rB is a base address register, rC and rD are data regs. The ... represents
load delay cycles.
The CPU has only a single instance of "ld", but the machine description
defines five in order to allow overlapping live ranges to pipeline loads.
My mov insn patterns have constraints so that a memory destination pairs with
the "st" register source, and a memory source pairs with "ld" destination
reg. The trouble is that register allocation doesn't understand the
constraint, so it loads/stores from/to random data registers.