This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: paired register loads and stores
- From: Rask Ingemann Lambertsen <rask at sygehus dot dk>
- To: Erich Plondke <eplondke at gmail dot com>
- Cc: gcc <gcc at gcc dot gnu dot org>
- Date: Wed, 25 Apr 2007 10:18:38 +0200
- Subject: Re: paired register loads and stores
- References: <deb8826f0609282227x6b7e96f7s141fc04abaf78372@mail.gmail.com>
On Fri, Sep 29, 2006 at 05:27:10AM +0000, Erich Plondke wrote:
> rs6000 and Sparc ports seem to use a peephole2 to get the ldd or lfq
> instructions (respectively), but it looks like there's no reason for
> the register allocater to allocate registers together. The peephole2
> just picks up loads to adjacent memory locations if the allocater
> happens to choose adjacent registers (is that correct?) or the
> variables are specified as living in hard registers with the help
> of an asm.
>
> Several other architectures have paired loads: some ARM targets have ldrd
> which can be cheaper than a ldm, and ia64 has a pair load.
>
> It seems like GCC does a good job of knowing how to modify register-
> sized subregs of two- or four-register larger modes. So if I could
> tell GCC to turn:
>
> [(set (reg:SI X) (mem:SI (addr)))
> (set (reg:SI Y) (mem:SI (addr+4)))]
>
> (where addr is aligned to DI) into something like:
> [(set (reg:DI T) (mem:DI (addr)))
> (set (reg:SI X) (subreg:SI (reg:DI T) 0))
> (set (reg:SI Y) (subreg:SI (reg:DI T) 4))]
>
> and I could do so early enough, GCC would know to access the subregs
> directly in instruction(s) using the loaded values, and I would end up
> loading
> the register pair and using the individual elements. But it has to
> be done early on; after register allocation even if I could get a
> DI temporary I'd probably have the two SI moves and that's probably
> not a win.
You may have success using the combine pass to do this. The difficulty is
that combine only tries to combine instructions when the LOG_LINKS field is
set up. I think this only happens for plain SET insns when subregs are
involved, e.g.
(set (subreg:SI (reg:DI T) 0) (mem:SI addr))
(set (subreg:SI (reg:DI T) 4) (mem:SI addr+4))
For example, I don't know how to make this work with adjecent structure
fields. You could try to extend the optimization that GCC already does for
loading adjecent structure fields smaller than a word; the one enabled by
SLOW_BYTE_ACCESS.
--
Rask Ingemann Lambertsen