This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: WIP patch, RFC: ARM ldm/stm peephole rewrite


On Thu, 2010-03-25 at 12:58 +0100, Bernd Schmidt wrote:
> On 03/25/2010 12:42 PM, Richard Earnshaw wrote:
> 
> > 1) Please no more ML in the ARM backend.  It's bad enough that the neon
> > patterns are generated that way, but that's a massively bigger chunk of
> > code.  This bit seems more like doing it for its own sake.
> 
> Well, I didn't want to write them by hand, and the other generators were
> in ML so I was going for consistency.  What do you suggest - just delete
> the generator after I'm done with it, or write another one in C?
> 

I don't really see the need to keep the generator around, it's not like
we're going to want to rototill this code on a regular basis.  For the
same reason, I don't really see the point in re-writing it in another
language.  So once this goes into mainline, I think we should just make
the .md file the master.

> Do you think we should have a full set, or only up to four registers?

It's a case of diminishing returns.  We don't see four used all that
often (outside of memcpy operations), and I suspect we'll see longer
ones even less often.  Maybe we could go to 5 if experimentation shows
some usage, but I doubt we'll usefully (to the extent that finding them
is worth the cost of looking for them) be seeing longer than that.  But
we should probably benchmark before deciding.

> 
> > 2) Why are you generating the .md file fragment with the GPL exception
> > clause?  This file is part of core GCC not a library.
> 
> Copied from the wrong neon generator.
> 
> > 3) IIRC LDMIB with write-back trips a bug on some StrongARM chips (sadly
> > not yet obsolete).  So anything up to ARMv4 needs to avoid this.
> 
> Ok.  Is it only ldmib?
> 

IIRC only LDMIB and only with base-register write-back.

> > 4) LDM is not always the best thing.  On more recent chips for a small
> > number of regs they can be slower than individual loads, and only once
> > you go above 4 regs might it break even.  On some cores it may never be
> > faster than using LDRD (especially if the address is 64-bit aligned).
> > Don't forget than in Thumb-2 LDRD can take any two independent registers
> > with no ordering constraint.
> 
> The current peephole code doesn't handle this, does it?  Where can I
> find more information about the cases in which it is profitable?  I
> think as a code-size optimization, it should always be a win; and
> there's a Thumb PR requesting this type of transformation.
> 

No, but it's something we do need to take into account.  I think we need
a way of asking what the cost of this is vs the cost of n*ldr or n*ldrd
as appropriate (and remembering that on v5te ldrd requires 64-bit
alignment).  If we put that generic infrastructure in, then we can
select as needed for each core just by getting implementing the
appropriate answers for each core.  The key is to get the 'questions'
phrased correctly so that the answers are meaningful and accurate.

I have some groundwork for a change to the costs infrastructure in the
ARM back-end ready to commit; but its waiting for the branch before I
can push it upstream.

> > Finally, I'm presuming that as a WIP this isn't aimed at 4.5. 
> 
> It isn't.
> 

OK.

R.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]