This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses

From: "olegendo at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 03 Oct 2013 10:47:17 +0000
Subject: [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
Auto-submitted: auto-generated
References: <bug-50749-4 at http dot gcc dot gnu dot org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #16 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to bin.cheng from comment #15)
> There must be another scenario for the example, and in this case example:
> 
> int test_0 (char* p, int c)
> {
>   int r = 0;
>   r += *p++;
>   r += *p++;
>   r += *p++;
>   return r;
> }
> 
> should be translated into sth like:
>   //...
>   ldrb [rx]
>   ldrb [rx+1]
>   ldrb [rx+2]
>   add rx, rx, #3
>   //...

As mentioned above, on SH this is the case (displacement addressing mode is
selected).  However, the order of the memory accesses is not the same as in the
original source code (which is OK for non-volatile mems).

> This way all loads are independent and can be issued on super scalar
> machine.  Actuall for targets like arm which supports post-increment
> constant (other than size of memory access), it can be further changed into:
>   //...
>   ldrb [rx], #3
>   ldrb [rx-2]
>   ldrb [rx-1]
>   //...

Whether this is transformation is beneficial or not depends on the target
architecture of course.  E.g. SH2A and SH4* is 2-way super scalar, but they
have only one memory load/store unit.  Thus the loads would not be done in
parallel anyway and the latency of the post-incremented address register can be
neglected.
There is a similar case on SH regarding floating point loads.  SH ISAs (other
than SH2A) don't have a displacement addressing mode for floating point
loads/stores.  When loading adjacent memory locations it's best to use post-inc
addressing modes and when storing adjacent memory locations it's best to use
pre-dec stores.  I.e. things like

float* x = ...;
*x++ = a;
*x++ = b;
*x++ = c;

should become:

add      #8,r1
fmov.s   fr0,@r1  // store c
fmov.s   fr1,@-r1 // store b
fmov.s   fr2,@-r1 // store a


> For now auto-increment pass can't do this optimization.  I once have a patch
> for this but benchmark shows the case is not common.

To be honest, I think such optimizations as mentioned above are out of scope of
the auto-increment pass.  Of course we can try to wallpaper its shortcomings
and get some improvements here and there, but as soon as ivopts or another tree
pass is changed and outputs different sequences it will break again.
Thus I suggested to replace the auto-inc-dec pass with a more generic
addressing mode selection RTL pass (PR 56590).  Unfortunately, I don't have
anything useful yet.  But I could share some notes and thoughts regarding AMS
optimization, if you're interested.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]