This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

From: Jim Wilson <wilson at tuliptree dot org>
To: Andrew Pinski <pinskia at gmail dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Date: Thu, 12 Oct 2017 14:48:51 -0700
Subject: Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.
Authentication-results: sourceware.org; auth=none
References: <1506095357-3334-1-git-send-email-jim.wilson@linaro.org> <CABXYE2WyDADSL_eDPKDKMj8iJcRQt9K-TbpXc7rVdKeVHp1rrg@mail.gmail.com> <CA+=Sn1=Noh74J+Y0j40GRxT9ZBCr+6UeTr3biftfdgaG96hXmg@mail.gmail.com> <CABXYE2X-FfdteB-usOfeDx8Oub66KQidP55qiX=eKzTE8-5x2w@mail.gmail.com> <CA+=Sn1nUzqv0CPVKvGjDucGVNJy_dismkUW8daqfJB-RXuKrLA@mail.gmail.com>

On Fri, 2017-09-22 at 14:11 -0700, Andrew Pinski wrote:
> On Fri, Sep 22, 2017 at 11:39 AM, Jim Wilson <jim.wilson@linaro.org>
> wrote:
> > 
> > On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski <pinskia@gmail.com>
> > wrote:
> > > 
> > > Two overall comments:
> > > * What about splitting register_offset into two different
> > > elements,
> > > one for non 128bit modes and one for 128bit (and more; OI, etc.)
> > > modes
> > > so you get better address generation right away for the simd load
> > > cases rather than having LRA/reload having to reload the address
> > > into
> > > a register.
> > I'm not sure if changing register_offset cost would make a
> > difference,
> > since costs are usually used during optimization, not during
> > address
> > generation.  This is something that I didn't think to try
> > though.  I
> > can try taking a look at this.
> It does taken into account when fwprop is propagating the addition
> into
> the MEM (the tree level is always a_1 = POINTER_PLUS_EXPR;
> MEM_REF(a_1)).
> IV-OPTS will produce much better code if the address_cost is correct.
> 
> It looks like no other pass (combine, etc.) would take that into
> account except for postreload CSE but maybe they should.

I tried increasing the cost of register_offset.  This got rid of the
reg+reg addressing mode in the middle of the main loop for lmbench
stream copy, but did not eliminate it after the main loop.

The tree optimized dump has 
  _52 = a_15 + _51;
  _53 = c_17 + _51;
  _54 = *_52;
  *_53 = _54;
and the RTL expand dump has
(insn 64 63 65 10 (set (reg:DF 96 [ _54 ])
        (mem:DF (plus:DI (reg/v/f:DI 78 [ a ])
                (reg:DI 93 [ _51 ])) [3 *_52+0 S8 A64])) "stream.c":223
-1
     (nil))
(insn 65 64 66 10 (set (mem:DF (plus:DI (reg/v/f:DI 79 [ c ])
                (reg:DI 93 [ _51 ])) [3 *_53+0 S8 A64])
        (reg:DF 96 [ _54 ])) "stream.c":223 -1
     (nil))

That may be fixable, but there is a bigger problem here which is that
increasing the costs of register_offset affects both loads and stores.
 On falkor, it is only quad-word stores that are inefficient with a
reg+reg address.  Quad-word loads with a reg+reg address are faster
than the equivalent add/ldr.  Disabling reg+reg address for quad-word
loads will hurt performance.

Since the address cost stuff makes no distinction between loads and
stores, I see no way to get the result I need by using address costs.
 I can only get the result I need by modifying the md file.

> > I did try writing a patch to modify predicates to disallow reg
> > offset
> > for 128bit modes, and that got complicated, as I had to split apart
> > a
> > number of patterns in the aarch64-simd.md file that accept both VD
> > and
> > VQ modes.  I ended up with a patch 3-4 times as big as the one I
> > submitted, without any additional performance improvement, so it
> > wasn't worth the trouble.
> > 
> > > 
> > > * Maybe adding a testcase to the testsuite to show this change.
> > Yes, I can add a testcase.
> > 
> > > 
> > > One extra comment:
> > > * should we change the generic tuning to avoid reg+reg for 128bit
> > > modes?
> > Are there other targets with a similar problem?  I only know that
> > it
> > is a problem for Falkor.  It might be a loss for some targets as it
> > is
> > replacing one instruction with two.
> Well that is why I was suggesting the address cost model change.
> Because the cost model change actually might provide better code in
> the first place and still allow for reasonable generic code to be
> produced.

The patch I posted only affects Falkor.  It doesn't change generic
code.  I don't know of any reason why we need to change generic code
here.

The Falkor core has out-of-order execution and multiple function units,
so there isn't any noticeable performance gain from trying to fix this
earlier.  Fixing this with a md file change gives optimal performance
for the testcases I've looked at.

Since I'm no longer at Linaro, I expect that someone else will take
over this patch submission.  I will create a bug report to document the
issue, to make it easier to track it and hand off to someone else.

Jim

Follow-Ups:
- Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.
  - From: Kugan Vivekanandarajah

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]