This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

From: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
To: Christophe Lyon <christophe dot lyon at linaro dot org>
Cc: James Greenhalgh <james dot greenhalgh at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <marcus dot shawcroft at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, nd <nd at arm dot com>
Date: Wed, 15 Nov 2017 15:58:38 +0000
Subject: Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers
Authentication-results: sourceware.org; auth=none
References: <59366A8C.9090302@foss.arm.com> <20170606131711.GD2208@arm.com> <5A034E43.5000309@foss.arm.com> <CAKdteOYdt+PtTm9v530ChYj0RXF_w8G=2VSwPPZKfe+2Nk3U5Q@mail.gmail.com>

Hi Christophe,

On 15/11/17 15:31, Christophe Lyon wrote:

Hi Kyrill,


On 8 November 2017 at 19:34, Kyrill  Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:

On 06/06/17 14:17, James Greenhalgh wrote:

On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote:

Hi all,

On top of the previous vec_merge simplifications [1] we can add this
pattern to perform
a store of a vec_concat of two 64-bit values in distinct registers as an
STP.
This avoids constructing such a vector explicitly in a register and
storing it as
a Q register.
This way for the code in the testcase we can generate:

construct_lane_1:
          ldp     d1, d0, [x0]
          fmov    d3, 1.0e+0
          fmov    d2, 2.0e+0
          fadd    d4, d1, d3
          fadd    d5, d0, d2
          stp     d4, d5, [x1, 32]
          ret

construct_lane_2:
          ldp     x2, x0, [x0]
          add     x3, x2, 1
          add     x4, x0, 2
          stp     x3, x4, [x1, 32]
          ret

instead of the current:
construct_lane_1:
          ldp     d0, d1, [x0]
          fmov    d3, 1.0e+0
          fmov    d2, 2.0e+0
          fadd    d0, d0, d3
          fadd    d1, d1, d2
          dup     v0.2d, v0.d[0]
          ins     v0.d[1], v1.d[0]
          str     q0, [x1, 32]
          ret

construct_lane_2:
          ldp     x2, x3, [x0]
          add     x0, x2, 1
          add     x2, x3, 2
          dup     v0.2d, x0
          ins     v0.d[1], x2
          str     q0, [x1, 32]
          ret

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for GCC 8?

OK.


Thanks, I've committed this and the other patches in this series after
rebasing and rebootstrapping and testing on aarch64-none-linux-gnu.
The only conflict from updating the patch was that I had to use the store_16
attribute rather than
the old store2 for the new define_insn. This is what I've committed with
r254551.

Sorry for the delay in committing.

I've noticed that the new tests fail when testing with -mabi=ilp32:
FAIL:    gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-not ins\t
FAIL:    gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
stp\td[0-9]+, d[0-9]+ 1 (found 0 times)
FAIL:    gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
stp\tx[0-9]+, x[0-9]+ 1 (found 0 times)

Sorry if this has been reported before.


Thank you for reporting this, I was not aware of it.

My patch does indeed fail to generate the optimised sequence for-mabi=ilp32.

During combine it fails to match:
Failed to match this instruction:
(set (mem:V2DF (plus:DI (reg/v/f:DI 79 [ z ])

(const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16A128])

    (vec_concat:V2DF (reg:DF 81 [ y0 ])
        (reg:DF 84 [ y1 ])))


but without the -mabi=ilp32 it does successfully match the equivalent

(set (mem:V2DF (plus:DI (reg:DI 1 x1 [ z ])

(const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16A128])

    (vec_concat:V2DF (reg:DF 81 [ y0 ])
        (reg:DF 84 [ y1 ])))

The only difference is the index register being the hard reg x1.

There's probably some subtlety in aarch64_classify_address that I'llneed to dig into.

In any case, can you please open a bug report for this so we can track it?

To be clear, the failure is just suboptimal codegen for the -mabi=ilp32case, not a wrong-code or ICE

(though it should still be fixed, of course).

Thanks again,
Kyrill

Christophe

Kyrill

Thanks,
James

2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

      * config/aarch64/aarch64-simd.md (store_pair_lanes<mode>):
      New pattern.
      * config/aarch64/constraints.md (Uml): New constraint.
      * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): New
      predicate.

2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

      * gcc.target/aarch64/store_v2vec_lanes.c: New test.

Follow-Ups:
- Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers
  - From: Christophe Lyon

References:
- Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers
  - From: Kyrill Tkachov
- Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers
  - From: Christophe Lyon

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]