This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification
- From: Jeff Law <law at redhat dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 27 Jun 2017 16:28:22 -0600
- Subject: Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=law at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com C38D4FEEE5
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com C38D4FEEE5
- References: <5936693E.2050005@foss.arm.com>
On 06/06/2017 02:35 AM, Kyrill Tkachov wrote:
> Hi all,
>
> Another vec_merge simplification that's missing is transforming:
> (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
> into
> (vec_concat x z) if N == 1 (0b01) or
> (vec_concat y x) if N == 2 (0b10)
>
> For the testcase in this patch on aarch64 this allows us to try matching
> during combine the pattern:
> (set (reg:V2DI 78 [ x ])
> (vec_concat:V2DI
> (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
> (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
> (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) +
> 8B]+0 S8 A64])))
>
> rather than the more complex:
> (set (reg:V2DI 78 [ x ])
> (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76
> [ y ])
> (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D)
> + 8B]+0 S8 A64]))
> (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0
> S8 A64]))
> (const_int 2 [0x2])))
>
> We don't actually have an aarch64 pattern for the simplified version
> above, but it's a simple enough
> form to add, so this patch adds such a pattern that performs a
> concatenated load of two 64-bit vectors
> in adjacent memory locations as a single Q-register LDR. The new aarch64
> pattern is needed to demonstrate
> the effectiveness of the simplify-rtx change, so I've kept them together
> as one patch.
>
> Now for the testcase in the patch we can generate:
> construct_lanedi:
> ldr q0, [x0]
> ret
>
> construct_lanedf:
> ldr q0, [x0]
> ret
>
> instead of:
> construct_lanedi:
> ld1r {v0.2d}, [x0]
> ldr x0, [x0, 8]
> ins v0.d[1], x0
> ret
>
> construct_lanedf:
> ld1r {v0.2d}, [x0]
> ldr d1, [x0, 8]
> ins v0.d[1], v1.d[0]
> ret
>
> The new memory constraint Utq is needed because we need to allow only
> the Q-register addressing modes but
> the MEM expressions in the RTL pattern have 64-bit vector modes, and if
> we don't constrain them they will
> allow the D-register addressing modes during register allocation/address
> mode selection, which will produce
> invalid assembly.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
> Simplify vec_merge of vec_duplicate and vec_concat.
> * config/aarch64/constraints.md (Utq): New constraint.
> * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New
> define_insn.
>
> 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/load_v2vec_lanes_1.c: New test.
OK for the simplify-rtx bits.
jeff