[PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

Richard Sandiford richard.sandiford@arm.com
Tue Aug 25 11:07:38 GMT 2020

xiezhiheng <xiezhiheng@huawei.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford [mailto:richard.sandiford@arm.com]
>> Sent: Friday, August 21, 2020 5:02 PM
>> To: xiezhiheng <xiezhiheng@huawei.com>
>> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
> Cut...
>> Looks like the saturating intrinsics might need a bit more thought.
>> Would you mind submitting the patch with just the other parts?
>> Those were uncontroversial and it would be a shame to hold them
>> up over this.
> Okay, I reorganized the existing patch and finished the first half of the intrinsics
> except saturating intrinsics and load intrinsics.
> Bootstrapped and tested on aarch64 Linux platform.

I know this'll be frustrating, sorry, but could you post the
2020-08-17 patch without the saturation changes?  It's going to be
easier to track and review if each patch deals with similar intrinsics.
The non-saturating part of the 2020-08-17 patch was good because it was
dealing purely with arithmetic operations.  Loads should really be a
separate change.

BTW, for something like this, it's OK to test and submit several patches
at once, so separating the patches doesn't need to mean longer test cycles.
It's just that for review purposes, it's easier if one patch does one thing.

> For load intrinsics, I have one problem when I set FLAG_READ_MEMORY for them,
> some test cases like
> gcc.target/aarch64/advsimd-intrinsics/vld2_lane_p8_indices_1.c
>   #include <arm_neon.h>
>   /* { dg-do compile } */
>   /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
>   poly8x8x2_t
>   f_vld2_lane_p8 (poly8_t * p, poly8x8x2_t v)
>   {
>     poly8x8x2_t res;
>     /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
>     res = vld2_lane_p8 (p, v, 8);
>     /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */
>     res = vld2_lane_p8 (p, v, -1);
>     return res;
>   }
> would fail in regression.  Because the first statement
>   res = vld2_lane_p8 (p, v, 8);
> would be eliminated as dead code in gimple phase but the error message is
> generated in expand pass.  So I am going to replace the second statement
>   res = vld2_lane_p8 (p, v, -1);
> with
>   res = vld2_lane_p8 (p, res, -1);
> or do you have any other suggestions?

The test is valid as-is, so it would be better not to change it.

I guess this means that we should leave the _lane loads and stores until
we implement the range checks in a different way.  This is somewhat
related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95969 ,
although your example shows that the “dummy const function” approach
might not work.

So to start with, could you just patch the non-lane loads?

> And for test case gcc.target/aarch64/arg-type-diagnostics-1.c, I return the result
> to prevent the statement
>   result = vrsra_n_s32 (arg1, arg2, a);
> from being eliminated by treated as dead code.

Hmm.  Here too I think the test is valid as-is.  I think we need
to ensure that the range check still happens even if the call is
dead code (similar to PR95969 above).

So I guess here too, it might be better to leave the _n forms to
a separate patch.

That doesn't mean we shouldn't fix the _lane and _n cases (or the
previous saturating cases).  It's just that each time we find a group
of functions that's awkward for some reason, it'd be better to deal
with those functions separately.


More information about the Gcc-patches mailing list