[PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

xiezhiheng xiezhiheng@huawei.com
Thu Aug 20 08:24:33 GMT 2020

> -----Original Message-----
> From: Richard Sandiford [mailto:richard.sandiford@arm.com]
> Sent: Wednesday, August 19, 2020 6:06 PM
> To: xiezhiheng <xiezhiheng@huawei.com>
> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> xiezhiheng <xiezhiheng@huawei.com> writes:
> > I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first for a try,
> > including all the add/sub arithmetic intrinsics.
> >
> > Something like faddp intrinsic which only handles floating-point operations,
> > both FP and NONE flags are suitable for it because FLAG_FP will be added
> > later if the intrinsic handles floating-point operations.  And I prefer FP
> since
> > it would be more clear.
> Sounds good to me.
> > But for qadd intrinsics, they would modify FPSR register which is a scenario
> > I missed before.  And I consider to add an additional flag
> > to represent it.
> I don't think we make any attempt to guarantee that the Q flag is
> meaningful after saturating intrinsics.  To do that, we'd need to model
> the modification of the flag in the .md patterns too.
> So my preference would be to leave this out and just use NONE for the
> saturating forms too.

The problem is that the test case in the attachment has different results under -O0 and -O2.

In gimple phase statement:
  _9 = __builtin_aarch64_uqaddv2si_uuu (op0_4, op1_6);
would be treated as dead code if we set NONE flag for saturating intrinsics.
Adding FLAG_WRITE_FPSR would help fix this problem.

Even when we set FLAG_WRITE_FPSR, the uqadd insn: 
  (insn 11 10 12 2 (set (reg:V2SI 97)
        (us_plus:V2SI (reg:V2SI 98)
            (reg:V2SI 99))) {aarch64_uqaddv2si}
could also be eliminated in RTL phase because this insn will be treated as dead insn.
So I think we might also need to modify saturating instruction patterns adding the side effect of set the FPSR register.

So if we could use NONE flag for saturating intrinsics, the description of function attributes and patterns are both incorrect. 
I think I can propose another patch to fix the patterns if you agree? 

Xie Zhiheng
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.c
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20200820/62be9009/attachment.c>

More information about the Gcc-patches mailing list