[PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

Tue Oct 13 08:07:32 GMT 2020

xiezhiheng <xiezhiheng@huawei.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford [mailto:richard.sandiford@arm.com]
>> Sent: Thursday, August 27, 2020 4:08 PM
>> To: xiezhiheng <xiezhiheng@huawei.com>
>> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng <xiezhiheng@huawei.com> writes:
>> > I made two separate patches for these two groups for review purposes.
>> >
>> > Note: Patch for min/max intrinsics should be applied before the patch for
>> rounding intrinsics
>> >
>> > Bootstrapped and tested on aarch64 Linux platform.
>> 
>> Thanks, LGTM.  Pushed to master.
>> 
>> Richard
>
> I made the patch for multiply and multiply accumulator intrinsics.
>
> Note that bfmmlaq intrinsic is special because this instruction ignores the FPCR and does not update the FPSR exception status.
>   https://developer.arm.com/docs/ddi0596/h/simd-and-floating-point-instructions-alphabetic-order/bfmmla-bfloat16-floating-point-matrix-multiply-accumulate-into-2x2-matrix
> So I set it to the AUTO_FP flag.
>
> Bootstrapped and tested on aarch64 Linux platform.

Thanks, LGTM.  Pushed to trunk.

Richard