[PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

Richard Sandiford richard.sandiford@arm.com
Thu Jul 16 12:41:39 GMT 2020


xiezhiheng <xiezhiheng@huawei.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford [mailto:richard.sandiford@arm.com]
>> Sent: Tuesday, July 7, 2020 10:08 PM
>> To: xiezhiheng <xiezhiheng@huawei.com>
>> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng <xiezhiheng@huawei.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford [mailto:richard.sandiford@arm.com]
>> >> Sent: Monday, July 6, 2020 5:31 PM
>> >> To: xiezhiheng <xiezhiheng@huawei.com>
>> >> Cc: Richard Biener <richard.guenther@gmail.com>;
>> gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> >> emitted at -O3
>> >>
>> >> No, this is unfortunately a known bug.  See:
>> >>
>> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964
>> >>
>> >> (Although the PR is recent, it's been a known bug for longer.)
>> >>
>> >> As you say, the difficulty is that the correct attributes depend on what
>> >> the built-in function does.  Most integer arithmetic is “const”, but
>> things
>> >> get more complicated for floating-point arithmetic.
>> >>
>> >> The SVE intrinsics use a three stage process:
>> >>
>> >> - each function is classified into one of several groups
>> >> - each group has a set of flags that describe what functions in the
>> >>   group can do
>> >> - these flags get converted into attributes based on the current
>> >>   command-line options
>> >>
>> >> I guess we should have something similar for the arm_neon.h built-ins.
>> >>
>> >> If you're willing to help fix this, that'd be great.  I think a first
>> >> step would be to agree a design.
>> >>
>> >> Thanks,
>> >> Richard
>> >
>> > I'd like to have a try.
>> 
>> Great!
>> 
>> > I have checked the steps in SVE intrinsics.
>> > It defines a base class "function_base" and derives different classes
>> > to describe several intrinsics for each.  And each class may
>> > have its own unique flags described in virtual function "call_properties".
>> > The specific attributes will be converted from these flags in
>> > "get_attributes" later.
>> >
>> > I find that there are more than 100 classes in total and if I only
>> > need to classify them into different groups by attributes, maybe
>> > we does not need so many classes?
>> 
>> Yeah, I agree.
>> 
>> Long term, there might be value in defining arm_neon.h in a similar
>> way to arm_sve.h: i.e. have arm_neon.h defer most of the work to
>> a special compiler pragma.  But that's going to be a lot of work.
>> 
>> I think it's possible to make incremental improvements to the current
>> arm_neon.h implementation without that work being thrown away if we
>> ever
>> did switch to a pragma in future.  And the incremental approach seems
>> more practical.
>> 
>> > The difficult thing I think is how to classify neon intrinsics into
>> > different groups.  I'm going to follow up the way in SVE intrinsics
>> > first now.
>> 
>> For now I'd suggest just giving a name to each combination of flags
>> that the intrinsics need, rather than splitting instructions in a
>> more fine-grained way.  (It's not at all obvious from the final state
>> of the SVE code, but even there, the idea was to have as few groups as
>> possible.  I.e. the groups were supposedly only split where necessary.
>> As you say, there still ended up being a lot of groups in the end…)
>> 
>> It'd be easier to review if the work was split up into smaller steps.
>> E.g. maybe one way would be this, with each number being a single
>> patch:
>> 
>> (1) (a) Add a flags field to the built-in function definitions
>>         that for now is always zero.
>>     (b) Pick a name N to describe the most conservative set of flags.
>>     (c) Make every built-in function definition use N.
>> 
>
> I have finished the first part.
>
> (a) I add a new parameter called FLAG to every built-in function macro.
>
> (b) I define some flags in aarch64-builtins.c
> FLAG_NONE for no needed flags
> FLAG_READ_FPCR for functions will read FPCR register
> FLAG_RAISE_FP_EXCEPTIONS for functions will raise fp exceptions
> FLAG_READ_MEMORY for functions will read global memory
> FLAG_PREFETCH_MEMORY for functions will prefetch data to memory
> FLAG_WRITE_MEMORY for functions will write global memory
>
> FLAG_FP is used for floating-point arithmetic
> FLAG_ALL is all flags above
>
> (c) I add a field in struct aarch64_simd_builtin_datum to record flags
> for each built-in function.  But the default flags I set for built-in functions
> are FLAG_ALL because by default the built-in functions might do anything.
>
> And bootstrap and regression are tested ok on aarch64 Linux platform.

This looks great.

The patch is OK for trunk, but could you send a changelog too,
so that I can include it in the commit message?

Thanks,
Richard


More information about the Gcc-patches mailing list