[PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

Richard Sandiford richard.sandiford@arm.com
Mon Jul 6 09:31:15 GMT 2020


xiezhiheng <xiezhiheng@huawei.com> writes:
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: Thursday, July 2, 2020 10:46 PM
>> To: xiezhiheng <xiezhiheng@huawei.com>
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> On Thu, Jul 2, 2020 at 3:22 PM xiezhiheng <xiezhiheng@huawei.com> wrote:
>> >
>> > Hi,
>> >
>> > This is a fix for pr94442.
>> > I modify get_inner_reference to handle the case for MEM[ptr, off].
>> > I extract the "off" and add it to the recorded offset, then I build a
>> > MEM[ptr, 0] and return it later.
>> >
>> > diff --git a/gcc/expr.c b/gcc/expr.c
>> > index 3c68b0d754c..8cc18449a0c 100644
>> > --- a/gcc/expr.c
>> > +++ b/gcc/expr.c
>> > @@ -7362,7 +7362,8 @@ tree
>> >  get_inner_reference (tree exp, poly_int64_pod *pbitsize,
>> >                      poly_int64_pod *pbitpos, tree *poffset,
>> >                      machine_mode *pmode, int *punsignedp,
>> > -                    int *preversep, int *pvolatilep)
>> > +                    int *preversep, int *pvolatilep,
>> > +                    bool include_memref_p)
>> >  {
>> >    tree size_tree = 0;
>> >    machine_mode mode = VOIDmode;
>> > @@ -7509,6 +7510,21 @@ get_inner_reference (tree exp, poly_int64_pod
>> *pbitsize,
>> >                 }
>> >               exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
>> >             }
>> > +         else if (include_memref_p
>> > +                  && TREE_CODE (TREE_OPERAND (exp, 0)) ==
>> SSA_NAME)
>> > +           {
>> > +             tree off = TREE_OPERAND (exp, 1);
>> > +             if (!integer_zerop (off))
>> > +               {
>> > +                 poly_offset_int boff = mem_ref_offset (exp);
>> > +                 boff <<= LOG2_BITS_PER_UNIT;
>> > +                 bit_offset += boff;
>> > +
>> > +                 exp = build2 (MEM_REF, TREE_TYPE (exp),
>> > +                               TREE_OPERAND (exp, 0),
>> > +                               build_int_cst (TREE_TYPE (off), 0));
>> > +               }
>> > +           }
>> >           goto done;
>> >
>> >         default:
>> > @@ -10786,7 +10802,7 @@ expand_expr_real_1 (tree exp, rtx target,
>> machine_mode tmode,
>> >         int reversep, volatilep = 0, must_force_mem;
>> >         tree tem
>> >           = get_inner_reference (exp, &bitsize, &bitpos, &offset,
>> &mode1,
>> > -                                &unsignedp, &reversep, &volatilep);
>> > +                                &unsignedp, &reversep, &volatilep,
>> true);
>> >         rtx orig_op0, memloc;
>> >         bool clear_mem_expr = false;
>> >
>> > diff --git a/gcc/tree.h b/gcc/tree.h
>> > index a74872f5f3e..7df0d15f7f9 100644
>> > --- a/gcc/tree.h
>> > +++ b/gcc/tree.h
>> > @@ -6139,7 +6139,8 @@ extern bool complete_ctor_at_level_p
>> (const_tree, HOST_WIDE_INT, const_tree);
>> >     look for the ultimate containing object, which is returned and specify
>> >     the access position and size.  */
>> >  extern tree get_inner_reference (tree, poly_int64_pod *, poly_int64_pod
>> *,
>> > -                                tree *, machine_mode *, int *, int *,
>> int *);
>> > +                                tree *, machine_mode *, int *, int *,
>> int *,
>> > +                                bool = false);
>> >
>> >  extern tree build_personality_function (const char *);
>> >
>> >
>> > I add an argument "include_memref_p" to control whether to go into
>> MEM_REF,
>> > because without it will cause the test case "Warray-bounds-46.c" to fail in
>> regression.
>> >
>> > It because function set_base_and_offset in gimple-ssa-warn-restrict.c
>> >   base = get_inner_reference (expr, &bitsize, &bitpos, &var_off,
>> >                               &mode, &sign, &reverse, &vol);
>> >   ...
>> >   ...
>> >   if (TREE_CODE (base) == MEM_REF)
>> >     {
>> >       tree memrefoff = fold_convert (ptrdiff_type_node, TREE_OPERAND
>> (base, 1));
>> >       extend_offset_range (memrefoff);
>> >       base = TREE_OPERAND (base, 0);
>> >
>> >       if (refoff != HOST_WIDE_INT_MIN
>> >           && TREE_CODE (expr) == COMPONENT_REF)
>> >         {
>> >           /* Bump up the offset of the referenced subobject to reflect
>> >              the offset to the enclosing object.  For example, so that
>> >              in
>> >                struct S { char a, b[3]; } s[2];
>> >                strcpy (s[1].b, "1234");
>> >              REFOFF is set to s[1].b - (char*)s.  */
>> >           offset_int off = tree_to_shwi (memrefoff);
>> >           refoff += off;
>> >         }
>> >
>> >       if (!integer_zerop (memrefoff))       <=================
>> >         /* A non-zero offset into an array of struct with flexible array
>> >            members implies that the array is empty because there is no
>> >            way to initialize such a member when it belongs to an array.
>> >            This must be some sort of a bug.  */
>> >         refsize = 0;
>> >     }
>> >
>> > needs MEM_REF offset to judge whether refsize should be set to zero.
>> > But I fold the offset into bitpos and the offset will always be zero.
>> >
>> > Suggestion?
>> 
>> The thing you want to fix is not get_inner_reference but the aarch64 backend
>> to not make __builtin_aarch64_sqaddv16qi clobber global memory.  That
>> way
>> CSE can happen on GIMPLE which can handle the difference in the IL just
>> fine.
>> 
>> Richard.
>
> Yes, __builtin_aarch64_sqaddv16qi is not set any attributes to describe that
> it would not clobber global memory.  But I find it strange that when building
> SIMD built-in FUNCTION_DECLs they are not set any attributes in the backend.
>
> void
> aarch64_init_simd_builtins (void)
> {
> ...
>       ftype = build_function_type (return_type, args);
>
>       gcc_assert (ftype != NULL);
>
>       if (print_type_signature_p)
>         snprintf (namebuf, sizeof (namebuf), "__builtin_aarch64_%s_%s",
>                   d->name, type_signature);
>       else
>         snprintf (namebuf, sizeof (namebuf), "__builtin_aarch64_%s",
>                   d->name);
>
>       fndecl = aarch64_general_add_builtin (namebuf, ftype, fcode);
>       aarch64_builtin_decls[fcode] = fndecl;
> ...
> }
> static tree
> aarch64_general_add_builtin (const char *name, tree type, unsigned int code)
> {
>   code = (code << AARCH64_BUILTIN_SHIFT) | AARCH64_BUILTIN_GENERAL;
>   return add_builtin_function (name, type, code, BUILT_IN_MD,
>                                NULL, NULL_TREE);
> }
>
> The loop in aarch64_init_simd_builtins creates FUNCTION_DECL node for each
> build-in function and put the node in array.  But it does not set any attributes.
> And I did not find interface for each build-in function to control the attributes.
>
> Did I miss anything?

No, this is unfortunately a known bug.  See:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964

(Although the PR is recent, it's been a known bug for longer.)

As you say, the difficulty is that the correct attributes depend on what
the built-in function does.  Most integer arithmetic is “const”, but things
get more complicated for floating-point arithmetic.

The SVE intrinsics use a three stage process:

- each function is classified into one of several groups
- each group has a set of flags that describe what functions in the
  group can do
- these flags get converted into attributes based on the current
  command-line options

I guess we should have something similar for the arm_neon.h built-ins.

If you're willing to help fix this, that'd be great.  I think a first
step would be to agree a design.

Thanks,
Richard


More information about the Gcc-patches mailing list