This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)


On Thu, 8 Aug 2019, Bernd Edlinger wrote:

> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
> > On 8/2/19 3:11 PM, Richard Biener wrote:
> >> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> >>
> >>>
> >>> I have no test coverage for the movmisalign optab though, so I
> >>> rely on your code review for that part.
> >>
> >> It looks OK.  I tried to make it trigger on the following on
> >> i?86 with -msse2:
> >>
> >> typedef int v4si __attribute__((vector_size (16)));
> >>
> >> struct S { v4si v; } __attribute__((packed));
> >>
> >> v4si foo (struct S s)
> >> {
> >>   return s.v;
> >> }
> >>
> > 
> > Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> > So the test case could be made to trigger it this way:
> > 
> > typedef int v4si __attribute__((vector_size (16)));
> > 
> > struct S { v4si v; } __attribute__((packed));
> > 
> > int t;
> > v4si foo (struct S a, struct S b, struct S c, struct S d,
> >           struct S e, struct S f, struct S g, struct S h,
> >           int i, int j, int k, int l, int m, int n,
> >           int o, struct S s)
> > {
> >   t = o;
> >   return s.v;
> > }
> > 
> 
> Ah, I realized that there are already a couple of very similar
> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
> which also manage to execute the movmisalign code with the latest patch
> version.  So I thought that it is not necessary to add another one.
> 
> > However the code path is still not reached, since targetm.slow_ualigned_access
> > is always FALSE, which is probably a flaw in my patch.
> > 
> > So I think,
> > 
> > +  else if (MEM_P (data->entry_parm)
> > +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > +             > MEM_ALIGN (data->entry_parm)
> > +          && targetm.slow_unaligned_access (promoted_nominal_mode,
> > +                                            MEM_ALIGN (data->entry_parm)))
> > 
> > should probably better be
> > 
> > +  else if (MEM_P (data->entry_parm)
> > +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > +             > MEM_ALIGN (data->entry_parm)
> > +        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
> > +             != CODE_FOR_nothing)
> > +            || targetm.slow_unaligned_access (promoted_nominal_mode,
> > +                                              MEM_ALIGN (data->entry_parm))))
> > 
> > Right?
> > 
> > Then the modified test case would use the movmisalign optab.
> > However nothing changes in the end, since the i386 back-end is used to work
> > around the middle end not using movmisalign optab when it should do so.
> > 
> 
> I prefer the second form of the check, as it offers more test coverage,
> and is probably more correct than the former.
> 
> Note there are more variations of this misalign check in expr.c,
> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
> 
>             && mode != BLKmode
>             && align < GET_MODE_ALIGNMENT (mode))
>           {
>             if ((icode = optab_handler (movmisalign_optab, mode))
>                 != CODE_FOR_nothing)
>               [...]
>             else if (targetm.slow_unaligned_access (mode, align))
>               temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
>                                         0, TYPE_UNSIGNED (TREE_TYPE (exp)),
>                                         (modifier == EXPAND_STACK_PARM
>                                          ? NULL_RTX : target),
>                                         mode, mode, false, alt_rtl);
> 
> I wonder if they are correct this way, why shouldn't we use the movmisalign
> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?

Doesn't the code do exactly this?  Prefer movmisalign over 
extrct_bit_field?

> 
> > I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
> > patterns that the memory is properly aligned ?
> >
> 
> Wow, that was a really exciting bug-hunt with those assertions around...

:)

> >> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> >>
> >>        did_conversion = true;
> >>      }
> >> +  else if (MEM_P (data->entry_parm)
> >> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >> +             > MEM_ALIGN (data->entry_parm)
> >>
> >> we arrive here by-passing
> >>
> >>   else if (need_conversion)
> >>     {
> >>       /* We did not have an insn to convert directly, or the sequence
> >>          generated appeared unsafe.  We must first copy the parm to a
> >>          pseudo reg, and save the conversion until after all
> >>          parameters have been moved.  */
> >>
> >>       int save_tree_used;
> >>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> >>
> >>       emit_move_insn (tempreg, validated_mem);
> >>
> >> but this move instruction is invalid in the same way as the case
> >> you fix, no?  So wouldn't it be better to do
> >>
> > 
> > We could do that, but I supposed that there must be a reason why
> > assign_parm_setup_stack gets away with that same:
> > 
> >   if (data->promoted_mode != data->nominal_mode)
> >     {
> >       /* Conversion is required.  */
> >       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> > 
> >       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
> > 
> > 
> > So either some back-ends are too permissive with us,
> > or there is a reason why promoted_mode != nominal_mode
> > does not happen together with unaligned entry_parm.
> > In a way that would be a rather unusual ABI.
> > 
> 
> To find out if that ever happens I added a couple of checking
> assertions in the arm mov<mode> expand patterns.
> 
> So far the assertions did (almost) always hold, so it is likely not
> necessary to fiddle with all those naive move instructions here.
>
> So my gut feeling is, leave those places alone until there is a reason
> for changing them.

Works for me - I wonder if we should add those asserts to generic
code (guarded with flag_checking) though.

> However the assertion in movsi triggered a couple times in the
> ada testsuite due to expand_builtin_init_descriptor using a
> BLKmode MEM rtx, which is only 8-bit aligned.  So, I set the
> ptr_mode alignment there explicitly.

Looks good given we emit ptr_mode moves into it.  Thus OK independently.
 
> Several struct-layout-1.dg testcase tripped over misaligned
> complex_cst constants, fixed by varasm.c (align_variable).
> This is likely a wrong code bug, because misaligned complex
> constants, are expanded to misaligned MEM_REF, but the
> expansion cannot handle misaligned constants, only packed
> structure fields.

Hmm.  So your patch overrides user-alignment here.  Woudln't it
be better to do that more conciously by

  if (! DECL_USER_ALIGN (decl)
      || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
          && targetm.slow_unaligned_access (DECL_MODE (decl), align)))

?  And why is the movmisalign optab support missing here?

IMHO whatever code later fails to properly use unaligned loads
should be fixed instead rather than ignoring user requested alignment.

Can you quote a short testcase that explains what exactly goes wrong?
The struct-layout ones are awkward to look at...

> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the
> change in expr.c (expand_expr_real_1).  Certainly is it invalid
> to read memory at a function address, but it should not ICE.
> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks
> like A32, so the misaligned code execution is not taken, but it is
> set to A8 below, but then we hit an ICE if the result is used:

So the user accessed it as A32.

>         /* Don't set memory attributes if the base expression is
>            SSA_NAME that got expanded as a MEM.  In that case, we should
>            just honor its original memory attributes.  */
>         if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0))
>           set_mem_attributes (op0, exp, 0);

Huh, I don't understand this.  'tem' should never be SSA_NAME.
But set_mem_attributes_minus_bitpos uses get_object_alignment_1
and that has special treatment for FUNCTION_DECLs that is not
covered by

      /* When EXP is an actual memory reference then we can use
         TYPE_ALIGN of a pointer indirection to derive alignment.
         Do so only if get_pointer_alignment_1 did not reveal absolute
         alignment knowledge and if using that alignment would
         improve the situation.  */
      unsigned int talign;
      if (!addr_p && !known_alignment
          && (talign = min_align_of_type (TREE_TYPE (exp)) * 
BITS_PER_UNIT)
          && talign > align)
        align = talign;

which could be moved out of the if-cascade.

That said, setting A8 should eventually result into appropriate
unaligned expansion, so it seems odd this triggers the assert...

> 
> Finally gcc.dg/torture/pr48493.c required the change
> in assign_parm_setup_stack.  This is just not using the
> correct MEM_ALIGN attribute value, while the memory is
> actually aligned.

But doesn't

          int align = STACK_SLOT_ALIGNMENT (data->passed_type,
                                            GET_MODE (data->entry_parm),
                                            TYPE_ALIGN 
(data->passed_type));
+         if (align < (int)GET_MODE_ALIGNMENT (GET_MODE 
(data->entry_parm))
+             && targetm.slow_unaligned_access (GET_MODE 
(data->entry_parm),
+                                               align))
+           align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));

hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target?
That is, the target says, for natural alignment 64 the stack slot
alignment can only be guaranteed 32.  You can't then simply up it
but have to use unaligned accesses (or the target/middle-end needs
to do dynamic stack alignment).


>  Note that set_mem_attributes does not
> always preserve the MEM_ALIGN of the ref, since:

set_mem_attributes sets _all_ attributes from an expression or type.

>   /* Default values from pre-existing memory attributes if present.  */
>   refattrs = MEM_ATTRS (ref);
>   if (refattrs)
>     {
>       /* ??? Can this ever happen?  Calling this routine on a MEM that
>          already carries memory attributes should probably be invalid.  */
>       attrs.expr = refattrs->expr;
>       attrs.offset_known_p = refattrs->offset_known_p;
>       attrs.offset = refattrs->offset;
>       attrs.size_known_p = refattrs->size_known_p;
>       attrs.size = refattrs->size;
>       attrs.align = refattrs->align;
>     }
> 
> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT
> the MEM_ATTRS are zero, and a smaller alignment may result.

Not sure what you are saying here.  That

set_mem_align (MEM:SI A32, 32)

produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting
the A32 but eventually computing sth lower?  Yeah, that's probably
an interesting "hole" here.  I'm quite sure that if we'd do

refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)];

we run into issues exactly on strict-align targets ...

> Well with those checks in place it should now be a lot harder to generate
> invalid code on STRICT_ALIGNMENT targets, without running into an ICE.
> 
> Attached is the latest version of my arm alignment patch.
> 
> 
> Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueabihf.
> Is it OK for trunk?

@@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all

       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)
+          && (((icode = optab_handler (movmisalign_optab,
+                                       promoted_nominal_mode))
+               != CODE_FOR_nothing)
+              || targetm.slow_unaligned_access (promoted_nominal_mode,
+                                                MEM_ALIGN 
(data->entry_parm))))
+    {
+      if (icode != CODE_FOR_nothing)
+       emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
+      else
+       rtl = parmreg = extract_bit_field (validated_mem,
+                       GET_MODE_BITSIZE (promoted_nominal_mode), 0,
+                       unsignedp, parmreg,
+                       promoted_nominal_mode, VOIDmode, false, NULL);
+    }
   else
     emit_move_insn (parmreg, validated_mem);

This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) /
GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm)
and promoted_nominal_mode.

Not sure if it helps on its own.

I'm nervous about the alignment one since I'm not at all familiar
with this code...

Thanks,
Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]