[PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Fri Aug 2 13:11:00 GMT 2019

On Tue, 30 Jul 2019, Bernd Edlinger wrote:

> Hi Richard,
> 
> it is already a while ago, but I had not found time to continue
> with this patch until now.
> 
> I think I have now a better solution, which properly addresses your
> comments below.
> 
> On 3/25/19 9:41 AM, Richard Biener wrote:
> > On Fri, 22 Mar 2019, Bernd Edlinger wrote:
> > 
> >> On 3/21/19 12:15 PM, Richard Biener wrote:
> >>> On Sun, 10 Mar 2019, Bernd Edlinger wrote:
> >>> Finally...
> >>>
> >>> Index: gcc/function.c
> >>> ===================================================================
> >>> --- gcc/function.c      (revision 269264)
> >>> +++ gcc/function.c      (working copy)
> >>> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
> >>>    if (DECL_MODE (decl) == BLKmode)
> >>>      return false;
> >>>
> >>> +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
> >>> +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
> >>> +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
> >>> +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
> >>> +    return false;
> >>> +
> >>>    /* If -ffloat-store specified, don't put explicit float variables
> >>>       into registers.  */
> >>>    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
> >>>
> >>> I wonder if it is necessary to look at DECL_INCOMING_RTL here
> >>> and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
> >>> doesn't exist then shouldn't we return false for safety reasons?
> >>>
> 
> You are right, it is not possbile to return different results from
> use_register_for_decl before vs. after incoming RTL is assigned.
> That hits an assertion in set_rtl.
> 
> This hunk is gone now, instead I changed assign_parm_setup_reg
> to use movmisalign optab and/or extract_bit_field if misaligned
> entry_parm is to be assigned in a register.
> 
> I have no test coverage for the movmisalign optab though, so I
> rely on your code review for that part.

It looks OK.  I tried to make it trigger on the following on
i?86 with -msse2:

typedef int v4si __attribute__((vector_size (16)));

struct S { v4si v; } __attribute__((packed));

v4si foo (struct S s)
{
  return s.v;
}

but nowadays x86 seems to be happy with regular moves operating on
unaligned memory, using unaligned moves where necessary.

(insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
        (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 
{movv4si_internal}
     (nil))

and with GCC 4.8 we ended up with the following expansion which is
also correct.

(insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
        (unspec:V16QI [
                (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
            ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
     (nil))

So it seems it has been too long and I don't remember what is
special with arm that it doesn't work...  it possibly simply
trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
I think is OK-ish?

> >>> Similarly the very same issue should exist on x86_64 which is
> >>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> >>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
> >>> a wrong one.
> >>>
> >>
> >> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> >> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
> >> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
> >> would look at MEM_ALIGN and do the right thing, yes?
> > 
> > No, they need to use the movmisalign optab and end up with UNSPECs
> > for example.
> Ah, thanks, now I see.
> 
> >> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> >> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> >> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> >> right?
> > 
> > Yes, I think we never losened that.  Note that RTL expansion has to
> > fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> > specifies that x86 is strict-align wrt vector modes.
> > 
> 
> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
> is not used.  So I invoke the movmisalign optab if available and if not fall
> back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
> assumes that data->promoted_mode != data->nominal_mode does not happen with
> misaligned stack slots.
> 
> 
> Attached is the v3 if my patch.
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> 
> Is it OK for trunk?

Few comments.

@@ -2274,8 +2274,6 @@ struct assign_parm_data_one
   int partial;
   BOOL_BITFIELD named_arg : 1;
   BOOL_BITFIELD passed_pointer : 1;
-  BOOL_BITFIELD on_stack : 1;
-  BOOL_BITFIELD loaded_in_reg : 1;
 };

 /* A subroutine of assign_parms.  Initialize ALL.  */

independently OK.

@@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
      ultimate type, don't use that slot after entry.  We'll make another
      stack slot, if we need one.  */
   if (stack_parm
-      && ((STRICT_ALIGNMENT
-          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
(stack_parm))
+      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
(stack_parm)
+          && targetm.slow_unaligned_access (data->nominal_mode,
+                                            MEM_ALIGN (stack_parm)))
          || (data->nominal_type
              && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
              && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))

looks like something we should have as a separate commit as well.  It
also looks obvious to me.

@@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all

       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)

we arrive here by-passing

  else if (need_conversion)
    {
      /* We did not have an insn to convert directly, or the sequence
         generated appeared unsafe.  We must first copy the parm to a
         pseudo reg, and save the conversion until after all
         parameters have been moved.  */

      int save_tree_used;
      rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));

      emit_move_insn (tempreg, validated_mem);

but this move instruction is invalid in the same way as the case
you fix, no?  So wouldn't it be better to do

  if (moved)
    /* Nothing to do.  */
    ;
  else
    {
       if (unaligned)
         ...
       else
         emit_move_insn (...);

       if (need_conversion)
 ....
    }

?  Hopefully whatever "moved" things in the if (moved) case did
it correctly.

Can you check whehter your patch does anything to the x86 testcase
posted above?

I'm not very familiar with this code so I'm leaving actual approval
to somebody else.  Still hope the comments were helpful.

Thanks,
Richard.