This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
- From: Richard Biener <rguenther at suse dot de>
- To: Bernd Edlinger <bernd dot edlinger at hotmail dot de>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Richard Earnshaw <richard dot earnshaw at arm dot com>, Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>, Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, Eric Botcazou <ebotcazou at adacore dot com>, Jeff Law <law at redhat dot com>, Jakub Jelinek <jakub at redhat dot com>
- Date: Wed, 14 Aug 2019 13:16:30 +0200 (CEST)
- Subject: Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
- References: <AM6PR07MB4037775DF79E0229DCCA425AE44F0@AM6PR07MB4037.eurprd07.prod.outlook.com> <alpine.LSU.2.20.1903211208070.4934@zhemvz.fhfr.qr> <AM6PR07MB403745E0BCCCF005A02B0CBDE4430@AM6PR07MB4037.eurprd07.prod.outlook.com> <alpine.LSU.2.20.1903250937530.4934@zhemvz.fhfr.qr> <AM6PR10MB256664D731C3CC92F2FBEDC5E4DC0@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM> <alpine.LSU.2.20.1908021451300.19626@zhemvz.fhfr.qr> <AM6PR10MB2566A6E51DC500187D9EC6CFE4D90@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM>
On Fri, 2 Aug 2019, Bernd Edlinger wrote:
> On 8/2/19 3:11 PM, Richard Biener wrote:
> > On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> >
> >>
> >> I have no test coverage for the movmisalign optab though, so I
> >> rely on your code review for that part.
> >
> > It looks OK. I tried to make it trigger on the following on
> > i?86 with -msse2:
> >
> > typedef int v4si __attribute__((vector_size (16)));
> >
> > struct S { v4si v; } __attribute__((packed));
> >
> > v4si foo (struct S s)
> > {
> > return s.v;
> > }
> >
>
> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> So the test case could be made to trigger it this way:
>
> typedef int v4si __attribute__((vector_size (16)));
>
> struct S { v4si v; } __attribute__((packed));
>
> int t;
> v4si foo (struct S a, struct S b, struct S c, struct S d,
> struct S e, struct S f, struct S g, struct S h,
> int i, int j, int k, int l, int m, int n,
> int o, struct S s)
> {
> t = o;
> return s.v;
> }
>
> However the code path is still not reached, since targetm.slow_ualigned_access
> is always FALSE, which is probably a flaw in my patch.
>
> So I think,
>
> + else if (MEM_P (data->entry_parm)
> + && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> + > MEM_ALIGN (data->entry_parm)
> + && targetm.slow_unaligned_access (promoted_nominal_mode,
> + MEM_ALIGN (data->entry_parm)))
>
> should probably better be
>
> + else if (MEM_P (data->entry_parm)
> + && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> + > MEM_ALIGN (data->entry_parm)
> + && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
> + != CODE_FOR_nothing)
> + || targetm.slow_unaligned_access (promoted_nominal_mode,
> + MEM_ALIGN (data->entry_parm))))
>
> Right?
Ah, yes. So it's really the presence of a movmisalign optab makes it
a must for unaligned moves and if it is not present then
targetm.slow_unaligned_access tells whether we need to use the bitfield
extraction/insertion code.
> Then the modified test case would use the movmisalign optab.
> However nothing changes in the end, since the i386 back-end is used to work
> around the middle end not using movmisalign optab when it should do so.
Yeah, in the past it would have failed though. I wonder if movmisalign
is still needed for x86...
> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
> patterns that the memory is properly aligned ?
I suppose gen* could add asserts that there is no movmisalign_optab
that would match when expanding a mov<mode>. Eventually it's enough
to guard the mov_optab use in emit_move_insn_1 that way? Or even
try movmisalign there...
>
> > but nowadays x86 seems to be happy with regular moves operating on
> > unaligned memory, using unaligned moves where necessary.
> >
> > (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
> > (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229
> > {movv4si_internal}
> > (nil))
> >
> > and with GCC 4.8 we ended up with the following expansion which is
> > also correct.
> >
> > (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
> > (unspec:V16QI [
> > (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
> > ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
> > (nil))
> >
> > So it seems it has been too long and I don't remember what is
> > special with arm that it doesn't work... it possibly simply
> > trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
> > I think is OK-ish?
> >
>
> Yes, that is what Richard said as well.
>
> >>>>> Similarly the very same issue should exist on x86_64 which is
> >>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> >>>>> alignment on the caller side. So the STRICT_ALIGNMENT check is
> >>>>> a wrong one.
> >>>>>
> >>>>
> >>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> >>>> just use MEM_ALIGN to select the right instructions. MEM_ALIGN
> >>>> is always 32-bit align on the DImode memory. The x86_64 vector instructions
> >>>> would look at MEM_ALIGN and do the right thing, yes?
> >>>
> >>> No, they need to use the movmisalign optab and end up with UNSPECs
> >>> for example.
> >> Ah, thanks, now I see.
> >>
> >>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> >>>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> >>>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> >>>> right?
> >>>
> >>> Yes, I think we never losened that. Note that RTL expansion has to
> >>> fix this up for them. Note that strictly speaking SLOW_UNALIGNED_ACCESS
> >>> specifies that x86 is strict-align wrt vector modes.
> >>>
> >>
> >> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
> >> is not used. So I invoke the movmisalign optab if available and if not fall
> >> back to extract_bit_field. As in the assign_parm_setup_stack assign_parm_setup_reg
> >> assumes that data->promoted_mode != data->nominal_mode does not happen with
> >> misaligned stack slots.
> >>
> >>
> >> Attached is the v3 if my patch.
> >>
> >> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> >>
> >> Is it OK for trunk?
> >
> > Few comments.
> >
> > @@ -2274,8 +2274,6 @@ struct assign_parm_data_one
> > int partial;
> > BOOL_BITFIELD named_arg : 1;
> > BOOL_BITFIELD passed_pointer : 1;
> > - BOOL_BITFIELD on_stack : 1;
> > - BOOL_BITFIELD loaded_in_reg : 1;
> > };
> >
> > /* A subroutine of assign_parms. Initialize ALL. */
> >
> > independently OK.
> >
> > @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
> > ultimate type, don't use that slot after entry. We'll make another
> > stack slot, if we need one. */
> > if (stack_parm
> > - && ((STRICT_ALIGNMENT
> > - && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN
> > (stack_parm))
> > + && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN
> > (stack_parm)
> > + && targetm.slow_unaligned_access (data->nominal_mode,
> > + MEM_ALIGN (stack_parm)))
> > || (data->nominal_type
> > && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
> > && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
> >
> > looks like something we should have as a separate commit as well. It
> > also looks obvious to me.
> >
>
> Okay, committed as two separate commits: r274023 & r274025
>
> > @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> >
> > did_conversion = true;
> > }
> > + else if (MEM_P (data->entry_parm)
> > + && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > + > MEM_ALIGN (data->entry_parm)
> >
> > we arrive here by-passing
> >
> > else if (need_conversion)
> > {
> > /* We did not have an insn to convert directly, or the sequence
> > generated appeared unsafe. We must first copy the parm to a
> > pseudo reg, and save the conversion until after all
> > parameters have been moved. */
> >
> > int save_tree_used;
> > rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> >
> > emit_move_insn (tempreg, validated_mem);
> >
> > but this move instruction is invalid in the same way as the case
> > you fix, no? So wouldn't it be better to do
> >
>
> We could do that, but I supposed that there must be a reason why
> assign_parm_setup_stack gets away with that same:
>
> if (data->promoted_mode != data->nominal_mode)
> {
> /* Conversion is required. */
> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
>
> emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
>
>
> So either some back-ends are too permissive with us,
> or there is a reason why promoted_mode != nominal_mode
> does not happen together with unaligned entry_parm.
> In a way that would be a rather unusual ABI.
>
> > if (moved)
> > /* Nothing to do. */
> > ;
> > else
> > {
> > if (unaligned)
> > ...
> > else
> > emit_move_insn (...);
> >
> > if (need_conversion)
> > ....
> > }
> >
> > ? Hopefully whatever "moved" things in the if (moved) case did
> > it correctly.
> >
>
> It would'nt. It uses the gen_extend_insn would that be expected to
> work with unaligned memory?
No idea..
> > Can you check whehter your patch does anything to the x86 testcase
> > posted above?
> >
>
> Thanks, it might help to have at least a test case where the pattern
> is expanded, even if it does not change anything.
>
> > I'm not very familiar with this code so I'm leaving actual approval
> > to somebody else. Still hope the comments were helpful.
> >
>
> Yes they are, thanks a lot.
Sorry for the slow response(s).
Richard.