This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][GCC][mid-end] Allow larger copies when not slow_unaligned_access and no padding.


On Tue, 31 Jul 2018, Tamar Christina wrote:

> Hi Richard,
> 
> The 07/31/2018 11:21, Richard Biener wrote:
> > On Tue, 31 Jul 2018, Tamar Christina wrote:
> > 
> > > Ping 😊
> > > 
> > > > -----Original Message-----
> > > > From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org>
> > > > On Behalf Of Tamar Christina
> > > > Sent: Tuesday, July 24, 2018 17:34
> > > > To: Richard Biener <rguenther@suse.de>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; law@redhat.com;
> > > > ian@airs.com; amodra@gmail.com; bergner@vnet.ibm.com
> > > > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when not
> > > > slow_unaligned_access and no padding.
> > > > 
> > > > Hi Richard,
> > > > 
> > > > Thanks for the review!
> > > > 
> > > > The 07/23/2018 18:46, Richard Biener wrote:
> > > > > On July 23, 2018 7:01:23 PM GMT+02:00, Tamar Christina
> > > > <tamar.christina@arm.com> wrote:
> > > > > >Hi All,
> > > > > >
> > > > > >This allows copy_blkmode_to_reg to perform larger copies when it is
> > > > > >safe to do so by calculating the bitsize per iteration doing the
> > > > > >maximum copy allowed that does not read more than the amount of bits
> > > > > >left to copy.
> > > > > >
> > > > > >Strictly speaking, this copying is only done if:
> > > > > >
> > > > > >  1. the target supports fast unaligned access  2. no padding is
> > > > > > being used.
> > > > > >
> > > > > >This should avoid the issues of the first patch (PR85123) but still
> > > > > >work for targets that are safe to do so.
> > > > > >
> > > > > >Original patch
> > > > > >https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01088.html
> > > > > >Previous respin
> > > > > >https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00239.html
> > > > > >
> > > > > >
> > > > > >This produces for the copying of a 3 byte structure:
> > > > > >
> > > > > >fun3:
> > > > > >	adrp	x1, .LANCHOR0
> > > > > >	add	x1, x1, :lo12:.LANCHOR0
> > > > > >	mov	x0, 0
> > > > > >	sub	sp, sp, #16
> > > > > >	ldrh	w2, [x1, 16]
> > > > > >	ldrb	w1, [x1, 18]
> > > > > >	add	sp, sp, 16
> > > > > >	bfi	x0, x2, 0, 16
> > > > > >	bfi	x0, x1, 16, 8
> > > > > >	ret
> > > > > >
> > > > > >whereas before it was producing
> > > > > >
> > > > > >fun3:
> > > > > >	adrp	x0, .LANCHOR0
> > > > > >	add	x2, x0, :lo12:.LANCHOR0
> > > > > >	sub	sp, sp, #16
> > > > > >	ldrh	w1, [x0, #:lo12:.LANCHOR0]
> > > > > >	ldrb	w0, [x2, 2]
> > > > > >	strh	w1, [sp, 8]
> > > > > >	strb	w0, [sp, 10]
> > > > > >	ldr	w0, [sp, 8]
> > > > > >	add	sp, sp, 16
> > > > > >	ret
> > > > > >
> > > > > >Cross compiled and regtested on
> > > > > >  aarch64_be-none-elf
> > > > > >  armeb-none-eabi
> > > > > >and no issues
> > > > > >
> > > > > >Boostrapped and regtested
> > > > > > aarch64-none-linux-gnu
> > > > > > x86_64-pc-linux-gnu
> > > > > > powerpc64-unknown-linux-gnu
> > > > > > arm-none-linux-gnueabihf
> > > > > >
> > > > > >and found no issues.
> > > > > >
> > > > > >OK for trunk?
> > > > >
> > > > > How does this affect store-to-load forwarding when the source is initialized
> > > > piecewise? IMHO we should avoid larger loads but generate larger stores
> > > > when possible.
> > > > >
> > > > > How do non-x86 architectures behave with respect to STLF?
> > > > >
> > > > 
> > > > I should have made it more explicit in my cover letter, but this only covers reg
> > > > to reg copies.
> > > > So the store-t-load forwarding shouldn't really come to play here, unless I'm
> > > > missing something
> > > > 
> > > > The example in my patch shows that the loads from mem are mostly
> > > > unaffected.
> > > > 
> > > > For x86 the change is also quite significant, e.g for a 5 byte struct load it used
> > > > to generate
> > > > 
> > > > fun5:
> > > > 	movl	foo5(%rip), %eax
> > > > 	movl	%eax, %edi
> > > > 	movzbl	%al, %edx
> > > > 	movzbl	%ah, %eax
> > > > 	movb	%al, %dh
> > > > 	movzbl	foo5+2(%rip), %eax
> > > > 	shrl	$24, %edi
> > > > 	salq	$16, %rax
> > > > 	movq	%rax, %rsi
> > > > 	movzbl	%dil, %eax
> > > > 	salq	$24, %rax
> > > > 	movq	%rax, %rcx
> > > > 	movq	%rdx, %rax
> > > > 	movzbl	foo5+4(%rip), %edx
> > > > 	orq	%rsi, %rax
> > > > 	salq	$32, %rdx
> > > > 	orq	%rcx, %rax
> > > > 	orq	%rdx, %rax
> > > > 	ret
> > > > 
> > > > instead of
> > > > 
> > > > fun5:
> > > >         movzbl  foo5+4(%rip), %eax
> > > >         salq    $32, %rax
> > > >         movq    %rax, %rdx
> > > >         movl    foo5(%rip), %eax
> > > >         orq     %rdx, %rax
> > > >         ret
> > > > 
> > > > so the loads themselves are unaffected.
> > 
> > I see.  Few things:
> > 
> >    dst_words = XALLOCAVEC (rtx, n_regs);
> > +
> > +  slow_unaligned_access
> > +    = targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
> > (src)));
> > +
> >    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > 
> > please avoid the extra vertical space.
> > 
> > 
> > +
> > +      /* Find the largest integer mode that can be used to copy all or as
> > +        many bits as possible of the struct
> > 
> > likewise.
> 
> Done.
> 
> > 
> > +      FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> > +       if (padding_correction == 0
> > +           && !slow_unaligned_access
> > 
> > These conditions are invariant so please hoist them out of the
> > loop.
> 
> Done.
> 
> > 
> > +           && GET_MODE_BITSIZE (mode_iter.require ())
> > +                           <= ((bytes * BITS_PER_UNIT) - bitpos)
> > +           && GET_MODE_BITSIZE (mode_iter.require ()) <= BITS_PER_WORD)
> > +         bitsize = GET_MODE_BITSIZE (mode_iter.require ());
> > +       else
> > +         break;
> > 
> > The slow_unaligned_access target hook is passed a mode and an
> > alignment but you are assuming that a narrower mode behaves
> > the same with possibly different alignment.
> >
> 
> Ah, indeed, good point. Changing it to STRICT_ALIGNMENT would indeed be
> a better check. Done.
>  
> > I'm also not sure this loop will never compute to sth that is
> > smaller than the original conservative bitsize based on
> > TYPE_ALIGN of src, so maybe you could instead use sth like
> > 
> >   min_mode = get_mode_for_size (bitsize);
> > ...
> > 
> >   if (padding_correction == 0
> >       && !STRICT_ALIGNMENT)
> >     FOR_EACH_MODE_FROM (mode_iter, min_mode)
> >       {
> >         unsigned msize = GET_MODE_BITSIZE (mode_iter.require ());
> >         if (msize <= ((bytes * BITS_PER_UNIT) - bitpos)
> >             && msize <= BITS_PER_WORD)
> >           bitsize = msize;
> >         else
> >           break;
> >       }
> > ...
> > 
> > instead?
> 
> I've rewritten the loop as suggested.  It now doesn't pick a smaller mode
> than the initial bitsize.
> 
> New patch attached, no change to changelog.

+  min_mode = int_mode_for_mode (smallest_mode_for_size (bitsize,
MODE_INT));

I think you can use smallest_int_mode_for_size (bitsize), given
bitsize is less or equal than BITS_PER_WORD this mode should
exist (the result isn't an opt_mode but a scalar_int_mode).

OK with that change.

Richard.

> OK for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-31  Tamar Christina  <tamar.christina@arm.com>
> 
> 	* expr.c (copy_blkmode_to_reg): Perform larger copies when safe.
> 
> > 
> > Thanks,
> > Richar.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]