[PATCH][GCC][mid-end] Allow larger copies when not slow_unaligned_access and no padding.

Tamar Christina Tamar.Christina@arm.com
Tue Jul 31 09:47:00 GMT 2018


Ping 😊

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org>
> On Behalf Of Tamar Christina
> Sent: Tuesday, July 24, 2018 17:34
> To: Richard Biener <rguenther@suse.de>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; law@redhat.com;
> ian@airs.com; amodra@gmail.com; bergner@vnet.ibm.com
> Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when not
> slow_unaligned_access and no padding.
> 
> Hi Richard,
> 
> Thanks for the review!
> 
> The 07/23/2018 18:46, Richard Biener wrote:
> > On July 23, 2018 7:01:23 PM GMT+02:00, Tamar Christina
> <tamar.christina@arm.com> wrote:
> > >Hi All,
> > >
> > >This allows copy_blkmode_to_reg to perform larger copies when it is
> > >safe to do so by calculating the bitsize per iteration doing the
> > >maximum copy allowed that does not read more than the amount of bits
> > >left to copy.
> > >
> > >Strictly speaking, this copying is only done if:
> > >
> > >  1. the target supports fast unaligned access  2. no padding is
> > > being used.
> > >
> > >This should avoid the issues of the first patch (PR85123) but still
> > >work for targets that are safe to do so.
> > >
> > >Original patch
> > >https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01088.html
> > >Previous respin
> > >https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00239.html
> > >
> > >
> > >This produces for the copying of a 3 byte structure:
> > >
> > >fun3:
> > >	adrp	x1, .LANCHOR0
> > >	add	x1, x1, :lo12:.LANCHOR0
> > >	mov	x0, 0
> > >	sub	sp, sp, #16
> > >	ldrh	w2, [x1, 16]
> > >	ldrb	w1, [x1, 18]
> > >	add	sp, sp, 16
> > >	bfi	x0, x2, 0, 16
> > >	bfi	x0, x1, 16, 8
> > >	ret
> > >
> > >whereas before it was producing
> > >
> > >fun3:
> > >	adrp	x0, .LANCHOR0
> > >	add	x2, x0, :lo12:.LANCHOR0
> > >	sub	sp, sp, #16
> > >	ldrh	w1, [x0, #:lo12:.LANCHOR0]
> > >	ldrb	w0, [x2, 2]
> > >	strh	w1, [sp, 8]
> > >	strb	w0, [sp, 10]
> > >	ldr	w0, [sp, 8]
> > >	add	sp, sp, 16
> > >	ret
> > >
> > >Cross compiled and regtested on
> > >  aarch64_be-none-elf
> > >  armeb-none-eabi
> > >and no issues
> > >
> > >Boostrapped and regtested
> > > aarch64-none-linux-gnu
> > > x86_64-pc-linux-gnu
> > > powerpc64-unknown-linux-gnu
> > > arm-none-linux-gnueabihf
> > >
> > >and found no issues.
> > >
> > >OK for trunk?
> >
> > How does this affect store-to-load forwarding when the source is initialized
> piecewise? IMHO we should avoid larger loads but generate larger stores
> when possible.
> >
> > How do non-x86 architectures behave with respect to STLF?
> >
> 
> I should have made it more explicit in my cover letter, but this only covers reg
> to reg copies.
> So the store-t-load forwarding shouldn't really come to play here, unless I'm
> missing something
> 
> The example in my patch shows that the loads from mem are mostly
> unaffected.
> 
> For x86 the change is also quite significant, e.g for a 5 byte struct load it used
> to generate
> 
> fun5:
> 	movl	foo5(%rip), %eax
> 	movl	%eax, %edi
> 	movzbl	%al, %edx
> 	movzbl	%ah, %eax
> 	movb	%al, %dh
> 	movzbl	foo5+2(%rip), %eax
> 	shrl	$24, %edi
> 	salq	$16, %rax
> 	movq	%rax, %rsi
> 	movzbl	%dil, %eax
> 	salq	$24, %rax
> 	movq	%rax, %rcx
> 	movq	%rdx, %rax
> 	movzbl	foo5+4(%rip), %edx
> 	orq	%rsi, %rax
> 	salq	$32, %rdx
> 	orq	%rcx, %rax
> 	orq	%rdx, %rax
> 	ret
> 
> instead of
> 
> fun5:
>         movzbl  foo5+4(%rip), %eax
>         salq    $32, %rax
>         movq    %rax, %rdx
>         movl    foo5(%rip), %eax
>         orq     %rdx, %rax
>         ret
> 
> so the loads themselves are unaffected.
> 
> Thanks,
> Tamar
> 
> > Richard.
> >
> > >Thanks,
> > >Tamar
> > >
> > >gcc/
> > >2018-07-23  Tamar Christina  <tamar.christina@arm.com>
> > >
> > >	* expr.c (copy_blkmode_to_reg): Perform larger copies when safe.
> >
> 
> --


More information about the Gcc-patches mailing list