[PATCH][GCC][mid-end] Allow larger copies when not slow_unaligned_access and no padding.
Tamar Christina
Tamar.Christina@arm.com
Tue Jul 24 16:33:00 GMT 2018
Hi Richard,
Thanks for the review!
The 07/23/2018 18:46, Richard Biener wrote:
> On July 23, 2018 7:01:23 PM GMT+02:00, Tamar Christina <tamar.christina@arm.com> wrote:
> >Hi All,
> >
> >This allows copy_blkmode_to_reg to perform larger copies when it is
> >safe to do so by calculating
> >the bitsize per iteration doing the maximum copy allowed that does not
> >read more
> >than the amount of bits left to copy.
> >
> >Strictly speaking, this copying is only done if:
> >
> > 1. the target supports fast unaligned access
> > 2. no padding is being used.
> >
> >This should avoid the issues of the first patch (PR85123) but still
> >work for targets that are safe
> >to do so.
> >
> >Original patch https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01088.html
> >Previous respin
> >https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00239.html
> >
> >
> >This produces for the copying of a 3 byte structure:
> >
> >fun3:
> > adrp x1, .LANCHOR0
> > add x1, x1, :lo12:.LANCHOR0
> > mov x0, 0
> > sub sp, sp, #16
> > ldrh w2, [x1, 16]
> > ldrb w1, [x1, 18]
> > add sp, sp, 16
> > bfi x0, x2, 0, 16
> > bfi x0, x1, 16, 8
> > ret
> >
> >whereas before it was producing
> >
> >fun3:
> > adrp x0, .LANCHOR0
> > add x2, x0, :lo12:.LANCHOR0
> > sub sp, sp, #16
> > ldrh w1, [x0, #:lo12:.LANCHOR0]
> > ldrb w0, [x2, 2]
> > strh w1, [sp, 8]
> > strb w0, [sp, 10]
> > ldr w0, [sp, 8]
> > add sp, sp, 16
> > ret
> >
> >Cross compiled and regtested on
> > aarch64_be-none-elf
> > armeb-none-eabi
> >and no issues
> >
> >Boostrapped and regtested
> > aarch64-none-linux-gnu
> > x86_64-pc-linux-gnu
> > powerpc64-unknown-linux-gnu
> > arm-none-linux-gnueabihf
> >
> >and found no issues.
> >
> >OK for trunk?
>
> How does this affect store-to-load forwarding when the source is initialized piecewise? IMHO we should avoid larger loads but generate larger stores when possible.
>
> How do non-x86 architectures behave with respect to STLF?
>
I should have made it more explicit in my cover letter, but this only covers reg to reg copies.
So the store-t-load forwarding shouldn't really come to play here, unless I'm missing something
The example in my patch shows that the loads from mem are mostly unaffected.
For x86 the change is also quite significant, e.g for a 5 byte struct load it used to generate
fun5:
movl foo5(%rip), %eax
movl %eax, %edi
movzbl %al, %edx
movzbl %ah, %eax
movb %al, %dh
movzbl foo5+2(%rip), %eax
shrl $24, %edi
salq $16, %rax
movq %rax, %rsi
movzbl %dil, %eax
salq $24, %rax
movq %rax, %rcx
movq %rdx, %rax
movzbl foo5+4(%rip), %edx
orq %rsi, %rax
salq $32, %rdx
orq %rcx, %rax
orq %rdx, %rax
ret
instead of
fun5:
movzbl foo5+4(%rip), %eax
salq $32, %rax
movq %rax, %rdx
movl foo5(%rip), %eax
orq %rdx, %rax
ret
so the loads themselves are unaffected.
Thanks,
Tamar
> Richard.
>
> >Thanks,
> >Tamar
> >
> >gcc/
> >2018-07-23 Tamar Christina <tamar.christina@arm.com>
> >
> > * expr.c (copy_blkmode_to_reg): Perform larger copies when safe.
>
--
More information about the Gcc-patches
mailing list