This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: arm memcpy of aligned data
- From: Richard Earnshaw <Richard dot Earnshaw at foss dot arm dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>, Mike Stump <mikestump at comcast dot net>, gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 15 Jun 2015 16:11:55 +0100
- Subject: Re: arm memcpy of aligned data
- Authentication-results: sourceware.org; auth=none
- References: <9FF08D9A-529E-4FDE-9193-4BD81EDD89E3 at comcast dot net> <55682C81 dot 9040709 at arm dot com> <55683C58 dot 20701 at arm dot com> <557EE17C dot 5000008 at arm dot com>
On 15/06/15 15:30, Kyrill Tkachov wrote:
>
> On 29/05/15 11:15, Kyrill Tkachov wrote:
>> On 29/05/15 10:08, Kyrill Tkachov wrote:
>>> Hi Mike,
>>>
>>> On 28/05/15 22:15, Mike Stump wrote:
>>>> So, the arm memcpy code of aligned data isnât as good as it can be.
>>>>
>>>> void *memcpy(void *dest, const void *src, unsigned int n);
>>>>
>>>> void foo(char *dst, int i) {
>>>> memcpy (dst, &i, sizeof (i));
>>>> }
>>>>
>>>> generates horrible code, but, it we are willing to notice the src or
>>>> the destination are aligned, we can do much better:
>>>>
>>>> $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve
>>>> -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
>>>> $ cat t.s
>>>> [ â ]
>>>> foo:
>>>> @ args = 0, pretend = 0, frame = 4
>>>> @ frame_needed = 0, uses_anonymous_args = 0
>>>> @ link register save eliminated.
>>>> sub sp, sp, #4
>>>> str r1, [r0] @ unaligned
>>>> add sp, sp, #4
>>> I think there's something to do with cpu tuning here as well.
>> That being said, I do think this is a good idea.
>> I'll give it a test.
>
> The patch passes bootstrap and testing ok and I've seen it
> improve codegen in a few places in SPEC.
> I've added a testcase all marked up.
>
> Mike, I'll commit the attached patch in 24 hours unless somebody objects.
>
> Thanks,
> Kyrill
>
> 2015-06-15 Mike Stump <mikestump@comcast.net>
>
> * config/arm/arm.c (arm_block_move_unaligned_straight):
> Emit normal move instead of unaligned load when source or destination
> are appropriately aligned.
>
> 2015-06-15 Mike Stump <mikestump@comcast.net>
> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/arm/memcpy-aligned-1.c: New test.
>
My only question would be whether this should be pushed down into
gen_unaligned_{load|store}si, so that all callers would benefit?
R.
>>
>> Kyrill
>>
>>> For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
>>> sub sp, sp, #8
>>> mov r2, r0
>>> add r3, sp, #8
>>> str r1, [r3, #-4]!
>>> ldr r0, [r3] @ unaligned
>>> str r0, [r2] @ unaligned
>>> add sp, sp, #8
>>> @ sp needed
>>> bx lr
>>>
>>> whereas for -O2 -mcpu=cortex-a57 I get the much better:
>>> sub sp, sp, #8
>>> str r1, [r0] @ unaligned
>>> add sp, sp, #8
>>> @ sp needed
>>> bx lr
>>>
>>> Kyrill
>>>
>>>
>>>> Index: gcc/config/arm/arm.c
>>>> ===================================================================
>>>> --- gcc/config/arm/arm.c (revision 223842)
>>>> +++ gcc/config/arm/arm.c (working copy)
>>>> @@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
>>>> srcoffset + j * UNITS_PER_WORD - src_autoinc);
>>>> mem = adjust_automodify_address (srcbase, SImode, addr,
>>>> srcoffset + j * UNITS_PER_WORD);
>>>> - emit_insn (gen_unaligned_loadsi (regs[j], mem));
>>>> + if (src_aligned)
>>>> + emit_move_insn (regs[j], mem);
>>>> + else
>>>> + emit_insn (gen_unaligned_loadsi (regs[j], mem));
>>>> }
>>>> srcoffset += words * UNITS_PER_WORD;
>>>> }
>>>> @@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
>>>> dstoffset + j * UNITS_PER_WORD - dst_autoinc);
>>>> mem = adjust_automodify_address (dstbase, SImode, addr,
>>>> dstoffset + j * UNITS_PER_WORD);
>>>> - emit_insn (gen_unaligned_storesi (mem, regs[j]));
>>>> + if (dst_aligned)
>>>> + emit_move_insn (mem, regs[j]);
>>>> + else
>>>> + emit_insn (gen_unaligned_storesi (mem, regs[j]));
>>>> }
>>>> dstoffset += words * UNITS_PER_WORD;
>>>> }
>>>>
>>>>
>>>> Ok?
>>>>
>>>> Can someone spin this through an arm test suite run for me, I was
>>>> doing this by inspection and cross compile on a system with no arm
>>>> bits. Bonus points if you can check it in with the test case above
>>>> marked up as appropriate.
>>>>
>
>
> arm-memcpy-aligned.patch
>
>
> commit 77191f4224c8729d014a9150bd9364f95ff704b0
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date: Fri May 29 10:44:21 2015 +0100
>
> [ARM] arm memcpy of aligned data
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 638d659..3a33c26 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -14283,7 +14283,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
> srcoffset + j * UNITS_PER_WORD - src_autoinc);
> mem = adjust_automodify_address (srcbase, SImode, addr,
> srcoffset + j * UNITS_PER_WORD);
> - emit_insn (gen_unaligned_loadsi (regs[j], mem));
> + if (src_aligned)
> + emit_move_insn (regs[j], mem);
> + else
> + emit_insn (gen_unaligned_loadsi (regs[j], mem));
> }
> srcoffset += words * UNITS_PER_WORD;
> }
> @@ -14302,7 +14305,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
> dstoffset + j * UNITS_PER_WORD - dst_autoinc);
> mem = adjust_automodify_address (dstbase, SImode, addr,
> dstoffset + j * UNITS_PER_WORD);
> - emit_insn (gen_unaligned_storesi (mem, regs[j]));
> + if (dst_aligned)
> + emit_move_insn (mem, regs[j]);
> + else
> + emit_insn (gen_unaligned_storesi (mem, regs[j]));
> }
> dstoffset += words * UNITS_PER_WORD;
> }
> diff --git a/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c b/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c
> new file mode 100644
> index 0000000..852b391
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -save-temps" } */
> +
> +void *memcpy (void *dest, const void *src, unsigned int n);
> +
> +void foo (char *dst, int i)
> +{
> + memcpy (dst, &i, sizeof (i));
> +}
> +
> +/* { dg-final { scan-assembler-times "str\t" 1 } } */
> +/* { dg-final { scan-assembler-not "ldr\t" } } */
>