This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Implement movmem for the benefit of inline memcpy
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Andrew Pinski <pinskia at gmail dot com>
- Cc: James Greenhalgh <james dot greenhalgh at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <marcus dot shawcroft at arm dot com>
- Date: Fri, 1 Aug 2014 11:04:55 +0200
- Subject: Re: [AArch64] Implement movmem for the benefit of inline memcpy
- Authentication-results: sourceware.org; auth=none
- References: <1402044609-30902-1-git-send-email-james dot greenhalgh at arm dot com> <CA+=Sn1mmiADzajZ2r_0Va0HpLNvoF3maF5H00d+rSmyNeYHNJw at mail dot gmail dot com>
On Fri, Aug 1, 2014 at 8:38 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Fri, Jun 6, 2014 at 1:50 AM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
>>
>> Hi,
>>
>> The move_by_pieces infrastructure performs a copy by repeatedly trying
>> the largest safe copy it can make. So for a 15-byte copy we might see:
>>
>> offset amount bytes copied
>> 0 8 0-7
>> 8 4 8-11
>> 12 2 12-13
>> 14 1 14
>>
>> However, we can implement a 15-byte copy as so:
>>
>> offset amount bytes copied
>> 0 8 0-7
>> 7 8 7-14
>>
>> Which can prove more efficient for both space and speed.
>>
>> In this patch we set MOVE_RATIO low to avoid using move_by_pieces, and
>> implement the movmem pattern name to expand small block copy cases. Note, this
>> optimization does not apply for -mstrict-align targets, which must continue
>> copying byte-by-byte.
>>
>> Setting MOVE_RATIO low in this way causes a few tests to begin failing,
>> both of these are documented in the test-case as expected to fail for
>> low MOVE_RATIO targets, which do not allow certain Tree-Level optimizations.
>
>
> I think you should reevaluate setting MOVE_RATIO this low. It is used
> for SRA and IPA-SRA which both are very useful; more useful than
> memmove optimizations can do.
Maybe we should finally decouple SRA and IPA-SRA from MOVE_RATIO
and have a --param to control the heuristic a target can adjust separately.
(we can still default to MOVE_RATIO here). Ok, maybe we need
two params, one for size and one for speed optimization (though
the size effects are hard to estimate).
Richard.
> In fact this optimization is not even valid for volatile variables.
> Here is a testcase for the volatile issue:
> struct __attribute__((packed)) t15{
> long long t8;
> int t4;
> short t2;
> unsigned char t1;
> };
> volatile struct t15 t15;
> int f(struct t15 *a)
> {
> t15 = *a;
> }
>
> Notice how we are writing to byte 7 twice to t15 in the outputted code.
>
> Thanks,
> Andrew Pinski
>
>>
>> Bootstrapped on aarch64-unknown-linux-gnu with no issues.
>>
>> OK for trunk?
>>
>> Thanks,
>> James
>>
>> ---
>> gcc/
>>
>> 2014-06-06 James Greenhalgh <james.greenhalgh@arm.com>
>>
>> * config/aarch64/aarch64-protos.h (aarch64_expand_movmem): New.
>> * config/aarch64/aarch64.c (aarch64_move_pointer): New.
>> (aarch64_progress_pointer): Likewise.
>> (aarch64_copy_one_part_and_move_pointers): Likewise.
>> (aarch64_expand_movmen): Likewise.
>> * config/aarch64/aarch64.h (MOVE_RATIO): Set low.
>> * config/aarch64/aarch64.md (movmem<mode>): New.
>>
>> gcc/testsuite/
>>
>> 2014-06-06 James Greenhalgh <james.greenhalgh@arm.com>
>>
>> * gcc.dg/tree-ssa/pr42585.c: Skip for AArch64.
>> * gcc.dg/tree-ssa/sra-12.c: Likewise.