This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AArch64] Implement movmem for the benefit of inline memcpy


On Fri, Aug 1, 2014 at 8:38 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Fri, Jun 6, 2014 at 1:50 AM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
>>
>> Hi,
>>
>> The move_by_pieces infrastructure performs a copy by repeatedly trying
>> the largest safe copy it can make. So for a 15-byte copy we might see:
>>
>> offset   amount  bytes copied
>> 0        8       0-7
>> 8        4       8-11
>> 12       2       12-13
>> 14       1       14
>>
>> However, we can implement a 15-byte copy as so:
>>
>> offset   amount  bytes copied
>> 0        8       0-7
>> 7        8       7-14
>>
>> Which can prove more efficient for both space and speed.
>>
>> In this patch we set MOVE_RATIO low to avoid using move_by_pieces, and
>> implement the movmem pattern name to expand small block copy cases. Note, this
>> optimization does not apply for -mstrict-align targets, which must continue
>> copying byte-by-byte.
>>
>> Setting MOVE_RATIO low in this way causes a few tests to begin failing,
>> both of these are documented in the test-case as expected to fail for
>> low MOVE_RATIO targets, which do not allow certain Tree-Level optimizations.
>
>
> I think you should reevaluate setting MOVE_RATIO this low.  It is used
> for SRA and IPA-SRA which both are very useful; more useful than
> memmove optimizations can do.

Maybe we should finally decouple SRA and IPA-SRA from MOVE_RATIO
and have a --param to control the heuristic a target can adjust separately.
(we can still default to MOVE_RATIO here).  Ok, maybe we need
two params, one for size and one for speed optimization (though
the size effects are hard to estimate).

Richard.

> In fact this optimization is not even valid for volatile variables.
> Here is a testcase for the volatile issue:
> struct __attribute__((packed)) t15{
>   long long t8;
>   int t4;
>   short t2;
>   unsigned char t1;
> };
> volatile struct t15 t15;
> int f(struct t15 *a)
> {
>   t15 = *a;
> }
>
> Notice how we are writing to byte 7 twice to t15 in the outputted code.
>
> Thanks,
> Andrew Pinski
>
>>
>> Bootstrapped on aarch64-unknown-linux-gnu with no issues.
>>
>> OK for trunk?
>>
>> Thanks,
>> James
>>
>> ---
>> gcc/
>>
>> 2014-06-06  James Greenhalgh  <james.greenhalgh@arm.com>
>>
>>         * config/aarch64/aarch64-protos.h (aarch64_expand_movmem): New.
>>         * config/aarch64/aarch64.c (aarch64_move_pointer): New.
>>         (aarch64_progress_pointer): Likewise.
>>         (aarch64_copy_one_part_and_move_pointers): Likewise.
>>         (aarch64_expand_movmen): Likewise.
>>         * config/aarch64/aarch64.h (MOVE_RATIO): Set low.
>>         * config/aarch64/aarch64.md (movmem<mode>): New.
>>
>> gcc/testsuite/
>>
>> 2014-06-06  James Greenhalgh  <james.greenhalgh@arm.com>
>>
>>         * gcc.dg/tree-ssa/pr42585.c: Skip for AArch64.
>>         * gcc.dg/tree-ssa/sra-12.c: Likewise.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]