This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Bin Cheng <Bin dot Cheng at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Thu, 12 Oct 2017 15:43:11 +0200
- Subject: Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions
- Authentication-results: sourceware.org; auth=none
- References: <DB5PR0801MB2742A4140603EA9F5B90C50AE7700@DB5PR0801MB2742.eurprd08.prod.outlook.com>
On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This patch merges adjacent memset builtin partitions if possible. It is
> a useful special case optimization transforming below code:
>
> #define M (256)
> #define N (512)
>
> struct st
> {
> int a[M][N];
> int c[M];
> int b[M][N];
> };
>
> void
> foo (struct st *p)
> {
> for (unsigned i = 0; i < M; ++i)
> {
> p->c[i] = 0;
> for (unsigned j = N; j > 0; --j)
> {
> p->a[i][j - 1] = 0;
> p->b[i][j - 1] = 0;
> }
> }
>
> into a single memset function call, rather than three calls initializing
> the structure field by field.
>
> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
+ /* Insertion sort is good enough for the small sub-array. */
+ for (k = i + 1; k < j; ++k)
+ {
+ part1 = (*partitions)[k];
+ for (l = k; l > i; --l)
+ {
+ part2 = (*partitions)[l - 1];
+ if (part1->builtin->dst_base_offset
+ >= part2->builtin->dst_base_offset)
+ break;
+
+ (*partitions)[l] = part2;
+ }
+ (*partitions)[l] = part1;
+ }
so we want to sort [i, j[ after dst_base_offset. I realize you don't want
to write a qsort helper for this but I can't wrap my head around the above
in 5 minutes so ... please!
You don't seem to check the sizes of the memsets given that they
obviously cannot overlap(!?)
Also why handle memset and not memcpy/memmove or combinations of
them (for sorting)?
for ()
{
p->a[i] = 0;
p->c[i] = q->c[i];
p->b[i] = 0;
}
with a and b adjacent. I suppose p->c could be computed by a
non-builtin partition as well.
So don't we want to see if dependences allow sorting all builtin
partitions next to each other
as much as possible? (maybe we do that already?)
Thanks,
Richard.
> Thanks,
> bin
> 2017-10-04 Bin Cheng <bin.cheng@arm.com>
>
> * tree-loop-distribution.c (tree-ssa-loop-ivopts.h): New header file.
> (struct builtin_info): New fields.
> (classify_builtin_1): Compute and record base and offset parts for
> memset builtin partition by calling strip_offset.
> (fuse_memset_builtins): New function.
> (finalize_partitions): Fuse adjacent memset partitions by calling
> above function.
> * tree-ssa-loop-ivopts.c (strip_offset): Delete static declaration.
> Expose the interface.
> * tree-ssa-loop-ivopts.h (strip_offset): New declaration.
>
> gcc/testsuite/ChangeLog
> 2017-10-04 Bin Cheng <bin.cheng@arm.com>
>
> * gcc.dg/tree-ssa/ldist-17.c: Adjust test string.
> * gcc.dg/tree-ssa/ldist-32.c: New test.