This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions


On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This patch merges adjacent memset builtin partitions if possible.  It is
> a useful special case optimization transforming below code:
>
> #define M (256)
> #define N (512)
>
> struct st
> {
>   int a[M][N];
>   int c[M];
>   int b[M][N];
> };
>
> void
> foo (struct st *p)
> {
>   for (unsigned i = 0; i < M; ++i)
>     {
>       p->c[i] = 0;
>       for (unsigned j = N; j > 0; --j)
>         {
>           p->a[i][j - 1] = 0;
>           p->b[i][j - 1] = 0;
>         }
>     }
>
> into a single memset function call, rather than three calls initializing
> the structure field by field.
>
> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?

+      /* Insertion sort is good enough for the small sub-array.  */
+      for (k = i + 1; k < j; ++k)
+       {
+         part1 = (*partitions)[k];
+         for (l = k; l > i; --l)
+           {
+             part2 = (*partitions)[l - 1];
+             if (part1->builtin->dst_base_offset
+                   >= part2->builtin->dst_base_offset)
+               break;
+
+             (*partitions)[l] = part2;
+           }
+         (*partitions)[l] = part1;
+       }

so we want to sort [i, j[ after dst_base_offset.  I realize you don't want
to write a qsort helper for this but I can't wrap my head around the above
in 5 minutes so ... please!

You don't seem to check the sizes of the memsets given that they
obviously cannot overlap(!?)

Also why handle memset and not memcpy/memmove or combinations of
them (for sorting)?

  for ()
   {
      p->a[i] = 0;
      p->c[i] = q->c[i];
      p->b[i] = 0;
   }

with a and b adjacent.  I suppose p->c could be computed by a
non-builtin partition as well.
So don't we want to see if dependences allow sorting all builtin
partitions next to each other
as much as possible?  (maybe we do that already?)

Thanks,
Richard.


> Thanks,
> bin
> 2017-10-04  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-loop-distribution.c (tree-ssa-loop-ivopts.h): New header file.
>         (struct builtin_info): New fields.
>         (classify_builtin_1): Compute and record base and offset parts for
>         memset builtin partition by calling strip_offset.
>         (fuse_memset_builtins): New function.
>         (finalize_partitions): Fuse adjacent memset partitions by calling
>         above function.
>         * tree-ssa-loop-ivopts.c (strip_offset): Delete static declaration.
>         Expose the interface.
>         * tree-ssa-loop-ivopts.h (strip_offset): New declaration.
>
> gcc/testsuite/ChangeLog
> 2017-10-04  Bin Cheng  <bin.cheng@arm.com>
>
>         * gcc.dg/tree-ssa/ldist-17.c: Adjust test string.
>         * gcc.dg/tree-ssa/ldist-32.c: New test.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]