This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions


On Mon, Oct 16, 2017 at 2:56 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Oct 12, 2017 at 2:43 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> This patch merges adjacent memset builtin partitions if possible.  It is
>>> a useful special case optimization transforming below code:
>>>
>>> #define M (256)
>>> #define N (512)
>>>
>>> struct st
>>> {
>>>   int a[M][N];
>>>   int c[M];
>>>   int b[M][N];
>>> };
>>>
>>> void
>>> foo (struct st *p)
>>> {
>>>   for (unsigned i = 0; i < M; ++i)
>>>     {
>>>       p->c[i] = 0;
>>>       for (unsigned j = N; j > 0; --j)
>>>         {
>>>           p->a[i][j - 1] = 0;
>>>           p->b[i][j - 1] = 0;
>>>         }
>>>     }
>>>
>>> into a single memset function call, rather than three calls initializing
>>> the structure field by field.
>>>
>>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
>>
>> +      /* Insertion sort is good enough for the small sub-array.  */
>> +      for (k = i + 1; k < j; ++k)
>> +       {
>> +         part1 = (*partitions)[k];
>> +         for (l = k; l > i; --l)
>> +           {
>> +             part2 = (*partitions)[l - 1];
>> +             if (part1->builtin->dst_base_offset
>> +                   >= part2->builtin->dst_base_offset)
>> +               break;
>> +
>> +             (*partitions)[l] = part2;
>> +           }
>> +         (*partitions)[l] = part1;
>> +       }
>>
>> so we want to sort [i, j[ after dst_base_offset.  I realize you don't want
>> to write a qsort helper for this but I can't wrap my head around the above
>> in 5 minutes so ... please!
> Hmm, I thought twice about this and now I believe stable sorting (thus
> insertion sort)
> is required here.  Please see below for explanation.
>
>>
>> You don't seem to check the sizes of the memsets given that they
>> obviously cannot overlap(!?)
> Yes, given it's quite special case transformation, I did add code
> checking overlap cases.
>>
>> Also why handle memset and not memcpy/memmove or combinations of
>> them (for sorting)?
>>
>>   for ()
>>    {
>>       p->a[i] = 0;
>>       p->c[i] = q->c[i];
>>       p->b[i] = 0;
>>    }
>>
>> with a and b adjacent.  I suppose p->c could be computed by a
>> non-builtin partition as well.
> Yes, the two memset builtin partitions can be merged in this case, but...
>> So don't we want to see if dependences allow sorting all builtin
>> partitions next to each other
>> as much as possible?  (maybe we do that already?)
> The answer for this, above partition merging and use of qsort is no.
> I think all the three are the same question here.  For now we only do
> topological sort for partitions.  To maximize parallelism (either by merging
> normal parallel partitions or merging builtin partitions) requires fine-tuned
> sorting between partitions that doesn't dependence on each other.
> In order to sort all memset/memcpy/memmove, we need check dependence
> between all data references between different partitions.  For example, I
> created new test ldist-36.c illustrating sorting memcpy along with memset
> would generate wrong code because dependence is broken.  It's the same
> for qsort.  In extreme case, if the same array is set twice with different rhs
> value, the order between the two sets needs to be preserved.  Unfortunately,
> qsort is unstable and could reorder different sets.  This would break output
> dependence.
> At the point of this function, dependence graph is destroyed, we can't do
> much in addition to special case handling for memset.  Full solution would
> require a customized topological sorting process.
>
> So, this updated patch keeps insertion sort with additional comment explaining
> why.  Also two test cases added showing when memset partitions should be
> merged (we can't for now) and when memset partitions should not be merged.
Hmm, I can factor out the sorting loop into a function, that might
make it easier
to read.

Thanks,
bin
>
> Bootstrap and test.  Is it OK?
>
> Thanks,
> bin
>
> 2017-10-14  Bin Cheng  <bin.cheng@arm.com>
>
> * tree-loop-distribution.c (tree-ssa-loop-ivopts.h): New header file.
> (struct builtin_info): New fields.
> (classify_builtin_1): Compute and record base and offset parts for
> memset builtin partition by calling strip_offset.
> (fuse_memset_builtins): New function.
> (finalize_partitions): Fuse adjacent memset partitions by calling
> above function.
> * tree-ssa-loop-ivopts.c (strip_offset): Delete static declaration.
> Expose the interface.
> * tree-ssa-loop-ivopts.h (strip_offset): New declaration.
>
> gcc/testsuite/ChangeLog
> 2017-10-14  Bin Cheng  <bin.cheng@arm.com>
>
> * gcc.dg/tree-ssa/ldist-17.c: Adjust test string.
> * gcc.dg/tree-ssa/ldist-32.c: New test.
> * gcc.dg/tree-ssa/ldist-35.c: New test.
> * gcc.dg/tree-ssa/ldist-36.c: New test.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]