[PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.

Richard Biener richard.guenther@gmail.com
Wed Jun 7 08:33:00 GMT 2017


On Wed, Jun 7, 2017 at 10:07 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>> Hi,
>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>
>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>
>>> Thanks,
>>> bin
>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>       for -O3 and above levels.
>> I think the question is how does this generally impact the performance
>> of the generated code and to a lesser degree compile-time.
>>
>> Do you have any performance data?
> Hi Jeff,
> At this stage of the patch, only hmmer is impacted and improved
> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
> term, loop distribution is also one prerequisite transformation to
> handle bwaves (at least).  For these two impacted cases, it helps to
> resolve the gap against ICC.  I didn't check compilation time slow
> down, we can restrict it to problem with small partition number if
> that's a problem.

The source of extra compile-time will be dependence checking which
is quadratic, there is currently no limit in place on (# writes * (#
reads + # writes))
but one could easily be added.

Note that I recently added -fopt-info support for loop distribution so
it should be
possible to get an idea how many loops in SPEC are distributed and if small,
double-check them.

The cost model at this point is very conservative but due to
implementation details
distributing a loop can cause quite some arithmetic to be duplicated like for

int a[1024], b[1204];

void foo()
{
  for (int i = 0; i < 1024; ++i)
    {
       a[i] = i * i * i ... * i;
       b[i] = a[i];
    }
}

it will distribute to two loops both computing i * i * i .... rather than
reading from a[i] in the second loop.

Richard.

> Thanks,
> bin
>>
>> jeff
>>



More information about the Gcc-patches mailing list