This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: complete_unrolli / complete_unroll


Richard Guenther wrote:
> 2009/8/19 Albert Cohen <Albert.Cohen@inria.fr>:
>> When debugging graphite, we ran into code bloat issues due to
>> pass_complete_unrolli being called very early in the non-ipa
>> optimization sequence. Much later, the full-blown pass_complete_unroll
>> is scheduled, and this one does not do any harm.
>>
>> Strangely, this early unrolling pass (tuned to only unroll inner loops)
>> is only enabled at -O3, independently of the -funroll-loops flag.
>>
>> Does anyone remember why it is there, for which platform it is useful,
>> and what are the perf regressions if we remove it?
> 
> The early loop unrolling pass is very important to remove abstraction
> penalty for C++ programs that chose not to implement manual
> unrolling by relying on the inliner and template metaprogramming.
> 
> In tramp3d you for example see (very much simplified, intermediate
> state after some inlining):
> 
>  foo (int i, int j, int k)
> {
>  double a[][][];
>  int index[3];
>  const int dX[3] = { 1, 0, 0 };
> ...
>  for (m=0; m<3; ++m)
>   index[m] = 0;
>  index[0] = i;
>  index[1] = j;
>  index[2] = k;
>   ... a[index[0]][index[1]][index[2]];
>  for (m=0; m<3; ++m)
>   index[m] += dx[m];
> ... a[index[0]][index[1]][index[2]];
> 
> etc. to access a[i][j][k] and a[i+1][j][k].
> 
> There is an absoulte need to unroll these simple loops before
> CSE otherwise loop optimizations have no chance on optimizing
> anything here.
> 
> Another benchmark that degrades considerably without early
> unrolling is 454.calculix (in fact that one was the reason to
> add this pass).
> 
>> My guess is that it may only harm... disabling or damaging the
>> effectivenesss of the (loop-level) vectorizer and increasing compilation
>> time.
> 
> No it definitely does not.  But it has one small issue in that it sometimes
> also unrolls an outermost loop IIRC, that could be fixed.

Thanks a lot for the quick and detailed response.

It is more difficult than I thought, then :-( We'll think more, and
maybe come up with yet another pass ordering proposal, but definitely
this tramp3d code deserves to be processed by graphite AFTER
unrolling+cse has done its specialization trick.

Albert


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]