Bug 19401 - Trivial loop not unrolled
Summary: Trivial loop not unrolled
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 enhancement
Target Milestone: 4.1.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, patch
Depends on:
Blocks: 11706
  Show dependency treegraph
 
Reported: 2005-01-12 16:08 UTC by Richard Biener
Modified: 2005-05-07 19:56 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-05-01 04:18:28


Attachments
patch (521 bytes, patch)
2005-01-13 12:21 UTC, Richard Biener
Details | Diff
Enable cunroll with -fpeel-loops, too (343 bytes, patch)
2005-01-20 12:43 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2005-01-12 16:08:00 UTC
We do not unroll the loop in

double foo(double __x)
{
        unsigned int __n = 2;
        double __y = __n % 2 ? __x : 1;

        while (__n >>= 1)
          {
            __x = __x * __x;
            if (__n % 2)
              __y = __y * __x;
          }

        return __y;
}

with -O2 which causes us to emit gratiously worse code
for std::pow(x, 2) than for std::pow(x, 2.0).

We should definitely get this right without -funroll-loops
and all its side-effects.
Comment 1 Andrew Pinski 2005-01-12 16:13:53 UTC
Really -funroll-loops should be added to -O3 by default and maybe a tweewed version to -O2/-Os 
(one which only unroll loops which can be unrolled fully and don't produce more code than the loop 
itself).
Comment 2 Richard Biener 2005-01-12 16:19:27 UTC
In 3.4 one was able to do this by specifying -fpeel-loops and got complete loop
peeling enabled.  In 4.0 this is also the case, but only for the RTL unroller -
the tree unroller is not affected and as such _this_ unrolling does not help
PR11706.  - Just another datapoint.
Comment 3 Andrew Pinski 2005-01-12 16:21:12 UTC
Another trivial loop which is not unrolled but should be:

int f(int x)
{
  int i, x1;
  x1 = x;
  for(i = 0;i<2;i++)
   x1 *= x1;
  return x1;
}
Comment 4 Richard Biener 2005-01-12 16:24:29 UTC
Or stuff often found in C++ libraries:

template <int Dim>
struct Vector
{
  Vector(float init)
  {
    for (int i=0; i<Dim; ++i)
      v[i] = init;
  }
  float v[Dim];
};

for small Dim (<= 4 should make nearly everyone happy) you want all the
dimension-unaware loops to be unrolled.
Comment 5 Gabriel Dos Reis 2005-01-12 17:05:48 UTC
Subject: Re:  Trivial loop not unrolled

"pinskia at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:

| Really -funroll-loops should be added to -O3 by default and maybe a
| tweewed version to -O2/-Os  

I don't think the issue is to have -funroll-loops as default at higher
level. I think it is more about the compiler understanding loops with
constant bounds, just like we handle if(0) or if(1) branches.

-- Gaby
Comment 6 Gabriel Dos Reis 2005-01-12 17:06:34 UTC
Subject: Re:  Trivial loop not unrolled

"rguenth at tat dot physik dot uni-tuebingen dot de" <gcc-bugzilla@gcc.gnu.org> writes:

| Or stuff often found in C++ libraries:
| 
| template <int Dim>
| struct Vector
| {
|   Vector(float init)
|   {
|     for (int i=0; i<Dim; ++i)
|       v[i] = init;
|   }
|   float v[Dim];
| };
| 
| for small Dim (<= 4 should make nearly everyone happy) you want all the
| dimension-unaware loops to be unrolled.

I agree.

-- Gaby
Comment 7 Richard Biener 2005-01-13 12:21:03 UTC
Created attachment 7950 [details]
patch

This patch enables complete unrolling on the tree level by default.
Comment 8 Zdenek Dvorak 2005-01-14 09:04:51 UTC
Other possible patch:

http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00796.html
Comment 9 Richard Biener 2005-01-20 12:43:06 UTC
Created attachment 8006 [details]
Enable cunroll with -fpeel-loops, too

Trivial patch to bring us back to 3.4 usability while at the same time
benefiting from tree level complete unrolling.	First patch is obsoleted by
Zdeneks, not this one.
Comment 10 Richard Biener 2005-01-24 09:43:22 UTC
Another one - matrix multiplication:

/* A [NxM], B [MxP] */
#define DOLOOP(N, M, P) \
void matmul ## N ## M ## P(double *res, const double *A, const double *B) \
{ \
        int i,j,k; \
        for (k=0; k<P; ++k) \
                for (i=0; i<N; ++i) { \
                        double s = 0.0; \
                        for (j=0; j<M; ++j) \
                                s += A[i*M+j] * B[j*P+k]; \
                        res[i*P+k] = s; \
                } \
}

DOLOOP(1, 1, 1)
DOLOOP(2, 1, 2)
DOLOOP(1, 2, 1)
DOLOOP(2, 2, 2)
DOLOOP(1, 3, 1)
DOLOOP(1, 1024, 1)

all up to 2x2 should be profitable to completely unroll.  Be sure to unroll
one-time rolling loops like for the last case.

Zdeneks patch only does not unroll the DOLOOP(2, 2, 2) case at -O2. Good.
Comment 11 CVS Commits 2005-05-06 21:11:34 UTC
Subject: Bug 19401

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rakdver@gcc.gnu.org	2005-05-06 21:11:29

Modified files:
	gcc            : ChangeLog tree-flow.h tree-ssa-loop-ivcanon.c 
	                 tree-ssa-loop.c 

Log message:
	PR tree-optimization/19401
	* tree-flow.h (tree_unroll_loops_completely): Declaration changed.
	* tree-ssa-loop-ivcanon.c (enum unroll_level): New.
	(estimated_unrolled_size): New function.
	(try_unroll_loop_completely, canonicalize_loop_induction_variables,
	tree_unroll_loops_completely): Always unroll loops if the code size
	does not increase.
	* tree-ssa-loop.c (tree_complete_unroll): Indicate whether all
	loops should be unrolled completely.
	(gate_tree_complete_unroll): Run complete unrolling unconditionally.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.8629&r2=2.8630
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.101&r2=2.102
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivcanon.c.diff?cvsroot=gcc&r1=2.11&r2=2.12
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop.c.diff?cvsroot=gcc&r1=2.28&r2=2.29

Comment 12 Andrew Pinski 2005-05-07 19:56:41 UTC
Fixed.