This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [cfg-branch] unroll-new cleanups


> > > Hello.
> > > 
> > > > > To be able to handle overflow, we must at least ensure that number of unrolling
> > > > > is a power of 2.
> > > > 
> > > > Why?
> > > 
> > > Consider
> > > 
> > > for (i=1;i!=0;i++)
> > >   do_something ();
> > > 
> > > if you unroll this (say) 3 times, in which copy will it exit? It depends on
> > > what is range of i's type modulo 3; perhaps it can be handled somehow, but it
> > > is much easier to unroll it with number of unrollings being power of two, when
> > > we use the fact that this number is always 0.
> > 
> > Is this corect
> >   If I will set number of unrollings to 4, and we unroll the "usual way"
> >
> > for (i=1;i!=0;i+=4)
> > {
> >   do_something ();
> >   do_something ();
> >   do_something ();
> >   do_something ();
> > }
> > 
> > it will never get to zero...
> 
> But we do not unroll it this way, of course. What we currently do
> (after lot of simplifying) is
> 
> do_something ();
> do_something ();
> do_something ();
> i=4;
> for (;i!=0;i+=4)
>  {
>    do_something ();
>    do_something ();
>    do_something ();
>    do_something ();
>  }
> 
> > I think we should compute that the loop has 0xff iterations (lets count in
> > unsigned chars) and unroll by something that divides 255, lets say 5 times.
> 
> This makes no sense to me, sorry. Could you explain more precisely?

We can avoid the peeling in the front of the loop and do simply:
for (i=1;i!=0;i+=5)
 {
   do_something ();
   do_something ();
   do_something ();
   do_something ();
   do_something ();
 }

This results in faster loop with fewer overall copies.  In most cases we can
find small divisor of niter so this is possible, in other case we can do little
peeling.

This scheme may however be loss when we have some other requirement on number
of iterations, lets say we want to consume one cache line at time, but at
the moment we can't do that at all.

But still I think both schemes can be implemented from the fact that
the loop body will repeat 0xff times.

OK, I guess I need to read unroll_simple_loop more closely.

Honza
> 
> Zdenek


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]