This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: machine learning for loop unrolling
- From: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
- To: Stefan Ciobaca <stefan dot ciobaca at gmail dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Sat, 9 Jun 2007 02:01:27 +0200
- Subject: Re: machine learning for loop unrolling
- References: <165ddd80706080630y5579dc89u39867f649b18c72e@mail.gmail.com>
Hello,
> The number of floating point ops. in loop body.
> The number of memory ops. in loop body.
> The number of operands in loop body.
> The number of implicit instructions in loop body.
> The number of unique predicates in loop body.
> The number of indirect references in loop body.
> The number of uses in the loop.h
> The number of defs. in the loop.
you have to scan insns in loop body, and check what they do
for this.
> The number of parallel "computations" in loop.
> The estimated latency of the critical path of loop.
> The estimated cycle length of loop body.
> The max. dependence height of computations.
> The max. height of memory dependencies of computations.
> The max. height of control dependencies of computations.
> The average dependence height of computations.
> The min. memory-to-memory loop-carried dependence.
> The number of memory-to-memory dependencies.
This is a bit more difficult; I guess you could persuade scheduler to
give you this information, but I have no idea how exactly. See
modulo-sched.c, it considers similar kind of information.
> The language (C or Fortran).
You may check name in langhooks.
> The tripcount of the loop (-1 if unknown).
See find_simple_exit and related functions in loop-iv.c.
> Here is how I'm thinking of conducting the experiment:
>
> - for each innermost loop:
> - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and
> measure the time the benchmark takes
> - write down the loop features and the best unroll factor
> - apply some machine learning technique to the above data to determine
> the correlations between loop features and best unroll factor
> - integrate the result into gcc and measure the benchmarks again
>
> Do you think it is ok to only consider inner-most loops?
We do not unroll non-innermost loops at the moment, so if you want to
test non-innermost loops, you would probably have to extend some loop
manipulation functions (loop unrolling was written to work for
non-innermost loops as well, but it was not well tested and thus it is
very likely buggy).
> What about
> the unroll factors? Should I consider bigger unroll factors?
I think unroll factors over 32 should not give you much more gain,
but you will see what results you will get.
> Do you think the above setup is ok?
I am somewhat skeptical; I would expect the results to be quite
target-dependent, and also to differ from program to program. Thus,
it may be somewhat hard to derive a useful general heuristics from
them. But I would be very happy to be proven wrong :-)
Zdenek