machine learning for loop unrolling

Kenneth Hoste kenneth.hoste@elis.ugent.be
Fri Jun 8 19:04:00 GMT 2007


On 08 Jun 2007, at 16:31, Stefan Ciobaca wrote:

> Hello everyone,
>
> For my bachelor thesis I'm modifying gcc to use machine learning to
> predict the optimal unroll factor for different loops (inspired from
> this paper: http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS- 
> TR-938.pdf).

Interesting. I'm using evolutionary algorithms for similar purposes  
in my current research...

> <snip>
>
> Of course, not all of these are relevant to gcc. I'm looking at ways
> to compute some of these features, hopefully the most relevant ones.
> If there is already some work done that I could use in order to
> compute some of these features, I'd be glad if you could tell me about
> it. Also, if anyone can think of some useful features, related to the
> target architecture or the loops structure, I'd be glad to hear about
> them.

I'm afraid I can't help here, I'm not familiar at all with GCCs  
internals.

>
> Also, I'm interested in some benchmarks. Many of the research papers
> that describe compiler optimizations use the SPEC* benchmarks, but
> these are not free, so I'm looking for alternatives. Currently I'm
> looking into:
>
> - OpenBench
> - Botan
> - CSiBE
> - Polyhedron
>
> (thanks to richi of #gcc for the last 3)
>
> Do you know any other one that would be better?

But I can help here. Polyhedron is Fortran-only, but are well-suited  
for timing experiments (i.e. they run long enough to have reasonable  
running times, but aren't too long either).
CSiBE is more targetted to code size, I believe the runtimes are  
ridicously small. I'm not familiar with the other two.
Some other possibilities:

* MiDataSets (also fairly small when run only once, but the suite  
allows you to adjust the outer loop iteration count to increase  
runtimes) [http://sourceforge.net/projects/midatasets]
* MediaBench / MediaBench II: multimedia workloads, which typically  
iterate over frames for example [http://euler.slu.edu/~fritts/ 
mediabench/]
* BioMetricsWorkload [http://www.ideal.ece.ufl.edu/main.php?action=bmw]
* BioPerf: gene sequence analysis, ... [http://www.bioperf.org/]
* some other benchmarks commonly used when testing GCC [http:// 
www.suse.de/~gcctest]

I've been using the above with GCC and most work pretty well (on x86).

>
> Here is how I'm thinking of conducting the experiment:
>
> - for each innermost loop:
>   - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and
> measure the time the benchmark takes
>   - write down the loop features and the best unroll factor
> - apply some machine learning technique to the above data to determine
> the correlations between loop features and best unroll factor

Any idea which? There's a huge number of different techniques out  
there, choosing an appropiate one is critical to success.

> - integrate the result into gcc and measure the benchmarks again

When using machine learning techniques to build some kind of model, a  
common technique is crossvalidation.
Say you have 20 benchmarks, no matter which ones. You use the larger  
part of those (for example 15) to build the model (i.e. determine the  
correlations between loop features and best unroll factor), and then  
test performance of that on the other ones. The important thing is  
not to use the benchmarks you test with when using the machine  
learning technique. That way, you can (hopefully) show that the stuff  
you've learned should work for other programs too.

>
> Do you think it is ok to only consider inner-most loops? What about
> the unroll factors? Should I consider bigger unroll factors? Do you
> think the above setup is ok?

I'd say: don't be afraid to try too much. Try insane unroll factors  
too, just for testing purposes. You don't want to limit yourself to  
32x when the optimal could be 64x for example.

>
> I welcome any feedback on this.

Who is your advisor on this? Where are you doing your bachelors thesis?

greetings,

Kenneth

-- 

Computer Science is no more about computers than astronomy is about  
telescopes. (E. W. Dijkstra)

Kenneth Hoste
ELIS - Ghent University
email: kenneth.hoste@elis.ugent.be
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste




More information about the Gcc mailing list