This is the mail archive of the
mailing list for the GCC project.
- From: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- To: Diego Novillo <dnovillo at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org, sebastian dot pop at cri dot ensmp dot fr
- Date: Tue, 3 Oct 2006 15:44:08 +0200
- Subject: Re: Autoparallelization
- References: <20060927210927.GA30121@atrey.karlin.mff.cuni.cz> <email@example.com>
> > I have commited the following patch to parloop branch. It implements
> > automatic parallelization for simple loops; at the moment, code
> > generation only creates static schedule into fixed number of threads
> > (specified at compile time). This should more or less be the version
> > I would like to get merged to 4.3 (I probably won't have time to
> > implement more features in following few weeks).
> I have started going through the code. One question regarding the
> approach: Why didn't you folks generate OMP_PARALLEL/OMP_FOR? The
> intent was for auto-parallelization passes to target the OMP IL, not
I thought about this and considered it a bit easier to start by
targetting libgomp directly, at least for the initial implementation.
Note that it does not necessarily mean much code duplication -- we still
mostly use the same functions as the omp lowering pass (with a few extra
wrappers in tree-parloop.c to ensure that the data structures used
during the optimizations, most importantly loop structures and
dominators, are preserved).
There is a functionality duplication in the loop schedule generation,
which is something I would like to fix; I believe it would be nice to
have exactly the same functions to do the code generation as the
expansion code for OMP_FOR. If this were possible, it would not make
much sense to me to generate the OMP_PARALLEL/OMP_FOR statements just to
lower them immediatelly, considering both the compile time overheads --
an extra lower_omp pass would be necessary -- and complexity of the
changes to omp lowering code to make it work during optimizations (many
of them to the parts that would not be used by the parallelization