This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: auto vectorization in gcc
- From: Daniel Berlin <dberlin at dberlin dot org>
- To: Dorit Naishlos <DORIT at il dot ibm dot com>
- Cc: Richard Henderson <rth at redhat dot com>, dje at watson dot ibm dot com, dnovillo at redhat dot com, aldyh at redhat dot com, law at redhat dot com, joern dot rennecke at superh dot com, gcc-mail at the-meissners dot org, gcc at gcc dot gnu dot org
- Date: Mon, 21 Jul 2003 12:45:42 -0400 (EDT)
- Subject: Re: auto vectorization in gcc
- References: <OFE94F7A16.A7B6BDCE-ONC2256D6A.003FE79D-C2256D6A.0041459C@telaviv.ibm.com>
On Mon, 21 Jul 2003, Dorit Naishlos wrote:
>
> Thanks very much for the responsiveness!
>
> > The tree level is *more* capable than the rtl level at representing
> > vector types (and thus operations). I think all we need is some
> > small amount of info from the target about vector widths and memory
> > blocking, and then the transformation should happen at the tree level.
>
> I wonder if the target info that you suggest to expose to the tree level
> would suffice. In many cases code sequences that are perfectly
> parallelizable with respect to data dependences, will not benefit from
> vectorization. In order to avoid making really poor decisions, you want
> to have at least the following information exposed:
The tree-ssa level should be looked at as more of a mid-level than a high
level, since it is a lowered form of high level, language specific trees.
Other compilers perform these optimizations on a form quite like tree-ssa,
*not* at a low-level, RTL like form. Thus, if there is target
specific information necessary to optimizations in a "good" manner, this
information should be made available.
The answer is not to try to pass around more information to a lower
level, which necesarrily loses information, but instead provide the
necessary target specific information to the higher level, which can use
it *without* any loss of information.
In short, if there is target information tree-ssa level optimizations need
to know in order to perform optimizations effectively, this information
should be provided to them.
To give another example, we can't perform cache blocking optimizations
effectively without knowing something of the target's cache structure.
This does not mean we should either perform these optimizations on RTL
(which is well near certain death), or attempt to pass the necessary
information about which loops should be cache blocked to the RTL level,
which can't make good use of it anyway. The proper solution is to provide
the information about the cache structure of the target to the higher
level.
Doing auto-vectorization at the RTL level is a bad idea.
No compiler I know of attempts to perform these high-level optimizations
at such a low level, through passing information.
It's a recipe for a mess.
--Dan