This is the mail archive of the
mailing list for the GCC project.
Re: auto vectorization in gcc
On Thu, Jul 17, 2003 at 03:45:47PM +0300, Dorit Naishlos wrote:
> We are planning to implement auto-vectorization targeting vector
> extensions such as AltiVec. As far as we understand from past
> discussions in the gcc mailing lists
> (e.g. http://gcc.gnu.org/ml/gcc-patches/2003-05/msg02438.html),
> it looks like:
> (a) the tree-SSA branch is preparing the infrastructure for
> (b) the task of auto-vectorizaiton will probably be divided between
> the RTL level and the tree level
> (c) since the tree-level is machine independent, actual vectorization
> will take place in the RTL level
> (d) the tree-SSA branch will perform data-dependence analysis and
> loop transformations to increase vectorizability of loops,
> and somehow pass on the information to the RTL level.
> How far is this from what has really been envisioned?
> Is it indeed the case that no target specific information will be
> available at the tree level, deferring transformations that rely on
> such information to the RTL level? (for example, detecting idioms
> like "substract and saturate" if they are supported, and unrolling
> or blocking the iterations according to the vector length, etc)?
While the tree level is probably the right level to add this (and tree-ssa at
that), originally I was thinking of doing it as part of the loop unroller pass.
> We would like to start making progress on auto-vectorization
> right away, and we're trying to figure out what's the best way to do
> that. On one hand, we don't want to duplicate work, and we want to
> take a path that could take advantage of the infrastructure that is
> being developed (tree-SSA branch, rtlopt branch), (and hopefully
> would be merged into the main trunk in the future).
> On the other hand, we would *not* like to postpone this work until
> all the infrastructure is ready, and entire branches are merged.
> Therefore, we're hoping it will be possible to break vectorization
> into subtasks, whose development will be as independent as possible.
> We were thinking to start to implement the actual vectorization (or
> "packing") in the RTL level, relying on the fact that in the future
> the tree level will propagate the information on whether a
> loop/statement is parallelizable (along with other information, such
> as alignment properties). Until then, we can rely on user notations
> (similar to the "INLINE" notation for functions), or bypass some of
> the analysis using runtime guards and code versioning.
> Some limited analysis maybe required that will be dropped later once
> tree-SSA branch information starts to propagate down, but the
> intention is to minimize such duplication as much as possible, and
> focus on code that is simple with respect to array references and
> access patterns, but maybe more involved in other (machine
> dependent?) ways (contains patterns like "substract and saturate",
If you do at at the tree level, you do have to make sure that it will work
later as you descend into RTL (ie, do you even have vectors on the particular
machine in question -- what types of restrictions do vectors have, etc.)
> One thing we're not clear about is on which branch should we start
> developing the RTL level part of the vectorization - we'd like to
> use the improved loop support in the rtlopt branch, but we'd also
> like to benefit from tree-SSA branch as soon as some components
> start to become available (array analysis, dependency analysis,
> alignment analysis...).
> What is your opinion on the correct mode of work here?
> After the above questions are cleared out, the remaining major
> design decision is which vectorization scheme to follow -
> straight-line code vectorization (Superword Level Parallism (SLP)
> or the classic loop based vectorization, and how to partition it
> between the tree-level and the RTL level.
> These are preliminary thoughts, intended to continue the
> discussion on this subject. We are open to any feedback!