This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

auto vectorization in gcc


Hi,

We are planning to implement auto-vectorization targeting vector
extensions such as AltiVec. As far as we understand from past
discussions in the gcc mailing lists
(e.g. http://gcc.gnu.org/ml/gcc-patches/2003-05/msg02438.html),
it looks like:
(a) the tree-SSA branch is preparing the infrastructure for
    auto-vectorization
(b) the task of auto-vectorizaiton will probably be divided between
    the RTL level and the tree level
(c) since the tree-level is machine independent, actual vectorization
    will take place in the RTL level
(d) the tree-SSA branch will perform data-dependence analysis and
    loop transformations to increase vectorizability of loops,
    and somehow pass on the information to the RTL level.

How far is this from what has really been envisioned?
Is it indeed the case that no target specific information will be
available at the tree level, deferring transformations that rely on
such information to the RTL level? (for example, detecting idioms
like "substract and saturate" if they are supported, and unrolling
or blocking the iterations according to the vector length, etc)?


We would like to start making progress on auto-vectorization
right away, and we're trying to figure out what's the best way to do
that. On one hand, we don't want to duplicate work, and we want to
take a path that could take advantage of the infrastructure that is
being developed (tree-SSA branch, rtlopt branch), (and hopefully
would be merged into the main trunk in the future).
On the other hand, we would *not* like to postpone this work until
all the infrastructure is ready, and entire branches are merged.
Therefore, we're hoping it will be possible to break vectorization
into subtasks, whose development will be as independent as possible.

We were thinking to start to implement the actual vectorization (or
"packing") in the RTL level, relying on the fact that in the future
the tree level will propagate the information on whether a
loop/statement is parallelizable (along with other information, such
as alignment properties). Until then, we can rely on user notations
(similar to the "INLINE" notation for functions), or bypass some of
the analysis using runtime guards and code versioning.
Some limited analysis maybe required that will be dropped later once
tree-SSA branch information starts to propagate down, but the
intention is to minimize such duplication as much as possible, and
focus on code that is simple with respect to array references and
access patterns, but maybe more involved in other (machine
dependent?) ways (contains patterns like "substract and saturate",
etc...).

One thing we're not clear about is on which branch should we start
developing the RTL level part of the vectorization - we'd like to
use the improved loop support in the rtlopt branch, but we'd also
like to benefit from tree-SSA branch as soon as some components
start to become available (array analysis, dependency analysis,
alignment analysis...).

What is your opinion on the correct mode of work here?

After the above questions are cleared out, the remaining major
design decision is which vectorization scheme to follow -
straight-line code vectorization (Superword Level Parallism (SLP)
http://www.acm.org/pubs/citations/proceedings/pldi/349299/p145-larsen/),
or the classic loop based vectorization, and how to partition it
between the tree-level and the RTL level.

These are preliminary thoughts, intended to continue the
discussion on this subject. We are open to any feedback!

thanks,
dorit


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]