This is the mail archive of the
mailing list for the GCC project.
Re: auto vectorization in gcc
- From: "Dorit Naishlos" <DORIT at il dot ibm dot com>
- To: law at redhat dot com
- Cc: Daniel Berlin <dberlin at dberlin dot org>, David Edelsohn <dje at watson dot ibm dot com>, Diego Novillo <dnovillo at redhat dot com>, gcc at gcc dot gnu dot org, Richard Henderson <rth at redhat dot com>
- Date: Mon, 21 Jul 2003 16:54:56 +0300
- Subject: Re: auto vectorization in gcc
>I suspect if you look hard at trying to do even the simple vectorization
>algorithms such as SLP on RTL you're going to find that it's exceedingly
>difficult. You've got to deal with the alignment issues, memory
>analysis, etc. I looked pretty hard at this a few years back.
That's exactly why I want the tree level to provide me with
all that information. Apply all the array, alignment and
dependence analyses at the tree level, and then all you need
to do at the RTL level is unroll and pack loops/statements
that have been marked as vectorizable.
Based on your experience from looking into this issue, and
assuming that it is feasible to develop a mechanism to
propagate and maintain this information (is it...?),
are there other problems that you foresee with respect to
implementing the remaining transformations (unroll and pack)
at the RTL level?
To: David Edelsohn <firstname.lastname@example.org>
17/07/2003 21:25 cc: Daniel Berlin <email@example.com>, Richard Henderson <firstname.lastname@example.org>,
Please respond to Dorit Naishlos/Haifa/IBM@IBMIL, email@example.com, Diego Novillo
Subject: Re: auto vectorization in gcc
In message <200307171815.OAA28902@makai.watson.ibm.com>, David Edelsohn
>>>>>> Daniel Berlin writes:
>Dan> Especially since it's a matter of just making tree nodes with a
>Dan> type, and performing standard tree ops on them.
>Dan> IE PLUS_EXPR of two trees with types of V4SF_type_node should work
>Dan> fine, and convert to the right vector RTL.
>Dan> If it doesn't, we should make it work.
> I believe part of the concern is alignment issues. If the
>was not originally written with vector types, we cannot guarantee that
>objects will have the appropriate alignment for SIMD instructions.
But if you're working at the tree level, then you have the opportunity
to fix the alignment of some objects. It's one of the many advantages
of working at the tree level.
>We may not know this until the storage layout occurs during RTL
You've got a lot more control over alignment of objects at the tree
level than you do at the RTL level -- largely because you haven't allocated
stack space for the objects when working at the tree level (say for local
arrays). Thus you get the chance to increase the alignment *before*
you put the object onto the stack.
>we have committed to auto-vectorization in the trees and possible skipped
>some scalar loop optimizations, how do we roll back once we determine
>the SIMD instructions cannot be generated efficiently?
You can always break a SIMD instruction down to its components. In the
extreme case you could re-roll the loop if the loop unrolling is
> The issue isn't that we *have to* do auto-vectorization in
>but how do we deal with reality intervening at later compilation stages?
No, quite the opposite. It's bloody hard to do in RTL and a hell of a
lot easier to do on trees.
I suspect if you look hard at trying to do even the simple vectorization
algorithms such as SLP on RTL you're going to find that it's exceedingly
difficult. You've got to deal with the alignment issues, memory dependency
analysis, etc. I looked pretty hard at this a few years back.