This is the mail archive of the
mailing list for the GCC project.
Re: [tree-ssa] vectorizer related issues
- From: law at redhat dot com
- To: Dorit Naishlos <DORIT at il dot ibm dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 06 Jan 2004 00:20:00 -0700
- Subject: Re: [tree-ssa] vectorizer related issues
- Reply-to: law at redhat dot com
In message <OF84E7BD0B.816A0013-ONC2256DFE.002C8227-C2256DFE.0047E4E4@il.ibm.co
m>, Dorit Naishlos writes:
>(1) Array addressing using pointers.
> As illustrated in the example above (loop2.c), the basic approach for
>handling array accesses is to generate a pointer to a vector type ('pb'),
>have it point to the array base ('pb = (v8hi *)b'), and use that to access
>the array ('vb = pb[i]'). I was hoping to get away with generating an
>ARRAY_REF expression for pb[i], but the RTL expander didn't like that. So I
>generated the following sequence of stmts instead, inspired by what the
>compiler currently generates at the tree level for the C stmt 'vb = pb[i]':
>pb = &b
>T.1 = (<unnamed type>)i_1;
>T.2 = T.1 * 16;
>T.3 = pb + T.2;
>vb = *T.3;
>It's too bad that we loose the information that these accesses were simple
>array references. Maybe there's a simple way to convince the expander to
>digest the expression pb[i] (Q1.1: is there?). What's much more worrying is
>that we loose this information even before vectorization takes place. This
>goes back to the issue mentioned in
>http://gcc.gnu.org/ml/gcc/2003-07/msg02013.html. With a lot of effort the
>vectorizer could deal with pointer arithmetic, but I think this is exactly
>the kind of effort we should save, working at the tree level.
>Is anyone planning to fix this situation? i.e., Q1.2: have the front-end
>represent p[i] as an array_ref even when p is declared a pointer, and Q1.3:
>have the RTL expander accept such array references.
>This is probably the most critical infrastructure fix required for the
>vectorizer to be applicable in the short term. If gcc continues to
>represent array references this way, it would be pretty useless to
>vectorize specifically for array accesses. Good handling of pointer
>arithmetic and pointer aliasing (which have well-known limitations) will
>then be vital.
Richard has been working on a lot of this stuff. I wouldn't be terribly
surprised to find out that we're still missing cases.
If you don't build the ARRAY_REF precisely how the rest of the compiler
expects, then all kinds of bad things happen. I ran into this when
trying to stop the compiler from building pointer arithmetic for
I'd suggest focusing on fixing whatever issues you run into when trying
to generate ARRAY_REFs for stuff like bv = pb[i]. Probably the first
step is to stop declaring pb as a pointer type since that's what ultimately
triggers the creation of pointer arithmetic instead of ARRAY_REFs in the
early parts of the compiler.
[ If it's the case that Richard's work was supposed to stop this lowering
from ARRAY_REF to pointer arithmetic, then your job would be to figure
out why Richard's fixes aren't solving your problem. See
>(2) virtual defs/uses.
> This issue goes back to an action item on the tree-ssa todo list:
>"SSA information for arrays
> The existing implementation treats arrays as an opaque object. A
>definition to an array location is treated as a definition for the whole
>Q2.1: Is anyone planning to address this issue? (so that the vdefs/vuses of
>arrays will be connected only to may/must aliased references?)
>It's not a major obstacle at the moment because I'm simply ignoring the
>virtual defs/uses, as they are currently not very informative. What I do
>when I analyze cross-iteration def-use cycles, is examine the def-use cycle
>related to each of the loop phi's. Virtual phi's are ignored, because I
>don't know whether they represent a real def-use cycle or not; I rely on
>other analysis passes to check the data references that are responsible for
>virtual defs-uses. For now, the only form of non-scalar data access I allow
>is array references. So the analysis of the loop scalar phi's together with
>array data dependence tests, suffice. In the future, when other forms of
>non-scalar accesses are allowed (structure, etc) - this workaround virtual
>defs/uses would need to be revisited.
Note it is possible to tell the difference between PHIs for virtual
operands and those for real operands. I think
is_gimple_reg (PHI_RESULT (phi))
will indicate that a particular PHI is for real operands rather than
I don't think anyone is currently working on changing the way virtual
operands work such that a store to an array memory would no longer be
treated as a write to the whole array.
In general, I don't think that's feasible since that could easily lead to
an explosion of memory tags and virtual operands. I'd be curious to know
what other compilers do for this kind of issue (particularly SGI's since
they must deal with similar issues and our memory tags/virtual operand
scheme has a lot of similarities to what SGI does).
>By the way, (Q2.2:) is there an API that provides a mapping between a
>virtual def/use and the operand it corresponds to? I guess in GIMPLE
>finding this mapping is pretty straightforward because the grammar doesn't
>allow a lot of flexibility; For example, I want to verify that any
>vuse/vdef found corresponds to an array_ref (and not structure or pointer
Not that I'm aware of. We haven't needed it so we haven't expanded the
APIs to provide this kind of detail.
I currently do that as follows: if a (GIMPLE) stmt has any vdefs
>(vuses), I check that it has exactly one vdef (vuse), I assume that it
>corresponds to op0 (op1), and I expect that the stmt is of the form: op0 =
>SSA_NAME/CONST (SSA_NAME = op1). Then I verify that op0 (op1) is an
>ARRAY_REF. Is there a better way to do that?
I think this kind of stuff would really need to be gathered during the
get_stmt_operands walk. Otherwise I would expect it to "mostly" work, but
fail in strange and wonderful ways since you're using a heuristic to recover
>Q3.3: I'm assuming that if a stmt has multiple vdefs/vuses then it's either
>a non vectorizable stmt (CALL_EXPR, asm) or there are aliasing problems
>which also prevent vectorization. Are there cases involving stmts with
>multiple vdefs/vuses that are relevant for vectorization?
I'd expect there are -- if for no other reason than we generate far more
virtual operands than we actually need to :(
>(3) loop invariants, dead code, redundant code:
> The general approach taken by the vectorizer when generating vector
>stmts, is to insert the vector code sequence corresponding to a certain
>(scalar) stmt X just before stmt X, and remove stmt X only if it has side
>effects (like changing memory). If stmt X does not have side effects, we
>rely on dead code elimination for removing it.
Sounds quite reasonable to me -- and it's what I (and others) would prefer
as a design goal.
>According to this approach, the vectorizer is currently doing the bare
>minimum, relying on subsequent optimization phases to take care of any loop
>invariants, dead code etc.
>The vectorizer can take care of these inefficiency issues by itself. Q3.1:
>should it? It's simply a design decision. Cleaning up may help subsequent
>(tree-level) optimizations and may also save compilation time and memory.
>In any case, the very first version of the vectorizer will probably
>continue to take this approach. If we decide that the vectorizer should
>cleanup after itself, I'll incorporate these optimizations into the next
>versions of the vectorizer.
I'd prefer to keep the cleanups separate. If we find that we're not cleaning
up something, then that's a good indication some other pass needs to be beefed
up. For example, we know we need a good dead store eliminator
(store meaning a store to memory rather than an assignment to a non-addressable
>(5) stmt_info data structure:
> During the analysis phases the vectorizer records some information
>per stmt. For example, a pointer to the vectorized version of each stmt;
>Say stmt1:'b=x[i]' is vectorized into vec_stmt1:'vb=px[T]'. When
>vectorizing the following stmt2:'a=b', the vectorizer has to find the def
>of 'vb'; it does so by first finding stmt1 (via 'SSA_NAME_DEF_STMT(b)'),
>then finding vec_stmt1 from stmt1 (via 'stmt_info(stmt1)'), and finally
>finding the def 'vb' from vec_stmt1.
>This scheme works for operands that are SSA_NAMEs. Operands that aren't,
>are currently limited to array references appearing in load/store
>operations, and are handled differently.
>I've actually used info-per-SSA_NAME so far, but I'm considering to switch
>to an info-per-stmt scheme since (a) I can't use it for non SSA_NAME
>operands (arrays), (b) I had to allocate an array of size
>"next_ssa_version" which is a bit wasteful considering I'm only looking at
>SSA_NAMEs that are used in certain loops, and (c) I expect to have more
>information to record in the future per stmt. My questions are:
>Q5.1: Is there an available field (in stmt/stmt_ann) for recording such
>information? I didn't find a general field, like a pointer to void, that
>each optimization pass could use for it's own purposes. Q5.2: Is it OK to
>add such a field (assuming the answer to Q5.1 is "no")?
Couldn't you put this in a hash table? Our hash tables automagically
expand as they get collisions. Meaning they tend to be reasonably
memory efficient for this kind of thing. With that kind of scheme you
could store whatever you need for use by the vectorizer, but not pollute
the statement annotation.
I wouldn't object strongly to putting a generic pointer into the
statement annotation as a stop-gap, but long term the whole annotation
concept has major problems and IMHO they need to be completely rethought.
If a pass can keep its data in some local data structure not attached to
the annotation it's going to make fixing the annotation mess a lot easier
(and make those passes much less likely to break as we muck around with
the annotation scheme).
>Q5.3: In GIMPLE, are there stmts that have more than one def, and I mean
>vectorizable stmts, not CALL_EXPRs and such? are they conceivable, even if
>not exist today?
Real defs? Hmmm, I would think the only thing that would create multiple
real defs would be calls and asms.
>(6) modeling the target vector support:
> One of the issues that I haven't addressed yet is the modelling of
>the vector support that is available in the target machine. The approach I
>took for now is to generate only vector operations which can be expressed
>using existing scalar tree codes (PLUS, MULT etc), after verifying that
>they are supported by checking the relevant optab at the relevant
>machine_mode (e.g, add_optab->handlers[(int) V8HImode].insn_code). If the
>value found is CODE_FOR_nothing, then there's no target support, and we
>can't vectorize the stmt. Otherwise - the stmt is transformed. The only
>target specific information I'm currently exposing is the size of the
>vector (in bytes) (through a variable in defaults.h which each target can
>define). In the case of loads/stores, for which there's no optab (Q6.1: is
>there?), I assume that the target supports any loads/store of the vector
>size (i.e., if the vector size is 16 bytes, it's possible to load/store 16
>QIs, 8 HIs, 4 SIs/SFs, etc).
>Q6.2: Sounds reasonable so far?
Yes. Though we could also consider vectorizing those loops and allowing
the expanders to break them down into serial code (or smaller vectors)
if the target doesn't directly support the width of the vectors we found.
>There are a lot of vector operations for which this approach is not
>expressive enough (e.g., a dot-product operation and reduction operations
>in general, operations that perform computations on complex numbers,
>conditional operations, and more). Q6.3: Should we add new TREE_CODES (and
>corresponding optabs?) for the new types of operations we want to express?
>Or instead, Q6.4: should the vectorizer generate calls to target specific
>builtin functions? in this case we'll need to decide (Q6.5: ) how to expose
>them to the tree level, and (Q6.6: ) how to address the problem that the
>compiler doesn't attempt to perform any optimizations around target
>specific builtins, not even to SSA their arguments; Or, Q6.7: should we
>introduce new generic builtins that are totally exposed to the compiler?
We're probably going to need new tree nodes for various things in this
space. This isn't totally unexpected.
Note that as we create new nodes for vectorizing purposes we'll need to
update the definition of the gimple grammar.
>A related issue, is how much information to expose to the tree level. We
>want the vectorizer to have enough information to decide whether it's worth
>while to vectorize; Also, some patterns can be detected and transformed
>much more easily at the tree level. Other should be left to the RTL level.
I think we want to expose as little as possible. Ideally we'd expose nothing
about the target to the tree optimizers -- in reality various things are
bound to need to leak through -- but I'd like it kept to a minimum
(some things like BRANCH_COST already leak through much to my annoyance)