Bug 22184 - tree vectorizer depends on context
Summary: tree vectorizer depends on context
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.1.0
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2005-06-25 15:13 UTC by Michael Cieslinski
Modified: 2012-07-13 14:02 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-12-30 22:24:41


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Cieslinski 2005-06-25 15:13:17 UTC
Consider the following short program. If it is compiled MyFunc1 is vectorized 
but MyFunc2 is not. Both fuctions differ only in the empty loop, which is a 
comment in MyMunc2. My compiler is the latest gcc41 snapshot (20050618)

Michael Cieslinski


double MyFunc1 (int size)
{
    int len = size + 1;
    double Data[16] = {0};
//    for (int i=3; i<len; i++) {}  // empty loop

    for (int i=0; i<len; i++)
        Data[i] = Data[i]>=0 ? Data[i] : -Data[i];

    return Data[1];
}

double MyFunc2 (int size)
{
    int len = size + 1;
    double Data[16] = {0};
    for (int i=3; i<len; i++) {}  // empty loop

    for (int i=0; i<len; i++)
        Data[i] = Data[i]>=0 ? Data[i] : -Data[i];

    return Data[1];
}

Output from gcc:
g++41c -O2 -ftree-vectorize -c vec-test.cpp -ftree-vectorizer-verbose=5

vec-test.cpp:7: note: accesses have the same alignment.
vec-test.cpp:7: note: LOOP VECTORIZED.
vec-test.cpp:1: note: vectorized 1 loops in function.

vec-test.cpp:17: note: not vectorized: unsupported data-type
vec-test.cpp:19: note: not vectorized: number of iterations cannot be computed.
vec-test.cpp:13: note: vectorized 0 loops in function.

g++41c -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.1-20050618/configure --prefix=/usr/local/gcc41c --
program-suffix=41c --with-arch=opteron --enable-languages=c,c++ --enable-
checking
Thread model: posix
gcc version 4.1.0 20050618 (experimental)
Comment 1 Andrew Pinski 2005-06-25 18:28:42 UTC
Confirmed, I think this is a bug in scaler evolution though.
Comment 2 Ira Rosen 2005-07-07 07:47:45 UTC
The problem occurs in decision whether the number of loop iterations is greater 
than zero. The (single) predecessor edge is checked for being EDGE_TRUE_VALUE 
or EDGE_FALSE_VALUE, and the corresponding predicate is used to make the 
decision. 

In the first case (single loop)  BB 0 contains predicate 'len>0', its TRUE 
successor is BB 1, and the fallthru successor of BB 1 is BB 2 - the loop. The 
condition to check is 'len <= 0', which is therefore simplified to FALSE. 

In the second case, however, the control flow is more complicated. The loop is 
in BB 6, its predecessor is BB 3, which has 2 predecessors: BB 5 (with 
predicate 'len > 0'), and BB 2 - the first loop. The first loop is also guarded 
by 'len  > 0', but this information is not propagated.
Comment 3 Zdenek Dvorak 2005-07-07 10:52:43 UTC
More precisely, the code when it comes to loop optimizer looks basically as

if (len > 3)
  something;
else if (len > 0)
  something_else;
else
  return;

for (i = 0; i < len; i++)
  whatever;

So indeed, len > 0 on each path that reaches the loop, but it is not trivial to 
deduce.

One solution is to use the results of VRP here. However, VRP uses scev 
analysis, and adding yet another cycle into the current completely 
incomprehensible nest of dependences between # of iterations analysis and SCEV 
seems quite scary to me (and there are other technical difficulties with this 
solution -- running VRP twice is bad from efficiency reasons, but keeping 
results of VRP valid is not entirely trivial and may prevent some 
optimizations).

The other possibility is to extend the current oracle-like approach (# of 
iterations analysis basically asks whether a given condition -- len > 0 in this 
case -- is true, and answer is computed on-demand by traversing SSA and 
dominance tree) to handle also this case.  This would however need to have the 
ASSERT_EXPRs (so we either would need to insert them before loop optimizer, or 
keep them valid since VRP, neither of which seems to be good for performance 
and memory consumption).
Comment 4 Richard Biener 2012-07-13 08:36:08 UTC
Link to vectorizer missed-optimization meta-bug.
Comment 5 Richard Biener 2012-07-13 14:02:17 UTC
This is fixed, even if I disable VRP / jump-threading and substitute the info
from comment #3 in the code:

double MyFunc2 (int size)
{
  int len = size + 1;
  double Data[16] = {0};
if (len > 3)
  ;
else if (len > 0)
  ;
else
  return 0;

  for (int i=0; i<len; i++)
    Data[i] = Data[i]>=0 ? Data[i] : -Data[i];

  return Data[1];
}