Bug 56688 - static/saved variables prevent loop vectorization.
Summary: static/saved variables prevent loop vectorization.
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2013-03-22 10:48 UTC by Yuri Rumyantsev
Modified: 2016-07-22 10:35 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-03-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Rumyantsev 2013-03-22 10:48:29 UTC
Analyzing gcc vectorization on 200.sixtrack from spec2000 suite we found out that only 6 loops are vectorized in the hottest routine (97% run time). The reason is that save statement is used. This issue can be illustrated by the following simple example:

	subroutine bar
	implicit real*8 (a-h,o-z)
	parameter (n=700)
	common/my_data/ x1(n), y1(n), z1(n), t1(n)
        save
	do i=1,n
	x = x1(i) - y1(i)
	z1(i) = t1(i) * x
	enddo
	end

and vectorizer issues the following message:

t1.f:6: note: ==> examining statement: _6 = my_data.x1[_5];

t1.f:6: note: num. args = 4 (not unary/binary/ternary op).
t1.f:6: note: vect_is_simple_use: operand my_data.x1[_5]
t1.f:6: note: not ssa-name.
t1.f:6: note: use not simple.
t1.f:6: note: vect_model_load_cost: aligned.
t1.f:6: note: vect_model_load_cost: inside_cost = 1, prologue_cost = 0 .
t1.f:6: note: vect_is_simple_use: operand my_data.x1
t1.f:6: note: not ssa-name.
t1.f:6: note: use not simple.
t1.f:6: note: not vectorized: live stmt not supported: _6 = my_data.x1[_5];

Note also if we comment down svae stmt loop will be vectorized.
Comment 1 Richard Biener 2013-03-22 13:29:57 UTC
The issue is that x is kept live by applying store-motion:

  <bb 3>:
  # prephitmp_26 = PHI <1(2), i.3_14(4)>
  # ivtmp_30 = PHI <700(2), ivtmp_15(4)>
  _5 = (integer(kind=8)) prephitmp_26;
  _6 = _5 + -1;
  _7 = my_data.x1[_6];
  _8 = my_data.y1[_6];
  x.1_9 = _7 - _8;
  _11 = my_data.t1[_6];
  _12 = x.1_9 * _11;
  my_data.z1[_6] = _12;
  i.3_14 = prephitmp_26 + 1;
  ivtmp_15 = ivtmp_30 - 1;
  if (ivtmp_15 == 0)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 4>:
  goto <bb 3>;

  <bb 5>:
  # x_lsm.7_25 = PHI <x.1_9(3)>
  x = x_lsm.7_25;
  i = 701;
  return;

because it appears that 'save' makes all variables global ones.  This kind
of "reduction" is not handled by the vectorizer.  If would be handled
by a pass that re-materializes x_lsm.7_25 from memory and operations
after the loop.  Or by handling the "final" value properly by means
of vector extraction or in the epilogue loop, simply using it, or
forcing at least one iteration of the epilogue loop by adjusting the
number of iterations of the vectorized loop.

I like the last option most ;)
Comment 2 Richard Biener 2013-03-22 13:31:23 UTC
C testcase:

int x[1024], y[1024];
int z;
void foo (void)
{
  unsigned i;
  for (i = 0; i < 1024; ++i)
    {
      z = x[i] - y[i];
      x[i] = z;
    }
}
Comment 3 Joost VandeVondele 2013-03-22 14:16:47 UTC
(In reply to comment #1)
> because it appears that 'save' makes all variables global ones.  

But this is maybe a frontend issue ? The visibility of x is local to the this subroutine, but its lifetime extends over the entire run (so different to your variable z in the C testcase).
Comment 4 Richard Biener 2013-03-22 14:24:14 UTC
(In reply to comment #3)
> (In reply to comment #1)
> > because it appears that 'save' makes all variables global ones.  
> 
> But this is maybe a frontend issue ? The visibility of x is local to the this
> subroutine, but its lifetime extends over the entire run (so different to your
> variable z in the C testcase).

No, same for

int x[1024], y[1024];
void foo (void)
{
  static int z;
  unsigned i;
  for (i = 0; i < 1024; ++i)
    {
      z = x[i] - y[i];
      x[i] = z;
    }
}

(missed optimization is that the variable and the store to it is not
removed completely).
Comment 5 Richard Biener 2013-03-22 14:25:15 UTC
Testcase for that:

void foo(int i)
{
  static int x;
  x = i;
}
Comment 6 Alan Lawrence 2015-06-12 14:14:50 UTC
(In reply to Richard Biener from comment #4)

The C testcase vectorizes on gcc 6 development (at -O3 on aarch64 or x86_64)....
Comment 7 Yuri Rumyantsev 2016-07-20 15:09:48 UTC
I checked that GCC 7 compiler still does not vectorize loops in thin6d function which is the only hottest function in 200.sixtrack benchmark.
Comment 8 Yuri Rumyantsev 2016-07-22 10:35:00 UTC
I checked that if we comment down 'save' stmt in thin6d.f all loops will be vectorized:
grep -c 'LOOP VECTORIZED' thin6d.f.149t.vect 32