Bug 49616 - REGRESSION vectorization fails in case of runtime dimensioned vector
Summary: REGRESSION vectorization fails in case of runtime dimensioned vector
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-03 09:59 UTC by vincenzo Innocente
Modified: 2012-05-18 09:39 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-07-03 11:30:06


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vincenzo Innocente 2011-07-03 09:59:37 UTC
4.7 fails while 4.6.1 succeed.
Test case at the end of the message: I was not able to reduce it more (the one that fails is at the bottom)

gcc version 4.7.0 20110702 (experimental) (GCC) 
c++ -O3 -std=c++0x -ftree-vectorizer-verbose=2 -c vectBug.cc 

vectBug.cc:68: note: LOOP VECTORIZED.
vectBug.cc:68: note: LOOP VECTORIZED.
vectBug.cc:71: note: vectorized 2 loops in function.

vectBug.cc:35: note: not vectorized: data ref analysis failed *bs$__b_46[k_320] = D.2547_242;

vectBug.cc:48: note: not vectorized: data ref analysis failed *bs$__b_46[k_187] = D.2556_271;

vectBug.cc:45: note: not vectorized: data ref analysis failed *bs$__b_46[k_70] = D.2553_262;

vectBug.cc:40: note: not vectorized: data ref analysis failed *bs$__b_46[k_317] = D.2550_251;

vectBug.cc:35: note: LOOP VECTORIZED.
vectBug.cc:48: note: LOOP VECTORIZED.
vectBug.cc:45: note: LOOP VECTORIZED.
vectBug.cc:40: note: LOOP VECTORIZED.
vectBug.cc:16: note: LOOP VECTORIZED.
vectBug.cc:28: note: LOOP VECTORIZED.
vectBug.cc:25: note: LOOP VECTORIZED.
vectBug.cc:20: note: LOOP VECTORIZED.
vectBug.cc:16: note: LOOP VECTORIZED.
vectBug.cc:28: note: LOOP VECTORIZED.
vectBug.cc:25: note: LOOP VECTORIZED.
vectBug.cc:20: note: LOOP VECTORIZED.
vectBug.cc:94: note: vectorized 12 loops in function.

while
c++ -O3 -std=c++0x -ftree-vectorizer-verbose=2 -c vectBug.cc -DFIXED

vectBug.cc:68: note: LOOP VECTORIZED.
vectBug.cc:68: note: LOOP VECTORIZED.
vectBug.cc:71: note: vectorized 2 loops in function.

vectBug.cc:35: note: LOOP VECTORIZED.
vectBug.cc:48: note: LOOP VECTORIZED.
vectBug.cc:45: note: LOOP VECTORIZED.
vectBug.cc:40: note: LOOP VECTORIZED.
vectBug.cc:35: note: LOOP VECTORIZED.
vectBug.cc:48: note: LOOP VECTORIZED.
vectBug.cc:45: note: LOOP VECTORIZED.
vectBug.cc:40: note: LOOP VECTORIZED.
vectBug.cc:16: note: LOOP VECTORIZED.
vectBug.cc:28: note: LOOP VECTORIZED.
vectBug.cc:25: note: LOOP VECTORIZED.
vectBug.cc:20: note: LOOP VECTORIZED.
vectBug.cc:16: note: LOOP VECTORIZED.
vectBug.cc:28: note: LOOP VECTORIZED.
vectBug.cc:25: note: LOOP VECTORIZED.
vectBug.cc:20: note: LOOP VECTORIZED.
vectBug.cc:94: note: vectorized 16 loops in function.

and
gcc version 4.6.1 20110520 (prerelease) (GCC) 
c++ -O3 -std=c++0x -ftree-vectorizer-verbose=2 -c vectBug.cc

vectBug.cc:68: note: LOOP VECTORIZED.
vectBug.cc:68: note: LOOP VECTORIZED.
vectBug.cc:71: note: vectorized 2 loops in function.

vectBug.cc:35: note: LOOP VECTORIZED.
vectBug.cc:48: note: LOOP VECTORIZED.
vectBug.cc:45: note: LOOP VECTORIZED.
vectBug.cc:40: note: LOOP VECTORIZED.
vectBug.cc:35: note: LOOP VECTORIZED.
vectBug.cc:48: note: LOOP VECTORIZED.
vectBug.cc:45: note: LOOP VECTORIZED.
vectBug.cc:40: note: LOOP VECTORIZED.
vectBug.cc:16: note: LOOP VECTORIZED.
vectBug.cc:28: note: LOOP VECTORIZED.
vectBug.cc:25: note: LOOP VECTORIZED.
vectBug.cc:20: note: LOOP VECTORIZED.
vectBug.cc:16: note: LOOP VECTORIZED.
vectBug.cc:28: note: LOOP VECTORIZED.
vectBug.cc:25: note: LOOP VECTORIZED.
vectBug.cc:20: note: LOOP VECTORIZED.
vectBug.cc:94: note: vectorized 16 loops in function.

test case

cat vectBug.cc 

const int arraySize=512;

struct Bar {
  
  int __attribute__ ((aligned(16))) c[arraySize];
  int last;

  Bar() : last(0) { refresh();}

  void refresh();

  void loop0(int N, float * f) {
    int k=0;
    int lead = arraySize-last;
    if (N<=lead) {
      for (int i=0; i!=N; ++i) f[k++] = c[last++];
      return;
    }
    
    for (int i=last; i!=arraySize; ++i)  f[k++] = c[i];
    int outLoop = (N-lead)/arraySize;
    last = N -lead -  outLoop*arraySize;
    for (int j=0; j!=outLoop; ++j)  {
      refresh();
      for (int i=0; i!=arraySize; ++i) f[k++] = c[i];
    }
    refresh();
    for (int i=0; i!=last; ++i) f[k++] = c[i];
  }

  template<typename F>
  void loop(int N, F f) {
    int lead = arraySize-last;
    if (N<=lead) {
      for (int i=0; i!=N; ++i) f(c[last+i]);
      last +=N;
      return;
    }
    
    for (int i=last; i!=arraySize; ++i)  f(c[i]);
    int outLoop = (N-lead)/arraySize;
    last = N -lead -  outLoop*arraySize;
    for (int j=0; j!=outLoop; ++j)  {
      refresh();
      for (int i=0; i!=arraySize; ++i) f(c[i]);
    }
    refresh();
    for (int i=0; i!=last; ++i) f(c[i]);
  }

};


float __attribute__ ((aligned(16))) z[4096];
void refresh();
int j=0;


void fun(float const *, float const *, int); 


template<typename F>
inline void loop(int N, F f) {
  if (j+N>4096) {
    j=0;
    refresh();
  }
  for (int i=0; i!=N; ++i) f(z[j++]);
}

void foo(int N) {
  float __attribute__ ((aligned(16))) x[N];
  float __attribute__ ((aligned(16))) y[N];
  int k=0;
  auto xs = [&x, &k](float r) { x[k++]= 1.5f*r;};
  auto ys = [&y, &k](float r) { y[k++]= r+1.f;};


  k=0;
  loop(N,xs);
  // for (int i=0; i!=N; ++i) xs(z[j++]);
    // x[k++] = z[j++];

  k=0;
  loop(N,ys);

  //  for (int i=0; i!=N; ++i) ys(z[j++]);
  //    y[k++] = z[j++];

  fun(x,y,N);
}


void load(int N) {

float __attribute__ ((aligned(16))) a[N];
#ifndef FIXED
float __attribute__ ((aligned(16))) b[N];
#else
float __attribute__ ((aligned(16))) b[1024];
#endif

  static Bar bar;


  bar.loop0(N,a);
  bar.loop0(N,b);
  fun(a,b,N);

  

  int k=0;
  auto as = [&a, &k](float r) { a[k++]= 1.5f*r;};
  auto bs = [&b, &k](float r) { b[k++]= r+1.f;};

  k=0;
  bar.loop(N,as);
  k=0;
  bar.loop(N,bs);   // <=== this fail (all others ok)
  

  fun(a,b,N);

}
Comment 1 Dominique d'Humieres 2011-07-03 11:30:06 UTC
It seems due to revision 172430:

Author:	hubicka
Date:	Thu Apr 14 13:26:44 2011 UTC (2 months, 2 weeks ago)
Changed paths:	16
Log Message:	
	* cgraph.c (dump_cgraph_node): Do not dump inline summaries.
	* cgraph.h (struct inline_summary): Move to ipa-inline.h
	(cgraph_local_info): Remove inline_summary.
	* ipa-cp.c: Include ipa-inline.h.
	(ipcp_cloning_candidate_p, ipcp_estimate_growth,
	ipcp_estimate_cloning_cost, ipcp_insert_stage): Use inline_summary
	accesor.
	* lto-cgraph.c (lto_output_node): Do not stream inline summary.
	(input_overwrite_node): Do not set inline summary.
	(input_node): Do not stream inline summary.
	* ipa-inline.c (cgraph_decide_inlining): Dump inline summaries.
	(cgraph_decide_inlining_incrementally): Do not try to estimate overall
	growth; we do not have inline parameters computed for that anyway.
	(cgraph_early_inlining): After inlining compute call_stmt_sizes.
	* ipa-inline.h (struct inline_summary): Move here from ipa-inline.h
	(inline_summary_t): New type and VECtor.
	(debug_inline_summary, dump_inline_summaries): Declare.
	(inline_summary): Use VOCtor.
	(estimate_edge_growth): Kill hack computing call stmt size directly.
	* lto-section-in.c (lto_section_name): Add inline section.
	* ipa-inline-analysis.c: Include lto-streamer.h
	(node_removal_hook_holder, node_duplication_hook_holder): New holders
	(inline_node_removal_hook, inline_node_duplication_hook): New functions.
	(inline_summary_vec): Define.
	(inline_summary_alloc, dump_inline_summary, debug_inline_summary,
	dump_inline_summaries): New functions.
	(estimate_function_body_sizes): Properly compute size/time of outgoing calls.
	(compute_inline_parameters): Alloc inline_summary; do not compute size/time
	of incomming calls.
	(estimate_edge_time): Avoid missing time summary hack.
	(inline_read_summary): Read inline summary info.
	(inline_write_summary): Write inline summary info.
	(inline_free_summary): Free all hooks and inline summary vector.
	* lto-streamer.h: Add LTO_section_inline_summary section.
	* Makefile.in (ipa-cp.o, ipa-inline-analysis.o): Update dependencies.
	* ipa.c (cgraph_remove_unreachable_nodes): Fix dump file formating.

	* lto.c: Include ipa-inline.h
	(add_cgraph_node_to_partition, undo_partition): Use inline_summary accessor.
	(ipa_node_duplication_hook): Fix declaration.
	* Make-lang.in (lto.o): Update dependencies.

With revision 172429 I get

....
pr49616.cc:94: note: vectorized 16 loops in function.

but

...
pr49616.cc:94: note: vectorized 12 loops in function.

with revision 172430.
Comment 2 vincenzo Innocente 2012-05-18 09:39:24 UTC
now ok in
gcc version 4.7.1 20120517 (prerelease) [gcc-4_7-branch revision 187624] (GCC) 
and
gcc version 4.8.0 20120509 (experimental) [trunk revision 187326] (GCC)