User account creation filtered due to spam.

Bug 50698 - pretending to create versioning for alias when not required
Summary: pretending to create versioning for alias when not required
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 minor
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks:
 
Reported: 2011-10-12 08:08 UTC by vincenzo Innocente
Modified: 2011-10-17 09:26 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-10-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vincenzo Innocente 2011-10-12 08:08:23 UTC
in the example below the vectorizer reports "versioning for alias" for function sumS (not for sum2).  The compiler has all information to determine that no aliasing can happen and indeed the generated code eventually does not contain any run time check!
One consequence is that in some similar real-life cases I'm now obliged to set --param vect-max-version-for-alias-checks to values as large as 100.

c++ -Ofast -ftree-vectorizer-verbose=1 -c Arena.cpp -mtune=corei7  --param vect-max-version-for-alias-checks=10

Analyzing loop at Arena.cpp:14
Vectorizing loop at Arena.cpp:14
14: LOOP VECTORIZED.
Arena.cpp:18: note: vectorized 1 loops in function.
Analyzing loop at Arena.cpp:14
Vectorizing loop at Arena.cpp:14
14: created 2 versioning for alias checks.
14: LOOP VECTORIZED.
Arena.cpp:22: note: vectorized 1 loops in function.

if I set --param vect-max-version-for-alias-checks=1 it will not vectorize sumS.

At the end I do not see any runtime check in the generated code!
otool -v -t -V -X Arena.o
__Z4sum2v:
	leaq	_mem+0x00001000(%rip),%rax
	leaq	_mem+0x00002000(%rip),%rdx
	nop
	movaps	0x00001000(%rax),%xmm0
	addps	(%rax),%xmm0
	movaps	%xmm0,0xfffff000(%rax)
	addq	$0x10,%rax
	cmpq	%rdx,%rax
	jne	0x00000010
	repz/ret
	nopl	__Z4sum2v(%rax)
__Z4sumSv:
	leaq	_mem+0x00001000(%rip),%rax
	leaq	_mem+0x00002000(%rip),%rdx
	nop
	movaps	0x00001000(%rax),%xmm0
	addps	(%rax),%xmm0
	movaps	%xmm0,0xfffff000(%rax)
	addq	$0x10,%rax
	cmpq	%rdx,%rax
	jne	0x00000040
	repz/ret



float mem[4096];
const int N=1024;

struct XYZ {
  float * mem;
  int n;
  float * x() { return mem;}
  float * y() { return x()+n;}
  float * z() { return y()+n;}
};

inline
void sum(float * x, float * y, float * z, int n) {
  for (int i=0;i!=n; ++i)
    x[i]=y[i]+z[i];
}

void sum2() {
  sum(mem,mem+N,mem+2*N,N);
}

void sumS() {
  XYZ xyz; xyz.mem=mem; xyz.n=N;
  sum(xyz.x(),xyz.y(),xyz.z(),xyz.n);
}
Comment 1 Richard Biener 2011-10-12 11:31:56 UTC
Confirmed:

14: versioning for alias required: can't determine dependence between *D.2220_26 and *D.2221_25
14: mark for run-time aliasing test between *D.2220_26 and *D.2221_25

  D.2222_24 = D.2223_23 * 4;
  D.2221_25 = &mem + D.2222_24;
  D.2220_26 = &MEM[(void *)&mem + 4096B] + D.2222_24;
  ... = *D.2220_26;
  *D.2221_25 = ...;

  (res = {(float *) &MEM[(void *)&mem + 4096B], +, 4}_1))
        base_address: &MEM[(void *)&mem + 4096B]
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *(float *) &MEM[(void *)&mem + 4096B]
        Access function 0: {0B, +, 4}_1

vs.

  (res = {(float *) &mem, +, 4}_1))
        base_address: &mem
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *(float *) &mem
        Access function 0: {0B, +, 4}_1

I will have a look.
Comment 2 Richard Biener 2011-10-12 14:40:55 UTC
Patch:

Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c (revision 179849)
+++ gcc/tree-data-ref.c (working copy)
@@ -589,9 +589,6 @@ split_constant_offset_1 (tree type, tree
        int punsignedp, pvolatilep;
 
        op0 = TREE_OPERAND (op0, 0);
-       if (!handled_component_p (op0))
-         return false;
-
        base = get_inner_reference (op0, &pbitsize, &pbitpos, &poffset,
                                    &pmode, &punsignedp, &pvolatilep, false);
Comment 3 vincenzo Innocente 2011-10-12 17:00:40 UTC
the patch work for me.
Does not seem to break anything else.
thanks
Comment 4 Richard Biener 2011-10-13 09:00:17 UTC
Author: rguenth
Date: Thu Oct 13 09:00:01 2011
New Revision: 179895

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179895
Log:
2011-10-13  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/50698
	* tree-data-ref.c (split_constant_offset_1): Also process
	offsets of &MEM.

	* g++.dg/vect/pr50698.cc: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/vect/pr50698.cc
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-data-ref.c
Comment 5 Richard Biener 2011-10-13 09:00:51 UTC
Fixed.
Comment 6 vincenzo Innocente 2011-10-15 13:40:31 UTC
I now moved to a more realistic case that can be reduced to this: 

void loop(float *  x, int n) {
  for (int i=0;i!=n; ++i)
    x[i]=x[i+n]+x[i+2*n];
}


and it creates aliases even if the memory region are clearly disjoint:
(used gcc version 4.7.0 20111015 (experimental) (GCC) )
keep here or open an other "enhancement request"?

2: versioning for alias required: can't determine dependence between *D.2199_12 and *D.2195_8
2: mark for run-time aliasing test between *D.2199_12 and *D.2195_8
2: versioning for alias required: can't determine dependence between *D.2205_18 and *D.2195_8
2: mark for run-time aliasing test between *D.2205_18 and *D.2195_8
2: Vectorizing an unaligned access.
2: Vectorizing an unaligned access.
2: Vectorizing an unaligned access.
2: vect_model_load_cost: unaligned supported by hardware.
2: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
2: vect_model_load_cost: unaligned supported by hardware.
2: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
2: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
2: vect_model_store_cost: unaligned supported by hardware.
2: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
2: cost model: Adding cost of checks for loop versioning aliasing.

2: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
2: Cost model analysis: 
  Vector inside of loop cost: 7
  Vector outside of loop cost: 19
  Scalar iteration cost: 4
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 2
  Calculated minimum iters for profitability: 7

2:   Profitability threshold = 6


Vectorizing loop at Arena.cpp:2

2: Profitability threshold is 6 loop iterations.
2: create runtime check for data references *D.2199_12 and *D.2195_8
2: create runtime check for data references *D.2205_18 and *D.2195_8
2: created 2 versioning for alias checks.

2: LOOP VECTORIZED.
Arena.cpp:1: note: vectorized 1 loops in function.
Comment 7 rguenther@suse.de 2011-10-17 09:26:05 UTC
On Sat, 15 Oct 2011, vincenzo.innocente at cern dot ch wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50698
> 
> --- Comment #6 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2011-10-15 13:40:31 UTC ---
> I now moved to a more realistic case that can be reduced to this: 
> 
> void loop(float *  x, int n) {
>   for (int i=0;i!=n; ++i)
>     x[i]=x[i+n]+x[i+2*n];
> }
> 
> 
> and it creates aliases even if the memory region are clearly disjoint:
> (used gcc version 4.7.0 20111015 (experimental) (GCC) )
> keep here or open an other "enhancement request"?

The problem is that x[i] and x[i+n] may alias for n == 0.  So this
is a completely different issue - that we miss to account for the
fact that n is the loop bound for the induction variable i and that
because i is signed, n has to be >= 0.  Still we won't be able to
compute a meaningful distance vector, as it depends on n, thus
we have to version the loop anyway (the distance vector is n and 2 * n).

Thus I'd say open a separate enhancement request stating that we need
to handle non-constant distance vectors in a better way (do not hold
your breath).