This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/50067] [4.7 Regression] Wrong code with -fpredictive-commoning
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 19 Aug 2011 11:16:41 +0000
- Subject: [Bug tree-optimization/50067] [4.7 Regression] Wrong code with -fpredictive-commoning
- Auto-submitted: auto-generated
- References: <bug-50067-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50067
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |spop at gcc dot gnu.org
--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-19 11:16:41 UTC ---
I think the bug is that we have
#(Data Ref:
# bb: 3
# stmt: D.2736_10 = MEM[(int *)&a + 16B];
# ref: MEM[(int *)&a + 16B];
# base_object: a
# Access function 0: 16B
thus, an access function with a byte offset. I don't see how this couldn't
happen with the 4.5 code-base and the indirect-ref handling as well.
Thus, the code in dr_analyze_indices that pushes an access-function for
if (nest
&& (INDIRECT_REF_P (aref)
|| TREE_CODE (aref) == MEM_REF))
{
probably shouldn't do so if the base analysis
op = TREE_OPERAND (aref, 0);
access_fn = analyze_scalar_evolution (loop, op);
access_fn = instantiate_scev (before_loop, loop, access_fn);
isn't a POLYNOMIAL_CHREC. Because then we have no idea if the
steps / element sizes are compatible.
For example for
int a[256] = { 0, 0, 0, 0, 7, 9, 0, 0, };
int main()
{
int i;
for (i = 0; i < 256; ++i)
{
a[i] = (*((char(*)[8])&a[4]))[i];
}
}
we create
Creating dr for MEM[(char[8] *)&a + 16B][i_12]
analyze_innermost: success.
base_address: &a
offset from base address: 0
constant offset from base address: 16
step: 1
aligned to: 128
base_object: MEM[(char[8] *)&a]
Creating dr for a[i_12]
analyze_innermost: success.
base_address: &a
offset from base address: 0
constant offset from base address: 0
step: 4
aligned to: 128
base_object: a
(compute_affine_dependence
(stmt_a =
D.2727_4 = MEM[(char[8] *)&a + 16B][i_12];
)
(stmt_b =
a[i_12] = D.2728_5;
)
)
(Data Dep:
#(Data Ref:
# bb: 3
# stmt: D.2727_4 = MEM[(char[8] *)&a + 16B][i_12];
# ref: MEM[(char[8] *)&a + 16B][i_12];
# base_object: MEM[(char[8] *)&a];
# Access function 0: {0, +, 1}_1
# Access function 1: 16B
#)
#(Data Ref:
# bb: 3
# stmt: a[i_12] = D.2728_5;
# ref: a[i_12];
# base_object: a
# Access function 0: {0, +, 1}_1
#)
(don't know)
with a mismatched number of access functions.
On the 4.5 branch we seem to do the same and we seem to rely on the
mismatch in the number of access functions to say "don't know" for
dependence analysis. Testcase:
extern void abort (void);
int a[6] = { 0, 0, 0, 0, 7, 0 };
int
main ()
{
int i;
int (*p)[1] = &a[4];
for (i = 0; i < 4; ++i)
{
a[i + 1] = a[i + 2] > i;
(*p)[i] &= ~1;
}
if (a[4] != 0)
abort ();
return 0;
}
#(Data Ref:
# stmt: D.2727_9 = (*p_3)[i_19];
# ref: (*p_3)[i_19];
# base_object: (*(int[1] *) &a)[0];
# Access function 0: {0, +, 1}_1
# Access function 1: 16B
so the new "issue" seems to be that with invariant DRs now allowed
we have a spurious match in access functions, because with a
non-indirect ref base we do not add a access function for the base
(which would be simply 0B).
Of course then still
(Data Dep:
#(Data Ref:
# bb: 3
# stmt: D.2727_4 = MEM[(char[8] *)&a + 16B][i_12];
# ref: MEM[(char[8] *)&a + 16B][i_12];
# base_object: a
# Access function 0: {0, +, 1}_1
# Access function 1: 16B
#)
#(Data Ref:
# bb: 3
# stmt: a[i_12] = D.2728_5;
# ref: a[i_12];
# base_object: a
# Access function 0: {0, +, 1}_1
# Access function 1: 0
#)
(no dependence)
because the code assumes that access functions affect "independent"
index spaces, if you look at
int a[256] = { 0, 0, 0, 0, 7, 9, 0, 0, };
int main()
{
int i;
for (i = 0; i < 256; ++i)
{
a[i] = (*((char(*)[8])&a[4]))[i];
}
}
so it seems that rather than adding a separate access function for
the base pointer object (which could be dependent on i as well!)
we shouldn't do that at all.
Sebastian - do you remember anything about this code?
I'd simply do
Index: tree-data-ref.c
===================================================================
--- tree-data-ref.c (revision 177891)
+++ tree-data-ref.c (working copy)
@@ -862,34 +862,6 @@ dr_analyze_indices (struct data_referenc
aref = TREE_OPERAND (aref, 0);
}
- if (nest
- && (INDIRECT_REF_P (aref)
- || TREE_CODE (aref) == MEM_REF))
- {
- op = TREE_OPERAND (aref, 0);
- access_fn = analyze_scalar_evolution (loop, op);
- access_fn = instantiate_scev (before_loop, loop, access_fn);
- base = initial_condition (access_fn);
- split_constant_offset (base, &base, &off);
- if (TREE_CODE (aref) == MEM_REF)
- off = size_binop (PLUS_EXPR, off,
- fold_convert (ssizetype, TREE_OPERAND (aref, 1)));
- access_fn = chrec_replace_initial_condition (access_fn,
- fold_convert (TREE_TYPE (base), off));
-
- TREE_OPERAND (aref, 0) = base;
- VEC_safe_push (tree, heap, access_fns, access_fn);
- }
-
- if (TREE_CODE (aref) == MEM_REF)
- TREE_OPERAND (aref, 1)
- = build_int_cst (TREE_TYPE (TREE_OPERAND (aref, 1)), 0);
as that seems what is conservatively correct.
Now that will loose any index analysis on pointers ... but with
some obfuscation we can arbitrarily mix array and pointer-based
code, so ...
Relying on SCEV or access function count mismatches looks incredibly
fragile ...
For different array-views on the same array we also get bogus dependences:
int a[256] = { 0, 0, 0, 0, 7, 9, 0, 0, };
int main()
{
int i;
for (i = 0; i < 128; ++i)
{
a[i] = (*((char(*)[128])&a[0]))[i+128];
}
}
(Data Dep:
#(Data Ref:
# bb: 3
# stmt: D.2728_5 = MEM[(char[128] *)&a][D.2727_4];
# ref: MEM[(char[128] *)&a][D.2727_4];
# base_object: a
# Access function 0: {128, +, 1}_1
#)
#(Data Ref:
# bb: 3
# stmt: a[i_13] = D.2729_6;
# ref: a[i_13];
# base_object: a
# Access function 0: {0, +, 1}_1
#)
(no dependence)
so what the access_fns assume is that if ever two base objects are
the same then the access functions, in order of appearance in the
vector, are compatible (iff there is the same number of access functions).