I tried to compile Python 3.2 (r87949) with gcc (version 4.6.0 20100908) on AMD64: Python does fail with an assertion error or another strange crash. The problem comes from a loop in Python/peephole.c. Compiled with -O1, it works fine. Compiled with -O1 -ftree-vectorize, the functions generates strange (invalid) outputs. gcc-4.6 -O1: ------------------------------------- 0x0000000000480991 <+5041>: mov %eax,%edx 0x0000000000480993 <+5043>: sub %esi,%edx 0x0000000000480995 <+5045>: mov %edx,(%r12,%rax,4) 0x0000000000480999 <+5049>: movzbl 0x0(%rbp,%rax,1),%edx 0x000000000048099e <+5054>: cmp $0x9,%dl 0x00000000004809a1 <+5057>: jne 0x4809a8 <PyCode_Optimize+5064> 0x00000000004809a3 <+5059>: add $0x1,%esi 0x00000000004809a6 <+5062>: jmp 0x4809b2 <PyCode_Optimize+5074> 0x00000000004809a8 <+5064>: mov $0x3,%ecx 0x00000000004809ad <+5069>: cmp $0x59,%dl 0x00000000004809b0 <+5072>: ja 0x4809b7 <PyCode_Optimize+5079> 0x00000000004809b2 <+5074>: mov $0x1,%ecx 0x00000000004809b7 <+5079>: add %rcx,%rax 0x00000000004809ba <+5082>: cmp %rax,%rdi 0x00000000004809bd <+5085>: jg 0x480991 <PyCode_Optimize+5041> ------------------------------------- gcc-4.6 -O1 -ftree-vectorize ------------------------------------- 0x0000000000480991 <+5041>: mov %eax,%ecx 0x0000000000480993 <+5043>: sub %edx,%ecx 0x0000000000480995 <+5045>: mov %ecx,(%r12,%rax,4) 0x0000000000480999 <+5049>: movzbl 0x0(%rbp,%rax,1),%ecx 0x000000000048099e <+5054>: lea 0x1(%rdx),%esi 0x00000000004809a1 <+5057>: cmp $0x9,%cl 0x00000000004809a4 <+5060>: cmovne %edx,%esi 0x00000000004809a7 <+5063>: cmove %esi,%edx 0x00000000004809aa <+5066>: setne %cl 0x00000000004809ad <+5069>: movzbl %cl,%ecx 0x00000000004809b0 <+5072>: lea 0x1(%rax,%rcx,2),%rax 0x00000000004809b5 <+5077>: cmp %rax,%rdi 0x00000000004809b8 <+5080>: jg 0x480991 <PyCode_Optimize+5041> ------------------------------------- Extract of the correct output (-O1): ---- addrmap[0]=0 addrmap[3]=3 addrmap[4]=4 addrmap[7]=7 addrmap[10]=10 addrmap[13]=13 addrmap[16]=16 addrmap[19]=19 addrmap[22]=22 addrmap[23]=22 ---- With -O1 -ftree-vectorize, only addrmap[0] and addrmap[3] are correct: ---- addrmap[0]=0 addrmap[3]=3 addrmap[4]=0 addrmap[7]=32767 addrmap[10]=16777216 addrmap[13]=0 addrmap[16]=469314288 addrmap[19]=32767 addrmap[22]=469315151 addrmap[23]=32767 ---- See also: http://bugs.python.org/issue9880 My setup: * Intel(R) Pentium(R) 4 CPU 3.00GHz * Debian Sid * gcc (Debian 20110106-1) 4.6.0 20110106 (experimental) [trunk revision 168538] * Python 3.2 (r87949)
Created attachment 22951 [details] Example to reproduce the bug
It is caused by revision 160030: http://gcc.gnu.org/ml/gcc-cvs/2010-05/msg01089.html
Confirmed.
Mine
In this loop: for (i=0, nops=0 ; i<codelen ; i += ((codestr[i] >= 90) ? 3 : 1)) { addrmap[i] = i - nops; if (codestr[i] == 9) nops++; } it looks like this part: i += ((codestr[i] >= 90) ? 3 : 1) is miscompiled into: # i_35 = PHI <i_19(8), 0(2)> iftmp.0_4 = [cond_expr] D.2702_12 != 9 ? 3 : 1; i_19 = iftmp.0_4 + i_35; the predicate for the increment is wrongly computed.
This bug is similar to: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23115 PR23115 never has been fixed correctly... Sebastian
Not sure if it actually was so good idea to remove the edge conditions. If the bb_predicate (first_edge->src) and bb_predicate (second_edge->src) are negations of each other, then it is easy, similarly if one of the first_edge->src and second_edge->src bbs dominates the other one. But if neither of those two bbs dominates the other one, then either we have the edge conditions and can use condition bb_predicate (bb) together with one (or the other) edge conditions, or we'd need to reconstruct both conditions (we have them gimplified) and xor them together to see the actual difference in between those and see in which one of the two conditions is the delta contained. How much memory savings did http://gcc.gnu.org/ml/gcc-patches/2010-05/msg02294.html actually bring in? Perhaps we wouldn't have to store the whole conditions into edge aux fields, but instead just a boolean that would help find_phi_replacement_condition choose which of the two conditions to use (or it could be stored as two bits in the target bb's bb_predicate* structure). All I wanted to say is that especially with even far more complicated loops then this one reconstructing which condition and bb to use in find_phi_replacement_condition is going to be more difficult than finding it out earlier (during predicate_bbs most likely).
The translation in predicate_all_scalar_phis assumes that the incoming edges OR up to true: it translates a phi node x = phi (y, z) into x = a ? y : z; In the testcase of this PR, the conditions of the two edges is not ORing up to true: we have something like this: x = a ? y : x; x = (!a and b) ? z : x; this is because we have a CFG containing diamond shape, and the write into x does not happen at every execution of the loop: (a and !b) should not modify the value of x. A possible solution is to filter out these cases in predicate_bbs.
Sorry, (!a and !b) should not modify the value of x.
The BB of the phi node to be rewritten should post-dominate the BBs in which the arguments of the phi node are defined. The following gcc_assert will ICE when the translation done in predicate_scalar_phi is unsafe: diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index cc7ad8a..9a3e2b5 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -992,6 +992,7 @@ if_convertible_loop_p_1 (struct loop *loop, return false; calculate_dominance_info (CDI_DOMINATORS); + calculate_dominance_info (CDI_POST_DOMINATORS); /* Allow statements that can be handled during if-conversion. */ ifc_bbs = get_loop_body_in_if_conv_order (loop); @@ -1244,6 +1245,15 @@ predicate_scalar_phi (gimple phi, tree cond, arg_1 = gimple_phi_arg_def (phi, 1); } + gcc_assert ((TREE_CODE (arg_0) != SSA_NAME + || dominated_by_p (CDI_POST_DOMINATORS, + gimple_bb (SSA_NAME_DEF_STMT (arg_0)), + bb)) + && (TREE_CODE (arg_1) != SSA_NAME + || dominated_by_p (CDI_POST_DOMINATORS, + gimple_bb (SSA_NAME_DEF_STMT (arg_1)), + bb))); + /* Build new RHS using selected condition and arguments. */ rhs = build3 (COND_EXPR, TREE_TYPE (res), unshare_expr (cond), arg_0, arg_1);
The # iftmp.0_6 = PHI <3(5), 1(6)> PHI node certainly wouldn't trigger that assert, as it has non-SSA_NAMEs in arguments, yet there is the same problem with building that condition.
Created attachment 23106 [details] proposed fix Thanks Jakub for thinking out a counter example, what about this fix?
Can't loop->header have more than 2 predecessors without causing any problems? Other than that it might work fine.
Right, thanks for catching that: I handled it only in the first place and not in the second: gcc_assert (bb == bb->loop_father->header || bb_postdominates_preds (bb)); I am seeing that this check is redundant, as just before calling predicate_scalar_phi, we have: if (bb == loop->header) continue; As this is in a gcc_assert I don't mind the extra check, just to make it more clear when somebody reads this code again. If so, I will send the amended patch to gcc-patches.
One more nit, perhaps the bb_postdominates_preds is too expensive for release checking? It might be just gcc_checking_assert...
Author: spop Date: Tue Jan 25 14:51:23 2011 New Revision: 169233 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169233 Log: Fix PR47271: only if-convert full writes. 2011-01-25 Sebastian Pop <sebastian.pop@amd.com> Jakub Jelinek <jakub@redhat.com> PR tree-optimization/47271 * tree-if-conv.c (bb_postdominates_preds): New. (if_convertible_bb_p): Call bb_postdominates_preds. (if_convertible_loop_p_1): Compute CDI_POST_DOMINATORS. (predicate_scalar_phi): Call bb_postdominates_preds. * gcc.dg/tree-ssa/ifc-pr47271.c: New. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr47271.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-if-conv.c
Fixed.
Author: dnovillo Date: Wed Feb 2 17:46:33 2011 New Revision: 169581 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169581 Log: Fix PR47271: only if-convert full writes. 2011-01-25 Sebastian Pop <sebastian.pop@amd.com> Jakub Jelinek <jakub@redhat.com> PR tree-optimization/47271 * tree-if-conv.c (bb_postdominates_preds): New. (if_convertible_bb_p): Call bb_postdominates_preds. (if_convertible_loop_p_1): Compute CDI_POST_DOMINATORS. (predicate_scalar_phi): Call bb_postdominates_preds. * gcc.dg/tree-ssa/ifc-pr47271.c: New. Added: branches/google/integration/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr47271.c Modified: branches/google/integration/gcc/ChangeLog branches/google/integration/gcc/testsuite/ChangeLog branches/google/integration/gcc/tree-if-conv.c