Bug 47271 - [4.6 Regression] if-conversion removes a test (if), the function generates invalid outputs
Summary: [4.6 Regression] if-conversion removes a test (if), the function generates in...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P1 critical
Target Milestone: 4.6.0
Assignee: Sebastian Pop
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2011-01-12 18:23 UTC by Victor Stinner
Modified: 2011-02-02 17:46 UTC (History)
9 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-01-12 21:34:17


Attachments
Example to reproduce the bug (449 bytes, text/plain)
2011-01-12 18:54 UTC, Victor Stinner
Details
proposed fix (1.91 KB, patch)
2011-01-24 19:57 UTC, Sebastian Pop
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Victor Stinner 2011-01-12 18:23:51 UTC
I tried to compile Python 3.2 (r87949) with gcc (version 4.6.0 20100908) on AMD64: Python does fail with an assertion error or another strange crash. The problem comes from a loop in Python/peephole.c. Compiled with -O1, it works fine. Compiled with -O1 -ftree-vectorize, the functions generates strange (invalid) outputs.

gcc-4.6 -O1:
-------------------------------------
0x0000000000480991 <+5041>:  mov    %eax,%edx
0x0000000000480993 <+5043>:  sub    %esi,%edx
0x0000000000480995 <+5045>:  mov    %edx,(%r12,%rax,4)
0x0000000000480999 <+5049>:  movzbl 0x0(%rbp,%rax,1),%edx
0x000000000048099e <+5054>:  cmp    $0x9,%dl
0x00000000004809a1 <+5057>:  jne    0x4809a8 <PyCode_Optimize+5064>
0x00000000004809a3 <+5059>:  add    $0x1,%esi
0x00000000004809a6 <+5062>:  jmp    0x4809b2 <PyCode_Optimize+5074>
0x00000000004809a8 <+5064>:  mov    $0x3,%ecx
0x00000000004809ad <+5069>:  cmp    $0x59,%dl
0x00000000004809b0 <+5072>:  ja     0x4809b7 <PyCode_Optimize+5079>
0x00000000004809b2 <+5074>:  mov    $0x1,%ecx
0x00000000004809b7 <+5079>:  add    %rcx,%rax
0x00000000004809ba <+5082>:  cmp    %rax,%rdi
0x00000000004809bd <+5085>:  jg     0x480991 <PyCode_Optimize+5041>
-------------------------------------

gcc-4.6 -O1 -ftree-vectorize
-------------------------------------
0x0000000000480991 <+5041>:  mov    %eax,%ecx
0x0000000000480993 <+5043>:  sub    %edx,%ecx
0x0000000000480995 <+5045>:  mov    %ecx,(%r12,%rax,4)
0x0000000000480999 <+5049>:  movzbl 0x0(%rbp,%rax,1),%ecx
0x000000000048099e <+5054>:  lea    0x1(%rdx),%esi
0x00000000004809a1 <+5057>:  cmp    $0x9,%cl
0x00000000004809a4 <+5060>:  cmovne %edx,%esi
0x00000000004809a7 <+5063>:  cmove  %esi,%edx
0x00000000004809aa <+5066>:  setne  %cl
0x00000000004809ad <+5069>:  movzbl %cl,%ecx
0x00000000004809b0 <+5072>:  lea    0x1(%rax,%rcx,2),%rax
0x00000000004809b5 <+5077>:  cmp    %rax,%rdi
0x00000000004809b8 <+5080>:  jg     0x480991 <PyCode_Optimize+5041>
-------------------------------------

Extract of the correct output (-O1):
----
addrmap[0]=0
addrmap[3]=3
addrmap[4]=4
addrmap[7]=7
addrmap[10]=10
addrmap[13]=13
addrmap[16]=16
addrmap[19]=19
addrmap[22]=22
addrmap[23]=22
----

With -O1 -ftree-vectorize, only addrmap[0] and addrmap[3] are correct:
----
addrmap[0]=0
addrmap[3]=3
addrmap[4]=0
addrmap[7]=32767
addrmap[10]=16777216
addrmap[13]=0
addrmap[16]=469314288
addrmap[19]=32767
addrmap[22]=469315151
addrmap[23]=32767
----

See also:
http://bugs.python.org/issue9880

My setup:
 * Intel(R) Pentium(R) 4 CPU 3.00GHz
 * Debian Sid
 * gcc (Debian 20110106-1) 4.6.0 20110106 (experimental) [trunk revision
168538] 
 * Python 3.2 (r87949)
Comment 1 Victor Stinner 2011-01-12 18:54:32 UTC
Created attachment 22951 [details]
Example to reproduce the bug
Comment 2 H.J. Lu 2011-01-12 19:42:38 UTC
It is caused by revision 160030:

http://gcc.gnu.org/ml/gcc-cvs/2010-05/msg01089.html
Comment 3 Richard Biener 2011-01-12 21:34:17 UTC
Confirmed.
Comment 4 Sebastian Pop 2011-01-12 22:40:37 UTC
Mine
Comment 5 Sebastian Pop 2011-01-18 18:45:27 UTC
In this loop:

  for (i=0, nops=0 ; i<codelen ; i += ((codestr[i] >= 90) ? 3 : 1))
    {
      addrmap[i] = i - nops;
      if (codestr[i] == 9)
	nops++;
    }

it looks like this part: i += ((codestr[i] >= 90) ? 3 : 1)
is miscompiled into:

  # i_35 = PHI <i_19(8), 0(2)>
  iftmp.0_4 = [cond_expr] D.2702_12 != 9 ? 3 : 1;
  i_19 = iftmp.0_4 + i_35;

the predicate for the increment is wrongly computed.
Comment 6 Sebastian Pop 2011-01-18 21:18:04 UTC
This bug is similar to:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23115

PR23115 never has been fixed correctly...

Sebastian
Comment 7 Jakub Jelinek 2011-01-24 18:02:45 UTC
Not sure if it actually was so good idea to remove the edge conditions.
If the bb_predicate (first_edge->src) and bb_predicate (second_edge->src) are
negations of each other, then it is easy, similarly if one of the first_edge->src
and second_edge->src bbs dominates the other one.
But if neither of those two bbs dominates the other one, then either we have the edge conditions and can use condition bb_predicate (bb) together with one (or the other) edge conditions, or we'd need to reconstruct both conditions (we have them gimplified) and xor them together to see the actual difference in between those and see in which one of the two conditions is the delta contained.
How much memory savings did
http://gcc.gnu.org/ml/gcc-patches/2010-05/msg02294.html
actually bring in?
Perhaps we wouldn't have to store the whole conditions into edge aux fields, but instead just a boolean that would help find_phi_replacement_condition choose which of the two conditions to use (or it could be stored as two bits in the target bb's bb_predicate* structure).  All I wanted to say is that especially with even far more complicated loops then this one reconstructing which condition and bb to use in find_phi_replacement_condition is going to be more difficult than finding it out earlier (during predicate_bbs most likely).
Comment 8 Sebastian Pop 2011-01-24 18:34:38 UTC
The translation in predicate_all_scalar_phis assumes that the incoming
edges OR up to true: it translates a phi node

x = phi (y, z)

into

x = a ? y : z;

In the testcase of this PR, the conditions of the two edges is not
ORing up to true: we have something like this:

x = a ? y : x;
x = (!a and b) ? z : x;

this is because we have a CFG containing diamond shape, and the write
into x does not happen at every execution of the loop: (a and !b)
should not modify the value of x.

A possible solution is to filter out these cases in predicate_bbs.
Comment 9 Sebastian Pop 2011-01-24 18:38:13 UTC
Sorry, (!a and !b) should not modify the value of x.
Comment 10 Sebastian Pop 2011-01-24 18:55:48 UTC
The BB of the phi node to be rewritten should post-dominate the BBs in which the arguments of the phi node are defined.  The following gcc_assert will ICE when the translation done in predicate_scalar_phi is unsafe:

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index cc7ad8a..9a3e2b5 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -992,6 +992,7 @@ if_convertible_loop_p_1 (struct loop *loop,
     return false;
 
   calculate_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_POST_DOMINATORS);
 
   /* Allow statements that can be handled during if-conversion.  */
   ifc_bbs = get_loop_body_in_if_conv_order (loop);
@@ -1244,6 +1245,15 @@ predicate_scalar_phi (gimple phi, tree cond,
 	  arg_1 = gimple_phi_arg_def (phi, 1);
 	}
 
+      gcc_assert ((TREE_CODE (arg_0) != SSA_NAME
+		   || dominated_by_p (CDI_POST_DOMINATORS,
+				      gimple_bb (SSA_NAME_DEF_STMT (arg_0)),
+				      bb))
+		  && (TREE_CODE (arg_1) != SSA_NAME
+		      || dominated_by_p (CDI_POST_DOMINATORS,
+					 gimple_bb (SSA_NAME_DEF_STMT (arg_1)),
+					 bb)));
+
       /* Build new RHS using selected condition and arguments.  */
       rhs = build3 (COND_EXPR, TREE_TYPE (res),
 		    unshare_expr (cond), arg_0, arg_1);
Comment 11 Jakub Jelinek 2011-01-24 19:24:57 UTC
The
# iftmp.0_6 = PHI <3(5), 1(6)>
PHI node certainly wouldn't trigger that assert, as it has non-SSA_NAMEs in arguments, yet there is the same problem with building that condition.
Comment 12 Sebastian Pop 2011-01-24 19:57:40 UTC
Created attachment 23106 [details]
proposed fix

Thanks Jakub for thinking out a counter example, what about this fix?
Comment 13 Jakub Jelinek 2011-01-24 20:04:08 UTC
Can't loop->header have more than 2 predecessors without causing any problems?
Other than that it might work fine.
Comment 14 Sebastian Pop 2011-01-24 20:14:55 UTC
Right, thanks for catching that:
I handled it only in the first place and not in the second:

      gcc_assert (bb == bb->loop_father->header
		  || bb_postdominates_preds (bb));

I am seeing that this check is redundant, as just before calling
predicate_scalar_phi, we have:

      if (bb == loop->header)
	continue;

As this is in a gcc_assert I don't mind the extra check, just to
make it more clear when somebody reads this code again.
If so, I will send the amended patch to gcc-patches.
Comment 15 Jakub Jelinek 2011-01-24 20:20:40 UTC
One more nit, perhaps the bb_postdominates_preds is too expensive for release checking?  It might be just gcc_checking_assert...
Comment 16 Sebastian Pop 2011-01-25 14:51:28 UTC
Author: spop
Date: Tue Jan 25 14:51:23 2011
New Revision: 169233

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169233
Log:
Fix PR47271: only if-convert full writes.

2011-01-25  Sebastian Pop  <sebastian.pop@amd.com>
	    Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/47271
	* tree-if-conv.c (bb_postdominates_preds): New.
	(if_convertible_bb_p): Call bb_postdominates_preds.
	(if_convertible_loop_p_1): Compute CDI_POST_DOMINATORS.
	(predicate_scalar_phi): Call bb_postdominates_preds.

	* gcc.dg/tree-ssa/ifc-pr47271.c: New.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr47271.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-if-conv.c
Comment 17 Sebastian Pop 2011-01-25 14:54:52 UTC
Fixed.
Comment 18 Diego Novillo 2011-02-02 17:46:36 UTC
Author: dnovillo
Date: Wed Feb  2 17:46:33 2011
New Revision: 169581

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169581
Log:
Fix PR47271: only if-convert full writes.

2011-01-25  Sebastian Pop  <sebastian.pop@amd.com>
	    Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/47271
	* tree-if-conv.c (bb_postdominates_preds): New.
	(if_convertible_bb_p): Call bb_postdominates_preds.
	(if_convertible_loop_p_1): Compute CDI_POST_DOMINATORS.
	(predicate_scalar_phi): Call bb_postdominates_preds.

	* gcc.dg/tree-ssa/ifc-pr47271.c: New.

Added:
    branches/google/integration/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr47271.c
Modified:
    branches/google/integration/gcc/ChangeLog
    branches/google/integration/gcc/testsuite/ChangeLog
    branches/google/integration/gcc/tree-if-conv.c