Bug 31966 - [4.1/4.2 Regression] Miscompiles valid code with -ftree-vectorize
Summary: [4.1/4.2 Regression] Miscompiles valid code with -ftree-vectorize
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.2.0
: P1 normal
Target Milestone: 4.1.3
Assignee: Uroš Bizjak
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: wrong-code
Depends on: 23115
Blocks: 32533
  Show dependency treegraph
 
Reported: 2007-05-17 07:35 UTC by Jack Lloyd
Modified: 2007-07-04 05:54 UTC (History)
2 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2007-07-02 14:28:40


Attachments
A short testcase for bug 31966 (382 bytes, text/plain)
2007-05-17 07:36 UTC, Jack Lloyd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jack Lloyd 2007-05-17 07:35:38 UTC
$ gcc-4.2.0 -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.2.0/configure --enable-languages=c,c++,objc,obj-c++ --program-suffix=-4.2.0
Thread model: posix
gcc version 4.2.0
$ gcc-4.2.0 divop.c -o divop
$ ./divop
          3CBA83
$ gcc-4.2.0 -ftree-vectorize -O2  divop.c -o divop
$ ./divop
          3CBA83
$ gcc-4.2.0 -ftree-vectorize -O2 -march=opteron divop.c -o divop
$ ./divop 
          3CBA83
$ gcc-4.2.0 -ftree-vectorize -O2 -march=nocona divop.c -o divop
$ ./divop
          2B672F

This also affects GCC 4.1.1, I have not checked 4.1.2 or the 4.3 snapshots. In addition to -march=nocona it requires -O1/-O2/-O3 (-Os does not trigger it under 4.2.0, though it _does_ in 4.1.1).
Comment 1 Jack Lloyd 2007-05-17 07:36:57 UTC
Created attachment 13567 [details]
A short testcase for bug 31966
Comment 2 Jack Lloyd 2007-06-30 22:11:06 UTC
The behavior still exists in the 4.3 20070622 snapshot. It does not occur using -march=core2 (the actual CPU in question). The bad value results when using -ftree-vectorize and -march or -mtune =nocona. -O, -O2, or -O3 is also required; a binary compiled with -Os produces the correct value.
Comment 3 dorit 2007-07-01 08:47:58 UTC
I was able to reproduce the behavior on a Pentinum4 with a recent 4.3 snapshot. It doesn't look like it's the vectorizer's fault - nothing gets vectorized. However, if I disable the tree-level if-conversion (which is automatically enabled by -ftree-vectorize) - the result of '-ftree-vectorize -O2 -march=nocona divop.c -o divop' becomes '3CBA83'. 
Comment 4 dorit 2007-07-01 08:52:43 UTC
This may be related to PR32533 (also wrong-code due to tree-if-conversion).
Comment 5 Uroš Bizjak 2007-07-01 09:33:30 UTC
Confirmed. This is the same bug as PR32533, but this one comes with the c testcase. The problem is in ifcvt pass.

In -march=nocona case (-march=nocona -O2 -ftree-vectorize), we have following code before ifcvt pass:

  if (high_top_bit_11 != 0)
    goto <bb 5>;
  else
    goto <bb 4>;

<bb 4>:
  if (d_7(D) <= high_19)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
  high_21 = high_19 - d_7(D);
  quotient_22 = quotient_20 | 1;

<bb 6>:
  # quotient_3 = PHI <quotient_20(4), quotient_22(5)>
  # high_1 = PHI <high_19(4), high_21(5)>
  j_23 = j_32 + 1;

This code is converted by ifcvt pass to:

  quotient_20 = quotient_31 << 1;          [+]
  D.2068_5 = high_top_bit_11 == 0;
  D.2069_4 = d_7(D) <= high_19;
  _ifc_.29_2 = D.2068_5 && D.2069_4;
  D.2071_29 = high_top_bit_11 == 0;
  D.2072_33 = d_7(D) > high_19;
  _ifc_.30_34 = D.2071_29 && D.2072_33;
  high_21 = high_19 - d_7(D);
  quotient_22 = quotient_20 | 1;           [++]
  quotient_3 = high_top_bit_11 == 0 ? quotient_20 : quotient_22;   <<< here!
  high_1 = high_top_bit_11 == 0 ? high_19 : high_21;               <<< here!
  j_23 = j_32 + 1;

The condition for quotient_3 [and high_1], produced by ifcvt pass is wrong, and should be:

  quotient_3 = _ifc_.3034 ? quotient_20 : quotient_22;

This is evident from the inner loop of the testcase:

--cut here--
      {
      word high_top_bit = (high & MP_WORD_TOP_BIT);

      high <<= 1;
      high |= (n0 >> (MP_WORD_BITS-1-j)) & 1;
      quotient <<= 1;                                [+]

      if(high_top_bit || high >= d)                  <<<< _the_condition_
         {
         high -= d;
         quotient |= 1;                              [++]
         }
      }
--cut here--

Due to slighlty different gimple generation for -march=core2 (please look into _.004t.gimple) where only if branch is created, ifcvt is able to create correct code:

  quotient_20 = quotient_34 << 1;                           [+]
  D.2065_21 = high_top_bit_11 != 0;
  D.2066_22 = high_19 >= d_7(D);
  D.2067_23 = D.2065_21 || D.2066_22;
  high_24 = high_19 - d_7(D);
  quotient_25 = quotient_20 | 1;                            [++]
  quotient_3 = D.2067_23 ? quotient_25 : quotient_20;       <<<<  here
Comment 6 Uroš Bizjak 2007-07-01 14:20:03 UTC
In the tree-if-conv.c, there is a comment with the reference to PR23115, where similar problems were addressed:

     4) If  pred B is dominated by pred A then use pred B's condition.
        See PR23115.  */

For attached testcase, we have following CFG:

                 bb3
                /   \
               /     \
              /       \
             V         V
            bb5 <---- bb4
              \       /
               \     /
                \   /
                 V V
                 bb6

with the conditions:

bb3:  C3 = (high_top_bit_11 != 0) 
bb4:  C4 = (d_7(D) <= high_19)

predicates for blocks are then:

bb4: !C3
bb5: C3 || !C3 & C4       [ = C3 || C4 ]

In bb6, we have:

  # quotient_3 = PHI <quotient_20(4), quotient_22(5)>

For some reason domiated_by_p() failed to detect that bb5 is dominated by bb4 and if-conversion simply assigns bb4 predicate (!C3) as the phi node condition.
Comment 7 Uroš Bizjak 2007-07-01 15:46:41 UTC
Well, the problem is, that we "forgot" to take into account false branch from bb4 to bb6. In notation form the Comment #6, this branch has the condition !C4, so the total condition for branch from bb4 leading to bb6 would be: !C3 && !C4.

The conditions for both branches leading to phi node are then:

left: (C3 || C4) && 1  ----->    C3 || C4
right: !C3 && !C4  --------->  !(C3 || C4)

With above correction, phi

  # quotient_3 = PHI <quotient_20(4), quotient_22(5)>

would be substituted with:

  quotient_3 = !(C3 || C4) ? quotient_20 : quotient_22;

  or:

  quotient_3 = (C3 || C4) ? quotient_22 : quotient_20;

which just happens to be the correct condition.

QED.
Comment 8 Uroš Bizjak 2007-07-02 14:02:26 UTC
Patch at http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00104.html
Comment 9 uros 2007-07-02 14:26:23 UTC
Subject: Bug 31966

Author: uros
Date: Mon Jul  2 14:26:11 2007
New Revision: 126206

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126206
Log:
	PR tree-optimization/31966
	PR tree-optimization/32533
	* tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not
	"basic_block" description as its third argument.  Update function
	calls to get destination bb from "edge" argument.  Save "cond" into
	aux field of the edge.  Update prototype for changed arguments.
	(find_phi_replacement_condition): Operate on incoming edges, not
	on predecessor blocks.  If there is a condition saved in the
	incoming edge aux field, AND it with incoming bb predicate.
	Return source bb of the first edge.
	(clean_predicate_lists): Clean aux field of outgoing node edges.
	(tree_if_conversion): Do not initialize cond variable. Move
	variable declaration into the loop.
	(replace_phi_with_cond_gimple_modify_stmt): Remove unneded
	initializations of new_stmt, arg0 and arg1 variables.

testsuite/ChangeLog:

	PR tree-optimization/31966
	PR tree-optimization/32533
	* gcc.dg/tree-ssa/pr31966.c: New runtime test.
	* gfortran.dg/pr32533.f90: Ditto.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c
    trunk/gcc/testsuite/gfortran.dg/pr32533.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-if-conv.c

Comment 10 uros 2007-07-04 05:41:15 UTC
Subject: Bug 31966

Author: uros
Date: Wed Jul  4 05:40:58 2007
New Revision: 126301

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126301
Log:
	PR tree-optimization/31966
	PR tree-optimization/32533
	* tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not
	"basic_block" description as its third argument.  Update function
	calls to get destination bb from "edge" argument.  Save "cond" into
	aux field of the edge.  Update prototype for changed arguments.
	(if_convertible_loop_p): Clear aux field of incoming edges if bb
	contains phi node.
	(find_phi_replacement_condition): Operate on incoming edges, not
	on predecessor blocks.  If there is a condition saved in the
	incoming edge aux field, AND it with incoming bb predicate.
	Return source bb of the first edge.
	(clean_predicate_lists): Clean aux field of outgoing node edges.
	(tree_if_conversion): Do not initialize cond variable. Move
	variable declaration into the loop.
	(replace_phi_with_cond_gimple_modify_stmt): Remove unneded
	initializations of new_stmt, arg0 and arg1 variables.

testsuite/ChangeLog:

	PR tree-optimization/31966
	PR tree-optimization/32533
	* gcc.dg/tree-ssa/pr31966.c: New runtime test.
	* gfortran.dg/pr32533.f90: Ditto.


Added:
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c
    branches/gcc-4_2-branch/gcc/testsuite/gfortran.dg/pr32533.f90
Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_2-branch/gcc/tree-if-conv.c

Comment 11 uros 2007-07-04 05:49:43 UTC
Subject: Bug 31966

Author: uros
Date: Wed Jul  4 05:49:31 2007
New Revision: 126302

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126302
Log:
	PR tree-optimization/31966
	PR tree-optimization/32533
	* tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not
	"basic_block" description as its third argument.  Update function
	calls to get destination bb from "edge" argument.  Save "cond" into
	aux field of the edge.  Update prototype for changed arguments.
	(if_convertible_loop_p): Clear aux field of incoming edges if bb
	contains phi node.
	(find_phi_replacement_condition): Operate on incoming edges, not
	on predecessor blocks.  If there is a condition saved in the
	incoming edge aux field, AND it with incoming bb predicate.
	Return source bb of the first edge.
	(clean_predicate_lists): Clean aux field of outgoing node edges.
	(tree_if_conversion): Do not initialize cond variable. Move
	variable declaration into the loop.
	(replace_phi_with_cond_gimple_modify_stmt): Remove unneded
	initializations of new_stmt, arg0 and arg1 variables.

testsuite/ChangeLog:

	PR tree-optimization/31966
	PR tree-optimization/32533
	* gcc.dg/tree-ssa/pr31966.c: New runtime test.
	* gfortran.dg/pr32533.f90: Ditto.


Added:
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c
    branches/gcc-4_1-branch/gcc/testsuite/gfortran.dg/pr32533.f90
Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_1-branch/gcc/tree-if-conv.c

Comment 12 Uroš Bizjak 2007-07-04 05:54:11 UTC
Fixed on mainline and branches.