$ gcc-4.2.0 -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.2.0/configure --enable-languages=c,c++,objc,obj-c++ --program-suffix=-4.2.0 Thread model: posix gcc version 4.2.0 $ gcc-4.2.0 divop.c -o divop $ ./divop 3CBA83 $ gcc-4.2.0 -ftree-vectorize -O2 divop.c -o divop $ ./divop 3CBA83 $ gcc-4.2.0 -ftree-vectorize -O2 -march=opteron divop.c -o divop $ ./divop 3CBA83 $ gcc-4.2.0 -ftree-vectorize -O2 -march=nocona divop.c -o divop $ ./divop 2B672F This also affects GCC 4.1.1, I have not checked 4.1.2 or the 4.3 snapshots. In addition to -march=nocona it requires -O1/-O2/-O3 (-Os does not trigger it under 4.2.0, though it _does_ in 4.1.1).
Created attachment 13567 [details] A short testcase for bug 31966
The behavior still exists in the 4.3 20070622 snapshot. It does not occur using -march=core2 (the actual CPU in question). The bad value results when using -ftree-vectorize and -march or -mtune =nocona. -O, -O2, or -O3 is also required; a binary compiled with -Os produces the correct value.
I was able to reproduce the behavior on a Pentinum4 with a recent 4.3 snapshot. It doesn't look like it's the vectorizer's fault - nothing gets vectorized. However, if I disable the tree-level if-conversion (which is automatically enabled by -ftree-vectorize) - the result of '-ftree-vectorize -O2 -march=nocona divop.c -o divop' becomes '3CBA83'.
This may be related to PR32533 (also wrong-code due to tree-if-conversion).
Confirmed. This is the same bug as PR32533, but this one comes with the c testcase. The problem is in ifcvt pass. In -march=nocona case (-march=nocona -O2 -ftree-vectorize), we have following code before ifcvt pass: if (high_top_bit_11 != 0) goto <bb 5>; else goto <bb 4>; <bb 4>: if (d_7(D) <= high_19) goto <bb 5>; else goto <bb 6>; <bb 5>: high_21 = high_19 - d_7(D); quotient_22 = quotient_20 | 1; <bb 6>: # quotient_3 = PHI <quotient_20(4), quotient_22(5)> # high_1 = PHI <high_19(4), high_21(5)> j_23 = j_32 + 1; This code is converted by ifcvt pass to: quotient_20 = quotient_31 << 1; [+] D.2068_5 = high_top_bit_11 == 0; D.2069_4 = d_7(D) <= high_19; _ifc_.29_2 = D.2068_5 && D.2069_4; D.2071_29 = high_top_bit_11 == 0; D.2072_33 = d_7(D) > high_19; _ifc_.30_34 = D.2071_29 && D.2072_33; high_21 = high_19 - d_7(D); quotient_22 = quotient_20 | 1; [++] quotient_3 = high_top_bit_11 == 0 ? quotient_20 : quotient_22; <<< here! high_1 = high_top_bit_11 == 0 ? high_19 : high_21; <<< here! j_23 = j_32 + 1; The condition for quotient_3 [and high_1], produced by ifcvt pass is wrong, and should be: quotient_3 = _ifc_.3034 ? quotient_20 : quotient_22; This is evident from the inner loop of the testcase: --cut here-- { word high_top_bit = (high & MP_WORD_TOP_BIT); high <<= 1; high |= (n0 >> (MP_WORD_BITS-1-j)) & 1; quotient <<= 1; [+] if(high_top_bit || high >= d) <<<< _the_condition_ { high -= d; quotient |= 1; [++] } } --cut here-- Due to slighlty different gimple generation for -march=core2 (please look into _.004t.gimple) where only if branch is created, ifcvt is able to create correct code: quotient_20 = quotient_34 << 1; [+] D.2065_21 = high_top_bit_11 != 0; D.2066_22 = high_19 >= d_7(D); D.2067_23 = D.2065_21 || D.2066_22; high_24 = high_19 - d_7(D); quotient_25 = quotient_20 | 1; [++] quotient_3 = D.2067_23 ? quotient_25 : quotient_20; <<<< here
In the tree-if-conv.c, there is a comment with the reference to PR23115, where similar problems were addressed: 4) If pred B is dominated by pred A then use pred B's condition. See PR23115. */ For attached testcase, we have following CFG: bb3 / \ / \ / \ V V bb5 <---- bb4 \ / \ / \ / V V bb6 with the conditions: bb3: C3 = (high_top_bit_11 != 0) bb4: C4 = (d_7(D) <= high_19) predicates for blocks are then: bb4: !C3 bb5: C3 || !C3 & C4 [ = C3 || C4 ] In bb6, we have: # quotient_3 = PHI <quotient_20(4), quotient_22(5)> For some reason domiated_by_p() failed to detect that bb5 is dominated by bb4 and if-conversion simply assigns bb4 predicate (!C3) as the phi node condition.
Well, the problem is, that we "forgot" to take into account false branch from bb4 to bb6. In notation form the Comment #6, this branch has the condition !C4, so the total condition for branch from bb4 leading to bb6 would be: !C3 && !C4. The conditions for both branches leading to phi node are then: left: (C3 || C4) && 1 -----> C3 || C4 right: !C3 && !C4 ---------> !(C3 || C4) With above correction, phi # quotient_3 = PHI <quotient_20(4), quotient_22(5)> would be substituted with: quotient_3 = !(C3 || C4) ? quotient_20 : quotient_22; or: quotient_3 = (C3 || C4) ? quotient_22 : quotient_20; which just happens to be the correct condition. QED.
Patch at http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00104.html
Subject: Bug 31966 Author: uros Date: Mon Jul 2 14:26:11 2007 New Revision: 126206 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126206 Log: PR tree-optimization/31966 PR tree-optimization/32533 * tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not "basic_block" description as its third argument. Update function calls to get destination bb from "edge" argument. Save "cond" into aux field of the edge. Update prototype for changed arguments. (find_phi_replacement_condition): Operate on incoming edges, not on predecessor blocks. If there is a condition saved in the incoming edge aux field, AND it with incoming bb predicate. Return source bb of the first edge. (clean_predicate_lists): Clean aux field of outgoing node edges. (tree_if_conversion): Do not initialize cond variable. Move variable declaration into the loop. (replace_phi_with_cond_gimple_modify_stmt): Remove unneded initializations of new_stmt, arg0 and arg1 variables. testsuite/ChangeLog: PR tree-optimization/31966 PR tree-optimization/32533 * gcc.dg/tree-ssa/pr31966.c: New runtime test. * gfortran.dg/pr32533.f90: Ditto. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c trunk/gcc/testsuite/gfortran.dg/pr32533.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-if-conv.c
Subject: Bug 31966 Author: uros Date: Wed Jul 4 05:40:58 2007 New Revision: 126301 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126301 Log: PR tree-optimization/31966 PR tree-optimization/32533 * tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not "basic_block" description as its third argument. Update function calls to get destination bb from "edge" argument. Save "cond" into aux field of the edge. Update prototype for changed arguments. (if_convertible_loop_p): Clear aux field of incoming edges if bb contains phi node. (find_phi_replacement_condition): Operate on incoming edges, not on predecessor blocks. If there is a condition saved in the incoming edge aux field, AND it with incoming bb predicate. Return source bb of the first edge. (clean_predicate_lists): Clean aux field of outgoing node edges. (tree_if_conversion): Do not initialize cond variable. Move variable declaration into the loop. (replace_phi_with_cond_gimple_modify_stmt): Remove unneded initializations of new_stmt, arg0 and arg1 variables. testsuite/ChangeLog: PR tree-optimization/31966 PR tree-optimization/32533 * gcc.dg/tree-ssa/pr31966.c: New runtime test. * gfortran.dg/pr32533.f90: Ditto. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c branches/gcc-4_2-branch/gcc/testsuite/gfortran.dg/pr32533.f90 Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/testsuite/ChangeLog branches/gcc-4_2-branch/gcc/tree-if-conv.c
Subject: Bug 31966 Author: uros Date: Wed Jul 4 05:49:31 2007 New Revision: 126302 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126302 Log: PR tree-optimization/31966 PR tree-optimization/32533 * tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not "basic_block" description as its third argument. Update function calls to get destination bb from "edge" argument. Save "cond" into aux field of the edge. Update prototype for changed arguments. (if_convertible_loop_p): Clear aux field of incoming edges if bb contains phi node. (find_phi_replacement_condition): Operate on incoming edges, not on predecessor blocks. If there is a condition saved in the incoming edge aux field, AND it with incoming bb predicate. Return source bb of the first edge. (clean_predicate_lists): Clean aux field of outgoing node edges. (tree_if_conversion): Do not initialize cond variable. Move variable declaration into the loop. (replace_phi_with_cond_gimple_modify_stmt): Remove unneded initializations of new_stmt, arg0 and arg1 variables. testsuite/ChangeLog: PR tree-optimization/31966 PR tree-optimization/32533 * gcc.dg/tree-ssa/pr31966.c: New runtime test. * gfortran.dg/pr32533.f90: Ditto. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c branches/gcc-4_1-branch/gcc/testsuite/gfortran.dg/pr32533.f90 Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/testsuite/ChangeLog branches/gcc-4_1-branch/gcc/tree-if-conv.c
Fixed on mainline and branches.