On Linux/ia32, revision 146817 miscompiled DAXPY in BLAS with -m32 O3 -msse2 -mfpmath=sse -ffast-math -fno-inline-functions -funroll-loops -ffixed-form
*** Bug 39972 has been marked as a duplicate of this bug. ***
*** Bug 39973 has been marked as a duplicate of this bug. ***
Created attachment 17797 [details] A testcase [hjl@gnu-33 pr40021]$ make /export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O3 -msse2 -mfpmath=sse -ffast-math -fno-inline-functions -funroll-loops -ffixed-form test.f /export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O3 -msse2 -mfpmath=sse -ffast-math -fno-inline-functions -funroll-loops -ffixed-form daxpy.f /export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -Wl,-rpath,/export/gnu/import/rrs/146817/usr/lib -o daxpy test.o daxpy.o ./daxpy 2 -2.20932289958000183E-002 != -2.22017297318430201E-002 4.88703976462625395E-003 [hjl@gnu-33 pr40021]$
The vectorizer seems broken: /export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O2 -ftree-vectorize -msse2 -mfpmath=sse -ffast-math -ffixed-form test.f /export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O2 -ftree-vectorize -msse2 -mfpmath=sse -ffast-math -ffixed-form daxpy.f /export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -Wl,-rpath,/export/gnu/import/rrs/146817/usr/lib -o daxpy test.o daxpy.o ./daxpy 2 -2.20932289958000183E-002 != -2.22017297318430201E-002 4.88703976462625395E-003 [hjl@gnu-33 pr40021]$
Created attachment 17800 [details] A testcase [hjl@gnu-33 pr40021]$ make /export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O2 -ftree-vectorize -msse2 -mfpmath=sse -ffast-math -ffixed-form test.f /export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -Wl,-rpath,/export/gnu/import/rrs/146817/usr/lib -o daxpy test.o ./daxpy 2 -1.0000000000000000 != -2.0000000000000000 [hjl@gnu-33 pr40021]$
Reproduced on x86_64-suse-linux. Seems that, somehow, the vectorized version of loop in line 29 is performed, even though the number of scalar iterations is 1.
The problem is the expansion of this code: <bb 4>: ... D.1650_109 = D.1648_107 | 1; if (D.1650_109 != 0) goto <bb 6>; else goto <bb 5>; plus the fact that we need to insert something on edge 4->6. Note how the condition here is always true (should have been cleaned up by some tree optimization, but isn't, and we shouldn't have to rely on this anyway). Now expand comes and wants to generate a jump sequence implementing this if (D.1648_107 | 1) jump true_label. do_jump has code for an IOR expression jump: case TRUTH_ORIF_EXPR: ... do_jump (TREE_OPERAND (exp, 0), NULL_RTX, if_true_label); do_jump (TREE_OPERAND (exp, 1), if_false_label, if_true_label); bb 5 comes after bb 6 so the false edge is fallthrough, hence if_false_label here is NULL. Now we first emit a conditional jump to true_label for the left side: (jump_insn 32 31 0 pr40021.f:28 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0x0])) (label_ref 0) (pc))) -1 (nil)) and then for the right side. As this is a trivially true condition do_jump then decides to emit an unconditional jump plus barrier: (jump_insn 33 32 34 pr40021.f:28 (set (pc) (label_ref 0)) -1 (nil)) (barrier 34 33 0) Note how the first jump is useless now, it conditionally jumps to a label which when not taken will be jumped to anyway unconditionally. This all is fine so far, except that we need to remove fallthrough edges if expand_gimple_cond, if do_jump decided to emit a barrier. Unfortunately this removes one of the two edges, leaving only one. Now emitting on this edge doesn't need splitting anymore, the instruction is inserted at the end of the source block. But that insertion only expects one jump here and inserts before that, leaving the conditional jump before: (jump_insn 32 31 111 5 pr40021.f:28 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0x0])) (label_ref 48) (pc))) -1 (expr_list:REG_BR_PROB (const_int 5000 [0x1388]) (nil))) (insn 13 111 33 6 pr40021.f:20 (set (reg/v:SI 122 [ i ]) (const_int 1 [0x1])) -1 (nil)) (jump_insn 33 13 34 6 pr40021.f:28 (set (pc) (label_ref 48)) -1 (nil)) (barrier 34 33 35) (insn 13 is the inserted one). That's semantically not equivalent, if the conditional jump is taken the assignment i = 1 is left out, although it was supposed to be on exactly that edge. I'm going to fix this in expand_gimple_cond if this situation can happen by cleaning up the generated jump sequence.
Created attachment 17803 [details] fix for the problem I'm regstrapping this patch, it fixes the problem in the testcase.
(In reply to comment #8) > Created an attachment (id=17803) [edit] > fix for the problem > > I'm regstrapping this patch, it fixes the problem in the testcase. > Shouldn't + { + rtx insn; + remove_edge (false_edge); + /* Now, we have a single successor block, if we have insns to + insert on the remaining edge we potentially will insert + it at the end of this block (if the dest block isn't feasible) + in order to avoid splitting the edge. This insertion will take + place in front of the last jump. But we might have emitted + multiple jumps (conditional and one unconditional) to the + same destination. Inserting in front of the last one then + is a problem. See PR 40021. We fix this by deleting all + jumps except the last unconditional one. */ + insn = PREV_INSN (get_last_insn ()); + /* Make sure we have an unconditional jump. Otherwise we're + confused. */ + gcc_assert (JUMP_P (insn) && !any_condjump_p (insn)); + for (insn = PREV_INSN (insn); insn != BB_HEAD (bb);) + { + insn = PREV_INSN (insn); + if (JUMP_P (NEXT_INSN (insn))) + delete_insn (NEXT_INSN (insn)); + } + } in a standalone function since it is used twice?
Yes, I'll clean this up before submission. Did it help?
(In reply to comment #10) > Yes, I'll clean this up before submission. Did it help? > I tried it on revision 147173 and it works on test input for 416.gamess and 465.tonto. I am running reference input now. Thanks.
(In reply to comment #11) > (In reply to comment #10) > > Yes, I'll clean this up before submission. Did it help? > > > > I tried it on revision 147173 and it works on test input for > 416.gamess and 465.tonto. I am running reference input now. Thanks. > It fixed 416.gamess and 465.tonto with reference input.
Subject: Bug 40021 Author: matz Date: Wed May 6 16:49:13 2009 New Revision: 147186 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147186 Log: PR middle-end/40021 * cfgexpand.c (maybe_cleanup_end_of_block): New static function. (expand_gimple_cond): Use it to cleanup CFG and superfluous jumps. * gfortran.dg/pr40021.f: New test. Added: trunk/gcc/testsuite/gfortran.dg/pr40021.f Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/testsuite/ChangeLog
Hence, fixed.
Subject: Bug 40021 Author: hjl Date: Wed May 6 17:45:40 2009 New Revision: 147195 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147195 Log: 2009-05-06 H.J. Lu <hongjiu.lu@intel.com> Backport from mainline: 2009-05-06 H.J. Lu <hongjiu.lu@intel.com> PR middle-end/40021 * gfortran.fortran-torture/execute/pr40021.f: New. 2009-05-05 Richard Guenther <rguenther@suse.de> PR middle-end/40023 * gcc.c-torture/compile/pr40023.c: New testcase. 2009-05-03 Richard Guenther <rguenther@suse.de> PR c/39983 * gcc.c-torture/compile/pr39983.c: New testcase. Added: branches/gcc-4_4-branch/gcc/testsuite/gcc.c-torture/compile/pr39983.c - copied unchanged from r147193, trunk/gcc/testsuite/gcc.c-torture/compile/pr39983.c branches/gcc-4_4-branch/gcc/testsuite/gcc.c-torture/compile/pr40023.c - copied unchanged from r147194, trunk/gcc/testsuite/gcc.c-torture/compile/pr40023.c branches/gcc-4_4-branch/gcc/testsuite/gfortran.fortran-torture/execute/pr40021.f - copied unchanged from r147193, trunk/gcc/testsuite/gfortran.fortran-torture/execute/pr40021.f Modified: branches/gcc-4_4-branch/gcc/testsuite/ChangeLog