Bug 40021 - [4.5 Regression] Revision 146817 miscompiled DAXPY in BLAS
Summary: [4.5 Regression] Revision 146817 miscompiled DAXPY in BLAS
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.5.0
: P1 normal
Target Milestone: 4.5.0
Assignee: Michael Matz
URL:
Keywords: wrong-code
: 39972 39973 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-05-04 21:52 UTC by H.J. Lu
Modified: 2023-12-31 14:46 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-05-05 15:32:53


Attachments
A testcase (1.56 KB, application/octet-stream)
2009-05-04 23:07 UTC, H.J. Lu
Details
A testcase (401 bytes, text/plain)
2009-05-05 00:45 UTC, H.J. Lu
Details
fix for the problem (805 bytes, patch)
2009-05-05 16:11 UTC, Michael Matz
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2009-05-04 21:52:28 UTC
On Linux/ia32, revision 146817 miscompiled DAXPY in BLAS with

-m32 O3 -msse2 -mfpmath=sse -ffast-math -fno-inline-functions -funroll-loops -ffixed-form
Comment 1 H.J. Lu 2009-05-04 21:52:51 UTC
*** Bug 39972 has been marked as a duplicate of this bug. ***
Comment 2 H.J. Lu 2009-05-04 21:53:07 UTC
*** Bug 39973 has been marked as a duplicate of this bug. ***
Comment 3 H.J. Lu 2009-05-04 23:07:25 UTC
Created attachment 17797 [details]
A testcase

[hjl@gnu-33 pr40021]$ make
/export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O3 -msse2 -mfpmath=sse -ffast-math -fno-inline-functions -funroll-loops -ffixed-form  test.f
/export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O3 -msse2 -mfpmath=sse -ffast-math -fno-inline-functions -funroll-loops -ffixed-form  daxpy.f
/export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -Wl,-rpath,/export/gnu/import/rrs/146817/usr/lib -o daxpy test.o daxpy.o
./daxpy
           2 -2.20932289958000183E-002 != -2.22017297318430201E-002  4.88703976462625395E-003
[hjl@gnu-33 pr40021]$
Comment 4 H.J. Lu 2009-05-04 23:09:53 UTC
The vectorizer seems broken:

/export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O2 -ftree-vectorize -msse2 -mfpmath=sse -ffast-math -ffixed-form  test.f
/export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O2 -ftree-vectorize -msse2 -mfpmath=sse -ffast-math -ffixed-form  daxpy.f
/export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -Wl,-rpath,/export/gnu/import/rrs/146817/usr/lib -o daxpy test.o daxpy.o
./daxpy
           2 -2.20932289958000183E-002 != -2.22017297318430201E-002  4.88703976462625395E-003
[hjl@gnu-33 pr40021]$ 
Comment 5 H.J. Lu 2009-05-05 00:45:42 UTC
Created attachment 17800 [details]
A testcase

[hjl@gnu-33 pr40021]$ make
/export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -c -O2 -ftree-vectorize -msse2 -mfpmath=sse -ffast-math -ffixed-form  test.f
/export/gnu/import/rrs/146817/usr/bin/gfortran -m32 -Wl,-rpath,/export/gnu/import/rrs/146817/usr/lib -o daxpy test.o
./daxpy
           2  -1.0000000000000000      !=  -2.0000000000000000     
[hjl@gnu-33 pr40021]$
Comment 6 Ira Rosen 2009-05-05 12:41:14 UTC
Reproduced on x86_64-suse-linux.

Seems that, somehow, the vectorized version of loop in line 29 is performed, even though the number of scalar iterations is 1.
Comment 7 Michael Matz 2009-05-05 15:32:52 UTC
The problem is the expansion of this code:
<bb 4>:
  ...
  D.1650_109 = D.1648_107 | 1;
  if (D.1650_109 != 0)
    goto <bb 6>;
  else
    goto <bb 5>;

plus the fact that we need to insert something on edge 4->6.  Note how the
condition here is always true (should have been cleaned up by some tree
optimization, but isn't, and we shouldn't have to rely on this anyway).  Now
expand comes and wants to generate a jump sequence implementing this
  if (D.1648_107 | 1) jump true_label.

do_jump has code for an IOR expression jump:

    case TRUTH_ORIF_EXPR:
        ...
          do_jump (TREE_OPERAND (exp, 0), NULL_RTX, if_true_label);
          do_jump (TREE_OPERAND (exp, 1), if_false_label, if_true_label);

bb 5 comes after bb 6 so the false edge is fallthrough, hence if_false_label
here is NULL.  Now we first emit a conditional jump to true_label for the
left side:

(jump_insn 32 31 0 pr40021.f:28 (set (pc)
        (if_then_else (ne (reg:CCZ 17 flags)
                (const_int 0 [0x0]))
            (label_ref 0)
            (pc))) -1 (nil))

and then for the right side.  As this is a trivially true condition do_jump
then decides to emit an unconditional jump plus barrier:

(jump_insn 33 32 34 pr40021.f:28 (set (pc)
        (label_ref 0)) -1 (nil))
(barrier 34 33 0)

Note how the first jump is useless now, it conditionally jumps to a label
which when not taken will be jumped to anyway unconditionally.

This all is fine so far, except that we need to remove fallthrough edges
if expand_gimple_cond, if do_jump decided to emit a barrier.

Unfortunately this removes one of the two edges, leaving only one.  Now
emitting on this edge doesn't need splitting anymore, the instruction is
inserted at the end of the source block.  But that insertion only expects
one jump here and inserts before that, leaving the conditional jump before:

(jump_insn 32 31 111 5 pr40021.f:28 (set (pc)
        (if_then_else (ne (reg:CCZ 17 flags)
                (const_int 0 [0x0]))
            (label_ref 48)
            (pc))) -1 (expr_list:REG_BR_PROB (const_int 5000 [0x1388])
        (nil)))
(insn 13 111 33 6 pr40021.f:20 (set (reg/v:SI 122 [ i ])
        (const_int 1 [0x1])) -1 (nil))
(jump_insn 33 13 34 6 pr40021.f:28 (set (pc)
        (label_ref 48)) -1 (nil))
(barrier 34 33 35)

(insn 13 is the inserted one).  That's semantically not equivalent, if the
conditional jump is taken the assignment i = 1 is left out, although it was
supposed to be on exactly that edge.

I'm going to fix this in expand_gimple_cond if this situation can happen
by cleaning up the generated jump sequence.
Comment 8 Michael Matz 2009-05-05 16:11:13 UTC
Created attachment 17803 [details]
fix for the problem

I'm regstrapping this patch, it fixes the problem in the testcase.
Comment 9 H.J. Lu 2009-05-05 16:28:56 UTC
(In reply to comment #8)
> Created an attachment (id=17803) [edit]
> fix for the problem
> 
> I'm regstrapping this patch, it fixes the problem in the testcase.
> 

Shouldn't

+	{
+	  rtx insn;
+	  remove_edge (false_edge);
+	  /* Now, we have a single successor block, if we have insns to
+	     insert on the remaining edge we potentially will insert
+	     it at the end of this block (if the dest block isn't feasible)
+	     in order to avoid splitting the edge.  This insertion will take
+	     place in front of the last jump.  But we might have emitted
+	     multiple jumps (conditional and one unconditional) to the
+	     same destination.  Inserting in front of the last one then
+	     is a problem.  See PR 40021.  We fix this by deleting all
+	     jumps except the last unconditional one.  */
+	  insn = PREV_INSN (get_last_insn ());
+	  /* Make sure we have an unconditional jump.  Otherwise we're
+	     confused.  */
+	  gcc_assert (JUMP_P (insn) && !any_condjump_p (insn));
+	  for (insn = PREV_INSN (insn); insn != BB_HEAD (bb);)
+	    {
+	      insn = PREV_INSN (insn);
+	      if (JUMP_P (NEXT_INSN (insn)))
+		delete_insn (NEXT_INSN (insn));
+	    }
+	}

in a standalone function since it is used twice?
Comment 10 Michael Matz 2009-05-06 09:01:53 UTC
Yes, I'll clean this up before submission.  Did it help?
Comment 11 H.J. Lu 2009-05-06 14:27:28 UTC
(In reply to comment #10)
> Yes, I'll clean this up before submission.  Did it help?
> 

I tried it on revision 147173 and it works on test input for
416.gamess and 465.tonto. I am running reference input now. Thanks.
Comment 12 H.J. Lu 2009-05-06 15:48:19 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Yes, I'll clean this up before submission.  Did it help?
> > 
> 
> I tried it on revision 147173 and it works on test input for
> 416.gamess and 465.tonto. I am running reference input now. Thanks.
> 

It fixed 416.gamess and 465.tonto with reference input.
Comment 13 Michael Matz 2009-05-06 16:50:22 UTC
Subject: Bug 40021

Author: matz
Date: Wed May  6 16:49:13 2009
New Revision: 147186

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147186
Log:
        PR middle-end/40021
        * cfgexpand.c (maybe_cleanup_end_of_block): New static function.
        (expand_gimple_cond): Use it to cleanup CFG and superfluous jumps.

        * gfortran.dg/pr40021.f: New test.

Added:
    trunk/gcc/testsuite/gfortran.dg/pr40021.f
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/testsuite/ChangeLog

Comment 14 Michael Matz 2009-05-06 16:54:37 UTC
Hence, fixed.
Comment 15 hjl@gcc.gnu.org 2009-05-06 17:47:06 UTC
Subject: Bug 40021

Author: hjl
Date: Wed May  6 17:45:40 2009
New Revision: 147195

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147195
Log:
2009-05-06  H.J. Lu  <hongjiu.lu@intel.com>

	Backport from mainline:
	2009-05-06  H.J. Lu  <hongjiu.lu@intel.com>

	PR middle-end/40021
	* gfortran.fortran-torture/execute/pr40021.f: New.

	2009-05-05  Richard Guenther  <rguenther@suse.de>

	PR middle-end/40023
	* gcc.c-torture/compile/pr40023.c: New testcase.

	2009-05-03  Richard Guenther  <rguenther@suse.de>

	PR c/39983
	* gcc.c-torture/compile/pr39983.c: New testcase.

Added:
    branches/gcc-4_4-branch/gcc/testsuite/gcc.c-torture/compile/pr39983.c
      - copied unchanged from r147193, trunk/gcc/testsuite/gcc.c-torture/compile/pr39983.c
    branches/gcc-4_4-branch/gcc/testsuite/gcc.c-torture/compile/pr40023.c
      - copied unchanged from r147194, trunk/gcc/testsuite/gcc.c-torture/compile/pr40023.c
    branches/gcc-4_4-branch/gcc/testsuite/gfortran.fortran-torture/execute/pr40021.f
      - copied unchanged from r147193, trunk/gcc/testsuite/gfortran.fortran-torture/execute/pr40021.f
Modified:
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog