Bug 32533 - [4.1/4.2 regression] miscompilation at -O3 -ffast-math -ftree-vectorize -march=native
Summary: [4.1/4.2 regression] miscompilation at -O3 -ffast-math -ftree-vectorize -marc...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.2.0
: P1 normal
Target Milestone: 4.2.1
Assignee: Uroš Bizjak
URL:
Keywords: wrong-code
Depends on: 31966
Blocks: 29975
  Show dependency treegraph
 
Reported: 2007-06-28 06:07 UTC by Joost VandeVondele
Modified: 2007-07-06 13:19 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.3.0 4.0.4
Known to fail: 4.2.0 4.1.3
Last reconfirmed: 2007-06-28 08:13:14


Attachments
The patch to fix tree-if-conv.c (2.27 KB, patch)
2007-07-02 09:40 UTC, Uroš Bizjak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joost VandeVondele 2007-06-28 06:07:06 UTC
derived from CP2K and discussed in PR 29975, the following is miscompiled with the current 4.2 branch :

vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> cat test.f90
      SUBROUTINE T(nsubcell,sab_max,subcells)
       INTEGER, PARAMETER :: dp=KIND(0.0D0)
       REAL(dp) :: sab_max(3), subcells,nsubcell(3)

        nsubcell(:) = MIN(MAX(1,NINT(0.5_dp*subcells/sab_max(:))),20)
        write(6,*) nsubcell
      END SUBROUTINE

       INTEGER, PARAMETER :: dp=KIND(0.0D0)
       REAL(dp) :: sab_max(3), subcells,nsubcell(3)
       subcells=2.00000000000000
       sab_max=0.590060749244805_dp
       write(6,*) NINT(0.5_dp*subcells/sab_max(1))
       CALL T(nsubcell,sab_max,subcells)
     END


vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> gfortran -v -O3 -ffast-math -ftree-vectorize -march=native test.f90
Driving: gfortran -v -O3 -ffast-math -ftree-vectorize -march=native test.f90 -lgfortranbegin -lgfortran -lm -shared-libgcc
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /data03/vondele/gcc_4_2_branch/gcc/configure --prefix=/data03/vondele/gcc_4_2_branch/build --with-gmp=/data03/vondele/ --with-mpfr=/data03/vondele/ --enable-languages=c,fortran
Thread model: posix
gcc version 4.2.1 20070627 (prerelease)
 /data03/vondele/gcc_4_2_branch/build/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1/f951 test.f90 -march=k8 -mtune=k8 -quiet -dumpbase test.f90 -auxbase test -O3 -version -ffast-math -ftree-vectorize -I /data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/finclude -o /tmp/ccTYHMGd.s
GNU F95 version 4.2.1 20070627 (prerelease) (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.2.1 20070627 (prerelease).
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
 as -V -Qy -o /tmp/ccs8OyYj.o /tmp/ccTYHMGd.s
GNU assembler version 2.16.91.0.5 (x86_64-suse-linux) using BFD version 2.16.91.0.5 20051219 (SUSE Linux)
 /data03/vondele/gcc_4_2_branch/build/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o /data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/crtbegin.o -L/data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1 -L/data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/../../.. /tmp/ccs8OyYj.o -lgfortranbegin -lgfortran -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/crtfastmath.o /data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/crtend.o /usr/lib/../lib64/crtn.o
vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> ./a.out
           2
   20.0000000000000        20.0000000000000        20.0000000000000
vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> gfortran -O0 test.f90
vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> ./a.out
           2
   2.00000000000000        2.00000000000000        2.00000000000000

the result is incorrectly 20.0 instead of 2.0 with optimization
Comment 1 Uroš Bizjak 2007-06-28 08:13:14 UTC
Looks like problems in tree ifcvt pass. Before ifcvt, we have:

  M.2_16 = (int4) D.1257_15;
  if (M.2_16 > 1) goto <L7>; else goto <L9>;

<L7>:;
  if (M.2_16 > 20) goto <L10>; else goto <L9>;

  # M.2_64 = PHI <M.2_16(6), 1(5)>;
<L9>:;
  pretmp.118_78 = (real8) M.2_64;

  # prephitmp.119_79 = PHI <2.0e+1(6), pretmp.118_78(7)>;
  # M.2_4 = PHI <20(6), M.2_64(7)>;
<L10>:;

But ifcvt creates:

  D.1446_84 = M.2_16 > 1;
  D.1447_85 = M.2_16 > 20;
  _ifc_.127_86 = D.1446_84 && D.1447_85;
  D.1449_87 = M.2_16 > 1;
  D.1450_88 = M.2_16 <= 20;
  _ifc_.128_89 = D.1449_87 && D.1450_88;
  M.2_64 = M.2_16 > 1 ? M.2_16 : 1;
  pretmp.118_78 = (real8) M.2_64;
  prephitmp.119_79 = M.2_16 > 1 ? 2.0e+1 : pretmp.118_78;
  M.2_4 = M.2_16 > 1 ? 20 : M.2_64;

Note the last two lines, where we compare "M.2_16 > 1" instead of "M.2_16 > 20".

This bug could be hidden in 4.3.0 as we use MIN_EXPR and MAX_EXPR here.
Comment 2 Uroš Bizjak 2007-06-28 08:14:22 UTC
(In reply to comment #1)

> This bug could be hidden in 4.3.0 as we use MIN_EXPR and MAX_EXPR here.

To clear the typo - we use MIN_EXPR and MAX_EXPR in 4.3.0.
Comment 3 Joost VandeVondele 2007-06-29 07:13:15 UTC
I'm wondering if it is still required to turn fortran testcases in the equivalent C testcase in order to mark this kind of bugs as P1. This does seem a bit of a waste of time... and just think about the engineers using Fortran to design nuclear power plants... ;-)
Comment 4 Joost VandeVondele 2007-06-29 07:50:50 UTC
also fails with 4.1.3

gfortran  -march=k8 -mtune=k8 -ftree-vectorize -ffast-math -O3 test.f90
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /data03/vondele/gcc_4_1_branch/gcc/configure --prefix=/data03/vondele/gcc_4_1_branch/build --with-gmp=/data03/vondele/ --with-mpfr=/data03/vondele/ --enable-languages=c,fortran
Thread model: posix
gcc version 4.1.3 20070629 (prerelease)
Comment 5 Joost VandeVondele 2007-06-29 08:06:12 UTC
works correctly with 4.0.4, so it is a regression:
gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /data03/vondele/gcc_4_0_branch/gcc/configure --prefix=/data03/vondele/gcc_4_0_branch/build --with-gmp=/data03/vondele/ --with-mpfr=/data03/vondele/ --enable-languages=c,fortran
Thread model: posix
gcc version 4.0.4
Comment 6 Uroš Bizjak 2007-06-30 15:39:53 UTC
Hm, in the dump (gcc-4.1.3), preceeding ifcvt, we have:

<L4>:;
  D.985_28 = iftmp.5_4 + D.964_27;
  M.2_29 = (int4) D.985_28;
  if (M.2_29 > 1) goto <L7>; else goto <L9>;

<L7>:;
  if (M.2_29 > 20) goto <L10>; else goto <L9>;

  # M.2_61 = PHI <M.2_29(4), 1(3)>;                  <<<<<<---- here
<L9>:;
  pretmp.98_1 = (real8) M.2_61;

  # prephitmp.99_39 = PHI <2.0e+1(4), pretmp.98_1(5)>;
  # M.2_3 = PHI <20(4), M.2_61(5)>;
<L10>:;
 
Isn't marked statement unreachable?

Comparing _.ssa dumps between 4.1 (wrong result) and 4.3 (correct result), we have:

(4.1)
--cut here--

<L5>:;
  M.1_36 = D.966_29;
  goto <bb 8> (<L7>);

<L6>:;
  M.1_35 = 1;

  # M.1_2 = PHI <M.1_36(6), M.1_35(7)>;              <<<< here (4.1)
<L7>:;
  if (M.1_2 > 20) goto <L8>; else goto <L9>;

<L8>:;
  M.2_34 = 20;
  goto <bb 11> (<L10>);
--cut here--

versus

(4.3)
--cut here--
<bb 8>:
  M.1_17 = D.1365_16;
  goto <bb 10>;

<bb 9>:
  M.1_18 = 1;

<bb 10>:
  # M.1_1 = PHI <M.1_17(8), M.1_18(9)>               <<<< here (4.3)
  if (20 < M.1_1)
    goto <bb 11>;
  else
    goto <bb 12>;

<bb 11>:
  M.2_19 = 20;
  goto <bb 13>;
---cut here---
Comment 7 dorit 2007-07-01 08:51:08 UTC
PR31966 is also wrong code which seems to be a result of the tree-level if-conversion (nothing gets vectorized these, and if tree-if-conversion is disabled everything works ok).
Comment 8 Uroš Bizjak 2007-07-01 09:40:08 UTC
(In reply to comment #6)
> Hm, in the dump (gcc-4.1.3), preceeding ifcvt, we have:

> Isn't marked statement unreachable?

The problem is in ifcvt pass, as confirmed by a c testcase in PR31966. This problem is also present in 4.3 branch, but (as explained in Comment #1), not triggered by the fortran testcase in this PR.
Comment 9 Uroš Bizjak 2007-07-02 09:40:58 UTC
Created attachment 13822 [details]
The patch to fix tree-if-conv.c

Joost, could you please test this 4.2 patch with CP2K? It fixes the testcase you provided (and PR31966, FWIW).

BTW: Could you create the testcase that would be suitable for gfortran testsuite?
Comment 10 Joost VandeVondele 2007-07-02 10:21:02 UTC
> Joost, could you please test this 4.2 patch with CP2K? It fixes the testcase
> you provided (and PR31966, FWIW).
> 

the testcase passes, the CP2K tests that were previously failing now seem to pass as well .... many thanks for fixing this issue

> BTW: Could you create the testcase that would be suitable for gfortran
> testsuite?

this aborts before and passes after the patch:

      SUBROUTINE T(nsubcell,sab_max,subcells)
       INTEGER, PARAMETER :: dp=KIND(0.0D0)
       REAL(dp) :: sab_max(3), subcells,nsubcell(3)
       nsubcell(:) = MIN(MAX(1,NINT(0.5_dp*subcells/sab_max(:))),20)
      END SUBROUTINE

      INTEGER, PARAMETER :: dp=KIND(0.0D0)
      REAL(dp) :: sab_max(3), subcells,nsubcell(3)
      subcells=2.0_dp
      sab_max=0.590060749244805_dp
      CALL T(nsubcell,sab_max,subcells)
      IF (ANY(nsubcell.NE.2.0_dp)) CALL ABORT()
     END
Comment 11 uros 2007-07-02 14:26:23 UTC
Subject: Bug 32533

Author: uros
Date: Mon Jul  2 14:26:11 2007
New Revision: 126206

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126206
Log:
	PR tree-optimization/31966
	PR tree-optimization/32533
	* tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not
	"basic_block" description as its third argument.  Update function
	calls to get destination bb from "edge" argument.  Save "cond" into
	aux field of the edge.  Update prototype for changed arguments.
	(find_phi_replacement_condition): Operate on incoming edges, not
	on predecessor blocks.  If there is a condition saved in the
	incoming edge aux field, AND it with incoming bb predicate.
	Return source bb of the first edge.
	(clean_predicate_lists): Clean aux field of outgoing node edges.
	(tree_if_conversion): Do not initialize cond variable. Move
	variable declaration into the loop.
	(replace_phi_with_cond_gimple_modify_stmt): Remove unneded
	initializations of new_stmt, arg0 and arg1 variables.

testsuite/ChangeLog:

	PR tree-optimization/31966
	PR tree-optimization/32533
	* gcc.dg/tree-ssa/pr31966.c: New runtime test.
	* gfortran.dg/pr32533.f90: Ditto.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c
    trunk/gcc/testsuite/gfortran.dg/pr32533.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-if-conv.c

Comment 12 Mark Mitchell 2007-07-04 03:31:11 UTC
Uros, is it feasible to backport this to 4.2?
Comment 13 uros 2007-07-04 05:41:15 UTC
Subject: Bug 32533

Author: uros
Date: Wed Jul  4 05:40:58 2007
New Revision: 126301

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126301
Log:
	PR tree-optimization/31966
	PR tree-optimization/32533
	* tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not
	"basic_block" description as its third argument.  Update function
	calls to get destination bb from "edge" argument.  Save "cond" into
	aux field of the edge.  Update prototype for changed arguments.
	(if_convertible_loop_p): Clear aux field of incoming edges if bb
	contains phi node.
	(find_phi_replacement_condition): Operate on incoming edges, not
	on predecessor blocks.  If there is a condition saved in the
	incoming edge aux field, AND it with incoming bb predicate.
	Return source bb of the first edge.
	(clean_predicate_lists): Clean aux field of outgoing node edges.
	(tree_if_conversion): Do not initialize cond variable. Move
	variable declaration into the loop.
	(replace_phi_with_cond_gimple_modify_stmt): Remove unneded
	initializations of new_stmt, arg0 and arg1 variables.

testsuite/ChangeLog:

	PR tree-optimization/31966
	PR tree-optimization/32533
	* gcc.dg/tree-ssa/pr31966.c: New runtime test.
	* gfortran.dg/pr32533.f90: Ditto.


Added:
    branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c
    branches/gcc-4_2-branch/gcc/testsuite/gfortran.dg/pr32533.f90
Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_2-branch/gcc/tree-if-conv.c

Comment 14 uros 2007-07-04 05:49:42 UTC
Subject: Bug 32533

Author: uros
Date: Wed Jul  4 05:49:31 2007
New Revision: 126302

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126302
Log:
	PR tree-optimization/31966
	PR tree-optimization/32533
	* tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not
	"basic_block" description as its third argument.  Update function
	calls to get destination bb from "edge" argument.  Save "cond" into
	aux field of the edge.  Update prototype for changed arguments.
	(if_convertible_loop_p): Clear aux field of incoming edges if bb
	contains phi node.
	(find_phi_replacement_condition): Operate on incoming edges, not
	on predecessor blocks.  If there is a condition saved in the
	incoming edge aux field, AND it with incoming bb predicate.
	Return source bb of the first edge.
	(clean_predicate_lists): Clean aux field of outgoing node edges.
	(tree_if_conversion): Do not initialize cond variable. Move
	variable declaration into the loop.
	(replace_phi_with_cond_gimple_modify_stmt): Remove unneded
	initializations of new_stmt, arg0 and arg1 variables.

testsuite/ChangeLog:

	PR tree-optimization/31966
	PR tree-optimization/32533
	* gcc.dg/tree-ssa/pr31966.c: New runtime test.
	* gfortran.dg/pr32533.f90: Ditto.


Added:
    branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c
    branches/gcc-4_1-branch/gcc/testsuite/gfortran.dg/pr32533.f90
Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_1-branch/gcc/tree-if-conv.c

Comment 15 Uroš Bizjak 2007-07-04 05:56:26 UTC
(In reply to comment #12)
> Uros, is it feasible to backport this to 4.2?

Sure, the patch was just waiting a couple of days on mainline for possible problems.  This bug is now fixed on mainline and branches.