derived from CP2K and discussed in PR 29975, the following is miscompiled with the current 4.2 branch : vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> cat test.f90 SUBROUTINE T(nsubcell,sab_max,subcells) INTEGER, PARAMETER :: dp=KIND(0.0D0) REAL(dp) :: sab_max(3), subcells,nsubcell(3) nsubcell(:) = MIN(MAX(1,NINT(0.5_dp*subcells/sab_max(:))),20) write(6,*) nsubcell END SUBROUTINE INTEGER, PARAMETER :: dp=KIND(0.0D0) REAL(dp) :: sab_max(3), subcells,nsubcell(3) subcells=2.00000000000000 sab_max=0.590060749244805_dp write(6,*) NINT(0.5_dp*subcells/sab_max(1)) CALL T(nsubcell,sab_max,subcells) END vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> gfortran -v -O3 -ffast-math -ftree-vectorize -march=native test.f90 Driving: gfortran -v -O3 -ffast-math -ftree-vectorize -march=native test.f90 -lgfortranbegin -lgfortran -lm -shared-libgcc Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /data03/vondele/gcc_4_2_branch/gcc/configure --prefix=/data03/vondele/gcc_4_2_branch/build --with-gmp=/data03/vondele/ --with-mpfr=/data03/vondele/ --enable-languages=c,fortran Thread model: posix gcc version 4.2.1 20070627 (prerelease) /data03/vondele/gcc_4_2_branch/build/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1/f951 test.f90 -march=k8 -mtune=k8 -quiet -dumpbase test.f90 -auxbase test -O3 -version -ffast-math -ftree-vectorize -I /data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/finclude -o /tmp/ccTYHMGd.s GNU F95 version 4.2.1 20070627 (prerelease) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.2.1 20070627 (prerelease). GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 as -V -Qy -o /tmp/ccs8OyYj.o /tmp/ccTYHMGd.s GNU assembler version 2.16.91.0.5 (x86_64-suse-linux) using BFD version 2.16.91.0.5 20051219 (SUSE Linux) /data03/vondele/gcc_4_2_branch/build/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o /data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/crtbegin.o -L/data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1 -L/data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/../../.. /tmp/ccs8OyYj.o -lgfortranbegin -lgfortran -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/crtfastmath.o /data03/vondele/gcc_4_2_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.2.1/crtend.o /usr/lib/../lib64/crtn.o vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> ./a.out 2 20.0000000000000 20.0000000000000 20.0000000000000 vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> gfortran -O0 test.f90 vondele@pcihopt3:/scratch/vondele/clean/cp2k/src> ./a.out 2 2.00000000000000 2.00000000000000 2.00000000000000 the result is incorrectly 20.0 instead of 2.0 with optimization
Looks like problems in tree ifcvt pass. Before ifcvt, we have: M.2_16 = (int4) D.1257_15; if (M.2_16 > 1) goto <L7>; else goto <L9>; <L7>:; if (M.2_16 > 20) goto <L10>; else goto <L9>; # M.2_64 = PHI <M.2_16(6), 1(5)>; <L9>:; pretmp.118_78 = (real8) M.2_64; # prephitmp.119_79 = PHI <2.0e+1(6), pretmp.118_78(7)>; # M.2_4 = PHI <20(6), M.2_64(7)>; <L10>:; But ifcvt creates: D.1446_84 = M.2_16 > 1; D.1447_85 = M.2_16 > 20; _ifc_.127_86 = D.1446_84 && D.1447_85; D.1449_87 = M.2_16 > 1; D.1450_88 = M.2_16 <= 20; _ifc_.128_89 = D.1449_87 && D.1450_88; M.2_64 = M.2_16 > 1 ? M.2_16 : 1; pretmp.118_78 = (real8) M.2_64; prephitmp.119_79 = M.2_16 > 1 ? 2.0e+1 : pretmp.118_78; M.2_4 = M.2_16 > 1 ? 20 : M.2_64; Note the last two lines, where we compare "M.2_16 > 1" instead of "M.2_16 > 20". This bug could be hidden in 4.3.0 as we use MIN_EXPR and MAX_EXPR here.
(In reply to comment #1) > This bug could be hidden in 4.3.0 as we use MIN_EXPR and MAX_EXPR here. To clear the typo - we use MIN_EXPR and MAX_EXPR in 4.3.0.
I'm wondering if it is still required to turn fortran testcases in the equivalent C testcase in order to mark this kind of bugs as P1. This does seem a bit of a waste of time... and just think about the engineers using Fortran to design nuclear power plants... ;-)
also fails with 4.1.3 gfortran -march=k8 -mtune=k8 -ftree-vectorize -ffast-math -O3 test.f90 Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /data03/vondele/gcc_4_1_branch/gcc/configure --prefix=/data03/vondele/gcc_4_1_branch/build --with-gmp=/data03/vondele/ --with-mpfr=/data03/vondele/ --enable-languages=c,fortran Thread model: posix gcc version 4.1.3 20070629 (prerelease)
works correctly with 4.0.4, so it is a regression: gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /data03/vondele/gcc_4_0_branch/gcc/configure --prefix=/data03/vondele/gcc_4_0_branch/build --with-gmp=/data03/vondele/ --with-mpfr=/data03/vondele/ --enable-languages=c,fortran Thread model: posix gcc version 4.0.4
Hm, in the dump (gcc-4.1.3), preceeding ifcvt, we have: <L4>:; D.985_28 = iftmp.5_4 + D.964_27; M.2_29 = (int4) D.985_28; if (M.2_29 > 1) goto <L7>; else goto <L9>; <L7>:; if (M.2_29 > 20) goto <L10>; else goto <L9>; # M.2_61 = PHI <M.2_29(4), 1(3)>; <<<<<<---- here <L9>:; pretmp.98_1 = (real8) M.2_61; # prephitmp.99_39 = PHI <2.0e+1(4), pretmp.98_1(5)>; # M.2_3 = PHI <20(4), M.2_61(5)>; <L10>:; Isn't marked statement unreachable? Comparing _.ssa dumps between 4.1 (wrong result) and 4.3 (correct result), we have: (4.1) --cut here-- <L5>:; M.1_36 = D.966_29; goto <bb 8> (<L7>); <L6>:; M.1_35 = 1; # M.1_2 = PHI <M.1_36(6), M.1_35(7)>; <<<< here (4.1) <L7>:; if (M.1_2 > 20) goto <L8>; else goto <L9>; <L8>:; M.2_34 = 20; goto <bb 11> (<L10>); --cut here-- versus (4.3) --cut here-- <bb 8>: M.1_17 = D.1365_16; goto <bb 10>; <bb 9>: M.1_18 = 1; <bb 10>: # M.1_1 = PHI <M.1_17(8), M.1_18(9)> <<<< here (4.3) if (20 < M.1_1) goto <bb 11>; else goto <bb 12>; <bb 11>: M.2_19 = 20; goto <bb 13>; ---cut here---
PR31966 is also wrong code which seems to be a result of the tree-level if-conversion (nothing gets vectorized these, and if tree-if-conversion is disabled everything works ok).
(In reply to comment #6) > Hm, in the dump (gcc-4.1.3), preceeding ifcvt, we have: > Isn't marked statement unreachable? The problem is in ifcvt pass, as confirmed by a c testcase in PR31966. This problem is also present in 4.3 branch, but (as explained in Comment #1), not triggered by the fortran testcase in this PR.
Created attachment 13822 [details] The patch to fix tree-if-conv.c Joost, could you please test this 4.2 patch with CP2K? It fixes the testcase you provided (and PR31966, FWIW). BTW: Could you create the testcase that would be suitable for gfortran testsuite?
> Joost, could you please test this 4.2 patch with CP2K? It fixes the testcase > you provided (and PR31966, FWIW). > the testcase passes, the CP2K tests that were previously failing now seem to pass as well .... many thanks for fixing this issue > BTW: Could you create the testcase that would be suitable for gfortran > testsuite? this aborts before and passes after the patch: SUBROUTINE T(nsubcell,sab_max,subcells) INTEGER, PARAMETER :: dp=KIND(0.0D0) REAL(dp) :: sab_max(3), subcells,nsubcell(3) nsubcell(:) = MIN(MAX(1,NINT(0.5_dp*subcells/sab_max(:))),20) END SUBROUTINE INTEGER, PARAMETER :: dp=KIND(0.0D0) REAL(dp) :: sab_max(3), subcells,nsubcell(3) subcells=2.0_dp sab_max=0.590060749244805_dp CALL T(nsubcell,sab_max,subcells) IF (ANY(nsubcell.NE.2.0_dp)) CALL ABORT() END
Subject: Bug 32533 Author: uros Date: Mon Jul 2 14:26:11 2007 New Revision: 126206 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126206 Log: PR tree-optimization/31966 PR tree-optimization/32533 * tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not "basic_block" description as its third argument. Update function calls to get destination bb from "edge" argument. Save "cond" into aux field of the edge. Update prototype for changed arguments. (find_phi_replacement_condition): Operate on incoming edges, not on predecessor blocks. If there is a condition saved in the incoming edge aux field, AND it with incoming bb predicate. Return source bb of the first edge. (clean_predicate_lists): Clean aux field of outgoing node edges. (tree_if_conversion): Do not initialize cond variable. Move variable declaration into the loop. (replace_phi_with_cond_gimple_modify_stmt): Remove unneded initializations of new_stmt, arg0 and arg1 variables. testsuite/ChangeLog: PR tree-optimization/31966 PR tree-optimization/32533 * gcc.dg/tree-ssa/pr31966.c: New runtime test. * gfortran.dg/pr32533.f90: Ditto. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c trunk/gcc/testsuite/gfortran.dg/pr32533.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-if-conv.c
Uros, is it feasible to backport this to 4.2?
Subject: Bug 32533 Author: uros Date: Wed Jul 4 05:40:58 2007 New Revision: 126301 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126301 Log: PR tree-optimization/31966 PR tree-optimization/32533 * tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not "basic_block" description as its third argument. Update function calls to get destination bb from "edge" argument. Save "cond" into aux field of the edge. Update prototype for changed arguments. (if_convertible_loop_p): Clear aux field of incoming edges if bb contains phi node. (find_phi_replacement_condition): Operate on incoming edges, not on predecessor blocks. If there is a condition saved in the incoming edge aux field, AND it with incoming bb predicate. Return source bb of the first edge. (clean_predicate_lists): Clean aux field of outgoing node edges. (tree_if_conversion): Do not initialize cond variable. Move variable declaration into the loop. (replace_phi_with_cond_gimple_modify_stmt): Remove unneded initializations of new_stmt, arg0 and arg1 variables. testsuite/ChangeLog: PR tree-optimization/31966 PR tree-optimization/32533 * gcc.dg/tree-ssa/pr31966.c: New runtime test. * gfortran.dg/pr32533.f90: Ditto. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c branches/gcc-4_2-branch/gcc/testsuite/gfortran.dg/pr32533.f90 Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/testsuite/ChangeLog branches/gcc-4_2-branch/gcc/tree-if-conv.c
Subject: Bug 32533 Author: uros Date: Wed Jul 4 05:49:31 2007 New Revision: 126302 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126302 Log: PR tree-optimization/31966 PR tree-optimization/32533 * tree-if-conv.c (add_to_dst_predicate_list): Use "edge", not "basic_block" description as its third argument. Update function calls to get destination bb from "edge" argument. Save "cond" into aux field of the edge. Update prototype for changed arguments. (if_convertible_loop_p): Clear aux field of incoming edges if bb contains phi node. (find_phi_replacement_condition): Operate on incoming edges, not on predecessor blocks. If there is a condition saved in the incoming edge aux field, AND it with incoming bb predicate. Return source bb of the first edge. (clean_predicate_lists): Clean aux field of outgoing node edges. (tree_if_conversion): Do not initialize cond variable. Move variable declaration into the loop. (replace_phi_with_cond_gimple_modify_stmt): Remove unneded initializations of new_stmt, arg0 and arg1 variables. testsuite/ChangeLog: PR tree-optimization/31966 PR tree-optimization/32533 * gcc.dg/tree-ssa/pr31966.c: New runtime test. * gfortran.dg/pr32533.f90: Ditto. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/tree-ssa/pr31966.c branches/gcc-4_1-branch/gcc/testsuite/gfortran.dg/pr32533.f90 Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/testsuite/ChangeLog branches/gcc-4_1-branch/gcc/tree-if-conv.c
(In reply to comment #12) > Uros, is it feasible to backport this to 4.2? Sure, the patch was just waiting a couple of days on mainline for possible problems. This bug is now fixed on mainline and branches.