The following gets miscompiled with gcc 4.3, works with 4.2.0 & 4.2.1 at least. PROGRAM test REAL(8) :: f,dist(2) dist = [1.0_8, 0.5_8] PRINT*,f(1.0_8, dist), ' should be: ', MINVAL(dist) END PROGRAM test FUNCTION f( x, dist ) RESULT(s) REAL(8) :: dist(2), x, s s = MINVAL(dist) IF( x < 0 ) s = -s END FUNCTION f gfortran -v -O2 -ffast-math -ftree-vectorize -o test test.f90 Driving: gfortran -v -O2 -ffast-math -ftree-vectorize -o test test.f90 -lgfortranbegin -lgfortra n -lm -shared-libgcc Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /projects/tob/gcc/configure --enable-languages=c,fortran --prefix=/projects/tob /gcc-trunk Thread model: posix gcc version 4.3.0 20070904 (experimental) [trunk revision 128067] (GCC) COLLECT_GCC_OPTIONS='-v' '-O2' '-ffast-math' '-ftree-vectorize' '-o' 'test' '-shared-libgcc' '-m tune=generic' /wrk/jpr/gcc-trunk/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/f951 test.f90 -quiet -dump base test.f90 -mtune=generic -auxbase test -O2 -version -ffast-math -ftree-vectorize -fintrinsic -modules-path /wrk/jpr/gcc-trunk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.0/finclude -o /tmp/ jpr/ccEvFIMX.s GNU F95 (GCC) version 4.3.0 20070904 (experimental) [trunk revision 128067] (x86_64-unknown-linu x-gnu) compiled by GNU C version 4.3.0 20070904 (experimental) [trunk revision 128067], GMP ver sion 4.2.1, MPFR version 2.2.1. GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 COLLECT_GCC_OPTIONS='-v' '-O2' '-ffast-math' '-ftree-vectorize' '-o' 'test' '-shared-libgcc' '-m tune=generic' as -V -Qy -o /tmp/jpr/ccJEvzPa.o /tmp/jpr/ccEvFIMX.s GNU assembler version 2.15.92.0.2 (x86_64-redhat-linux) using BFD version 2.15.92.0.2 20040927 COMPILER_PATH=/wrk/jpr/gcc-trunk/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/:/wrk/jpr/gcc -trunk/bin/../libexec/gcc/ LIBRARY_PATH=/wrk/jpr/gcc-trunk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.0/:/wrk/jpr/gcc-trun k/bin/../lib/gcc/:/v/linux26_x86_64/opt/papi/3.5.0/lib64/../lib64/:/wrk/jpr/gcc-trunk/bin/../lib /gcc/x86_64-unknown-linux-gnu/4.3.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/v/linu x26_x86_64/opt/papi/3.5.0/lib64/:/wrk/jpr/gcc-trunk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3. 0/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-v' '-O2' '-ffast-math' '-ftree-vectorize' '-o' 'test' '-shared-libgcc' '-m tune=generic' /wrk/jpr/gcc-trunk/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test /usr/lib/../lib64/crt1.o /usr/li b/../lib64/crti.o /wrk/jpr/gcc-trunk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.0/crtbegin.o -L /wrk/jpr/gcc-trunk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.0 -L/wrk/jpr/gcc-trunk/bin/../lib /gcc -L/v/linux26_x86_64/opt/papi/3.5.0/lib64/../lib64 -L/wrk/jpr/gcc-trunk/bin/../lib/gcc/x86_6 4-unknown-linux-gnu/4.3.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/v/linux26_x86 _64/opt/papi/3.5.0/lib64 -L/wrk/jpr/gcc-trunk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.0/../. ./.. /tmp/jpr/ccJEvzPa.o -lgfortranbegin -lgfortran -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /wrk/jpr /gcc-trunk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.0/crtfastmath.o /wrk/jpr/gcc-trunk/bin/.. /lib/gcc/x86_64-unknown-linux-gnu/4.3.0/crtend.o /usr/lib/../lib64/crtn.o ./test 1.00000000000000 should be: 0.500000000000000 Regrads, Juha
Confirmed. It looks like the vectorizer forgets to update the PHI node for stmp_var: <bb 5>: # vect_var_.32_31 = PHI <vect_var_.32_29(3)> # s_4 = PHI <s_2(3)> # s_10 = PHI <s_2(3)> stmp_var_.34_32 = BIT_FIELD_REF <vect_var_.32_31, 64, 0>; stmp_var_.34_33 = BIT_FIELD_REF <vect_var_.32_31, 64, 64>; stmp_var_.34_34 = MIN_EXPR <stmp_var_.34_33, stmp_var_.34_32>; D.1396_15 = *x_14(D); if (D.1396_15 < 0.0) goto <bb 6>; else goto <bb 7>; <bb 6>: s_16 = -stmp_var_.34_34; <bb 7>: # s_1 = PHI <s_4(5), s_16(6)> return s_1; s_4 should be stmp_var_.34_34 instead.
(In reply to comment #1) > Confirmed. It looks like the vectorizer forgets to update the PHI node for > stmp_var: yes. I suspect I didn't expect at the time that there would be two loop-closed-ssa-form phi-nodes at the loop exit for s_3, so I probably update just one of them (s_10) and not the other (s_4). This is how it looks before vectorization: <bb 7>: # s_4 = PHI <s_3(3)> # s_10 = PHI <s_3(3)> D.1368_15 = *x_14(D); if (D.1368_15 < 0.0) goto <bb 8>; else goto <bb 9>; <bb 8>: s_16 = -s_10; <bb 9>: # s_1 = PHI <s_4(7), s_16(8)> return s_1; I'll prepare a fix.
I'm testing this patch: Index: tree-vect-transform.c =================================================================== *** tree-vect-transform.c (revision 128037) --- tree-vect-transform.c (working copy) *************** vect_create_epilog_for_reduction (tree v *** 1964,1969 **** --- 1964,1971 ---- tree operation = GIMPLE_STMT_OPERAND (stmt, 1); bool nested_in_vect_loop = false; int op_type; + VEC(tree,heap) *phis = NULL; + int i; if (nested_in_vect_loop_p (loop, stmt)) { *************** vect_finalize_reduction: *** 2260,2270 **** epilog_stmt = build_gimple_modify_stmt (new_dest, expr); new_temp = make_ssa_name (new_dest, epilog_stmt); GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp; - #if 0 - bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT); - #else bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT); - #endif } --- 2262,2268 ---- *************** vect_finalize_reduction: *** 2274,2318 **** Find the loop-closed-use at the loop exit of the original scalar result. (The reduction result is expected to have two immediate uses - one at the latch block, and one at the loop exit). */ ! exit_phi = NULL; FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) { if (!flow_bb_inside_loop_p (loop, bb_for_stmt (USE_STMT (use_p)))) { exit_phi = USE_STMT (use_p); ! break; } } /* We expect to have found an exit_phi because of loop-closed-ssa form. */ ! gcc_assert (exit_phi); ! if (nested_in_vect_loop) { ! stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi); ! /* FORNOW. Currently not supporting the case that an inner-loop reduction ! is not used in the outer-loop (but only outside the outer-loop). */ ! gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo) ! && !STMT_VINFO_LIVE_P (stmt_vinfo)); ! ! epilog_stmt = adjustment_def ? epilog_stmt : new_phi; ! STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt; ! set_stmt_info (get_stmt_ann (epilog_stmt), ! new_stmt_vec_info (epilog_stmt, loop_vinfo)); ! if (vect_print_dump_info (REPORT_DETAILS)) ! { ! fprintf (vect_dump, "vector of partial results after inner-loop:"); ! print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM); ! } ! return; } - - /* Replace the uses: */ - orig_name = PHI_RESULT (exit_phi); - FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name) - FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) - SET_USE (use_p, new_temp); } --- 2272,2313 ---- Find the loop-closed-use at the loop exit of the original scalar result. (The reduction result is expected to have two immediate uses - one at the latch block, and one at the loop exit). */ ! phis = VEC_alloc (tree, heap, 10); FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) { if (!flow_bb_inside_loop_p (loop, bb_for_stmt (USE_STMT (use_p)))) { exit_phi = USE_STMT (use_p); ! VEC_quick_push (tree, phis, exit_phi); } } /* We expect to have found an exit_phi because of loop-closed-ssa form. */ ! gcc_assert (!VEC_empty (tree, phis)); ! for (i = 0; VEC_iterate (tree, phis, i, exit_phi); i++) { ! if (nested_in_vect_loop) ! { ! stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi); ! /* FORNOW. Currently not supporting the case that an inner-loop reduction ! is not used in the outer-loop (but only outside the outer-loop). */ ! gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo) ! && !STMT_VINFO_LIVE_P (stmt_vinfo)); ! ! epilog_stmt = adjustment_def ? epilog_stmt : new_phi; ! STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt; ! set_stmt_info (get_stmt_ann (epilog_stmt), ! new_stmt_vec_info (epilog_stmt, loop_vinfo)); ! continue; ! } ! /* Replace the uses: */ ! orig_name = PHI_RESULT (exit_phi); ! FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name) ! FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) ! SET_USE (use_p, new_temp); } }
(by the way, fast-math should not be required here, but that's a different bug... will fix that soonish)
Subject: Bug 33299 Author: dorit Date: Fri Sep 7 15:00:11 2007 New Revision: 128242 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128242 Log: PR tree-optimization/33299 * tree-vect-transform.c (vect_create_epilog_for_reduction): Update uses for all relevant loop-exit phis, not just the first. Added: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-pr33299.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/vect/vect.exp trunk/gcc/tree-vect-transform.c
fix committed