I see the following miscompare with test size: Error with '/home/marxin/Programming/cpu2017/bin/specinvoke -d /home/marxin/Programming/cpu2017/benchspec/CPU/554.roms_r/run/run_peak_test_gcc7-m64.0000 -f compare.cmd -E -e compare.err -o compare.stdout'; no non-empty output files exist Command returned exit code 1 *** Miscompare of ocean_benchmark0.log; for details see /home/marxin/Programming/cpu2017/benchspec/CPU/554.roms_r/run/run_peak_test_gcc7-m64.0000/ocean_benchmark0.log.mis Error: 1x554.roms_r Producing Raw Reports label: gcc7-m64 workload: test metric: SPECrate2017_fp_peak format: raw -> /home/marxin/Programming/cpu2017/result/CPU2017.023.fprate.test.rsf I'm going to isolate that more..
So one doesn't need -flto, the problematic file is set_weights.fppized.f90. The difference can be seen with -fdbg-cnt=vect_slp:0,vect_loop:2, -fdbg-cnt=vect_slp:0,vect_loop:1 is fine. The problematic loop comes from: set_weights.fppized.f90:1339:0: note: LOOP VECTORIZED
Will look.
(In reply to Richard Biener from comment #2) > Will look. Thanks, there's a code snippet which is different before and after the revision. Hopefully, it can be used for a reproducer creation: $ cat set.f90 USE mod_scalars real(r16) wsum, cff DO i=1,nfast0 wsum=wsum+weight(1,i,ng) cff=cff+weight(2,i,ng) END DO DO i=1,nfast0 weight=wsum*cff END DO END
On Mon, 18 Nov 2019, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92558 > > --- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #2) > > Will look. > > Thanks, there's a code snippet which is different before and after the > revision. Hopefully, it can be used for a reproducer creation: > > $ cat set.f90 > USE mod_scalars > real(r16) wsum, cff > DO i=1,nfast0 > wsum=wsum+weight(1,i,ng) > cff=cff+weight(2,i,ng) > END DO > DO i=1,nfast0 > weight=wsum*cff > END DO > END So this seems equivalent to integer, parameter :: r16 = selected_real_kind(12,300) !64-bit integer, parameter :: r8 = selected_real_kind(12,300) !64-bit integer, parameter :: Ngrids = 1 real(r8), dimension(2,0:256,Ngrids) :: weight real(r16) wsum, cff DO i=1,nfast0 wsum=wsum+weight(1,i,ng) cff=cff+weight(2,i,ng) END DO DO i=1,nfast0 weight=wsum*cff END DO END which is not vectorized. With larger Ngrids it is (I checked my old compiled mod_param.fppized.f90 for Ngrids), and the difference is that we now do # vect_wsum_18.25_129 = PHI <vect_wsum_18.25_128(4)> _130 = BIT_FIELD_REF <vect_wsum_18.25_129, 128, 0>; _131 = BIT_FIELD_REF <vect_wsum_18.25_129, 128, 128>; _132 = _130 + _131; _133 = BIT_FIELD_REF <vect_wsum_18.25_129, 64, 0>; _134 = BIT_FIELD_REF <vect_wsum_18.25_129, 64, 64>; so reduce { wsum, cff, wsum, cff } in vector form first but we somehow end up not using that for the scalar extracts. Will fix.
Author: rguenth Date: Mon Nov 18 12:41:11 2019 New Revision: 278400 URL: https://gcc.gnu.org/viewcvs?rev=278400&root=gcc&view=rev Log: 2019-11-18 Richard Biener <rguenther@suse.de> PR tree-optimization/92558 * tree-vect-loop.c (vect_create_epilog_for_reduction): When reducting the width of a reduction vector def update new_phis. * gcc.dg/vect/pr92558.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/vect/pr92558.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-loop.c
Fixed.