92558 – [10 Regression] Miscompare of 554.roms_r with -Ofast -march=znver2 -flto since r278289

Bug 92558 - [10 Regression] Miscompare of 554.roms_r with -Ofast -march=znver2 -flto since r278289

Summary: [10 Regression] Miscompare of 554.roms_r with -Ofast -march=znver2 -flto sinc...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	10.0

Importance:	P3 normal
Target Milestone:	10.0
Assignee:	Richard Biener

URL:
Keywords:	wrong-code

Depends on:
Blocks:	spec
	Show dependency tree / graph

Reported:	2019-11-18 09:20 UTC by Martin Liška
Modified:	2019-11-18 13:16 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:	9.2.0
Known to fail:	10.0
Last reconfirmed:	2019-11-18 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Liška 2019-11-18 09:20:25 UTC

I see the following miscompare with test size:

Error with '/home/marxin/Programming/cpu2017/bin/specinvoke -d /home/marxin/Programming/cpu2017/benchspec/CPU/554.roms_r/run/run_peak_test_gcc7-m64.0000 -f compare.cmd -E -e compare.err -o compare.stdout'; no non-empty output files exist
  Command returned exit code 1
*** Miscompare of ocean_benchmark0.log; for details see
    /home/marxin/Programming/cpu2017/benchspec/CPU/554.roms_r/run/run_peak_test_gcc7-m64.0000/ocean_benchmark0.log.mis
Error: 1x554.roms_r
Producing Raw Reports
 label: gcc7-m64
  workload: test
   metric: SPECrate2017_fp_peak
    format: raw -> /home/marxin/Programming/cpu2017/result/CPU2017.023.fprate.test.rsf

I'm going to isolate that more..

Comment 1 Martin Liška 2019-11-18 09:58:19 UTC

So one doesn't need -flto, the problematic file is set_weights.fppized.f90.
The difference can be seen with -fdbg-cnt=vect_slp:0,vect_loop:2, 
-fdbg-cnt=vect_slp:0,vect_loop:1 is fine.
The problematic loop comes from:
set_weights.fppized.f90:1339:0: note:  LOOP VECTORIZED

Comment 2 Richard Biener 2019-11-18 10:04:03 UTC

Will look.

Comment 3 Martin Liška 2019-11-18 10:11:09 UTC

(In reply to Richard Biener from comment #2)
> Will look.

Thanks, there's a code snippet which is different before and after the revision. Hopefully, it can be used for a reproducer creation:

$ cat set.f90
      USE mod_scalars
      real(r16) wsum, cff
      DO i=1,nfast0
        wsum=wsum+weight(1,i,ng)
        cff=cff+weight(2,i,ng)
      END DO
      DO i=1,nfast0
        weight=wsum*cff
      END DO
        END

Comment 4 rguenther@suse.de 2019-11-18 10:23:16 UTC

On Mon, 18 Nov 2019, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92558
> 
> --- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #2)
> > Will look.
> 
> Thanks, there's a code snippet which is different before and after the
> revision. Hopefully, it can be used for a reproducer creation:
> 
> $ cat set.f90
>       USE mod_scalars
>       real(r16) wsum, cff
>       DO i=1,nfast0
>         wsum=wsum+weight(1,i,ng)
>         cff=cff+weight(2,i,ng)
>       END DO
>       DO i=1,nfast0
>         weight=wsum*cff
>       END DO
>         END

So this seems equivalent to

      integer, parameter :: r16 = selected_real_kind(12,300) !64-bit
      integer, parameter :: r8 = selected_real_kind(12,300) !64-bit
      integer, parameter :: Ngrids = 1
      real(r8), dimension(2,0:256,Ngrids) :: weight
      real(r16) wsum, cff
      DO i=1,nfast0
        wsum=wsum+weight(1,i,ng)
        cff=cff+weight(2,i,ng)
      END DO
      DO i=1,nfast0
        weight=wsum*cff
      END DO
        END

which is not vectorized.  With larger Ngrids it is (I checked my
old compiled mod_param.fppized.f90 for Ngrids), and the difference
is that we now do

  # vect_wsum_18.25_129 = PHI <vect_wsum_18.25_128(4)>
  _130 = BIT_FIELD_REF <vect_wsum_18.25_129, 128, 0>;
  _131 = BIT_FIELD_REF <vect_wsum_18.25_129, 128, 128>;
  _132 = _130 + _131;
  _133 = BIT_FIELD_REF <vect_wsum_18.25_129, 64, 0>;
  _134 = BIT_FIELD_REF <vect_wsum_18.25_129, 64, 64>;

so reduce { wsum, cff, wsum, cff } in vector form first
but we somehow end up not using that for the scalar extracts.

Will fix.

Comment 5 Richard Biener 2019-11-18 12:41:42 UTC

Author: rguenth
Date: Mon Nov 18 12:41:11 2019
New Revision: 278400

URL: https://gcc.gnu.org/viewcvs?rev=278400&root=gcc&view=rev
Log:
2019-11-18  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/92558
	* tree-vect-loop.c (vect_create_epilog_for_reduction): When
	reducting the width of a reduction vector def update new_phis.

	* gcc.dg/vect/pr92558.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr92558.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-loop.c

Comment 6 Richard Biener 2019-11-18 13:16:21 UTC

Fixed.