Bug 107254 - [11 Regression] Wrong vectorizer code (Fortran) since r11-1501-gda2b7c7f0a136b4d
Summary: [11 Regression] Wrong vectorizer code (Fortran) since r11-1501-gda2b7c7f0a136b4d
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.3.1
: P2 normal
Target Milestone: 11.4
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2022-10-13 16:28 UTC by bartoldeman
Modified: 2024-02-13 00:15 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work: 10.4.0, 11.3.1, 12.2.1, 13.0
Known to fail: 11.1.0, 11.3.0, 12.2.0
Last reconfirmed: 2022-10-13 00:00:00


Attachments
Test case (400 bytes, text/plain)
2022-10-13 16:28 UTC, bartoldeman
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bartoldeman 2022-10-13 16:28:49 UTC
Created attachment 53703 [details]
Test case

The following code gives the wrong result (-1.0000000000000000 instead of 0.0000000000000000) with gfortran 11.3 (also tested with the 11.3.1 20221007 prerelease) when given the options
`-O2 -ftree-vectorize -march=core-avx`
for x86_64.

There's no issue with GCC 9,10, and 12. It could be related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 except that bug also affects GCC 12.

This issue came up from testing the reference LAPACK with -ftree-vectorize enabled, where many more tests failed with recent GCC (11/12), see
https://github.com/easybuilders/easybuild-easyconfigs/issues/16380

$ gfortran -O2 -ftree-vectorize -march=core-avx2 dhgeqz2.f90; ./a.out 
  -1.0000000000000000     
$ gfortran -Wall -O2 dhgeqz2.f90; ./a.out 
   0.0000000000000000


subroutine dlartg( f, g, s, r )
  implicit none
  double precision :: f, g, r, s
  double precision :: d, p

  d = sqrt( f*f + g*g )
  p = 1.d0 / d
  if( abs( f ) > 1 ) then
     s = g*sign( p, f )
     r = sign( d, f )
  else
     s = g*sign( p, f )
     r = sign( d, f )
  end if
end subroutine

subroutine dhgeqz( n, h, t )
  implicit none
  integer            n
  double precision   h( n, * ), t( n, * )
  integer            jc
  double precision   c, s, temp, temp2, tempr
  temp2 = 10d0
  call dlartg( 10d0, temp2, s, tempr )
  c = 0.9d0
  s = 1.d0
  do jc = 1, n
     temp = c*h( 1, jc ) + s*h( 2, jc )
     h( 2, jc ) = -s*h( 1, jc ) + c*h( 2, jc )
     h( 1, jc ) = temp
     temp2 = c*t( 1, jc ) + s*t( 2, jc )
     ! t(2,2)=-s*t(1,2)+c*t(2,2)=-0.9*0+1*0=0
     t( 2, jc ) = -s*t( 1, jc ) + c*t( 2, jc )
     t( 1, jc ) = temp2
  enddo
end subroutine dhgeqz

program test
  implicit none
  double precision h(2,2), t(2,2)  
  h = 0
  t(1,1) = 1
  t(2,1) = 0
  t(1,2) = 0
  t(2,2) = 0
  call dhgeqz( 2, h, t )
  print *,t(2,2)
end program test
Comment 1 Martin Liška 2022-10-13 18:56:30 UTC
Fixed with r12-2051-g7d810646d421f697 and started with r11-2453-gc89366b12ff4f362.

@Richi: Can you please take a look?
Comment 2 Drea Pinski 2022-10-13 19:02:35 UTC
(In reply to Martin Liška from comment #1)
> Fixed with r12-2051-g7d810646d421f697 and started with
> r11-2453-gc89366b12ff4f362.

If so there might be a latent bug still ....
Comment 3 Richard Biener 2022-10-14 06:34:41 UTC
(In reply to Martin Liška from comment #1)
> Fixed with r12-2051-g7d810646d421f697 and started with
> r11-2453-gc89366b12ff4f362.
> 
> @Richi: Can you please take a look?

Can you bisect with -fno-vect-cost-model?  The r11-2453-gc89366b12ff4f362 just
affects cost modeling.

And yes, the "fix" eventually just hides a SLP pattern bug.
Comment 4 Richard Biener 2022-10-14 07:50:15 UTC
Btw, the testcase still fails with -O2 -ftree-vectorize -mavx2 -mno-fma -fno-vect-cost-model on the GCC 12 branch and trunk but works with these flags on the GCC 10 branch.

Interestingly it doesn't fail with -mprefer-vector-width=128 (which doesn't
require versioning for aliasing).
Comment 5 Martin Liška 2022-10-14 08:31:15 UTC
(In reply to Richard Biener from comment #3)
> (In reply to Martin Liška from comment #1)
> > Fixed with r12-2051-g7d810646d421f697 and started with
> > r11-2453-gc89366b12ff4f362.
> > 
> > @Richi: Can you please take a look?
> 
> Can you bisect with -fno-vect-cost-model?  The r11-2453-gc89366b12ff4f362
> just
> affects cost modeling.

Sure, it started with r11-1501-gda2b7c7f0a136b4d.
Comment 6 Richard Biener 2022-10-14 08:52:44 UTC
So I think the issue is that we have a live operation:

  <bb 3> [local count: 955630225]:
  # jc_42 = PHI <jc_36(7), 1(6)>
  _2 = (integer(kind=8)) jc_42;
  _3 = _2 * stride.1_23;
  _4 = _3 + offset.2_24;
  _5 = _4 + 1;
  _6 = (*h_28(D))[_5];
  _7 = _6 * 9.0000000000000002220446049250313080847263336181640625e-1;
  _8 = _4 + 2;
  _9 = (*h_28(D))[_8];
  temp_29 = _7 + _9;
  _10 = _9 * 9.0000000000000002220446049250313080847263336181640625e-1;
  _11 = _10 - _6;
  (*h_28(D))[_8] = _11;
  (*h_28(D))[_5] = temp_29;
  _12 = (*t_32(D))[_5];
  _13 = _12 * 9.0000000000000002220446049250313080847263336181640625e-1;
  _14 = (*t_32(D))[_8];
  _15 = _13 + _14;
  _16 = _14 * 9.0000000000000002220446049250313080847263336181640625e-1;
  _17 = _16 - _12;
  (*t_32(D))[_8] = _17;
  (*t_32(D))[_5] = _15;
  jc_36 = jc_42 + 1;
  if (_1 < jc_36)
    goto <bb 4>; [11.00%]
  else
    goto <bb 7>; [89.00%]

  <bb 7> [local count: 850510901]:
  goto <bb 3>; [100.00%]

  <bb 4> [local count: 105119324]:
  # _72 = PHI <_15(3)>
  # _71 = PHI <_17(3)>
^^^

that we fail to vectorize which keeps the _12 and _14 loads live but
those are from the wrong iteration.  The out-of loop stores are
caused by invariant motion of 'temp2' (it's memory because it's passed
to dlartg) and which we apperantly cannot disambiguate against the
stores to t so we re-materialize them on the loop exit.

The vectorizer sees

t.f90:27:9: note:   init: stmt relevant? _15 = _13 + _14;
t.f90:27:9: note:   vec_stmt_relevant_p: used out of loop.
t.f90:27:9: note:   vect_is_simple_use: operand _12 * 9.0000000000000002220446049250313080847263336181640625e-1, type of def: internal
t.f90:27:9: note:   vec_stmt_relevant_p: stmt live but not relevant.
t.f90:27:9: note:   mark relevant 1, live 1: _15 = _13 + _14;

and later

t.f90:27:9: note:   op: VEC_PERM_EXPR
t.f90:27:9: note:       stmt 0 _15 = _13 + _14;
t.f90:27:9: note:       stmt 1 _17 = _16 - _12;
t.f90:27:9: note:       lane permutation { 0[0] 1[1] }
t.f90:27:9: note:       children 0x3358e90 0x3358f18
t.f90:27:9: note:   node 0x3358e90 (max_nunits=1, refcnt=1) vector(4) real(kind=8)
t.f90:27:9: note:   op template: _15 = _13 + _14;
t.f90:27:9: note:       { }
t.f90:27:9: note:       children 0x3358c70 0x3358e08
...
t.f90:27:9: note:   node 0x3358f18 (max_nunits=1, refcnt=1) vector(4) real(kind=8)
t.f90:27:9: note:   op template: _17 = _16 - _12;
t.f90:27:9: note:       { }
t.f90:27:9: note:       children 0x3358c70 0x3358e08

investigating further.
Comment 7 Richard Biener 2022-10-14 09:19:17 UTC
I have a patch in testing.
Comment 8 GCC Commits 2022-10-14 10:00:45 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:9ed4a849afb5b18b462bea311e7eee454c2c9f68

commit r13-3297-g9ed4a849afb5b18b462bea311e7eee454c2c9f68
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Oct 14 11:14:59 2022 +0200

    tree-optimization/107254 - check and support live lanes from permutes
    
    The following fixes an omission from adding SLP permute nodes which
    is live lanes originating from those.  We have to check that we
    can extract the lane and have to actually code generate them.
    
            PR tree-optimization/107254
            * tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
            For permutes also analyze live lanes.
            (vect_schedule_slp_node): For permutes also code generate
            live lane extracts.
    
            * gfortran.dg/vect/pr107254.f90: New testcase.
Comment 9 Richard Biener 2022-10-14 10:01:12 UTC
Fixed on trunk sofar.
Comment 10 bartoldeman 2022-10-17 12:23:39 UTC
Thanks for the fix! I can confirm that, when applied to 11.3 (with files renamed from .cc to .c), it fixes the issue, and with it, thousands of test failures in the reference LAPACK test suite.

My findings for LAPACK are in this issue here:
https://github.com/Reference-LAPACK/lapack/issues/732
Comment 11 GCC Commits 2022-10-17 13:11:01 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:fe7d74313736b8e1c30812bc49419f419bdf1c53

commit r12-8841-gfe7d74313736b8e1c30812bc49419f419bdf1c53
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Oct 14 11:14:59 2022 +0200

    tree-optimization/107254 - check and support live lanes from permutes
    
    The following fixes an omission from adding SLP permute nodes which
    is live lanes originating from those.  We have to check that we
    can extract the lane and have to actually code generate them.
    
            PR tree-optimization/107254
            * tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
            For permutes also analyze live lanes.
            (vect_schedule_slp_node): For permutes also code generate
            live lane extracts.
    
            * gfortran.dg/vect/pr107254.f90: New testcase.
    
    (cherry picked from commit 9ed4a849afb5b18b462bea311e7eee454c2c9f68)
Comment 12 GCC Commits 2023-01-24 15:22:46 UTC
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:fb335c938bbd54e83af2d281f0ccf79df1c342b3

commit r11-10481-gfb335c938bbd54e83af2d281f0ccf79df1c342b3
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Oct 14 11:14:59 2022 +0200

    tree-optimization/107254 - check and support live lanes from permutes
    
    The following fixes an omission from adding SLP permute nodes which
    is live lanes originating from those.  We have to check that we
    can extract the lane and have to actually code generate them.
    
            PR tree-optimization/107254
            * tree-vect-slp.c (vect_slp_analyze_node_operations_1):
            For permutes also analyze live lanes.
            (vect_schedule_slp_node): For permutes also code generate
            live lane extracts.
    
            * gfortran.dg/vect/pr107254.f90: New testcase.
    
    (cherry picked from commit 9ed4a849afb5b18b462bea311e7eee454c2c9f68)
Comment 13 Richard Biener 2023-01-24 15:23:46 UTC
Fixed.