Bug 71270 - [7 Regression] fortran regression after fix SLP PR58135
Summary: [7 Regression] fortran regression after fix SLP PR58135
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.0
: P1 normal
Target Milestone: 7.0
Assignee: ktkachov
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2016-05-25 07:50 UTC by Christophe Lyon
Modified: 2017-01-20 14:41 UTC (History)
3 users (show)

See Also:
Host:
Target: armeb
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-05-27 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Lyon 2016-05-25 07:50:36 UTC
As noted in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01916.html,
the fix for PR 58135 (r236582) causes fortran tests to fail at execution time on armeb (tested using qemu):

   gfortran.dg/c_f_pointer_logical.f03   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
  gfortran.dg/c_f_pointer_logical.f03   -O3 -g  execution test
  gfortran.dg/intrinsic_pack_1.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
  gfortran.dg/intrinsic_pack_1.f90   -O3 -g  execution test

GCC was configured with:
--target=armeb-none-linux-gnueabihf --enable-languages=c,c++,fortran --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon-fp16
Comment 1 Richard Biener 2016-05-25 08:33:39 UTC
Might be some arm BE vector support target bug as well.
Comment 2 vekumar 2016-05-26 09:53:27 UTC
Looked at x86_64 gimple code for intrinsic_pack_1.f90.

After the SLP split we now vectorize at the place where we pass constant arguments via a parameterstructure to _gfortran_pack call.  

Before
 parm.20D.3555.dtypeD.3497 = 297;
  # .MEM_242 = VDEF <.MEM_241>
  parm.20D.3555.dimD.3502[0].lboundD.3499 = 1;
  # .MEM_243 = VDEF <.MEM_242>
  parm.20D.3555.dimD.3502[0].uboundD.3500 = 9;
  # .MEM_244 = VDEF <.MEM_243>
  parm.20D.3555.dimD.3502[0].strideD.3498 = 1;
  # .MEM_245 = VDEF <.MEM_244>
  parm.20D.3555.dataD.3495 = &d_ri4D.3433[0];
  # .MEM_246 = VDEF <.MEM_245>
  parm.20D.3555.offsetD.3496 = -1;
  
After 
# .MEM_243 = VDEF <.MEM_1566>
parm.20D.3555.dimD.3502[0].uboundD.3500 = 9;
# .MEM_245 = VDEF <.MEM_243>
parm.20D.3555.dataD.3495 = &d_ri4D.3433[0];
# .MEM_992 = VDEF <.MEM_245>
MEM[(integer(kind=8)D.9 *)&parm.20D.3555 + 8B] = vect_cst__993;
# PT = anything
# ALIGN = 16, MISALIGN = 8
_984 = &parm.20D.3555.offsetD.3496 + 16;
# .MEM_983 = VDEF <.MEM_992>
MEM[(integer(kind=8)D.9 *)_984] = vect_cst__999;

vect_cst__993={-1,297}
vect_cst__999={1,1}

Other places looks similar. This looks like correct gimple. I am verifying the gimple generated for arm big endian target.
Comment 3 vekumar 2016-05-26 13:55:59 UTC
Built armeb-none-linux-gnueabihf -with-cpu=cortex-a9 --with-fpu=neon-fp16 --with-float=hard

And compared gimple output from intrinsic_pack_1.f90.151t.slp1 before and after my patch.

The difference is shown below and is similar to x86_64 dump. The gimple dump after SLP looks correct to me. I think something in backend is causing the issues.

Any thoughts?

Gimple SLP dumps.

Before

 # .MEM_1450 = VDEF <.MEM_1492>
  d_i1D.3585[0].vD.3582 = 1;
  # .MEM_1454 = VDEF <.MEM_1450>
  d_i1D.3585[1].vD.3582 = -1;
  # .MEM_1458 = VDEF <.MEM_1454>
  d_i1D.3585[2].vD.3582 = 2;
  # .MEM_1468 = VDEF <.MEM_1458>
  d_i1D.3585[3].vD.3582 = -2;
  # .MEM_1472 = VDEF <.MEM_1468>
  d_i1D.3585[4].vD.3582 = 3;
  # .MEM_1476 = VDEF <.MEM_1472>
  d_i1D.3585[5].vD.3582 = -3;
  # .MEM_1486 = VDEF <.MEM_1476>
  d_i1D.3585[6].vD.3582 = 4;
  # .MEM_1490 = VDEF <.MEM_1486>
  d_i1D.3585[7].vD.3582 = -4;
  # .MEM_1494 = VDEF <.MEM_1490>
  d_i1D.3585[8].vD.3582 = 5;


After 

  vect_cst__817 = { 1, 0, 1, 0 };
  vect_cst__873 = { 1, 0, 1, 0 };
  vect_cst__1413 = { 1, -1, 2, -2 };
  vect_cst__1461 = { 3, -3, 4, -4 };

  # .MEM_910 = VDEF <.MEM_1492>
  MEM[(integer(kind=1)D.3 *)&d_i1D.3585] = vect_cst__1413;
  # PT = anything
  # ALIGN = 4, MISALIGN = 0
  _918 = &d_i1D.3585[0].vD.3582 + 4;
  # .MEM_865 = VDEF <.MEM_910>
  MEM[(integer(kind=1)D.3 *)_918] = vect_cst__1461;
  # .MEM_1494 = VDEF <.MEM_865>
  d_i1D.3585[8].vD.3582 = 5;

Before 

 # .MEM_1388 = VDEF <.MEM_217>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][0] = 1;
  # .MEM_1393 = VDEF <.MEM_1388>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][1] = 0;
  # .MEM_1398 = VDEF <.MEM_1393>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][2] = 1;
  # .MEM_1409 = VDEF <.MEM_1398>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][3] = 0;
  # .MEM_1414 = VDEF <.MEM_1409>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][4] = 1;
  # .MEM_1419 = VDEF <.MEM_1414>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][5] = 0;
  # .MEM_1430 = VDEF <.MEM_1419>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][6] = 1;
  # .MEM_1435 = VDEF <.MEM_1430>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][7] = 0;
  # .MEM_1440 = VDEF <.MEM_1435>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][8] = 1;

After 

  # .MEM_825 = VDEF <.MEM_217>
  MEM[(logical(kind=1)D.7 *)&A.8D.3679] = vect_cst__817;
  # PT = anything
  # ALIGN = 4, MISALIGN = 0
  _769 = &MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][0] + 4;
  # .MEM_777 = VDEF <.MEM_825>
  MEM[(logical(kind=1)D.7 *)_769] = vect_cst__873;
  # .MEM_1440 = VDEF <.MEM_777>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][8] = 1;

Before 

  # .MEM_1271 = VDEF <.MEM_264>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][0] = 1;
  # .MEM_1276 = VDEF <.MEM_1271>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][1] = 0;
  # .MEM_1281 = VDEF <.MEM_1276>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][2] = 1;
  # .MEM_1292 = VDEF <.MEM_1281>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][3] = 0;
  # .MEM_1297 = VDEF <.MEM_1292>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][4] = 1;
  # .MEM_1302 = VDEF <.MEM_1297>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][5] = 0;
  # .MEM_1313 = VDEF <.MEM_1302>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][6] = 1;
  # .MEM_1318 = VDEF <.MEM_1313>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][7] = 0;
  # .MEM_1323 = VDEF <.MEM_1318>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][8] = 1;

After 

 vect_cst__729 = { 1, 0, 1, 0 };
  vect_cst__721 = { 1, 0, 1, 0 };

  # .MEM_673 = VDEF <.MEM_264>
  MEM[(logical(kind=1)D.7 *)&A.23D.3720] = vect_cst__729;
  # PT = anything
  # ALIGN = 4, MISALIGN = 0
  _681 = &MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][0] + 4;
  # .MEM_942 = VDEF <.MEM_673>
  MEM[(logical(kind=1)D.7 *)_681] = vect_cst__721;
  # .MEM_1323 = VDEF <.MEM_942>
  MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][8] = 1;
Comment 4 Richard Biener 2016-05-27 09:23:42 UTC
  MEM[(logical(kind=1)D.7 *)_769] = vect_cst__873;
  # .MEM_1440 = VDEF <.MEM_777>
  MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][8] = 1;

there's different alias-ptr types.  Can you double-check that RTL expansion
produces MEMs with the same MEM_ALIAS_SET?
Comment 5 vekumar 2016-05-27 09:41:08 UTC
The expand dump after SLP split 

---snip--
;; MEM[(logical(kind=1) *)&A.8] = { 1, 0, 1, 0 };

(insn 71 70 72 (set (reg:SI 308)
        (const_int 16777472 [0x1000100])) intrinsic_pack_1.f90:49 -1
     (nil))

(insn 72 71 0 (set (mem/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars)
                (const_int -576 [0xfffffffffffffdc0])) [8 MEM[(logical(kind=1)D.7 *)&A.8D.3679]+0 S4 A64])
        (reg:SI 308)) intrinsic_pack_1.f90:49 -1
     (nil))

;; MEM[(logical(kind=1) *)&A.8 + 4B] = { 1, 0, 1, 0 };

(insn 73 72 74 (set (reg:SI 309)
        (const_int 16777472 [0x1000100])) intrinsic_pack_1.f90:49 -1
     (nil))

(insn 74 73 0 (set (mem/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars)
                (const_int -572 [0xfffffffffffffdc4])) [8 MEM[(logical(kind=1)D.7 *)&A.8D.3679 + 4B]+0 S4 A32])
        (reg:SI 309)) intrinsic_pack_1.f90:49 -1
     (nil))

;; MEM[(logical(kind=1)[9] *)&A.8][8] = 1;

(insn 75 74 76 (set (reg:SI 310)
        (const_int 1 [0x1])) intrinsic_pack_1.f90:49 -1
     (nil))

(insn 76 75 77 (set (reg:QI 311)
        (subreg:QI (reg:SI 310) 3)) intrinsic_pack_1.f90:49 -1
     (nil))

(insn 77 76 0 (set (mem/c:QI (plus:SI (reg/f:SI 105 virtual-stack-vars)
                (const_int -568 [0xfffffffffffffdc8])) [8 A.8D.3679+8 S1 A64])
        (reg:QI 311)) intrinsic_pack_1.f90:49 -1
     (nil))
--snip---
Comment 6 Richard Biener 2016-05-27 09:43:09 UTC
Looks good - so leaving to target folks to investigate.
Comment 7 Jakub Jelinek 2016-12-12 09:53:26 UTC
Any progress on this from the ARM backend side?
Comment 8 ktkachov 2016-12-14 11:22:40 UTC
After playing around with this I believe the problem is in the
*neon_mov<mode> pattern in the arm backend. The way it constructs the const_vector of {1, 0, 1, ... } is probably wrong for big-endian.
If I disable alternative 2 of that pattern (forcing a constant pool load instead) the testcase runs correctly. I'll see if I can come up with a fix
Comment 9 ktkachov 2016-12-14 11:26:32 UTC
(In reply to ktkachov from comment #8)
> After playing around with this I believe the problem is in the
> *neon_mov<mode> pattern in the arm backend. The way it constructs the
> const_vector of {1, 0, 1, ... } is probably wrong for big-endian.

I should say the problem is not the RTL const_vector, but rather the vmov immediate that neon_immediate_valid_for_move generates
Comment 10 Jakub Jelinek 2017-01-10 14:11:34 UTC
Patch posted, not reviewed yet:
http://gcc.gnu.org/ml/gcc-patches/2017-01/msg00381.html
Comment 11 ktkachov 2017-01-20 14:37:28 UTC
Author: ktkachov
Date: Fri Jan 20 14:36:57 2017
New Revision: 244716

URL: https://gcc.gnu.org/viewcvs?rev=244716&root=gcc&view=rev
Log:
[ARM] PR target/71270 fix neon_valid_immediate for big-endian

	PR target/71270
	* config/arm/arm.c (neon_valid_immediate): Reject vector constants
	in big-endian mode when they are not a single duplicated value.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
Comment 12 ktkachov 2017-01-20 14:41:02 UTC
Fixed.
Opened PR 79166 for tracking codegen improvements for big-endian that can be done as a result of https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00381.html