As noted in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01916.html, the fix for PR 58135 (r236582) causes fortran tests to fail at execution time on armeb (tested using qemu): gfortran.dg/c_f_pointer_logical.f03 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test gfortran.dg/c_f_pointer_logical.f03 -O3 -g execution test gfortran.dg/intrinsic_pack_1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test gfortran.dg/intrinsic_pack_1.f90 -O3 -g execution test GCC was configured with: --target=armeb-none-linux-gnueabihf --enable-languages=c,c++,fortran --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon-fp16
Might be some arm BE vector support target bug as well.
Looked at x86_64 gimple code for intrinsic_pack_1.f90. After the SLP split we now vectorize at the place where we pass constant arguments via a parameterstructure to _gfortran_pack call. Before parm.20D.3555.dtypeD.3497 = 297; # .MEM_242 = VDEF <.MEM_241> parm.20D.3555.dimD.3502[0].lboundD.3499 = 1; # .MEM_243 = VDEF <.MEM_242> parm.20D.3555.dimD.3502[0].uboundD.3500 = 9; # .MEM_244 = VDEF <.MEM_243> parm.20D.3555.dimD.3502[0].strideD.3498 = 1; # .MEM_245 = VDEF <.MEM_244> parm.20D.3555.dataD.3495 = &d_ri4D.3433[0]; # .MEM_246 = VDEF <.MEM_245> parm.20D.3555.offsetD.3496 = -1; After # .MEM_243 = VDEF <.MEM_1566> parm.20D.3555.dimD.3502[0].uboundD.3500 = 9; # .MEM_245 = VDEF <.MEM_243> parm.20D.3555.dataD.3495 = &d_ri4D.3433[0]; # .MEM_992 = VDEF <.MEM_245> MEM[(integer(kind=8)D.9 *)&parm.20D.3555 + 8B] = vect_cst__993; # PT = anything # ALIGN = 16, MISALIGN = 8 _984 = &parm.20D.3555.offsetD.3496 + 16; # .MEM_983 = VDEF <.MEM_992> MEM[(integer(kind=8)D.9 *)_984] = vect_cst__999; vect_cst__993={-1,297} vect_cst__999={1,1} Other places looks similar. This looks like correct gimple. I am verifying the gimple generated for arm big endian target.
Built armeb-none-linux-gnueabihf -with-cpu=cortex-a9 --with-fpu=neon-fp16 --with-float=hard And compared gimple output from intrinsic_pack_1.f90.151t.slp1 before and after my patch. The difference is shown below and is similar to x86_64 dump. The gimple dump after SLP looks correct to me. I think something in backend is causing the issues. Any thoughts? Gimple SLP dumps. Before # .MEM_1450 = VDEF <.MEM_1492> d_i1D.3585[0].vD.3582 = 1; # .MEM_1454 = VDEF <.MEM_1450> d_i1D.3585[1].vD.3582 = -1; # .MEM_1458 = VDEF <.MEM_1454> d_i1D.3585[2].vD.3582 = 2; # .MEM_1468 = VDEF <.MEM_1458> d_i1D.3585[3].vD.3582 = -2; # .MEM_1472 = VDEF <.MEM_1468> d_i1D.3585[4].vD.3582 = 3; # .MEM_1476 = VDEF <.MEM_1472> d_i1D.3585[5].vD.3582 = -3; # .MEM_1486 = VDEF <.MEM_1476> d_i1D.3585[6].vD.3582 = 4; # .MEM_1490 = VDEF <.MEM_1486> d_i1D.3585[7].vD.3582 = -4; # .MEM_1494 = VDEF <.MEM_1490> d_i1D.3585[8].vD.3582 = 5; After vect_cst__817 = { 1, 0, 1, 0 }; vect_cst__873 = { 1, 0, 1, 0 }; vect_cst__1413 = { 1, -1, 2, -2 }; vect_cst__1461 = { 3, -3, 4, -4 }; # .MEM_910 = VDEF <.MEM_1492> MEM[(integer(kind=1)D.3 *)&d_i1D.3585] = vect_cst__1413; # PT = anything # ALIGN = 4, MISALIGN = 0 _918 = &d_i1D.3585[0].vD.3582 + 4; # .MEM_865 = VDEF <.MEM_910> MEM[(integer(kind=1)D.3 *)_918] = vect_cst__1461; # .MEM_1494 = VDEF <.MEM_865> d_i1D.3585[8].vD.3582 = 5; Before # .MEM_1388 = VDEF <.MEM_217> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][0] = 1; # .MEM_1393 = VDEF <.MEM_1388> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][1] = 0; # .MEM_1398 = VDEF <.MEM_1393> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][2] = 1; # .MEM_1409 = VDEF <.MEM_1398> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][3] = 0; # .MEM_1414 = VDEF <.MEM_1409> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][4] = 1; # .MEM_1419 = VDEF <.MEM_1414> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][5] = 0; # .MEM_1430 = VDEF <.MEM_1419> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][6] = 1; # .MEM_1435 = VDEF <.MEM_1430> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][7] = 0; # .MEM_1440 = VDEF <.MEM_1435> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][8] = 1; After # .MEM_825 = VDEF <.MEM_217> MEM[(logical(kind=1)D.7 *)&A.8D.3679] = vect_cst__817; # PT = anything # ALIGN = 4, MISALIGN = 0 _769 = &MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][0] + 4; # .MEM_777 = VDEF <.MEM_825> MEM[(logical(kind=1)D.7 *)_769] = vect_cst__873; # .MEM_1440 = VDEF <.MEM_777> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][8] = 1; Before # .MEM_1271 = VDEF <.MEM_264> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][0] = 1; # .MEM_1276 = VDEF <.MEM_1271> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][1] = 0; # .MEM_1281 = VDEF <.MEM_1276> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][2] = 1; # .MEM_1292 = VDEF <.MEM_1281> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][3] = 0; # .MEM_1297 = VDEF <.MEM_1292> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][4] = 1; # .MEM_1302 = VDEF <.MEM_1297> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][5] = 0; # .MEM_1313 = VDEF <.MEM_1302> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][6] = 1; # .MEM_1318 = VDEF <.MEM_1313> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][7] = 0; # .MEM_1323 = VDEF <.MEM_1318> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][8] = 1; After vect_cst__729 = { 1, 0, 1, 0 }; vect_cst__721 = { 1, 0, 1, 0 }; # .MEM_673 = VDEF <.MEM_264> MEM[(logical(kind=1)D.7 *)&A.23D.3720] = vect_cst__729; # PT = anything # ALIGN = 4, MISALIGN = 0 _681 = &MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][0] + 4; # .MEM_942 = VDEF <.MEM_673> MEM[(logical(kind=1)D.7 *)_681] = vect_cst__721; # .MEM_1323 = VDEF <.MEM_942> MEM[(logical(kind=1)D.7[9] *)&A.23D.3720][8] = 1;
MEM[(logical(kind=1)D.7 *)_769] = vect_cst__873; # .MEM_1440 = VDEF <.MEM_777> MEM[(logical(kind=1)D.7[9] *)&A.8D.3679][8] = 1; there's different alias-ptr types. Can you double-check that RTL expansion produces MEMs with the same MEM_ALIAS_SET?
The expand dump after SLP split ---snip-- ;; MEM[(logical(kind=1) *)&A.8] = { 1, 0, 1, 0 }; (insn 71 70 72 (set (reg:SI 308) (const_int 16777472 [0x1000100])) intrinsic_pack_1.f90:49 -1 (nil)) (insn 72 71 0 (set (mem/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -576 [0xfffffffffffffdc0])) [8 MEM[(logical(kind=1)D.7 *)&A.8D.3679]+0 S4 A64]) (reg:SI 308)) intrinsic_pack_1.f90:49 -1 (nil)) ;; MEM[(logical(kind=1) *)&A.8 + 4B] = { 1, 0, 1, 0 }; (insn 73 72 74 (set (reg:SI 309) (const_int 16777472 [0x1000100])) intrinsic_pack_1.f90:49 -1 (nil)) (insn 74 73 0 (set (mem/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -572 [0xfffffffffffffdc4])) [8 MEM[(logical(kind=1)D.7 *)&A.8D.3679 + 4B]+0 S4 A32]) (reg:SI 309)) intrinsic_pack_1.f90:49 -1 (nil)) ;; MEM[(logical(kind=1)[9] *)&A.8][8] = 1; (insn 75 74 76 (set (reg:SI 310) (const_int 1 [0x1])) intrinsic_pack_1.f90:49 -1 (nil)) (insn 76 75 77 (set (reg:QI 311) (subreg:QI (reg:SI 310) 3)) intrinsic_pack_1.f90:49 -1 (nil)) (insn 77 76 0 (set (mem/c:QI (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -568 [0xfffffffffffffdc8])) [8 A.8D.3679+8 S1 A64]) (reg:QI 311)) intrinsic_pack_1.f90:49 -1 (nil)) --snip---
Looks good - so leaving to target folks to investigate.
Any progress on this from the ARM backend side?
After playing around with this I believe the problem is in the *neon_mov<mode> pattern in the arm backend. The way it constructs the const_vector of {1, 0, 1, ... } is probably wrong for big-endian. If I disable alternative 2 of that pattern (forcing a constant pool load instead) the testcase runs correctly. I'll see if I can come up with a fix
(In reply to ktkachov from comment #8) > After playing around with this I believe the problem is in the > *neon_mov<mode> pattern in the arm backend. The way it constructs the > const_vector of {1, 0, 1, ... } is probably wrong for big-endian. I should say the problem is not the RTL const_vector, but rather the vmov immediate that neon_immediate_valid_for_move generates
Patch posted, not reviewed yet: http://gcc.gnu.org/ml/gcc-patches/2017-01/msg00381.html
Author: ktkachov Date: Fri Jan 20 14:36:57 2017 New Revision: 244716 URL: https://gcc.gnu.org/viewcvs?rev=244716&root=gcc&view=rev Log: [ARM] PR target/71270 fix neon_valid_immediate for big-endian PR target/71270 * config/arm/arm.c (neon_valid_immediate): Reject vector constants in big-endian mode when they are not a single duplicated value. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c
Fixed. Opened PR 79166 for tracking codegen improvements for big-endian that can be done as a result of https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00381.html