g:052204fac580b21c967e57e6285d99a9828b8fac, r11-3230 FAIL: gcc.target/powerpc/p9-vec-length-epil-7.c scan-assembler-times \\mstxvl\\M 10 FAIL: gcc.target/powerpc/p9-vec-length-full-6.c scan-assembler-not \\mlxvx\\M FAIL: gcc.target/powerpc/p9-vec-length-full-6.c scan-assembler-not \\mstxvx\\M FAIL: gcc.target/powerpc/p9-vec-length-full-6.c scan-assembler-times \\mlxvl\\M 16 FAIL: gcc.target/powerpc/p9-vec-length-full-6.c scan-assembler-times \\mstxvl\\M 16 For the second test case here is a diff of the assembler: seurer@makalu-lp1:~/gcc/git/build/gcc-test$ diff p9-vec-length-epil-7.s.r11-3229 p9-vec-length-epil-7.s.r11-3230 322,323c322,323 < li 6,28 < addis 7,2,.LC8@toc@ha --- > li 7,28 > addis 8,2,.LC8@toc@ha 325d324 < addis 10,2,.LANCHOR0@toc@ha 327,330c326,328 < li 8,0 < mtctr 6 < addi 7,7,.LC8@toc@l < addi 10,10,.LANCHOR0@toc@l --- > li 10,0 > mtctr 7 > addi 8,8,.LC8@toc@l 333,335c331,333 < lxv 32,0(7) < addis 7,2,.LANCHOR0+904@toc@ha < std 8,.LANCHOR0+904@toc@l(7) --- > lxv 32,0(8) > addis 8,2,.LANCHOR0+904@toc@ha > std 10,.LANCHOR0+904@toc@l(8) 343,349c341,343 < addis 7,2,.LC9@toc@ha < li 8,8 < addi 9,10,1360 < addi 7,7,.LC9@toc@l < sldi 10,8,56 < lxv 0,0(7) < stxvl 0,9,10 --- > li 9,57 > addis 10,2,.LANCHOR0+1360@toc@ha > std 9,.LANCHOR0+1360@toc@l(10) 368,369c362,363 < li 6,28 < addis 7,2,.LC8@toc@ha --- > li 7,28 > addis 8,2,.LC8@toc@ha 371d364 < addis 10,2,.LANCHOR0@toc@ha 373,376c366,368 < li 8,0 < mtctr 6 < addi 7,7,.LC8@toc@l < addi 10,10,.LANCHOR0@toc@l --- > li 10,0 > mtctr 7 > addi 8,8,.LC8@toc@l 379,381c371,373 < lxv 32,0(7) < addis 7,2,.LANCHOR0+1416@toc@ha < std 8,.LANCHOR0+1416@toc@l(7) --- > lxv 32,0(8) > addis 8,2,.LANCHOR0+1416@toc@ha > std 10,.LANCHOR0+1416@toc@l(8) 389,395c381,383 < addis 7,2,.LC9@toc@ha < li 8,8 < addi 9,10,1872 < addi 7,7,.LC9@toc@l < sldi 10,8,56 < lxv 0,0(7) < stxvl 0,9,10 --- > li 9,57 > addis 10,2,.LANCHOR0+1872@toc@ha > std 9,.LANCHOR0+1872@toc@l(10) 414,415c402,403 < addis 6,2,.LC10@toc@ha < addis 7,2,.LC11@toc@ha --- > addis 6,2,.LC9@toc@ha > addis 7,2,.LC10@toc@ha 421,422c409,410 < addi 6,6,.LC10@toc@l < addi 7,7,.LC11@toc@l --- > addi 6,6,.LC9@toc@l > addi 7,7,.LC10@toc@l 441c429 < addis 7,2,.LC12@toc@ha --- > addis 7,2,.LC11@toc@ha 444c432 < addi 7,7,.LC12@toc@l --- > addi 7,7,.LC11@toc@l 466,469c454,456 < addis 7,2,.LC13@toc@ha < li 6,28 < addis 8,2,.LC14@toc@ha < addis 10,2,.LANCHOR0@toc@ha --- > addis 8,2,.LC12@toc@ha > li 7,28 > addis 10,2,.LC13@toc@ha 472,475c459,461 < addi 7,7,.LC13@toc@l < mtctr 6 < addi 8,8,.LC14@toc@l < addi 10,10,.LANCHOR0@toc@l --- > addi 8,8,.LC12@toc@l > mtctr 7 > addi 10,10,.LC13@toc@l 477,480c463,466 < lxv 0,0(7) < lxv 11,0(8) < addis 8,2,.LANCHOR0+2184@toc@ha < stfd 12,.LANCHOR0+2184@toc@l(8) --- > lxv 0,0(8) > lxv 11,0(10) > addis 10,2,.LANCHOR0+2184@toc@ha > stfd 12,.LANCHOR0+2184@toc@l(10) 488,494c474,477 < addis 7,2,.LC15@toc@ha < li 8,8 < addi 9,10,2640 < addi 7,7,.LC15@toc@l < sldi 10,8,56 < lxv 0,0(7) < stxvl 0,9,10 --- > addis 9,2,.LC14@toc@ha > lfd 0,.LC14@toc@l(9) > addis 9,2,.LANCHOR0+2640@toc@ha > stfd 0,.LANCHOR0+2640@toc@l(9)
I'll take a look at this.
Thanks Kewen, unfortunately I've no Power setup. Sorry for the inconvenience.
(In reply to akrl from comment #2) > Thanks Kewen, unfortunately I've no Power setup. Sorry for the > inconvenience. My pleasure! If you have interests to run on Power machines, you can apply and use some Power8/Power9 machines in CFarm machine pool https://cfarm.tetaneutral.net/machines/list/.
> gcc.target/powerpc/p9-vec-length-full-6.c This is a test case issue, 64bit/32bit pairs will use full vector instead of partial vector as Andrea's improvement. > gcc.target/powerpc/p9-vec-length-epil-7.c It exposed one problem: when we call vect_need_peeling_or_partial_vectors_p in function vect_analyze_loop_2, it's in analysis stage, if the loop is one epilogue loop, the loop_vinfo hasn't been fixed up, like LOOP_VINFO_INT_NITERS, the function can probably give the wrong answer. For some 64bit type functions of this failed case, it will return false for the epilogue loops but actually the remaining iteration can't cover the full vector. One simple fix is to exclude epilogue loop for this check. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index ab627fbf029..7273e998a99 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2278,7 +2278,8 @@ start_over: { /* Don't use partial vectors if we don't need to peel the loop. */ if (param_vect_partial_vector_usage == 0 - || !vect_need_peeling_or_partial_vectors_p (loop_vinfo)) + || (!LOOP_VINFO_EPILOGUE_P (loop_vinfo) + && !vect_need_peeling_or_partial_vectors_p (loop_vinfo))) LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; else if (vect_verify_full_masking (loop_vinfo) || vect_verify_loop_lens (loop_vinfo)) Testing is ongoing.
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>: https://gcc.gnu.org/g:5427bd4d57c0376e51fc7b256e76aa46c43aa8cf commit r11-3422-g5427bd4d57c0376e51fc7b256e76aa46c43aa8cf Author: Kewen Lin <linkw@linux.ibm.com> Date: Thu Sep 24 00:40:47 2020 -0500 test: Adjust case p9-vec-length-full-6.c [PR97075] The commit r11-3230 brings a nice improvement to use full vectors instead of partial vectors when available. This patch is to fix the test failures on p9-vec-length-full-6.c, where 64bit/32bit pairs are able to use full vector instead. Bootstrapped/regtested on powerpc64le-linux-gnu P9. gcc/testsuite/ChangeLog: PR tree-optimization/97075 * gcc.target/powerpc/p9-vec-length-full-6.c: Adjust.
Richard's rework r11-3393 has taken care of the failure on gcc.target/powerpc/p9-vec-length-epil-7.c. All failures should be gone on latest trunk.