The recent changes to the vect peeling cost model have introduced numerous failures to powerpc vect testsuite.
New.
*** Bug 80927 has been marked as a duplicate of this bug. ***
Strange, my tests didn't show new failures on Power7. I'll have a look, perhaps the build settings were wrong.
Starting with r248678 these are the tests that fail on powerpc64 LE and BE (power6,7, and 8): FAIL: g++.dg/vect/slp-pr56812.cc -std=c++11 scan-tree-dump-times slp1 "basic block vectorized" 1 FAIL: g++.dg/vect/slp-pr56812.cc -std=c++14 scan-tree-dump-times slp1 "basic block vectorized" 1 FAIL: g++.dg/vect/slp-pr56812.cc -std=c++98 scan-tree-dump-times slp1 "basic block vectorized" 1 FAIL: gcc.dg/vect/costmodel/ppc/costmodel-pr37194.c scan-tree-dump-times vect "vectorization not profitable" 1 FAIL: gcc.dg/vect/costmodel/ppc/costmodel-pr37194.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect "Alignment of access forced using peeling" 2 FAIL: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/section-anchors-vect-69.c scan-tree-dump-times vect "Alignment of access forced using peeling" 2 FAIL: gcc.dg/vect/section-anchors-vect-69.c scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-28.c -flto -ffat-lto-objects scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-28.c -flto -ffat-lto-objects scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-28.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-28.c scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-33-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-33-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-33-big-array.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-33-big-array.c scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-70.c -flto -ffat-lto-objects scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-70.c -flto -ffat-lto-objects scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-70.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-70.c scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-87.c -flto -ffat-lto-objects scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-87.c -flto -ffat-lto-objects scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-87.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-87.c scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-88.c -flto -ffat-lto-objects scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-88.c -flto -ffat-lto-objects scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-88.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gcc.dg/vect/vect-88.c scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-91.c -flto -ffat-lto-objects scan-tree-dump-times vect "Alignment of access forced using peeling" 3 FAIL: gcc.dg/vect/vect-91.c scan-tree-dump-times vect "Alignment of access forced using peeling" 3 FAIL: gcc.dg/vect/vect-93.c -flto -ffat-lto-objects scan-tree-dump-times vect "Vectorizing an unaligned access" 1 FAIL: gcc.dg/vect/vect-93.c scan-tree-dump-times vect "Vectorizing an unaligned access" 1 FAIL: gfortran.dg/vect/vect-3.f90 -O scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gfortran.dg/vect/vect-3.f90 -O scan-tree-dump-times vect "Vectorizing an unaligned access" 1 FAIL: gfortran.dg/vect/vect-4.f90 -O scan-tree-dump-times vect "Alignment of access forced using peeling" 1 FAIL: gfortran.dg/vect/vect-4.f90 -O scan-tree-dump-times vect "Vectorizing an unaligned access" 1
I quickly built trunk without bootstrap on power7 BE ("--enable-languages="c,c++,fortran" --disable-multilib --disable-bootstrap") and still get no new fails. Do I need other build parameters? Meanwhile I got access to a power8 LE machine and will check there. r248678 definitely introduced fails but r248680 should actually get rid of them. I can't confirm for every one of these but [6/6] was made specifically to address fails on power. Are all of them still present after r248680?
I see them still for r248738. My configure is pretty simple: --enable-languages=c,fortran,c++ --with-cpu=power8 --disable-bootstrap and it's the same on both BE and LE. I am using binutils 2.27 though I am not sure that would matter.
I could reproduce the fails on a power8 machine now. Looking at the vect-28.c FAIL now - the loop to be vectorized is: for (i = 0; i < N; i++) { ia[i+off] = 5; } It still gets vectorized but not peeled anymore because the costs for no peeling equal the costs for peeling (for unknown alignment). Costs for an unaligned store are the same (1) as for a regular store so this is to be expected. At first sight, the situation is similar for vect-87.c, vect-88.c and maybe most of the fails with '"Vectorizing an unaligned access" 0'. How should we deal with this? If the cost function is correct as it is and unaligned stores are not slower at all, I don't think we should be peeling. What is expected for real workloads and unaligned loads/stores?
The cost modeling doesn't explain failures on P6 and P7, I guess. For P8 we consider unaligned loads to be the same cost as aligned loads (this is a small lie because of boundary-crossing costs, but these are much smaller than before and amortized across a sequence of loads). Prior to P8 there is a heavy penalty in the cost model for using unaligned loads, so the P6/P7 failures are still unexpected.
I built --with-cpu=power7 and still see TARGET_EFFICIENT_UNALIGNED_VSX == true in the backend which causes unaligned stores to have costs of 1. On my power7 system, TARGET_EFFICIENT_UNALIGNED_VSX is never true. I stepped through rs6000_override_internal and TARGET_EFFICIENT_UNALIGNED_VSX is properly unset via the options defined by --with-cpu=power7 (or power6). Afterwards, however, it is overwritten again by TARGET_P8_VECTOR which, in turn, is set automatically by the vector test suite depending on the current CPU: [...] } elseif [check_p8vector_hw_available] { lappend DEFAULT_VECTCFLAGS "-mpower8-vector" Therefore, whenever the vector tests are run on a power8 CPU, TARGET_EFFICIENT_UNALIGNED_VSX = 1, no matter the --with-cpu. This would also explain why I didn't see the fails on my machine, all vect tests are only called with -maltivec which doesn't override TARGET_EFFICIENT_UNALIGNED_VSX. So, the way the vect test suite is currently set up, this is kind of expected.
(In reply to rdapp from comment #9) > > Therefore, whenever the vector tests are run on a power8 CPU, > TARGET_EFFICIENT_UNALIGNED_VSX = 1, no matter the --with-cpu. This would > also explain why I didn't see the fails on my machine, all vect tests are > only called with -maltivec which doesn't override > TARGET_EFFICIENT_UNALIGNED_VSX. > > So, the way the vect test suite is currently set up, this is kind of > expected. Ick, that is horrible. We'll have to review whether that can be fixed, as that is going to causes us a lot of trouble with cost modeling issues... For this test case we'll need to figure out a more appropriate way to disable it when the unaligned load cost is cheap.
Well, I should be more careful -- the behavior you see is probably reasonable for these runtime tests, since the testing infrastructure isn't aware that you built for an older architecture on the POWER8 it will be running on. And I think we still have an undiagnosed problem. We are seeing these tests fail natively on POWER6 and POWER7 systems where we have buildbots sending results to gcc-testresults. In that scenario, the override you discovered presumably wouldn't kick in. Bill Seurer, can you run one of these tests by hand on the regtester machines, and save the output from -fdump-tree-vect-details? Hopefully that will give me some insight into why the tests still fail there when they should use the expensive cost modeling.
Hmmm, they don't all fail on power6/7 (costmodel-pr37194.c for instance). I attached a dump from -fdump-tree-vect-details for one that does (power6).
Created attachment 41475 [details] Dump from -fdump-tree-vect-details for test case gcc.dg/vect/vect-33-big-array.c
spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc -B/home/seurer/gcc/build/gcc-test/gcc/ /home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c -fno-diagnostics-show-caret -fdiagnostics-color=never -maltivec -mpower8-vector -ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o vect-33-big-array.s PASS: gcc.dg/vect/vect-33-big-array.c (test for excess errors) PASS: gcc.dg/vect/vect-33-big-array.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-33-big-array.c scan-tree-dump-times vect "Vectorizing an unaligned access" 0 FAIL: gcc.dg/vect/vect-33-big-array.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 Executing on host: /home/seurer/gcc/build/gcc-test/gcc/xgcc -B/home/seurer/gcc/build/gcc-test/gcc/ /home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -ffat-lto-objects -maltivec -mpower8-vector -ftree-vectorize -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o vect-33-big-array.s (timeout = 300)
These started to fail on aarch64-*-* at the same time as powerpc.
(In reply to seurer from comment #14) > spawn -ignore SIGHUP /home/seurer/gcc/build/gcc-test/gcc/xgcc > -B/home/seurer/gcc/build/gcc-test/gcc/ > /home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c > -fno-diagnostics-show-caret -fdiagnostics-color=never -maltivec > -mpower8-vector -ftree-vectorize -fno-vect-cost-model -fno-common -O2 > -fdump-tree-vect-details -S -o vect-33-big-array.s > PASS: gcc.dg/vect/vect-33-big-array.c (test for excess errors) > PASS: gcc.dg/vect/vect-33-big-array.c scan-tree-dump-times vect "vectorized > 1 loops" 1 > FAIL: gcc.dg/vect/vect-33-big-array.c scan-tree-dump-times vect "Vectorizing > an unaligned access" 0 > FAIL: gcc.dg/vect/vect-33-big-array.c scan-tree-dump-times vect "Alignment > of access forced using peeling" 1 > Executing on host: /home/seurer/gcc/build/gcc-test/gcc/xgcc > -B/home/seurer/gcc/build/gcc-test/gcc/ > /home/seurer/gcc/gcc-test/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c > -fno-diagnostics-show-caret -fdiagnostics-color=never -flto > -ffat-lto-objects -maltivec -mpower8-vector -ftree-vectorize > -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o > vect-33-big-array.s (timeout = 300) There is still -mpower8-vector in the compile options, making unaligned stores inexpensive. This is really run on a power6 CPU? In order to bulk-disable tests that rely on peeling, would something like a global check (e.g. target_vect_unaligned, akin to target_vect_int etc.) make sense? This could be used to flag and disable specific tests depending on the CPU the the vect suite is run on.
That is the usual approach, and there are already some predicates involving alignment. It's a matter of going through and figuring out which ones will do what's needed. I spent some tiresome weeks working through this when we first made the unaligned loads cheaper.
See https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01862.html for a proposed patch to update the tests.
Author: sje Date: Mon Jul 31 21:44:34 2017 New Revision: 250752 URL: https://gcc.gnu.org/viewcvs?rev=250752&root=gcc&view=rev Log: 2017-07-31 Steve Ellcey <sellcey@cavium.com> PR tree-optimization/80925 * gcc.dg/vect/no-section-anchors-vect-69.c: Add --param vect-max-peeling-for-alignment=0 option. Remove unaligned access and peeling checks. * gcc.dg/vect/section-anchors-vect-69.c: Ditto. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c trunk/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
Author: sje Date: Tue Aug 1 15:37:22 2017 New Revision: 250783 URL: https://gcc.gnu.org/viewcvs?rev=250783&root=gcc&view=rev Log: 2017-08-01 Steve Ellcey <sellcey@cavium.com> PR tree-optimization/80925 * gcc.dg/vect/vect-28.c: Add --param vect-max-peeling-for-alignment=0 option. Remove unaligned access and peeling checks. * gcc.dg/vect/vect-33-big-array.c: Ditto. * gcc.dg/vect/vect-70.c: Ditto. * gcc.dg/vect/vect-87.c: Ditto. * gcc.dg/vect/vect-88.c: Ditto. * gcc.dg/vect/vect-91.c: Ditto. * gcc.dg/vect/vect-93.c: Ditto. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/vect-28.c trunk/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c trunk/gcc/testsuite/gcc.dg/vect/vect-70.c trunk/gcc/testsuite/gcc.dg/vect/vect-87.c trunk/gcc/testsuite/gcc.dg/vect/vect-88.c trunk/gcc/testsuite/gcc.dg/vect/vect-91.c trunk/gcc/testsuite/gcc.dg/vect/vect-93.c
(In reply to Steve Ellcey from comment #19) > Author: sje > Date: Mon Jul 31 21:44:34 2017 > New Revision: 250752 > > URL: https://gcc.gnu.org/viewcvs?rev=250752&root=gcc&view=rev > Log: > 2017-07-31 Steve Ellcey <sellcey@cavium.com> > > PR tree-optimization/80925 > * gcc.dg/vect/no-section-anchors-vect-69.c: Add > --param vect-max-peeling-for-alignment=0 option. > Remove unaligned access and peeling checks. > * gcc.dg/vect/section-anchors-vect-69.c: Ditto. > > Modified: > trunk/gcc/testsuite/ChangeLog > trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c > trunk/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c I think this change caused regressions on armeb-none-linux-gnueabihf --with-cpu=cortex-a9 --with-fpu=neon-fp16 (works OK --with-fpu=vfpv3-d16-fp16)
(In reply to Christophe Lyon from comment #21) > I think this change caused regressions on armeb-none-linux-gnueabihf > --with-cpu=cortex-a9 --with-fpu=neon-fp16 (works OK > --with-fpu=vfpv3-d16-fp16) Ranier Orth reported a failure on SPARC64 as well, here was my reply to him. I don't know if your problem is the same without seeing the specific failure. -- Looking at the checks at the end, I also see that SPARC does include the 'Alignment' message and Aarch64 does not and that is handled by a conditional check. I think the fix is to check for 'vectorized 4 loops' when we support unaligned vector instructions (vect_hw_misalign is true) and check for 'vectorized 3 loops' otherwise. Does that sound reasonable to you? I think the reason this worked before is that that loop got vectorized due to being peeled and my change turned off the peeling and thus it could not be vectorized on machines that do not support unaligned vectorization.
Author: aldyh Date: Wed Sep 13 16:09:53 2017 New Revision: 252203 URL: https://gcc.gnu.org/viewcvs?rev=252203&root=gcc&view=rev Log: 2017-07-31 Steve Ellcey <sellcey@cavium.com> PR tree-optimization/80925 * gcc.dg/vect/no-section-anchors-vect-69.c: Add --param vect-max-peeling-for-alignment=0 option. Remove unaligned access and peeling checks. * gcc.dg/vect/section-anchors-vect-69.c: Ditto. Modified: branches/range-gen2/gcc/testsuite/ChangeLog branches/range-gen2/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c branches/range-gen2/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
Author: aldyh Date: Wed Sep 13 16:14:53 2017 New Revision: 252228 URL: https://gcc.gnu.org/viewcvs?rev=252228&root=gcc&view=rev Log: 2017-08-01 Steve Ellcey <sellcey@cavium.com> PR tree-optimization/80925 * gcc.dg/vect/vect-28.c: Add --param vect-max-peeling-for-alignment=0 option. Remove unaligned access and peeling checks. * gcc.dg/vect/vect-33-big-array.c: Ditto. * gcc.dg/vect/vect-70.c: Ditto. * gcc.dg/vect/vect-87.c: Ditto. * gcc.dg/vect/vect-88.c: Ditto. * gcc.dg/vect/vect-91.c: Ditto. * gcc.dg/vect/vect-93.c: Ditto. Modified: branches/range-gen2/gcc/testsuite/ChangeLog branches/range-gen2/gcc/testsuite/gcc.dg/vect/vect-28.c branches/range-gen2/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c branches/range-gen2/gcc/testsuite/gcc.dg/vect/vect-70.c branches/range-gen2/gcc/testsuite/gcc.dg/vect/vect-87.c branches/range-gen2/gcc/testsuite/gcc.dg/vect/vect-88.c branches/range-gen2/gcc/testsuite/gcc.dg/vect/vect-91.c branches/range-gen2/gcc/testsuite/gcc.dg/vect/vect-93.c
Author: sje Date: Wed Sep 13 18:06:36 2017 New Revision: 252723 URL: https://gcc.gnu.org/viewcvs?rev=252723&root=gcc&view=rev Log: 2017-09-13 Steve Ellcey <sellcey@cavium.com> PR tree-optimization/80925 * gfortran.dg/vect/vect-2.f90: Add --param vect-max-peeling-for-alignment=0 option. Remove unaligned access and peeling checks. * gfortran.dg/vect/vect-3.f90: Ditto. * gfortran.dg/vect/vect-4.f90: Ditto. * gfortran.dg/vect/vect-5.f90: Ditto. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/vect/vect-2.f90 trunk/gcc/testsuite/gfortran.dg/vect/vect-3.f90 trunk/gcc/testsuite/gfortran.dg/vect/vect-4.f90 trunk/gcc/testsuite/gfortran.dg/vect/vect-5.f90
Fixed?
(In reply to Richard Biener from comment #26) > Fixed? I see still these vect failures on aarch64: FAIL: gcc.dg/vect/pr65947-14.c execution test FAIL: gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test FAIL: g++.dg/vect/slp-pr56812.cc -std=c++11 scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times) FAIL: g++.dg/vect/slp-pr56812.cc -std=c++14 scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times) FAIL: g++.dg/vect/slp-pr56812.cc -std=c++98 scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times) I don't think the pr65947-14.c failure is related to this change but the pr56812.cc failure is one of the failures listed in the original report.
FWIW I am still seeing these fail: FAIL: g++.dg/vect/slp-pr56812.cc -std=c++11 scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times) FAIL: g++.dg/vect/slp-pr56812.cc -std=c++14 scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times) FAIL: g++.dg/vect/slp-pr56812.cc -std=c++98 scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times)
This was fixed except for the pr56812 failures which are being tracked via pr81038.
(In reply to Steve Ellcey from comment #22) Finally coming back to this... > Ranier Orth reported a failure on SPARC64 as well, here was my reply > to him. I don't know if your problem is the same without seeing the > specific failure. > > -- > > Looking at the checks at the end, I also see that SPARC does include > the 'Alignment' message and Aarch64 does not and that is handled by a > conditional check. > > I think the fix is to check for 'vectorized 4 loops' when we support > unaligned vector instructions (vect_hw_misalign is true) and check for > 'vectorized 3 loops' otherwise. Does that sound reasonable to you? I just successfully tested a patch along these lines on sparc-sun-solaris2.11 and i386-pc-solaris2.11: works fine. I'll also test on the gcc-8 branch which is likewise affected and then post to gcc-patches. Thanks for the suggestion. Rainer
Created attachment 44498 [details] Proposed patch for gcc.dg/vect/no-section-anchors-vect-69.c failure
Author: ro Date: Tue Aug 7 08:51:29 2018 New Revision: 263352 URL: https://gcc.gnu.org/viewcvs?rev=263352&root=gcc&view=rev Log: Fix gcc.dg/vect/no-section-anchors-vect-69.c on SPARC etc. (PR tree-optimization/80925) 2018-08-07 Steve Ellcey <sellcey@cavium.com> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> PR tree-optimization/80925 * gcc.dg/vect/no-section-anchors-vect-69.c: Expect 3 loops vectorized on !vect_hw_misalign targets. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
Author: ro Date: Tue Aug 7 08:58:20 2018 New Revision: 263353 URL: https://gcc.gnu.org/viewcvs?rev=263353&root=gcc&view=rev Log: Fix gcc.dg/vect/no-section-anchors-vect-69.c on SPARC etc. (PR tree-optimization/80925) 2018-08-07 Steve Ellcey <sellcey@cavium.com> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> PR tree-optimization/80925 * gcc.dg/vect/no-section-anchors-vect-69.c: Expect 3 loops vectorized on !vect_hw_misalign targets. Modified: branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c