[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3
cvs-commit at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Jun 9 12:41:50 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:ce670e4faafb296d1f1a7828d20f8c8ba4686797
commit r12-1329-gce670e4faafb296d1f1a7828d20f8c8ba4686797
Author: Richard Biener <rguenther@suse.de>
Date: Wed Nov 18 14:17:34 2020 +0100
tree-optimization/97832 - handle associatable chains in SLP discovery
This makes SLP discovery handle associatable (including mixed
plus/minus) chains better by swapping operands across the whole
chain. To work this adds caching of the 'matches' lanes for
failed SLP discovery attempts, thereby fixing a failed SLP
discovery for the slp-pr98855.cc testcase which results in
building an operand from scalars as expected. Unfortunately
this makes us trip over the cost threshold so I'm XFAILing the
testcase for now.
For BB vectorization all this doesn't work because we have no way
to distinguish good from bad associations as we eventually build
operands from scalars and thus not fail in the classical sense.
2021-05-31 Richard Biener <rguenther@suse.de>
PR tree-optimization/97832
* tree-vectorizer.h (_slp_tree::failed): New.
* tree-vect-slp.c (_slp_tree::_slp_tree): Initialize
failed member.
(_slp_tree::~_slp_tree): Free failed.
(vect_build_slp_tree): Retain failed nodes and record
matches in them, copying that back out when running
into a cached fail. Dump start and end of discovery.
(dt_sort_cmp): New.
(vect_build_slp_tree_2): Handle associatable chains
together doing more aggressive operand swapping.
* gcc.dg/vect/pr97832-1.c: New testcase.
* gcc.dg/vect/pr97832-2.c: Likewise.
* gcc.dg/vect/pr97832-3.c: Likewise.
* g++.dg/vect/slp-pr98855.cc: XFAIL.
More information about the Gcc-bugs
mailing list