[OpenACC] Fix an ICE where a loop with GT condition is collapsed.
We have seen an ICE both on trunk and devel/omp/gcc-10 branches which can
be reprodued with this simple testcase. It occurs if an OpenACC loop has
a collapse clause and any of the loop being collapsed uses GT or GE
condition. This issue is specific to OpenACC.
int main (void)
{
int ix, iy;
int dim_x = 16, dim_y = 16;
{
for (iy = dim_y - 1; iy > 0; --iy)
for (ix = dim_x - 1; ix > 0; --ix)
;
}
}
The problem is caused by a failing assertion in expand_oacc_collapse_init.
It checks that cond_code for fd->loop should be same as cond_code for all
the loops that are being collapsed. As the cond_code for fd->loop is
LT_EXPR with collapse clause (set at the end of omp_extract_for_data),
this assertion forces that all the loop in collapse clause should use
< operator.
There does not seem to be anything in the code which demands this
condition as loop with > condition works ok otherwise. I digged old
mailing list a bit but could not find any discussion on this change.
Looking at the code, expand_oacc_for checks that fd->loop->cond_code is
either LT_EXPR or GT_EXPR. I guess the original intention was to have
similar checks on the loop which are being collapsed. But the way check
was written does not acheive that.
I have fixed it by modifying the check in the assertion to be same as
check on fd->loop->cond_code.
I tested goacc and libgomp (with nvptx offloading) and did not see any
regression. I have added new tests to check collapse with GT/GE condition.
PR middle-end/98088
gcc/
* omp-expand.c (expand_oacc_collapse_init): Update condition in
a gcc_assert.
Jason Merrill [Sat, 10 Apr 2021 18:00:15 +0000 (14:00 -0400)]
c++: ICE with anonymous union [PR97974]
Here lookup got confused by finding a conversion operator from
lookup_anon_field. Let's avoid this by pruning functions from
CLASSTYPE_MEMBER_VEC as well as TYPE_FIELDS.
gcc/cp/ChangeLog:
PR c++/97974
* decl.c (fixup_anonymous_aggr): Prune all functions from
CLASSTYPE_MEMBER_VEC.
gcc/testsuite/ChangeLog:
PR c++/97974
* g++.dg/lookup/pr84962.C: Adjust diagnostic.
* g++.dg/other/anon-union5.C: New test.
Jason Merrill [Sat, 10 Apr 2021 14:55:58 +0000 (10:55 -0400)]
c++: ICE with invalid use of 'this' with static memfn [PR98800]
Here instantiation of the fake 'this' parameter we used when parsing the
trailing return type of func() was failing because there is no actual 'this'
parameter in the instantiation. For PR97399 I told Patrick to do the 'this'
injection even for statics, but now I think I was wrong; the out-of-class
definition case I was concerned about does not break with this patch. And
we don't set current_class_ptr in the body of a static member function.
And the OMP code should continue to parse 'this' and complain about it
rather than give a syntax error.
gcc/cp/ChangeLog:
PR c++/98800
PR c++/97399
* parser.c (cp_parser_direct_declarator): Don't
inject_this_parameter if static_p.
(cp_parser_omp_var_list_no_open): Parse 'this' even if
current_class_ptr isn't set for a better diagnostic.
gcc/testsuite/ChangeLog:
PR c++/98800
* g++.dg/gomp/this-1.C: Adjust diagnostic.
* g++.dg/cpp0x/constexpr-this1.C: New test.
David Malcolm [Sat, 10 Apr 2021 20:23:23 +0000 (16:23 -0400)]
analyzer: fix ICE on assignment from STRING_CST when building path [PR100011]
gcc/analyzer/ChangeLog:
PR analyzer/100011
* region-model.cc (region_model::on_assignment): Avoid NULL
dereference if ctxt is NULL when assigning from a STRING_CST.
gcc/testsuite/ChangeLog:
PR analyzer/100011
* gcc.dg/analyzer/pr100011.c: New test.
The following testcase ICEs during error recovery, because finish_decl
overwrites TREE_TYPE (error_mark_node), which better should stay always
to be error_mark_node.
2021-04-10 Jakub Jelinek <jakub@redhat.com>
PR c/99990
* c-decl.c (finish_decl): Don't overwrite TREE_TYPE of
error_mark_node.
libphobos: Build runtime library with -ffunction-sections -fdata-sections
Tests for `-ffunction-sections -fdata-sections' and sets SECTION_FLAGS
accordingly. If there is no warning when using it, take advantage of
the smaller executables that can be had with `--gc-sections'.
libphobos: Explicitly use -static-libphobos in druntime and phobos tests
Linking to libphobos statically is the default in the driver, however
this may change in future. Be explicit that the static libphobos is
what's being tested.
libphobos/ChangeLog:
* testsuite/libphobos.druntime/druntime.exp: Compile all tests with
-static-libphobos.
* testsuite/libphobos.phobos/phobos.exp: Likewise.
Jakub Jelinek [Sat, 10 Apr 2021 10:49:01 +0000 (12:49 +0200)]
expand: Fix up LTO ICE with COMPOUND_LITERAL_EXPR [PR99849]
The gimplifier optimizes away COMPOUND_LITERAL_EXPRs, but they can remain
in the form of ADDR_EXPR of COMPOUND_LITERAL_EXPRs in static initializers.
By the TREE_STATIC check I meant to check that the underlying decl of
the compound literal is a global rather than automatic variable which
obviously can't be referenced in static initializers, but unfortunately
with LTO it might end up in another partition and thus be DECL_EXTERNAL
instead.
2021-04-10 Jakub Jelinek <jakub@redhat.com>
PR lto/99849
* expr.c (expand_expr_addr_expr_1): Test is_global_var rather than
just TREE_STATIC on COMPOUND_LITERAL_EXPR_DECLs.
This PR is about a -W*uninitialized warning on riscv64.
alloca_type_and_limit is documented to have limit member only defined
when type is ALLOCA_BOUND_MAYBE_LARGE or ALLOCA_BOUND_DEFINITELY_LARGE
and otherwise just default constructs limit, which for wide_int means
no initialization at all. IMHO it is fine not to use the limit
member otherwise, but trying to not initialize it when it can be e.g.
copied around and then invoke UB doesn't look like a good idea.
2021-04-10 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99989
* gimple-ssa-warn-alloca.c
(alloca_type_and_limit::alloca_type_and_limit): Initialize limit to
0 with integer precision unconditionally.
Jakub Jelinek [Sat, 10 Apr 2021 10:46:09 +0000 (12:46 +0200)]
rtlanal: Another fix for VOIDmode MEMs [PR98601]
This is a sequel to the PR85022 changes, inline-asm can (unfortunately)
introduce VOIDmode MEMs and in PR85022 they have been changed so that
we don't pretend we know their size (as opposed to assuming they have
zero size).
This time we ICE in rtx_addr_can_trap_p_1 because it assumes that
all memory but BLKmode has known size. The patch just treats VOIDmode
MEMs like BLKmode in that regard. And, the STRICT_ALIGNMENT change
is needed because VOIDmode has GET_MODE_SIZE of 0 and we don't want to
check if something is a multiple of 0.
2021-04-10 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/98601
* rtlanal.c (rtx_addr_can_trap_p_1): Allow in assert unknown size
not just for BLKmode, but also for VOIDmode. For STRICT_ALIGNMENT
unaligned_mems handle VOIDmode like BLKmode.
Jason Merrill [Fri, 9 Apr 2021 22:02:38 +0000 (18:02 -0400)]
c++: deduction guide using alias [PR99180]
alias_ctad_tweaks was expecting that all deduction guides for the class
would be suitable for deduction from the alias definition; in this case, the
deduction guide uses 'true' and the alias B uses 'false', so deduction
fails. But that's OK, we just don't use that deduction guide. I also
noticed that we were giving up on deduction entirely if substitution failed
for some guide; we should only give up on that particular deduction guide.
We ought to give a better diagnostic about this case when deduction fails,
but that can wait.
Jason Merrill [Fri, 9 Apr 2021 20:43:50 +0000 (16:43 -0400)]
c++: pack in base-specifier in lambda [PR100006]
Normally cp_parser_base_clause prevents unexpanded packs, but in a lambda
check_for_bare_parameter_packs allows it. Then we weren't finding the
pack when scanning the lambda body.
François Dumont [Sun, 7 Mar 2021 18:11:02 +0000 (19:11 +0100)]
libstdc++: [_GLIBCXX_DEBUG] Fix management of __dp_sign_max_size [PR 99402]
__dp_sign precision indicates that we found out what iterator comes first or
last in the range. __dp_sign_max_size is the same plus it gives the information
of the max size of the range that is to say the max_size value such that
distance(lhs, rhs) < max_size.
Thanks to this additional information we are able to tell when a copy of n elements
to that range will fail even if we do not know exactly how large it is.
This patch makes sure that we are properly using this information.
If a toolchain is configured with --with-cpu=X and gcc is
then run with an explicit -march=Y option, we ignore the
X cpu setting and tune for generic Y code:
In the above scenario, ptr->x_explicit_tune_core is aarch64_none,
so we fall back on the default configure-time CPU. This means
that before the push_options we tuned for generic Y but after
the pop_options we tuned for X.
This was picked up by an assertion failure in cl_optimization_compare.
The ICE itself is a GCC 11 regression, but the problem that it shows
up is much older.
gcc/
* config/aarch64/aarch64.c (aarch64_option_restore): If the
architecture was specified explicitly and the tuning wasn't,
tune for the architecture rather than the configured default CPU.
Marek Polacek [Thu, 8 Apr 2021 18:39:28 +0000 (14:39 -0400)]
c++: Fix two issues with auto function parameter [PR99806]
When we have a member function with auto parameter like this:
struct S {
void f(auto);
};
cp_parser_member_declaration -> grokfield produces a FUNCTION_DECL
"void S::foo(auto:1)", and then finish_fully_implicit_template turns
that FUNCTION_DECL into a TEMPLATE_DECL. The bug here is that we only
call cp_parser_save_default_args for a FUNCTION_DECL. As a consequence,
abbrev10.C is rejected because we complain that the default argument has
not been defined, and abbrev11.C ICEs, because we don't re-parse the
delayed noexcept, so the DEFERRED_PARSE tree leaks into tsubst* where we
crash. This patch fixes both issues.
gcc/cp/ChangeLog:
PR c++/99806
* parser.c (cp_parser_member_declaration): Call
cp_parser_save_default_args even for function templates. Use
STRIP_TEMPLATE on the declaration we're passing.
gcc/testsuite/ChangeLog:
PR c++/99806
* g++.dg/concepts/abbrev10.C: New test.
* g++.dg/concepts/abbrev11.C: New test.
These tests are passing on all my runs, and it looks like
they are for Christophe's runs too. We can reapply with a
tighter target selector if this is still a problem for some
configurations.
This patch adds XFAILs for some tests that fail with variable-length
vectors.
For pr96573.c I'd wondered about instead extending the regexp.
The code we generate isn't very good though, so it doesn't seem
worth matching. (Fixing the bad code is on the todo list.)
Which one is better is an interesting question. However, it was really
only a fluke that we generated the original code. The pseudo that
becomes s1 in the new code above has a REG_EQUIV note:
Before the PR, IRA didn't allocate a register to r111 and so LRA
rematerialised the REG_EQUIV note inside insn 18, leading to the
reload. Now IRA allocates a register instead.
So I think this is working as expected, in the sense that IRA is now
doing what the backend asked it to do. If the backend prefers the first
version (and it might not), it needs to do more than it's currently
doing to promote the use of lane loads. E.g. it should probably have a
combine define_split that splits the combination of insn 17 and insn 18
into an ADD + an LD1.
I think for now the best thing is to use a different approach to
triggering the original bug. The asm in the new test ICEs with the
r11-2903 LRA patch reverted and passes with it applied.
gcc/testsuite/
* gcc.target/aarch64/mem-shift-canonical.c: Use an asm instead
of relying on vectorisation.
testsuite: Skip gfortran.dg/ieee/ieee_[68].f90 for Arm targets [PR78314]
For the reasons discussed in PR78314, ieee_support_halting
doesn't work correctly for arm* and aarch64*. I think the
easiest thing is to skip these tests until the PR is fixed.
This doesn't mean that the PR is unimportant. It just doesn't
seem useful to have the unpredictable failures described in the
PR trail given that the problem is known and has been analysed.
gcc/testsuite/
PR libfortran/78314
* gfortran.dg/ieee/ieee_6.f90: Skip for arm* and aarch64*.
* gfortran.dg/ieee/ieee_8.f90: Likewise.
aarch64: Use x30 as temporary in SVE TLSDESC patterns
gcc.dg/torture/tls/tls-reload-1.c started ICEing for SVE some time
during the GCC 11 cycle (not sure when). The problem is that we
had an output reload on a call_insn, which isn't a supported
combination.
This patch uses LR_REGNUM instead. The resulting "blr x30"
might not perform as well on some CPUs, but in this context
the difference shouldn't be noticeable.
gcc/
* config/aarch64/aarch64.md (tlsdesc_small_sve_<mode>): Use X30
as the temporary register.
Jonathan Wakely [Fri, 9 Apr 2021 11:05:39 +0000 (12:05 +0100)]
libstdc++: Fix invalid constexpr function in C++11 mode [PR 99985]
I keep forgetting that a constexpr function in C++11 has to be a single
return statement.
libstdc++-v3/ChangeLog:
PR libstdc++/99985
* include/bits/hashtable.h (_Hashtable::_S_nothrow_move()): Fix
to be a valid constexpr function in C++11.
* testsuite/23_containers/unordered_set/cons/99985.cc: New test.
pthread_setspecific second argument is const void *, so that one can
call it even with pointers to const, but the function only stores the
pointer and does nothing else, so the new assumption of -Wmaybe-uninitialized
that functions taking such pointers will read from what those pointers
will point to is wrong. Maybe it would be useful to have some whitelist
of functions that surely don't do that.
Anyway, in this case it is easy to workaround the warning by moving the
pthread_setspecific call after the initialization without slowing anything
down.
2021-04-09 Jakub Jelinek <jakub@redhat.com>
PR libgomp/99984
* team.c (gomp_thread_start): Call pthread_setspecific for
!(defined HAVE_TLS || defined USE_EMUTLS) only after local_thr
has been initialized to avoid false positive warning.
David Edelsohn [Thu, 8 Apr 2021 01:34:02 +0000 (21:34 -0400)]
aix: revert TLS common change
GCC uses TLS common for both public common / BSS and local common / BSS.
This patch reverts to use .comm directive to allocate TLS
common / BSS. This also changes the priority of section selection
to use BSS before data section.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_xcoff_select_section): Select
TLS BSS before TLS data.
* config/rs6000/xcoff.h (ASM_OUTPUT_TLS_COMMON): Use .comm.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/tls-5.C: Expect tbss failure on AIX.
Patrick Palka [Thu, 8 Apr 2021 20:45:25 +0000 (16:45 -0400)]
libstdc++: Simplify copy-pasted algorithms in <ranges>
The <ranges> header currently copies some simple algorithms from
<bits/ranges_algo.h>, which was originally done in order to avoid a
circular dependency with the header. This is no longer an issue since
the latter header now includes <bits/ranges_util.h> instead of all of
<ranges>.
This means we could now just include <bits/ranges_algo.h> and remove the
copied algorithms, but that'd increase the size of <ranges> by ~10%.
And we can't use the corresponding STL-style algorithms here because
they assume input iterators are copyable. So this patch instead
simplifies these copied algorithms, removing their constraints and
unused parameters, and keeps them around. In a subsequent patch we're
going to copy (a simplified version of) ranges::find into <ranges> as
well.
libstdc++-v3/ChangeLog:
* include/std/ranges (__detail::find_if): Simplify.
(__detail::find_if_not): Likewise.
(__detail::min): Remove.
(__detail::mismatch): Simplify.
(take_view::size): Use std::min instead of __detail::min.
Patrick Palka [Thu, 8 Apr 2021 20:45:22 +0000 (16:45 -0400)]
libstdc++: Fix elements_view::operator* and operator[] [LWG 3502]
While we're modifying elements_view, this also implements the one-line
resolution of LWG 3492.
libstdc++-v3/ChangeLog:
* include/std/ranges (__detail::__returnable_element): New
concept.
(elements_view): Use this concept in its constraints. Add
missing private access specifier.
(elements_view::_S_get_element): Define as per LWG 3502.
(elements_view::operator*, elements_view::operator[]): Use
_S_get_element.
(elements_view::operator++): Remove unnecessary constraint
as per LWG 3492.
* testsuite/std/ranges/adaptors/elements.cc (test05): New test.
Jonathan Wakely [Thu, 8 Apr 2021 17:37:59 +0000 (18:37 +0100)]
libstdc++: Improve error reporting if PDF generation fails
If pdflatex runs out of memory the build fails with no hint what's
wrong. This adds another grep command to the makefile so that an
out-of-memory error will result in more information being shown.
As suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1841056
using lualatex can be used as a workaround.
libstdc++-v3/ChangeLog:
* doc/Makefile.am (stamp-pdf-doxygen): Also grep for
out-of-memory error in log file.
* doc/Makefile.in: Regenerate.
Patrick Palka [Thu, 8 Apr 2021 17:07:43 +0000 (13:07 -0400)]
c++: Don't substitute into constraints on lambdas [PR99874]
We currently substitute through a lambda's constraints whenever we
regenerate it via tsubst_lambda_expr. This is the wrong approach
because it can lead to hard errors due to constraints being evaluated
out of order (as in the testcase concepts-lambda17.C below), and because
it doesn't mesh well with the recently added REQUIRES_EXPR_EXTRA_ARGS
mechanism for delaying substitution into requires-expressions, which is
the cause of this PR.
But in order to avoid substituting through a lambda's constraints during
regeneration, we need to be able to get at all in-scope template
parameters and corresponding template arguments during constraint
checking of a lambda's op(). And this information is not easily
available when we need it, it seems.
To that end, the approach that this patch takes is to add two new fields
to LAMBDA_EXPR (and remove one): LAMBDA_EXPR_REGENERATED_FROM
(replacing LAMBDA_EXPR_INSTANTIATED), and LAMBDA_EXPR_REGENERATING_TARGS.
The former allows us to obtain the complete set of template parameters
that are in-scope for a lambda's op(), and the latter gives us all outer
template arguments that were used to regenerate the lambda (analogous to
the TI_TEMPLATE and TI_ARGS of a TEMPLATE_INFO, respectively).
LAMBDA_EXPR_REGENERATING_TARGS is not strictly necessary -- in an
earlier prototype, I walked LAMBDA_EXPR_EXTRA_SCOPE to build up this set
of outer template arguments on demand, but it seems cleaner to do it this
way. (We'd need to walk LAMBDA_EXPR_EXTRA_SCOPE and not DECL/TYPE_CONTEXT
because the latter skips over variable template scopes.)
This patch also renames the predicate instantiated_lambda_fn_p to
regenerated_lambda_fn_p, for sake of consistency with the rest of the
patch which uses "regenerated" instead of "instantiated".
gcc/cp/ChangeLog:
PR c++/99874
* constraint.cc (get_normalized_constraints_from_decl): Handle
regenerated lambdas.
(satisfy_declaration_constraints): Likewise. Check for
dependent args later.
* cp-tree.h (LAMBDA_EXPR_INSTANTIATED): Replace with ...
(LAMBDA_EXPR_REGENERATED_FROM): ... this.
(LAMBDA_EXPR_REGENERATING_TARGS): New.
(tree_lambda_expr::regenerated_from): New data member.
(tree_lambda_expr::regenerating_targs): New data member.
(add_to_template_args): Declare.
(regenerated_lambda_fn_p): Likewise.
(most_general_lambda): Likewise.
* lambda.c (build_lambda_expr): Set LAMBDA_EXPR_REGENERATED_FROM
and LAMBDA_EXPR_REGENERATING_TARGS.
* pt.c (add_to_template_args): No longer static.
(tsubst_function_decl): Unconditionally propagate constraints on
the substituted function decl.
(instantiated_lambda_fn_p): Rename to ...
(regenerated_lambda_fn_p): ... this. Check
LAMBDA_EXPR_REGENERATED_FROM instead of
LAMBDA_EXPR_INSTANTIATED.
(most_general_lambda): Define.
(enclosing_instantiation_of): Adjust after renaming
instantiated_lambda_fn_p.
(tsubst_lambda_expr): Don't set LAMBDA_EXPR_INSTANTIATED. Set
LAMBDA_EXPR_REGENERATED_FROM and LAMBDA_EXPR_REGENERATING_TARGS.
Don't substitute or set constraints on the regenerated lambda.
gcc/testsuite/ChangeLog:
PR c++/99874
* g++.dg/cpp2a/concepts-lambda16.C: New test.
* g++.dg/cpp2a/concepts-lambda17.C: New test.
Patrick Palka [Thu, 8 Apr 2021 17:07:37 +0000 (13:07 -0400)]
c++: constrained CTAD for nested class template [PR97679]
In the testcase below, we're crashing during constraint checking of the
implicitly generated deduction guides for the nested class template A::B
because we never substitute the outer template arguments (for A) into
the constraint, neither ahead of time nor as part of satisfaction.
Ideally we'd like to avoid substituting into a constraint ahead of
time, but the "flattening" vector 'tsubst_args' is constructed under the
assumption that all outer template arguments are already substituted in,
and eliminating this assumption to yield a flattening vector that
includes outer (generic) template arguments suitable for substituting
into the constraint would be tricky and error-prone. So this patch
takes the approximate approach of substituting the outer arguments into
the constraint ahead of time, so that the subsequent substitution of
'tsubst_args' is coherent and so later satisfaction just works.
gcc/cp/ChangeLog:
PR c++/97679
* pt.c (build_deduction_guide): Document OUTER_ARGS. Substitute
them into the propagated constraints.
gcc/testsuite/ChangeLog:
PR c++/97679
* g++.dg/cpp2a/concepts-ctad3.C: New test.
Jonathan Wakely [Thu, 8 Apr 2021 15:29:11 +0000 (16:29 +0100)]
libstdc++: Simplify noexcept-specifiers for move constructors
This puts the logic for the noexcept-specifier in one place, and then
reuses it elsewhere. This means checking whether the move constructor
can throw doesn't need to do overload resolution and then check whether
some other constructor can throw, we just get the answer directly.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (_Hashtable::_S_nothrow_move()):
New function to determine noexcept-specifier for move
constructors.
(_Hashtable): Use _S_nothrow_move() on move constructors.
* testsuite/23_containers/unordered_map/cons/noexcept_move_construct.cc:
Correct static assertion message.
* testsuite/23_containers/unordered_multimap/cons/noexcept_move_construct.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/cons/noexcept_move_construct.cc:
Likewise.
* testsuite/23_containers/unordered_set/cons/noexcept_move_construct.cc:
Likewise.
Here we were complaining about binding the lvalue reference to the rvalue
result of converting from float to int, but didn't mention that conversion.
Talk about the type of the initializer instead.
VAX: Fix comment for `*bit<mode>' pattern's peephole
The comment for a peephole provided for the `*bit<mode>' pattern to be
produced in comparison elimination from a sequence involving a bitwise
complement operation of one input operand followed by a bitwise AND
operation between a bitwise complement of said intermediate result and
the other input operand (which corresponds to a sequence of MCOM and BIC
machine instructions) incorrectly refers to the first operation as MNEG
(which is the machine instruction for arithmetic negation) rather than
MCOM as it is supposed to. Fix it.
gcc/
* config/vax/vax.md: Fix comment for `*bit<mode>' pattern's
peephole.
Jakub Jelinek [Thu, 8 Apr 2021 15:15:39 +0000 (17:15 +0200)]
c++: Don't cache constexpr functions which are passed pointers to heap or static vars being constructed [PR99859]
When cxx_bind_parameters_in_call is called e.g. on a method on an automatic
variable, we evaluate the argument and because ADDR_EXPR of an automatic
decl is not TREE_CONSTANT, we set *non_constant_args and don't cache it.
But when it is called on an object located on the heap (allocated using
C++20 constexpr new) where we represent it as TREE_STATIC artificial
var, or when it is called on a static var that is currently being
constructed, such ADDR_EXPRs are TREE_CONSTANT and we happily cache
such calls, but they can in those cases have side-effects in the heap
or static var objects and so caching them means such side-effects will
happen only once and not as many times as that method or function is called.
Furthermore, as Patrick mentioned in the PR, the argument doesn't need to be
just ADDR_EXPR of the heap or static var or its components, but it could be
a CONSTRUCTOR that has the ADDR_EXPR embedded anywhere.
And the incorrectly cached function doesn't need to modify the pointed vars
or their components, but some caller could be changing them in between the
call that was cached and the call that used the cached result.
The following patch fixes it by setting *non_constant_args also when
the argument contains somewhere such an ADDR_EXPR, either of a heap
artificial var or component thereof, or of a static var currently being
constructed (where for that it uses the same check as
cxx_eval_store_expression, ctx->global->values.get (...); addresses of
other static variables would be rejected by cxx_eval_store_expression
and therefore it is ok to cache such calls).
2021-04-08 Jakub Jelinek <jakub@redhat.com>
PR c++/99859
* constexpr.c (addr_of_non_const_var): New function.
(cxx_bind_parameters_in_call): Set *non_constant_args to true
even if cp_walk_tree on arg with addr_of_non_const_var callback
returns true.
* g++.dg/cpp1y/constexpr-99859-1.C: New test.
* g++.dg/cpp1y/constexpr-99859-2.C: New test.
* g++.dg/cpp2a/constexpr-new18.C: New test.
* g++.dg/cpp2a/constexpr-new19.C: New test.
This works around the remaining reported execution FAILs of this test on
AIX, Solaris and Darwin. Eventually we should rewrite this test to be
less fragile, but there's not enough time to do that for GCC 11.
libstdc++-v3/ChangeLog:
PR libstdc++/98384
* testsuite/20_util/to_chars/long_double.cc: Don't run the test
on targets without a large long double. XFAIL the execution on
targets with a non-conforming printf.
Patrick Palka [Thu, 8 Apr 2021 14:40:19 +0000 (10:40 -0400)]
libstdc++: Reimplement range adaptors [PR99433]
This rewrites our range adaptor implementation for more comprehensible
error messages, improved SFINAE behavior and conformance to P2281.
The diagnostic improvements mostly come from using appropriately named
functors instead of lambdas in the generic implementation of partial
application and composition of range adaptors, and in the definition of
each of the standard range adaptors. This makes their pretty printed
types much shorter and more self-descriptive.
The improved SFINAE behavior comes from constraining the range adaptors'
member functions appropriately. This improvement fixes PR99433, and is
also necessary in order to implement the wording changes of P2281.
Finally, P2281 clarified that partial application and composition of
range adaptors behaves like a perfect forwarding call wrapper. This
patch implements this, except that we don't bother adding overloads for
forwarding captured state entities as non-const lvalues, since it seems
sufficient to handle the const lvalue and non-const rvalue cases for now,
given the current set of standard range adaptors. But such overloads
can be easily added if they turn out to be needed.
libstdc++-v3/ChangeLog:
PR libstdc++/99433
* include/std/ranges (__adaptor::__maybe_refwrap): Remove.
(__adaptor::__adaptor_invocable): New concept.
(__adaptor::__adaptor_partial_app_viable): New concept.
(__adaptor::_RangeAdaptorClosure): Rewrite, turning it into a
non-template base class.
(__adaptor::_RangeAdaptor): Rewrite, turning it into a CRTP base
class template.
(__adaptor::_Partial): New class template that represents
partial application of a range adaptor non-closure.
(__adaptor::__pipe_invocable): New concept.
(__adaptor::_Pipe): New class template.
(__detail::__can_ref_view): New concept.
(__detail::__can_subrange): New concept.
(all): Replace the lambda here with ...
(_All): ... this functor. Add appropriate constraints.
(__detail::__can_filter_view): New concept.
(filter, _Filter): As in all/_All.
(__detail::__can_transform): New concept.
(transform, _Transform): As in all/_All.
(__detail::__can_take_view): New concept.
(take, _Take): As in all/_All.
(__detail::__can_take_while_view): New concept.
(take_while, _TakeWhile): As in all/_All.
(__detail::__can_drop_view): New concept.
(drop, _Drop): As in all/_All.
(__detail::__can_drop_while_view): New concept.
(drop_while, _DropWhile): As in all/_All.
(__detail::__can_join_view): New concept.
(join, _Join): As in all/_All.
(__detail::__can_split_view): New concept.
(split, _Split): As in all/_All. Rename template parameter
_Fp to _Pattern.
(__detail::__already_common): New concept.
(__detail::__can_common_view): New concept.
(common, _Common): As in all/_All.
(__detail::__can_reverse_view): New concept.
(reverse, _Reverse): As in all/_All.
(__detail::__can_elements_view): New concept.
(elements, _Elements): As in all/_All.
(keys, values): Adjust.
* testsuite/std/ranges/adaptors/99433.cc: New test.
* testsuite/std/ranges/adaptors/all.cc: No longer expect that
adding empty range adaptor closure objects to a pipeline doesn't
increase the size of the pipeline.
(test05): New test.
* testsuite/std/ranges/adaptors/common.cc (test03): New test.
* testsuite/std/ranges/adaptors/drop.cc (test09): New test.
* testsuite/std/ranges/adaptors/drop_while.cc (test04): New test.
* testsuite/std/ranges/adaptors/elements.cc (test04): New test.
* testsuite/std/ranges/adaptors/filter.cc (test06): New test.
* testsuite/std/ranges/adaptors/join.cc (test09): New test.
* testsuite/std/ranges/adaptors/p2281.cc: New test.
* testsuite/std/ranges/adaptors/reverse.cc (test07): New test.
* testsuite/std/ranges/adaptors/split.cc (test01, test04):
Adjust.
(test09): New test.
* testsuite/std/ranges/adaptors/split_neg.cc (test01): Adjust
expected error message.
(test02): Likewise. Extend test.
* testsuite/std/ranges/adaptors/take.cc (test06): New test.
* testsuite/std/ranges/adaptors/take_while.cc (test05): New test.
* testsuite/std/ranges/adaptors/transform.cc (test07, test08):
New test.
testsuite: Update error messages in sve/acle/general-c
The “previous definition of 'x'” notes now include the type
of the original definition before “was here”. There's not really
any need to hard-code that much of the message in the ACLE tests,
so this patch just removes the “was here” from the match string.
Some sve/mul_2.c tests were failing because we'd (reasonably)
decided to use shifts and adds instead of MULs for some simple
negative constants. We'd already needed to avoid that when
picking positive constants, so this patch does the same thing
for the negative ones.
gcc/testsuite/
* gcc.target/aarch64/sve/mul_2.c: Adjust negative constants to avoid
conversion to shifts and adds.
David Malcolm [Thu, 8 Apr 2021 13:46:03 +0000 (09:46 -0400)]
analyzer: fix leak false +ves due to maybe-clobbered regions [PR99042,PR99774]
Prior to this patch, program_state::detect_leaks worked by finding all
live svalues in the old state and in the new state, and calling
on_svalue_leak for each svalue that has changed from being live to
not being live.
PR analyzer/99042 and PR analyzer/99774 both describe false leak
diagnostics from -fanalyzer (a false FILE * leak in git, and a false
malloc leak in qemu, respectively).
In both cases the root cause of the false leak diagnostic relates to
svalues no longer being explicitly bound in the store due to regions
being conservatively clobbered, due to an unknown function being
called, or due to a write through a pointer that could alias the
region, respectively.
We have a transition from an svalue being explicitly live to not
being explicitly live - but only because the store is being
conservative, clobbering the binding. The leak detection is looking
for transitions from "definitely live" to "not definitely live",
when it should be looking for transitions from "definitely live"
to "definitely not live".
This patch introduces a new class to temporarily capture information
about svalues that were explicitly live, but for which a region bound
to them got clobbered for conservative reasons. This new
"uncertainty_t" class is passed around to capture the data long enough
for use in program_state::detect_leaks, where it is used to only
complain about svalues that were definitely live and are now both
not definitely live *or* possibly-live i.e. definitely not-live.
The class also captures for which svalues we can't meaningfully track
sm-state anymore, and resets the svalues back to the "start" state.
Together, these changes fix the false leak reports.
gcc/analyzer/ChangeLog:
PR analyzer/99042
PR analyzer/99774
* engine.cc
(impl_region_model_context::impl_region_model_context): Add
uncertainty param and use it to initialize m_uncertainty.
(impl_region_model_context::get_uncertainty): New.
(impl_sm_context::get_fndecl_for_call): Add NULL for new
uncertainty param when constructing impl_region_model_context.
(impl_sm_context::get_state): Likewise.
(impl_sm_context::set_next_state): Likewise.
(impl_sm_context::warn): Likewise.
(exploded_node::on_stmt): Add uncertainty param
and use it when constructing impl_region_model_context.
(exploded_node::on_edge): Add uncertainty param and pass
to on_edge call.
(exploded_node::detect_leaks): Create uncertainty_t and pass to
impl_region_model_context.
(exploded_graph::get_or_create_node): Create uncertainty_t and
pass to prune_for_point.
(maybe_process_run_of_before_supernode_enodes): Create
uncertainty_t and pass to impl_region_model_context.
(exploded_graph::process_node): Create uncertainty_t instances and
pass around as needed.
* exploded-graph.h
(impl_region_model_context::impl_region_model_context): Add
uncertainty param.
(impl_region_model_context::get_uncertainty): New decl.
(impl_region_model_context::m_uncertainty): New field.
(exploded_node::on_stmt): Add uncertainty param.
(exploded_node::on_edge): Likewise.
* program-state.cc (sm_state_map::on_liveness_change): Get
uncertainty from context and use it to unset sm-state from
svalues as appropriate.
(program_state::on_edge): Add uncertainty param and use it when
constructing impl_region_model_context. Fix indentation.
(program_state::prune_for_point): Add uncertainty param and use it
when constructing impl_region_model_context.
(program_state::detect_leaks): Get any uncertainty from ctxt and
use it to get maybe-live svalues for dest_state, rather than
definitely-live ones; use this when determining which svalues
have leaked.
(selftest::test_program_state_merging): Create uncertainty_t and
pass to impl_region_model_context.
* program-state.h (program_state::on_edge): Add uncertainty param.
(program_state::prune_for_point): Likewise.
* region-model-impl-calls.cc (call_details::get_uncertainty): New.
(region_model::impl_call_memcpy): Pass uncertainty to
mark_region_as_unknown call.
(region_model::impl_call_memset): Likewise.
(region_model::impl_call_strcpy): Likewise.
* region-model-reachability.cc (reachable_regions::handle_sval):
Also add sval to m_mutable_svals.
* region-model.cc (region_model::on_assignment): Pass any
uncertainty from ctxt to the store::set_value call.
(region_model::handle_unrecognized_call): Get any uncertainty from
ctxt and use it to record mutable svalues at the unknown call.
(region_model::get_reachable_svalues): Add uncertainty param and
use it to mark any maybe-bound svalues as being reachable.
(region_model::set_value): Pass any uncertainty from ctxt to the
store::set_value call.
(region_model::mark_region_as_unknown): Add uncertainty param and
pass it on to the store::mark_region_as_unknown call.
(region_model::update_for_call_summary): Add uncertainty param and
pass it on to the region_model::mark_region_as_unknown call.
* region-model.h (call_details::get_uncertainty): New decl.
(region_model::get_reachable_svalues): Add uncertainty param.
(region_model::mark_region_as_unknown): Add uncertainty param.
(region_model_context::get_uncertainty): New vfunc.
(noop_region_model_context::get_uncertainty): New vfunc
implementation.
* store.cc (dump_svalue_set): New.
(uncertainty_t::dump_to_pp): New.
(uncertainty_t::dump): New.
(binding_cluster::clobber_region): Pass NULL for uncertainty to
remove_overlapping_bindings.
(binding_cluster::mark_region_as_unknown): Add uncertainty param
and pass it to remove_overlapping_bindings.
(binding_cluster::remove_overlapping_bindings): Add uncertainty param.
Use it to record any svalues that were in clobbered bindings.
(store::set_value): Add uncertainty param. Pass it to
binding_cluster::mark_region_as_unknown when handling symbolic
regions.
(store::mark_region_as_unknown): Add uncertainty param and pass it
to binding_cluster::mark_region_as_unknown.
(store::remove_overlapping_bindings): Add uncertainty param and
pass it to binding_cluster::remove_overlapping_bindings.
* store.h (binding_cluster::mark_region_as_unknown): Add
uncertainty param.
(binding_cluster::remove_overlapping_bindings): Likewise.
(store::set_value): Likewise.
(store::mark_region_as_unknown): Likewise.
gcc/testsuite/ChangeLog:
PR analyzer/99042
PR analyzer/99774
* gcc.dg/analyzer/pr99042.c: New test.
* gcc.dg/analyzer/pr99774-1.c: New test.
* gcc.dg/analyzer/pr99774-2.c: New test.
d: Update language attribute support, and implement gcc.attributes
D attribute support has been updated to have a baseline parity with the
LLVM D compiler's own `ldc.attributes'.
The handler that extracts GCC attributes from a list of UDAs has been
improved to take care of some mistakes that could have been warnings.
UDAs attached to field variables are also now processed for any GCC
attributes attached to them.
The following new attributes have been added to the D front-end:
The old gcc.attribute module has been deprecated, along with the removal
of the following attribute handlers:
- @attribute("alias"): Has been superseded by `pragma(mangle)'.
- @attribute("forceinline"): Renamed to always_inline.
gcc/d/ChangeLog:
* d-attribs.cc: Include fold-const.h and opts.h.
(attr_noreturn_exclusions): Add alloc_size.
(attr_const_pure_exclusions): Likewise.
(attr_inline_exclusions): Add target_clones.
(attr_noinline_exclusions): Rename forceinline to always_inline.
(attr_target_exclusions): New array.
(attr_target_clones_exclusions): New array.
(attr_alloc_exclusions): New array.
(attr_cold_hot_exclusions): New array.
(d_langhook_common_attribute_table): Add new D attribute handlers.
(build_attributes): Update to look for gcc.attributes. Issue warning
if not given a struct literal. Handle void initialized arguments.
(handle_always_inline_attribute): Remove function.
(d_handle_noinline_attribute): Don't extract TYPE_LANG_FRONTEND.
(d_handle_forceinline_attribute): Rename to...
(d_handle_always_inline_attribute): ...this. Remove special handling.
(d_handle_flatten_attribute): Don't extract TYPE_LANG_FRONTEND.
(d_handle_target_attribute): Likewise. Warn about empty arguments.
(d_handle_target_clones_attribute): New function.
(optimize_args): New static variable.
(parse_optimize_options): New function.
(d_handle_optimize_attribute): New function.
(d_handle_noclone_attribute): Don't extract TYPE_LANG_FRONTEND.
(d_handle_alias_attribute): Remove function.
(d_handle_noicf_attribute): New function.
(d_handle_noipa_attribute): New function.
(d_handle_section_attribute): Call the handle_generic_attribute target
hook after performing target independent processing.
(d_handle_symver_attribute): New function.
(d_handle_noplt_attribute): New function.
(positional_argument): New function.
(d_handle_alloc_size_attribute): New function.
(d_handle_cold_attribute): New function.
(d_handle_restrict_attribute): New function.
(d_handle_used_attribute): New function.
* decl.cc (gcc_attribute_p): Update to look for gcc.attributes.
(get_symbol_decl): Update decl source location of old prototypes to
the new declaration being merged.
* types.cc (layout_aggregate_members): Apply user defined attributes
on fields.
* gdc.dg/gdc108.d: Update test.
* gdc.dg/gdc142.d: Likewise.
* gdc.dg/pr90136a.d: Likewise.
* gdc.dg/pr90136b.d: Likewise.
* gdc.dg/pr90136c.d: Likewise.
* gdc.dg/pr95173.d: Likewise.
* gdc.dg/attr_allocsize1.d: New test.
* gdc.dg/attr_allocsize2.d: New test.
* gdc.dg/attr_alwaysinline1.d: New test.
* gdc.dg/attr_cold1.d: New test.
* gdc.dg/attr_exclusions1.d: New test.
* gdc.dg/attr_exclusions2.d: New test.
* gdc.dg/attr_flatten1.d: New test.
* gdc.dg/attr_module.d: New test.
* gdc.dg/attr_noclone1.d: New test.
* gdc.dg/attr_noicf1.d: New test.
* gdc.dg/attr_noinline1.d: New test.
* gdc.dg/attr_noipa1.d: New test.
* gdc.dg/attr_noplt1.d: New test.
* gdc.dg/attr_optimize1.d: New test.
* gdc.dg/attr_optimize2.d: New test.
* gdc.dg/attr_optimize3.d: New test.
* gdc.dg/attr_optimize4.d: New test.
* gdc.dg/attr_restrict1.d: New test.
* gdc.dg/attr_section1.d: New test.
* gdc.dg/attr_symver1.d: New test.
* gdc.dg/attr_target1.d: New test.
* gdc.dg/attr_targetclones1.d: New test.
* gdc.dg/attr_used1.d: New test.
* gdc.dg/attr_used2.d: New test.
* gdc.dg/attr_weak1.d: New test.
* gdc.dg/imports/attributes.d: New test.
We were telling users they needed more template<> to specialize a member
template in a testcase with no member templates. Only produce that message
if we actually see a member template, and also always print the candidates.
Marek Polacek [Wed, 7 Apr 2021 20:44:24 +0000 (16:44 -0400)]
c++: Fix ICE with unexpanded parameter pack [PR99844]
In explicit17.C, we weren't detecting an unexpanded parameter pack in
explicit(bool), so we crashed on a TEMPLATE_PARM_INDEX in constexpr.
I noticed the same is true for noexcept(), but only since my patch to
implement delayed parsing of noexcept. Previously, we would detect the
unexpanded pack in push_template_decl but now the noexcept expression
has not yet been parsed, so we need to do it a bit later.
Jonathan Wakely [Thu, 8 Apr 2021 09:50:57 +0000 (10:50 +0100)]
libstdc++: Make std::is_scoped_enum work with incomplete types
Tim Song pointed out that using __underlying_type is ill-formed for
incomplete enumeration types, and is_scoped_enum doesn't require a
complete type. This changes the trait to check for conversion to int
instead of to the underlying type.
In order to give the correct result when the trait is used in the
enumerator-list of an incomplete type the partial specialization for
enums has an additional check that fails for incomplete types. This
assumes that an incompelte enumeration type must be an unscoped
enumeration, and so the primary template (with a std::false_type base
characteristic) can be used. This isn't necessarily true, but it is not
currently possible to refer to a scoped enumeration type before its type
is complete (PR c++/89025).
It should be possible to use requires(remove_cv_t<_Tp> __t) in the
partial specialization's assignablility check, but that currently gives
an ICE (PR c++/99968) so there is an extra partial specialization of
is_scoped_enum<const _Tp> to handle const types.
libstdc++-v3/ChangeLog:
* include/std/type_traits (is_scoped_enum<T>): Constrain partial
specialization to not match incomplete enum types. Use a
requires-expression instead of instantiating is_convertible.
(is_scoped_enum<const T>): Add as workaround for PR c++/99968.
* testsuite/20_util/is_scoped_enum/value.cc: Check with
incomplete types and opaque-enum-declarations.
Alex Coplan [Thu, 8 Apr 2021 08:36:57 +0000 (09:36 +0100)]
arm: Various MVE vec_duplicate fixes [PR99647]
This patch fixes various issues with vec_duplicate in the MVE patterns.
Currently there are two patterns named *mve_mov<mode>. The second of
these is really a vector duplicate rather than a move, so I've renamed
it accordingly.
As it stands, there are several issues with this pattern:
1. The MVE_types iterator has an entry for TImode, but
vec_duplicate:TI is invalid.
2. The mode of the operand to vec_duplicate is SImode, but it should
vary according to the vector mode iterator.
3. The second alternative of this pattern is bogus: it allows matching
symbol_refs (the cause of the PR) and const_ints (which means that it
matches (vec_duplicate (const_int ...)) which is non-canonical: such
rtxes should be const_vectors instead and handled by the main vector
move pattern).
This patch fixes all of these issues, and removes the redundant
*mve_vec_duplicate<mode> pattern.
gcc/ChangeLog:
PR target/99647
* config/arm/iterators.md (MVE_vecs): New.
(V_elem): Also handle V2DF.
* config/arm/mve.md (*mve_mov<mode>): Rename to ...
(*mve_vdup<mode>): ... this. Remove second alternative since
vec_duplicate of const_int is not canonical RTL, and we don't
want to match symbol_refs.
(*mve_vec_duplicate<mode>): Delete (pattern is redundant).
gcc/testsuite/ChangeLog:
PR target/99647
* gcc.c-torture/compile/pr99647.c: New test.
Xionghu Luo [Wed, 7 Apr 2021 05:29:32 +0000 (00:29 -0500)]
Improve rtx insn vec output
print_rtl will dump the rtx_insn from current until LAST. But it is only
useful to see the particular insn that called by print_rtx_insn_vec,
Let's call print_rtl_single to display that insn in the gcse and store-motion
pass dump.
Jason Merrill [Wed, 7 Apr 2021 20:42:44 +0000 (16:42 -0400)]
c++: friend with redundant qualification [PR41723]
Different code paths were correctly choosing to look up D directly, since C
is the current instantiation, but here we decided to try to make it a
typename type, leading to confusion. Fixed by using dependent_scope_p as we
do elsewhere.
Jason Merrill [Wed, 7 Apr 2021 18:55:48 +0000 (14:55 -0400)]
c++: using overloaded with local decl [PR92918]
The problem here was that the lookup for 'impl' when parsing the template
only found the using-declaration, not the member function declaration.
This happened because when trying to add the member function declaration,
push_class_level_binding_1 saw that the current binding was a USING_DECL and
the new value is an overload, and decided to just return success.
That 'return true' dates back to r69921. In
https://gcc.gnu.org/pipermail/gcc-patches/2003-July/110632.html Nathan
mentions that we only push dependent USING_DECLs, which is no longer the
case; now that we retain more USING_DECLs, handling this case like the other
overloaded function cases seems like the obvious solution.
gcc/cp/ChangeLog:
PR c++/92918
* name-lookup.c (push_class_level_binding_1): Do overload a new
function with a previous using-declaration.
It turns out that, on targets that use testglue, many gcc.dg/vect
scan-dump tests became UNRESOLVED after the change to the dump
file naming scheme.
The problem is that, when creating an executable, we normally name
the dump file after both the executable and the source file name.
However, as an exception, we name it after only the source file
name if:
(a) there is only one source file name and
(b) the source file and the executable have the same basename
Both (a) and (b) are normally true when building executables from
gcc.dg/vect. But (a) is not true when linking against testglue.
The harness was therefore looking for a dump file based only on the
source file name while the compiler was producing a dump file that
contained both names.
We get around this for dg-additional-sources using:
# This option restores naming of aux and dump output files
# after input files when multiple input files are named,
# instead of getting them combined with the output name.
lappend options "additional_flags=-dumpbase \"\""
This patch does the same thing for executables that are linked
against testglue. This removes over 2400 UNRESOLVEDs from an
armeb-eabi test run, but in so doing introduces FAILs for some
tests that were previously skipped.
gcc/testsuite/
* lib/gcc.exp (gcc_target_compile): Add -dumpbase ""
when building an executable with testglue.
Jonathan Wakely [Wed, 7 Apr 2021 15:05:42 +0000 (16:05 +0100)]
libstdc++: Fix filesystem::path construction from COW string [PR 99805]
Calling the non-const data() member on a COW string makes it "leaked",
possibly resulting in reallocating the string to ensure a unique owner.
The path::_M_split_cmpts() member parses its _M_pathname string using
string_view objects and then calls _M_pathname.data() to find the offset
of each string_view from the start of the string. However because
_M_pathname is non-const that will cause a COW string to reallocate if
it happens to be shared with another string object. This results in the
offsets calculated for each component being wrong (i.e. undefined)
because the string views no longer refer to substrings of the
_M_pathname member. The fix is to use the parse.offset(c) member which
gets the offset safely.
The bug only happens for the path(string_type&&) constructor and only
for COW strings. When constructed from an lvalue string the string's
contents are copied rather than just incrementing the refcount, so
there's no reallocation when calling the non-const data() member. The
testsuite changes check the lvalue case anyway, because we should
probably change the deep copying to just be a refcount increment (by
adding a path(const string_type&) constructor or an overload for
__effective_range(const string_type&), for COW strings only).
libstdc++-v3/ChangeLog:
PR libstdc++/99805
* src/c++17/fs_path.cc (path::_M_split_cmpts): Do not call
non-const member on _M_pathname, to avoid copy-on-write.
* testsuite/27_io/filesystem/path/decompose/parent_path.cc:
Check construction from strings that might be shared.
Many of the gcc.target/sve/slp-perm*.c tests started failing
after the introduction of separate SLP permute nodes.
This patch adds variable-length support using a similar
technique to vect_transform_slp_perm_load.
As there, the idea is to detect when every permute mask vector
is the same and can be generated using a regular stepped sequence.
We can easily handle those cases for variable-length, but still
need to restrict the general case to constant-length.
Again copying vect_transform_slp_perm_load, the idea is to distinguish
the two cases regardless of whether the length is variable or not,
partly to increase testing coverage and partly because it avoids
generating redundant trees.
Doing this means that we can also use SLP for the two-vector
permute in pr88834.c, which we couldn't before VEC_PERM_EXPR
nodes were introduced. The patch therefore makes pr88834.c
check that we don't regress back to not using SLP and adds
pr88834_ld3.c to check for the original problem in the PR.
gcc/
PR tree-optimization/97513
* tree-vect-slp.c (vect_add_slp_permutation): New function,
split out from...
(vectorizable_slp_permutation): ...here. Detect cases in which
all VEC_PERM_EXPRs are guaranteed to have the same stepped
permute vector and only generate one permute vector for that case.
Extend that case to handle variable-length vectors.
gcc/testsuite/
* gcc.target/aarch64/sve/pr88834.c: Expect the vectorizer to use SLP.
* gcc.target/aarch64/sve/pr88834_ld3.c: New test.
vect: Don't split store groups if we have IFN_STORE_LANES [PR99873]
As noted in the PR, we were no longer using ST3 for the testcase and
instead stored each lane individually. This is because we'd split
the store group during SLP and couldn't recover when SLP failed.
However, we can also get better code with ST3 and ST4 even if SLP would
have succeeded, such as for vect-complex-5.c. I'm not sure exactly
where the cut-off point is, but it seems reasonable to allow the split
if either of the new groups would operate on full vectors *within*
rather than across scalar loop iterations.
E.g. on a Cortex-A57, pr99873_3.c performs better using ST4 while
pr99873_2.c performs better with SLP.
Another factor is that SLP can handle smaller iteration counts than
IFN_STORE_LANES can, but we don't have the infrastructure to choose
reliably based on that.
gcc/
PR tree-optimization/99873
* tree-vect-slp.c (vect_slp_prefer_store_lanes_p): New function.
(vect_build_slp_instance): Don't split store groups that could
use IFN_STORE_LANES.
gcc/testsuite/
* gcc.dg/vect/slp-21.c: Only expect 2 of the loops to use SLP
if IFN_STORE_LANES is available.
* gcc.dg/vect/vect-complex-5.c: Expect no loops to use SLP if
IFN_STORE_LANES is available.
* gcc.target/aarch64/pr99873_1.c: New test.
* gcc.target/aarch64/pr99873_2.c: Likewise.
* gcc.target/aarch64/pr99873_3.c: Likewise.
* gcc.target/aarch64/sve/pr99873_1.c: Likewise.
* gcc.target/aarch64/sve/pr99873_2.c: Likewise.
* gcc.target/aarch64/sve/pr99873_3.c: Likewise.
Jakub Jelinek [Wed, 7 Apr 2021 13:51:15 +0000 (15:51 +0200)]
varasm: Fix up constpool alias handling [PR99872]
Last year, I have added in r11-2944-g0106300f6c3f7bae5eb1c46dbd45aa07c94e1b15
(aka PR54201 fix) code to find bitwise duplicates in constant pool and output
them as aliases instead of duplicating the data.
Unfortunately this broke mingw32 -m32.
On most targets, ASM_GENERATE_INTERNAL_LABEL with "LC" emits something like
*.LC123 and the targets don't add user label prefixes, so the aliases
that we print should be something like
.set .LC5, .LC6
or
.set .LC5, .LC6 + 8
and I wasn't sure if ASM_OUTPUT_DEF can handle the * and therefore I have
stripped it.
But, on mingw32 -m32, ASM_GENERATE_INTERNAL_LABEL with "LC" emits
*LC123 and the target has user label prefixes, which means what I wrote
results in
LC6:
...
.set _LC5, _LC6
which results in unresolved symbols. I went through the ASM_OUTPUT_DEF
definitions of all targets and all of them use assemble_name twice under
the hood (with various differences on what they print before, in between or
after those names). And assemble_name handles the name encoding properly,
so if we pass it ASM_OUTPUT_DEF (..., "*.LC123", "*.LC456+16") it will
emit .LC123 and .LC456+16 and if we pass it "*LC789", it will emit
LC789.
2021-04-07 Jakub Jelinek <jakub@redhat.com>
PR target/99872
* varasm.c (output_constant_pool_contents): Don't strip name encoding
from XSTR (desc->sym, 0) or from label before passing those to
ASM_OUTPUT_DEF.
Richard Biener [Wed, 7 Apr 2021 11:17:05 +0000 (13:17 +0200)]
tree-optimization/99954 - fix loop distribution memcpy classification
This fixes bogus classification of a copy as memcpy. We cannot use
plain dependence analysis to decide between memcpy and memmove when
it computes no dependence. Instead we have to try harder later which
the patch does for the gcc.dg/tree-ssa/ldist-24.c testcase by resorting
to tree-affine to compute the difference between src and dest and
compare against the copy size.
2021-04-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/99954
* tree-loop-distribution.c: Include tree-affine.h.
(generate_memcpy_builtin): Try using tree-affine to prove
non-overlap.
(loop_distribution::classify_builtin_ldst): Always classify
as PKIND_MEMMOVE.
This avoids (again) the C++ pitfall of pushing a reference to
sth being reallocated.
2021-04-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/99947
* tree-vect-loop.c (vectorizable_induction): Pre-allocate
steps vector to avoid pushing elements from the reallocated
vector.