Kefu Chai [Mon, 1 May 2023 20:24:26 +0000 (21:24 +0100)]
libstdc++: Set _M_string_length before calling _M_dispose() [PR109703]
This always sets _M_string_length in the constructor for ranges of input
iterators, such as stream iterators.
We copy from the source range to the local buffer, and then repeatedly
reallocate a larger one if necessary. When disposing the old buffer,
_M_is_local() is used to tell if the buffer is the local one or not (and
so must be deallocated). In addition to comparing the buffer address
with the local buffer, _M_is_local() has an optimization hint so that
the compiler knows that for a string using the local buffer, there is an
invariant that _M_string_length <= _S_local_capacity (added for PR109299
via r13-6915-gbf78b43873b0b7). But we failed to set _M_string_length in
the constructor taking a pair of iterators, so the invariant might not
hold, and __builtin_unreachable() is reached. This causes UBsan errors,
and potentially misoptimization.
To ensure the invariant holds, _M_string_length is initialized to zero
before doing anything else, so that _M_is_local() doesn't see an
uninitialized value.
This issue only surfaces when constructing a string with a range of
input iterator, and the uninitialized _M_string_length happens to be
greater than _S_local_capacity, i.e., 15 for the std::string
specialization.
Jakub Jelinek [Wed, 3 May 2023 08:38:04 +0000 (10:38 +0200)]
c++: Fix up VEC_INIT_EXPR gimplification after r12-7069
During patch backporting, I've noticed that while most cp_walk_tree calls
with cp_fold_r callback callers were changed from &pset to cp_fold_data
&data, the VEC_INIT_EXPR gimplifications has not, so it still passes just
address of a hash_set<tree> and so if during the folding we ever touch
data->genericize, we use uninitialized data there.
The following patch changes it to do the same thing as cp_fold_function
because the VEC_INIT_EXPR gimplifications will happen on function bodies
only.
2023-05-03 Jakub Jelinek <jakub@redhat.com>
* cp-gimplify.cc (cp_fold_data): Move definition earlier.
(cp_gimplify_expr): Pass address of genericize = true
constructed data rather than &pset to cp_walk_tree with cp_fold_r.
c++, coroutines: Fix block nests when the function has no top-level bind.
When the function contains no local vars and also no nested scopes, there
is no top-level bind expression. Because the rewritten coroutine body will
require both local vars and contain nested scopes, we add a bind expression
to such functions. When this was done the necessary scope blocks were
omitted which leads to disconnected function content.
Fixed by adding a new block to the added bind expression.
Iain Sandoe [Thu, 30 Mar 2023 07:44:23 +0000 (13:14 +0530)]
c++,coroutines: Stabilize names of promoted slot vars [PR101118].
When we need to 'promote' a value (i.e. store it in the coroutine frame) it
is given a frame entry name. This was based on the DECL_UID for slot vars.
However, when LTO is used, the names from multiple TUs become visible at the
same time, and the DECL_UIDs usually differ between units. This leads to a
"ODR mismatch" warning for the frame type.
The fix here is to use the current promoted temporaries count to produce
the name, this is stable between TUs and computed per coroutine.
Andrew Pinski [Thu, 8 Dec 2022 22:34:16 +0000 (22:34 +0000)]
coroutines: Build pointer initializers with nullptr_node [PR107768]
The PR reports that using integer_zero_node triggers a warning for
-Wzero-as-null-pointer-constant which comes from compiler-generated code so
makes no sense to the end user.
Patrick Palka [Wed, 12 Apr 2023 17:08:21 +0000 (13:08 -0400)]
libstdc++: Implement LWG 3904 change to lazy_split_view's iterator
libstdc++-v3/ChangeLog:
* include/std/ranges (lazy_split_view::_OuterIter::_OuterIter):
Propagate _M_trailing_empty in the const-converting constructor
as per LWG 3904.
* testsuite/std/ranges/adaptors/lazy_split.cc (test12): New test.
Patrick Palka [Tue, 14 Mar 2023 20:44:32 +0000 (16:44 -0400)]
libstdc++: Implement P2520R0 changes to move_iterator's iterator_concept
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h (move_iterator::_S_iter_concept):
Define.
(__cpp_lib_move_iterator_concept): Define for C++20.
(move_iterator::iterator_concept): Strengthen as per P2520R0.
* include/std/version (__cpp_lib_move_iterator_concept): Define
for C++20.
* testsuite/24_iterators/move_iterator/p2520r0.cc: New test.
Patrick Palka [Thu, 9 Mar 2023 18:35:04 +0000 (13:35 -0500)]
libstdc++: Make views::single/iota/istream SFINAE-friendly [PR108362]
PR libstdc++/108362
libstdc++-v3/ChangeLog:
* include/std/ranges (__detail::__can_single_view): New concept.
(_Single::operator()): Constrain it. Move [[nodiscard]] to the
end of the function declarator.
(__detail::__can_iota_view): New concept.
(_Iota::operator()): Constrain it. Move [[nodiscard]] to the
end of the function declarator.
(__detail::__can_istream_view): New concept.
(_Istream::operator()): Constrain it. Move [[nodiscard]] to the
end of the function declarator.
* testsuite/std/ranges/iota/lwg3292_neg.cc: Prune "in
requirements" diagnostic.
* testsuite/std/ranges/iota/iota_view.cc (test07): New test.
* testsuite/std/ranges/istream_view.cc (test08): New test.
* testsuite/std/ranges/single_view.cc (test07): New test.
Patrick Palka [Mon, 24 Apr 2023 17:39:54 +0000 (13:39 -0400)]
libstdc++: Fix __max_diff_type::operator>>= for negative values
This patch fixes sign bit propagation when right-shifting a negative
__max_diff_type value by more than one, a bug that our existing test
coverage didn't expose until r14-159-g03cebd304955a6 fixed the front
end's 'signed typedef-name' handling that the test relies on (which is
a non-standard extension to the language grammar).
libstdc++-v3/ChangeLog:
* include/bits/max_size_type.h (__max_diff_type::operator>>=):
Fix propagation of sign bit.
* testsuite/std/ranges/iota/max_size_type.cc: Avoid using the
non-standard 'signed typedef-name'. Add some compile-time tests
for right-shifting a negative __max_diff_type value by more than
one.
Patrick Palka [Fri, 24 Mar 2023 18:51:24 +0000 (14:51 -0400)]
c++: outer 'this' leaking into local class [PR106969]
Here when resolving the implicit object for '&wrapped' within the
local class Foo, we expect to obtain a dummy object of type Foo& since
there's no 'this' available in this context. And yet at this point
current_class_ref still corresponds to the outer class Context (and is
const), which confuses maybe_dummy_object into propagating the cv-quals
of current_class_ref and returning an object of type const Foo&. Thus
decltype(&wrapped) wrongly yields const int* instead of int*.
The problem ultimately seems to be that the 'this' from the enclosing
class appears available for use when parsing the local class, but 'this'
shouldn't persist across classes like that. This patch fixes this by
clearing current_class_ptr/ref before parsing a class definition.
After this change, for the test name-clash11.C in C++98 mode we would
now complain about an invalid use of 'this' in e.g.
ASSERT (sizeof (this->A) == 16);
due to the way the test defines the ASSERT macro via a local class.
This patch redefines the macro using a local typedef instead.
PR c++/106969
gcc/cp/ChangeLog:
* parser.cc (cp_parser_class_specifier): Clear current_class_ptr
and current_class_ref sooner, before parsing a class definition.
gcc/testsuite/ChangeLog:
* g++.dg/lookup/name-clash11.C: Fix ASSERT macro definition in
C++98 mode.
* g++.dg/lookup/this2.C: New test.
Here we're mishandling the unevaluated array new-expressions due to a
supposed non-constant array size ever since r12-5253-g4df7f8c79835d569
made us no longer perform constant evaluation of non-manifestly-constant
expressions within unevaluated contexts. This shouldn't make a difference
here since the array sizes are constant literals, except they're expressed
as NON_LVALUE_EXPR location wrappers around INTEGER_CST, wrappers which
used to get stripped as part of constant evaluation and now no longer do.
Moreover it means build_vec_init can't constant fold the MINUS_EXPR
'maxindex' passed from build_new_1 when in an unevaluated context (since
it tries reducing it via maybe_constant_value called with mce_unknown).
This patch fixes these issues by making maybe_constant_value (and
fold_non_dependent_expr) try folding an unevaluated non-manifestly-constant
operand via fold(), as long as it simplifies to a simple constant, rather
than doing no simplification at all. This covers e.g. simple arithmetic
and casts including stripping of location wrappers around INTEGER_CST.
In passing, this patch also fixes maybe_constant_value to avoid constant
evaluating an unevaluated operand when called with mce_false, by adjusting
the early exit test appropriately.
Co-authored-by: Jason Merrill <jason@redhat.com>
PR c++/108219
PR c++/108218
gcc/cp/ChangeLog:
* constexpr.cc (fold_to_constant): Define.
(maybe_constant_value): Move up early exit test for unevaluated
operands. Try reducing an unevaluated operand to a constant via
fold_to_constant.
(fold_non_dependent_expr_template): Add early exit test for
CONSTANT_CLASS_P nodes. Try reducing an unevaluated operand
to a constant via fold_to_constant.
* cp-tree.h (fold_to_constant): Declare.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/new6.C: New test.
* g++.dg/cpp2a/concepts-new1.C: New test.
Jonathan Wakely [Thu, 24 Nov 2022 21:09:03 +0000 (21:09 +0000)]
libstdc++: Call predicate with non-const values in std::erase_if [PR107850]
As specified in the standard, the predicate for std::erase_if has to be
invocable as non-const with a non-const lvalues argument. Restore
support for predicates that only accept non-const arguments.
It's not strictly nevessary to change it for the set and unordered_set
overloads, because they only give const access to the elements anyway.
I've done it for them too just to keep them all consistent.
Jonathan Wakely [Fri, 31 Mar 2023 12:38:14 +0000 (13:38 +0100)]
libstdc++: Avoid -Wmaybe-uninitialized warning in std::stop_source [PR109339]
We pass a const-reference to *this before it's constructed, and GCC
assumes that all const-references are accessed. Add the access attribute
to say it's not accessed.
libstdc++-v3/ChangeLog:
PR libstdc++/109339
* include/std/stop_token (_Stop_state_ptr(const stop_source&)):
Add attribute access with access-mode 'none'.
* testsuite/30_threads/stop_token/stop_source/109339.cc: New test.
Jonathan Wakely [Mon, 28 Nov 2022 09:44:52 +0000 (09:44 +0000)]
libstdc++: Make 16-bit std::subtract_with_carry_engine work [PR107466]
This implements the proposed resolution of LWG 3809, so that
std::subtract_with_carry_engine can be used with a 16-bit result_type.
Currently this produces a narrowing error when instantiating the
std::linear_congruential_engine to create the initial state. It also
truncates the default_seed constant when passing it as a result_type
argument.
Change the type of the constant to uint_least32_t and pass 0u when the
default_seed should be used.
libstdc++-v3/ChangeLog:
PR libstdc++/107466
* include/bits/random.h (subtract_with_carry_engine): Use 32-bit
type for default seed. Use 0u as default argument for
subtract_with_carry_engine(result_type) constructor and
seed(result_type) member function.
* include/bits/random.tcc (subtract_with_carry_engine): Use
32-bit type for default seed and engine used for initial state.
* testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc:
New test.
Jonathan Wakely [Wed, 26 Apr 2023 11:27:59 +0000 (12:27 +0100)]
libstdc++: Reduce Doxygen output for PDF
Including the header source code in the doxygen-generated PDF file makes
it too large, and causes pdflatex to run out of memory. If we only set
SOURCE_BROWSER=YES for the HTML docs then we won't include the sources
in the PDF file.
There are several macros defined for std::valarray that are only used to
generate repetitive code and then #undef'd. Those aren't useful in the
doxygen docs, especially the ones that reuse the same name in different
files. Omitting them avoids warnings about duplicate labels in the
refman.tex file.
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (SOURCE_BROWSER): Only set to YES for
HTML docs.
* include/bits/gslice_array.h (_DEFINE_VALARRAY_OPERATOR): Omit
from doxygen docs.
* include/bits/indirect_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/mask_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/slice_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/std/valarray (_DEFINE_VALARRAY_UNARY_OPERATOR)
(_DEFINE_VALARRAY_AUGMENTED_ASSIGNMENT)
(_DEFINE_VALARRAY_EXPR_AUGMENTED_ASSIGNMENT)
(_DEFINE_BINARY_OPERATOR): Likewise.
Roger Sayle [Tue, 10 Jan 2023 14:05:46 +0000 (14:05 +0000)]
PR rtl-optimization/106421: ICE in bypass_block from non-local goto.
This patch fixes PR rtl-optimization/106421, an ICE-on-valid (but
undefined) regression. The fix, as proposed by Richard Biener, is to
defend against BLOCK_FOR_INSN returning NULL in cprop's bypass_block.
2023-01-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/106421
* cprop.cc (bypass_block): Check that DEST is local to this
function (non-NULL) before calling find_edge.
gcc/testsuite/ChangeLog
PR rtl-optimization/106421
* gcc.dg/pr106421.c: New test case.
H.J. Lu [Mon, 16 Jan 2023 18:45:41 +0000 (10:45 -0800)]
x86: Disable -mforce-indirect-call for PIC in 32-bit mode
-mforce-indirect-call generates invalid instruction in 32-bit MI thunk
since there are no available scratch registers in 32-bit PIC mode.
Disable -mforce-indirect-call for PIC in 32-bit mode when generating
MI thunk.
gcc/
PR target/105980
* config/i386/i386.cc (x86_output_mi_thunk): Disable
-mforce-indirect-call for PIC in 32-bit mode.
gcc/testsuite/
PR target/105980
* g++.target/i386/pr105980.C: New test.
Jan Hubicka [Fri, 12 Aug 2022 14:25:28 +0000 (16:25 +0200)]
Fix invalid devirtualization when combining final keyword and anonymous types
this patch fixes a wrong code issue where we incorrectly devirtualize to
__builtin_unreachable. The problem occurs in combination of anonymous
namespaces and final keyword used on methods. We do two optimizations here
1) when reacing final method we cut the search for possible new targets
2) if the type is anonymous we detect whether it is ever instatiated by
looking if its vtable is referred to.
Now this goes wrong when thre is an anonymous type with final method that
is not instantiated while its derived type is. So if 1 triggers we need
to make 2 to look for vtables of all derived types as done by this patch.
Bootstrpaped/regtested x86_64-linux
Honza
gcc/ChangeLog:
2022-08-10 Jan Hubicka <hubicka@ucw.cz>
PR middle-end/106057
* ipa-devirt.cc (type_or_derived_type_possibly_instantiated_p): New
function.
(possible_polymorphic_call_targets): Use it.
gcc/testsuite/ChangeLog:
2022-08-10 Jan Hubicka <hubicka@ucw.cz>
PR middle-end/106057
* g++.dg/tree-ssa/pr101839.C: New test.
Martin Jambor [Wed, 26 Apr 2023 16:38:39 +0000 (18:38 +0200)]
ipa: Fix double reference-count decrements for the same edge (PR 107769, PR 109318)
It turns out that since addition of the code that can identify globals
which are only read from, the code that keeps track of the references
can decrement their count for the same calls, once during IPA-CP and
then again during inlining. Fixed by adding a special flag to the
pass-through variant and simply wiping out the reference to the
refdesc structure from the constant ones.
Moreover, during debugging of the issue I have discovered that the
code removing references could remove a reference associated with the
same statement but of a wrong type. In all cases it wanted to remove
an IPA_REF_ADDR reference so removing a lesser one instead should do
no harm in practice, but we should try to be consistent and so this
patch extends symtab_node::find_reference so that it searches for a
reference of a given type only.
gcc/ChangeLog:
2023-04-14 Martin Jambor <mjambor@suse.cz>
PR ipa/107769
PR ipa/109318
* cgraph.h (symtab_node::find_reference): Add parameter use_type.
* ipa-prop.h (ipa_pass_through_data): New flag refdesc_decremented.
(ipa_zap_jf_refdesc): New function.
(ipa_get_jf_pass_through_refdesc_decremented): Likewise.
(ipa_set_jf_pass_through_refdesc_decremented): Likewise.
* ipa-cp.cc (ipcp_discover_new_direct_edges): Provide a value for
the new parameter of find_reference.
(adjust_references_in_caller): Likewise. Make sure the constant jump
function is not used to decrement a refdec counter again. Only
decrement refdesc counters when the pass_through jump function allows
it. Added a detailed dump when decrementing refdesc counters.
* ipa-prop.cc (ipa_print_node_jump_functions_for_edge): Dump new flag.
(ipa_set_jf_simple_pass_through): Initialize the new flag.
(ipa_set_jf_unary_pass_through): Likewise.
(ipa_set_jf_arith_pass_through): Likewise.
(remove_described_reference): Provide a value for the new parameter of
find_reference.
(update_jump_functions_after_inlining): Zap refdesc of new jfunc if
the previous pass_through had a flag mandating that we do so.
(propagate_controlled_uses): Likewise. Only decrement refdesc
counters when the pass_through jump function allows it.
(ipa_edge_args_sum_t::duplicate): Provide a value for the new
parameter of find_reference.
(ipa_write_jump_function): Assert the new flag does not have to be
streamed.
* symtab.cc (symtab_node::find_reference): Add parameter use_type, use
it in searching.
Jakub Jelinek [Tue, 25 Apr 2023 12:20:51 +0000 (14:20 +0200)]
powerpc: Fix up *branch_anddi3_dot for -m32 -mpowerpc64 [PR109566]
The following testcase reduced from newlib ICEs on powerpc-linux,
with -O2 -m32 -mpowerpc64 since r12-6433 PR102239 optimization was
added and on the original testcase since some ranger improvements in
GCC 13 made it no longer latent on newlib.
The problem is that the *branch_anddi3_dot define_insn_and_split
relies on the *rotldi3_mask_dot define_insn_and_split being recognized
during splitting. The rs6000_is_valid_rotate_dot_mask function checks whether
the mask is a CONST_INT which is a valid mask, but *rotl<mode>3_mask_dot in
addition to checking that it is a valid mask also has
(<MODE>mode == Pmode || UINTVAL (operands[3]) <= 0x7fffffff)
test in the condition. For TARGET_64BIT that doesn't add any further
requirements, but for !TARGET_64BIT && TARGET_POWERPC64 if the AND
second operand is larger than INT_MAX it will not be recognized.
The rs6000_is_valid_rotate_dot_mask function is used solely in one spot,
condition of *branch_anddi3_dot, so the following patch adjusts it
to check for that as well.
2023-04-25 Jakub Jelinek <jakub@redhat.com>
PR target/109566
* config/rs6000/rs6000.cc (rs6000_is_valid_rotate_dot_mask): For
!TARGET_64BIT, don't return true if UINTVAL (mask) << (63 - nb)
is larger than signed int maximum.
Richard Biener [Tue, 25 Apr 2023 12:56:44 +0000 (14:56 +0200)]
tree-optimization/109609 - correctly interpret arg size in fnspec
By majority vote and a hint from the API name which is
arg_max_access_size_given_by_arg_p this interprets a memory access
size specified as given as other argument such as for strncpy
in the testcase which has "1cO313" as specifying the _maximum_
size read/written rather than the exact size. There are two
uses interpreting it that way already and one differing. The
following adjusts the differing and clarifies the documentation.
PR tree-optimization/109609
* attr-fnspec.h (arg_max_access_size_given_by_arg_p):
Clarify semantics.
* tree-ssa-alias.cc (check_fnspec): Correctly interpret
the size given by arg_max_access_size_given_by_arg_p as
maximum, not exact, size.
Richard Biener [Mon, 24 Apr 2023 11:31:07 +0000 (13:31 +0200)]
rtl-optimization/109585 - alias analysis typo
When r10-514-gc6b84edb6110dd2b4fb improved access path analysis
it introduced a typo that triggers when there's an access to a
trailing array in the first access path leading to false
disambiguation.
Richard Biener [Fri, 21 Apr 2023 10:57:17 +0000 (12:57 +0200)]
tree-optimization/109573 - avoid ICEing on unexpected live def
The following relaxes the assert in vectorizable_live_operation
where we catch currently unhandled cases to also allow an
intermediate copy as it happens here but also relax the assert
to checking only.
PR tree-optimization/109573
* tree-vect-loop.cc (vectorizable_live_operation): Allow
unhandled SSA copy as well. Demote assert to checking only.
Eric Botcazou [Tue, 25 Apr 2023 08:46:16 +0000 (10:46 +0200)]
Remove obsolete configure code in gnattools
It was recently pointed out that we generate symbolic links to ghost files
when building the GNAT tools, as the mlib-tgt-specific-*.adb files are gone.
Harald Anlauf [Tue, 11 Apr 2023 14:44:32 +0000 (16:44 +0200)]
Fortran: resolve correct generic with TYPE(C_PTR) arguments [PR61615,PR99982]
gcc/fortran/ChangeLog:
PR fortran/61615
PR fortran/99982
* interface.cc (compare_parameter): Enable type and rank checks for
arguments of derived type from the intrinsic module ISO_C_BINDING.
gcc/testsuite/ChangeLog:
PR fortran/61615
PR fortran/99982
* gfortran.dg/interface_49.f90: New test.
Jonathan Wakely [Thu, 20 Apr 2023 20:02:22 +0000 (21:02 +0100)]
libstdc++: Optimize std::try_facet and std::use_facet [PR103755]
The std::try_facet and std::use_facet functions were optimized in r13-3888-gb3ac43a3c05744 to avoid redundant checking for all facets that
are required to always be present in every locale.
This performs a simpler version of the optimization that only applies to
std::ctype<char>, std::num_get<char>, std::num_put<char>, and the
wchar_t specializations of those facets. Those are the facets that are
cached by std::basic_ios, which means they're used on construction for
every iostream object. This smaller change is suitable for the gcc-12
branch, and mitigates the performance loss for powerpc64le-linux on the
gcc-12 branch caused by r12-9454-g24cf9f4c6f45f7 for PR 103387. It also
greatly improves the performance of constructing iostreams objects, for
all targets.
libstdc++-v3/ChangeLog:
PR libstdc++/103755
* include/bits/locale_classes.tcc (try_facet, use_facet): Do not
check array index or dynamic type when accessing required
specializations of std::ctype, std::num_get, or std::num_put.
* testsuite/22_locale/ctype/is/string/89728_neg.cc: Adjust
expected errors.
rs6000: correct vector sign extend builtins on Big Endian
gcc/
PR target/108812
* config/rs6000/vsx.md (vsx_sign_extend_qi_<mode>): Rename to...
(vsx_sign_extend_v16qi_<mode>): ... this.
(vsx_sign_extend_hi_<mode>): Rename to...
(vsx_sign_extend_v8hi_<mode>): ... this.
(vsx_sign_extend_si_v2di): Rename to...
(vsx_sign_extend_v4si_v2di): ... this.
(vsignextend_qi_<mode>): Remove.
(vsignextend_hi_<mode>): Remove.
(vsignextend_si_v2di): Remove.
(vsignextend_v2di_v1ti): Remove.
(*xxspltib_<mode>_split): Replace gen_vsx_sign_extend_qi_v2di with
gen_vsx_sign_extend_v16qi_v2di and gen_vsx_sign_extend_qi_v4si
with gen_vsx_sign_extend_v16qi_v4si.
* config/rs6000/rs6000.md (split for DI constant generation):
Replace gen_vsx_sign_extend_qi_si with gen_vsx_sign_extend_v16qi_si.
(split for HSDI constant generation): Replace gen_vsx_sign_extend_qi_di
with gen_vsx_sign_extend_v16qi_di and gen_vsx_sign_extend_qi_si
with gen_vsx_sign_extend_v16qi_si.
* config/rs6000/rs6000-builtins.def (__builtin_altivec_vsignextsb2d):
Set bif-pattern to vsx_sign_extend_v16qi_v2di.
(__builtin_altivec_vsignextsb2w): Set bif-pattern to
vsx_sign_extend_v16qi_v4si.
(__builtin_altivec_visgnextsh2d): Set bif-pattern to
vsx_sign_extend_v8hi_v2di.
(__builtin_altivec_vsignextsh2w): Set bif-pattern to
vsx_sign_extend_v8hi_v4si.
(__builtin_altivec_vsignextsw2d): Set bif-pattern to
vsx_sign_extend_si_v2di.
(__builtin_altivec_vsignext): Set bif-pattern to
vsx_sign_extend_v2di_v1ti.
* config/rs6000/rs6000-builtin.cc (lxvrse_expand_builtin): Replace
gen_vsx_sign_extend_qi_v2di with gen_vsx_sign_extend_v16qi_v2di,
gen_vsx_sign_extend_hi_v2di with gen_vsx_sign_extend_v8hi_v2di and
gen_vsx_sign_extend_si_v2di with gen_vsx_sign_extend_v4si_v2di.
gcc/testsuite/
PR target/108812
* gcc.target/powerpc/p9-sign_extend-runnable.c: Set corresponding
expected vectors for Big Endian.
* gcc.target/powerpc/int_128bit-runnable.c: Likewise.
Jonathan Wakely [Tue, 29 Nov 2022 15:50:06 +0000 (15:50 +0000)]
libstdc++: Avoid bogus warning in std::vector::insert [PR107852]
GCC assumes that any global variable might be modified by operator new,
and so in the testcase for this PR all data members get reloaded after
allocating new storage. By making local copies of the _M_start and
_M_finish members we avoid that, and then the compiler has enough info
to remove the dead branches that trigger bogus -Warray-bounds warnings.
libstdc++-v3/ChangeLog:
PR libstdc++/107852
PR libstdc++/106199
PR libstdc++/100366
* include/bits/vector.tcc (vector::_M_fill_insert): Copy
_M_start and _M_finish members before allocating.
(vector::_M_default_append): Likewise.
(vector::_M_range_insert): Likewise.
Jonathan Wakely [Tue, 4 Apr 2023 11:04:14 +0000 (12:04 +0100)]
libstdc++: Fix outdated docs about demangling exception messages
The string returned by std::bad_exception::what() hasn't been a mangled
name since PR libstdc++/14493 was fixed for GCC 4.2.0, so remove the
docs showing how to demangle it.
libstdc++-v3/ChangeLog:
* doc/xml/manual/extensions.xml: Remove std::bad_exception from
example program.
* doc/html/manual/ext_demangling.html: Regenerate.
Jonathan Wakely [Tue, 28 Mar 2023 20:07:21 +0000 (21:07 +0100)]
libstdc++: Do not use facets cached in ios for ALT128 build [PR103387]
For the powerpc64le build with two different long double
representations, we cannot use the ios_base::_M_num_put and
ios_base::_M_num_get pointers, because they might have been initialized
in a translation unit using the other long double type. With the changes
to add __try_use_facet to GCC 13 the cache isn't really needed now, we
can just access the right facet in the locale directly, without needing
RTTI checks.
libstdc++-v3/ChangeLog:
PR libstdc++/103387
* include/bits/istream.tcc (istream::_M_extract(ValueT&)): Use
std::use_facet instead of cached _M_num_get facet.
(istream::operator>>(short&)): Likewise.
(istream::operator>>(int&)): Likewise.
* include/bits/ostream.tcc (ostream::_M_insert(ValueT)): Use
std::use_facet instead of cached _M_num_put facet.
Jonathan Wakely [Wed, 22 Mar 2023 21:54:24 +0000 (21:54 +0000)]
libstdc++: Fix assigning nullptr to std::atomic<shared_ptr<T>> (LWG 3893)
LWG voted this to Tentatively Ready recently.
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr_atomic.h (atomic::operator=(nullptr_t)):
Add overload, as per LWG 3893.
* testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc:
Check assignment from nullptr.
Jonathan Wakely [Wed, 29 Mar 2023 21:43:16 +0000 (22:43 +0100)]
libstdc++: Apply small fix from LWG 3843 to std::expected
LWG 3843 adds some type requirements to std::expected::value to ensure
that it can correctly copy the error value if it needs to throw an
exception. We don't need to do anything to enforce that, because it will
already be ill-formed if the type can't be copied. The issue also makes
a small drive-by fix to ensure that a const E& is copied from the
non-const value()& overload, which this change implements.
libstdc++-v3/ChangeLog:
* include/std/expected (expected::value() &): Use const lvalue
for unex member passed to bad_expected_access constructor, as
per LWG 3843.
My earlier patch for 108099 made us accept this non-standard pattern but
messed up the semantics, so that e.g. unsigned __int128_t was not a 128-bit
type.
PR c++/108099
gcc/cp/ChangeLog:
* decl.cc (grokdeclarator): Keep typedef_decl for __int128_t.
Jason Merrill [Sat, 15 Apr 2023 02:40:43 +0000 (22:40 -0400)]
c++: constexpr aggregate destruction [PR109357]
We were assuming that the result of evaluation of TARGET_EXPR_INITIAL would
always be the new value of the temporary, but that's not necessarily true
when the initializer is complex (i.e. target_expr_needs_replace). In that
case evaluating the initializer initializes the temporary as a side-effect.
PR c++/109357
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression) [TARGET_EXPR]:
Check for complex initializer.
Jason Merrill [Thu, 23 Mar 2023 19:57:39 +0000 (15:57 -0400)]
c-family: -Wsequence-point and COMPONENT_REF [PR107163]
The patch for PR91415 fixed -Wsequence-point to treat shifts and ARRAY_REF
as sequenced in C++17, and COMPONENT_REF as well. But this is unnecessary
for COMPONENT_REF, since the RHS is just a FIELD_DECL with no actual
evaluation, and in this testcase handling COMPONENT_REF as sequenced blows
up fast in a deep inheritance tree. Instead, look through it.
PR c++/107163
gcc/c-family/ChangeLog:
* c-common.cc (verify_tree): Don't use sequenced handling
for COMPONENT_REF.
The default argument code in type_unification_real was assuming that all
targs we've deduced by that point are non-dependent, but that's not the case
for partial ordering.
PR c++/105481
gcc/cp/ChangeLog:
* pt.cc (type_unification_real): Adjust for partial ordering.
Jason Merrill [Thu, 23 Mar 2023 20:50:09 +0000 (16:50 -0400)]
c++: constexpr PMF conversion [PR105996]
Here, we were calling build_reinterpret_cast regardless of whether there was
actually a cast, and that now sets REINTERPRET_CAST_P. But that
optimization seems dodgy anyway, as it involves NOP_EXPR from one
RECORD_TYPE to another and we try to reserve NOP_EXPR for fundamental types.
And the generated code seems the same, so let's drop it. And also strip
location wrappers.
PR c++/105996
gcc/cp/ChangeLog:
* typeck.cc (build_ptrmemfunc): Drop 0-offset optimization
and location wrappers.
Jason Merrill [Sat, 18 Mar 2023 12:27:26 +0000 (08:27 -0400)]
c++: DMI in template with virtual base [PR106890]
When parsing a default member init we just build a CONVERT_EXPR for
converting to a virtual base, and then expand that into the more complex
form when we actually use the DMI in a constructor. But that wasn't working
for the template case where we are considering the conversion at the point
that the constructor needs the DMI instantiation, so it seemed like we were
in a constructor already. And then when the other constructor tries to
reuse the instantiation, it sees uses of the first constructor's parameters,
and dies. So ensure that we get the CONVERT_EXPR in this case, too.
PR c++/106890
gcc/cp/ChangeLog:
* init.cc (maybe_instantiate_nsdmi_init): Don't leave
current_function_decl set to a constructor.
Jason Merrill [Fri, 17 Mar 2023 21:26:40 +0000 (17:26 -0400)]
c++: constant, array, lambda, template [PR108975]
When a lambda refers to a constant local variable in the enclosing scope, we
tentatively capture it, but if we end up pulling out its constant value, we
go back at the end of the lambda and prune any unneeded captures. Here
while parsing the template we decided that the dim capture was unneeded,
because we folded it away, but then we brought back the use in the template
trees that try to preserve the source representation with added type info.
So then when we tried to instantiate that use, we couldn't find what it was
trying to use, and crashed.
Fixed by not trying to prune when parsing a template; we'll prune at
instantiation time.
PR c++/108975
gcc/cp/ChangeLog:
* lambda.cc (prune_lambda_captures): Don't bother in a template.
Jason Merrill [Fri, 17 Mar 2023 13:43:48 +0000 (09:43 -0400)]
c++: namespace-scoped friend in local class [PR69410]
do_friend was only considering class-qualified identifiers for the
qualified-id case, but we also need to skip local scope when there's an
explicit namespace scope.
PR c++/69410
gcc/cp/ChangeLog:
* friend.cc (do_friend): Handle namespace as scope argument.
* decl.cc (grokdeclarator): Pass down in_namespace.
Jason Merrill [Thu, 16 Mar 2023 19:11:25 +0000 (15:11 -0400)]
c++: generic lambda, local class, __func__ [PR108242]
Here we are trying to do name lookup in a deferred instantiation of t() and
failing to find __func__. tsubst_expr already tries to instantiate members
of local classes, but was failing with the partial instantiation of generic
lambdas.
Jason Merrill [Wed, 15 Mar 2023 21:02:15 +0000 (17:02 -0400)]
c++: co_await and move-only type [PR105406]
Here we were building a temporary MoveOnlyAwaitable to hold the result of
evaluating 'o', but since 'o' is an lvalue we should build a reference
instead.
Jason Merrill [Wed, 15 Mar 2023 20:33:37 +0000 (16:33 -0400)]
c++: co_await and initializer_list [PR103871]
When flatten_await_stmt processes the backing array for an initializer_list,
we call cp_build_modify_expr to initialize the promoted variable from the
TARGET_EXPR; that needs to be accepted.
PR c++/103871
PR c++/98056
gcc/cp/ChangeLog:
* typeck.cc (cp_build_modify_expr): Allow array initialization of
DECL_ARTIFICIAL variable.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/co-await-initlist1.C: New test.
Generally we expect TPARMS_PRIMARY_TEMPLATE to be set, but sometimes it
isn't for partial instantiations. This ought to be improved, but it's
trivial to work around it in this case.
PR c++/108468
gcc/cp/ChangeLog:
* pt.cc (unify_pack_expansion): Check that TPARMS_PRIMARY_TEMPLATE
is non-null.
Jason Merrill [Tue, 14 Mar 2023 16:20:51 +0000 (12:20 -0400)]
c++: -Wreturn-type with if (true) throw [PR107310]
I removed this folding in GCC 12 because it was interfering with an
experiment of richi's, but that never went in and the change causes
regressions, so let's put it back.
Jason Merrill [Fri, 10 Mar 2023 04:33:43 +0000 (23:33 -0500)]
c++: class NTTP and nested anon union [PR108566]
We were failing to come up with the name for the anonymous union. It seems
like unfortunate redundancy, but the ABI does say that the name of an
anonymous union is its first named member.
PR c++/108566
gcc/cp/ChangeLog:
* mangle.cc (anon_aggr_naming_decl): New.
(write_unqualified_name): Use it.
Jason Merrill [Tue, 4 Oct 2022 21:06:04 +0000 (17:06 -0400)]
c++: fix debug info for array temporary [PR107154]
In the testcase the elaboration of the array init that happens at genericize
time was getting the location info for the end of the function; fixed by
doing the expansion at the location of the original expression.
PR c++/107154
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_genericize_init_expr): Use iloc_sentinel.
(cp_genericize_target_expr): Likewise.
Jakub Jelinek [Wed, 12 Apr 2023 14:55:15 +0000 (16:55 +0200)]
reassoc: Fix up another ICE with returns_twice call [PR109410]
The following testcase ICEs in reassoc, unlike the last case I've fixed
there here SSA_NAME_USED_IN_ABNORMAL_PHI is not the case anywhere.
build_and_add_sum places new statements after the later appearing definition
of an operand but if both operands are default defs or constants, we place
statement at the start of the function.
If the very first statement of a function is a call to returns_twice
function, this doesn't work though, because that call has to be the first
thing in its basic block, so the following patch splits the entry successor
edge such that the new statements are added into a different block from the
returns_twice call.
I think we should in stage1 reconsider such placements, I think it
unnecessarily enlarges the lifetime of the new lhs if its operand(s)
are used more than once in the function. Unless something sinks those
again. Would be nice to place it closer to the actual uses (or where
they will be placed).
2023-04-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109410
* tree-ssa-reassoc.cc (build_and_add_sum): Split edge from entry
block if first statement of the function is a call to returns_twice
function.
Jakub Jelinek [Wed, 12 Apr 2023 14:22:28 +0000 (16:22 +0200)]
c++: Fix Solaris bootstraps across midnight
When working on the PR109040 fix, I wanted to test it on some
WORD_REGISTER_OPERATIONS target and tried sparc-solaris on GCC Farm.
My bootstrap failed in comparison failure on cp/module.o, because
Solaris date doesn't support the -r option and one stage's cp/module.o
was built before midnight and next stage's cp/module.o after midnight,
so they had different -DMODULE_VERSION= value.
Now, I think the advice (don't bootstrap at midnight) is something
we shouldn't have, so the following patch stores the module version
(still generated through the same way, date -r cp/module.cc
if it works, otherwise just date) into a temporary file, makes sure
that temporary file is updated when cp/module.cc source is updated
and when date -r doesn't work copies file from previous stage
if it is newer than cp/module.cc.
2023-04-12 Jakub Jelinek <jakub@redhat.com>
* Make-lang.in (s-cp-module-version): New target.
(cp/module.o): Depend on it.
(MODULE_VERSION): Remove variable.
(CFLAGS-cp/module.o): For -DMODULE_VERSION= argument just
cat s-cp-module-version.
Jakub Jelinek [Sun, 2 Apr 2023 18:05:31 +0000 (20:05 +0200)]
libiberty: Make strstr.c in libiberty ANSI compliant
On Fri, Nov 13, 2020 at 11:53:43AM -0700, Jeff Law via Gcc-patches wrote:
>
> On 5/1/20 6:06 PM, Seija Kijin via Gcc-patches wrote:
> > The original code in libiberty says "FIXME" and then says it has not been
> > validated to be ANSI compliant. However, this patch changes the function to
> > match implementations that ARE compliant, and such code is in the public
> > domain.
> >
> > I ran the test results, and there are no test failures.
>
> Thanks. This seems to be the standard "simple" strstr implementation.
> There's significantly faster implementations available, but I doubt it's
> worth the effort as the version in this file only gets used if there is
> no system strstr.c.
Except that PR109306 says the new version is non-compliant and
is certainly slower than what we used to have. The only problem I see
on the old version (sure, it is not very fast version) is that for
strstr ("abcd", "") it returned "abcd"+4 rather than "abcd" because
strchr in that case changed p to point to the last character and then
strncmp returned 0.
The question reported in PR109306 is whether memcmp is required not to
access characters beyond the first difference or not.
For all of memcmp/strcmp/strncmp, C17 says:
"The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp
is determined by the sign of the difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being compared."
but then in memcmp description says:
"The memcmp function compares the first n characters of the object pointed to by s1 to the first n
characters of the object pointed to by s2."
rather than something similar to strncmp wording:
"The strncmp function compares not more than n characters (characters that follow a null character
are not compared) from the array pointed to by s1 to the array pointed to by
s2."
So, while for strncmp it seems clearly well defined when there is zero
terminator before reaching the n, for memcmp it is unclear if say
int
memcmp (const void *s1, const void *s2, size_t n)
{
int ret = 0;
size_t i;
const unsigned char *p1 = (const unsigned char *) s1;
const unsigned char *p2 = (const unsigned char *) s2;
for (i = n; i; i--)
if (p1[i - 1] != p2[i - 1])
ret = p1[i - 1] < p2[i - 1] ? -1 : 1;
return ret;
}
wouldn't be valid implementation (one which always compares all characters
and just returns non-zero from the first one that differs).
So, shouldn't we just revert and handle the len == 0 case correctly?
I think almost nothing really uses it, but still, the old version
at least worked nicer with a fast strchr.
Could as well strncmp (p + 1, s2 + 1, len - 1) if that is preferred
because strchr already compared the first character.
2023-04-02 Jakub Jelinek <jakub@redhat.com>
PR other/109306
* strstr.c: Revert the 2020-11-13 changes.
(strstr): Return s1 if len is 0.
Jakub Jelinek [Thu, 30 Mar 2023 21:08:25 +0000 (23:08 +0200)]
c++: Fix up ICE in build_min_non_dep_op_overload [PR109319]
The following testcase ICEs, because grok_array_decl during
processing_template_decl handling of a non-dependent subscript
emits a -Wcomma-subscript pedwarn, we decide to pass to the
single index argument the index expressions as if it was wrapped
with () around it, but then when preparing it for later instantiation
we don't actually take that into account and ICE on a mismatch of
number of index arguments (the overload expects a single index,
testcase has two index expressions in this case).
For non-dependent subscript which are builtin subscripts we also
emit the same pedwarn and don't ICE, but emit the same pedwarn
again whenever we instantiate it, which is also IMHO undesirable,
it is enough to warn once during parsing the template.
The following patch fixes it by turning even the original index expressions
(those which didn't go through make_args_non_dependent) into a single
index using comma expression(s).
2023-03-30 Jakub Jelinek <jakub@redhat.com>
PR c++/109319
* decl2.cc (grok_array_decl): After emitting a pedwarn for
-Wcomma-subscript, if processing_template_decl set orig_index_exp
to compound expr from orig_index_exp_list.
Jakub Jelinek [Tue, 28 Mar 2023 08:56:44 +0000 (10:56 +0200)]
sanopt: Return TODO_cleanup_cfg if any .{UB,HWA,A}SAN_* calls were lowered [PR106190]
The following testcase ICEs, because without optimization eh lowering
decides not to duplicate finally block of try/finally and so we end up
with variable guarded cleanup. The sanopt pass creates a cfg that ought
to be cleaned up (some IFN_UBSAN_* functions are lowered in this case with
constant conditions in gcond and when not allowing recovery some bbs which
end with noreturn calls actually have successor edges), but the cfg cleanup
is actually (it is -O0) done only during the optimized pass. We notice
there that the d[1][a] = 0; statement which has an EH edge is unreachable
(because ubsan would always abort on the out of bounds d[1] access), remove
the EH landing pad and block, but because that block just sets a variable
and jumps to another one which tests that variable and that one is reachable
from normal control flow, the __builtin_eh_pointer (1) later in there is
kept in the IL and we ICE during expansion of that statement because the
EH region has been removed.
The following patch fixes it by doing the cfg cleanup already during
sanopt pass if we create something that might need it, while the EH
landing pad is then removed already during sanopt pass, there is ehcleanup
later and we don't ICE anymore.
2023-03-28 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106190
* sanopt.cc (pass_sanopt::execute): Return TODO_cleanup_cfg if any
of the IFN_{UB,HWA,A}SAN_* internal fns are lowered.
Jakub Jelinek [Tue, 28 Mar 2023 08:43:08 +0000 (10:43 +0200)]
i386: Require just 32-bit alignment for SLOT_FLOATxFDI_387 -m32 -mpreferred-stack-boundary=2 DImode temporaries [PR109276]
The following testcase ICEs since r11-2259 because assign_386_stack_local
-> assign_stack_local -> ix86_local_alignment now uses 64-bit alignment
for DImode temporaries rather than 32-bit as before.
Most of the spots in the backend which ask for DImode temporaries are during
expansion and those apparently are handled fine with -m32
-mpreferred-stack-boundary=2, we dynamically realign the stack in that case
(most of the spots actually don't need that alignment but at least one
does), then 2 spots are in STV which I assume also work correctly.
But during splitting we can create a DImode slot which doesn't need to be
64-bit alignment (it is nicer for performance though), when we apparently
aren't able to detect it for dynamic stack realignment purposes.
The following patch just makes the slot 32-bit aligned in that rare case.
2023-03-28 Jakub Jelinek <jakub@redhat.com>
PR target/109276
* config/i386/i386.cc (assign_386_stack_local): For DImode
with SLOT_FLOATxFDI_387 and -m32 -mpreferred-stack-boundary=2 pass
align 32 rather than 0 to assign_stack_local.
Jakub Jelinek [Sun, 26 Mar 2023 18:15:05 +0000 (20:15 +0200)]
predict: Don't emit -Wsuggest-attribute=cold warning for functions which already have that attribute [PR105685]
In the following testcase, we predict baz to have cold
entry regardless of the user supplied attribute (as it call
unconditionally a cold function), but still issue
a -Wsuggest-attribute=cold warning despite it having that attribute
already.
The following patch avoids that.
2023-03-26 Jakub Jelinek <jakub@redhat.com>
PR ipa/105685
* predict.cc (compute_function_frequency): Don't call
warn_function_cold if function already has cold attribute.
Jakub Jelinek [Thu, 23 Mar 2023 09:02:25 +0000 (10:02 +0100)]
tree-vect-generic: Fix up expand_vector_condition [PR109176]
The following testcase ICEs on aarch64-linux, because
expand_vector_condition attempts to piecewise lower SVE
d_3 = a_1(D) < b_2(D);
_5 = VEC_COND_EXPR <d_3, c_4(D), d_3>;
which isn't possible - nunits_for_known_piecewise_op ICEs but
the rest of the code assumes constant number of elements too.
expand_vector_condition attempts to find if a (rhs1) is a SSA_NAME
for comparison and calls expand_vec_cond_expr_p (type, TREE_TYPE (a1), code)
where a1 is one of the operands of the comparison and code is the comparison
code. That one indeed isn't supported here, but what aarch64 SVE supports
are the individual statements, comparison (expand_vec_cmp_expr_p) and
expand_vec_cond_expr_p (type, TREE_TYPE (a), SSA_NAME), the latter because
that function starts with
if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
&& get_vcond_mask_icode (TYPE_MODE (value_type),
TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
return true;
In an earlier version of the patch (in the PR), we did this
if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (a))
&& expand_vec_cond_expr_p (type, TREE_TYPE (a), ERROR_MARK))
return true;
before the code == SSA_NAME handling plus some further tweaks later.
While that fixed the ICE, it broke quite a few tests on x86 and some on
aarch64 too. The problem is that expand_vector_comparison doesn't lower
comparisons which aren't supported and only feed VEC_COND_EXPR first operand
and expand_vector_condition succeeds for those, so with the above mentioned
change we'd verify the VEC_COND_EXPR is implementable using optab alone,
but nothing would verify the tcc_comparison which relied on
expand_vector_condition to verify.
So, the following patch instead queries whether optabs can handle the
comparison and VEC_COND_EXPR together (if a (rhs1) is a comparison;
otherwise as before it checks only the VEC_COND_EXPR) and if that fails,
also checks whether the two operations could be supported individually
and only if even that fails does the piecewise lowering.
2023-03-23 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109176
* tree-vect-generic.cc (expand_vector_condition): If a has
vector boolean type and is a comparison, also check if both
the comparison and VEC_COND_EXPR could be successfully expanded
individually.
Jakub Jelinek [Mon, 20 Mar 2023 19:29:47 +0000 (20:29 +0100)]
c++: Drop TREE_READONLY on vars (possibly) initialized by tls wrapper [PR109164]
The following two testcases are miscompiled, because we keep TREE_READONLY
on the vars even when they are (possibly) dynamically initialized by a TLS
wrapper function. Normally cp_finish_decl drops TREE_READONLY from vars
which need dynamic initialization, but for TLS we do this kind of
initialization upon every access to those variables. Keeping them
TREE_READONLY means e.g. PRE can hoist loads from those before loops
which contain the TLS wrapper calls, so we can access the TLS variables
before they are initialized.
2023-03-20 Jakub Jelinek <jakub@redhat.com>
PR c++/109164
* cp-tree.h (var_needs_tls_wrapper): Declare.
* decl2.cc (var_needs_tls_wrapper): No longer static.
* decl.cc (cp_finish_decl): Clear TREE_READONLY on TLS variables
for which a TLS wrapper will be needed.
* g++.dg/tls/thread_local13.C: New test.
* g++.dg/tls/thread_local13-aux.cc: New file.
* g++.dg/tls/thread_local14.C: New test.
* g++.dg/tls/thread_local14-aux.cc: New file.
Philipp Tomsich [Mon, 30 Jan 2023 22:40:26 +0000 (23:40 +0100)]
PR target/108589 - Check REG_P for AARCH64_FUSE_ADDSUB_2REG_CONST1
This adds a check for REG_P on SET_DEST for the new idiom recognizer
for AARCH64_FUSE_ADDSUB_2REG_CONST1. The reported ICE is only
observable with checking=rtl.
Bootstrapped/regtested aarch64-linux, committed.
PR target/108589
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Check
REG_P on SET_DEST.
Philipp Tomsich [Thu, 23 Mar 2023 18:47:57 +0000 (19:47 +0100)]
aarch64: disable LDP via tuning structure for -mcpu=ampere1
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).
Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.
This commit:
* adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
* allows -moverride=tune=... to override this
These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
503.bwaves_r. -0.88%
507.cactuBSSN_r 0.35%
508.namd_r 3.09%
510.parest_r -2.99%
511.povray_r 5.54%
519.lbm_r 15.83%
521.wrf_r 0.56%
526.blender_r 2.47%
527.cam4_r 0.70%
538.imagick_r 0.00%
544.nab_r -0.33%
549.fotonik3d_r. -0.42%
554.roms_r 0.00%
-------------------------
= total 1.79%
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com>
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.