Lulu Cheng [Mon, 23 Oct 2023 01:07:32 +0000 (09:07 +0800)]
LoongArch: Define macro CLEAR_INSN_CACHE.
LoongArch's microstructure ensures cache consistency by hardware.
Due to out-of-order execution, "ibar" is required to ensure the visibility of the
store (invalidated icache) executed by this CPU before "ibar" (to the instance).
"ibar" will not invalidate the icache, so the start and end parameters are not Affect
"ibar" performance.
gcc/ChangeLog:
* config/loongarch/loongarch.h (CLEAR_INSN_CACHE): New definition.
Haochen Gui [Mon, 23 Oct 2023 01:14:13 +0000 (09:14 +0800)]
Expand: Enable vector mode for by pieces compares
Vector mode compare instructions are efficient for equality compare on
rs6000. This patch refactors the codes of by pieces operation to enable
vector mode for compare.
gcc/
PR target/111449
* expr.cc (can_use_qi_vectors): New function to return true if
we know how to implement OP using vectors of bytes.
(qi_vector_mode_supported_p): New function to check if optabs
exists for the mode and certain by pieces operations.
(widest_fixed_size_mode_for_size): Replace the second argument
with the type of by pieces operations. Call can_use_qi_vectors
and qi_vector_mode_supported_p to do the check. Call
scalar_mode_supported_p to check if the scalar mode is supported.
(by_pieces_ninsns): Pass the type of by pieces operation to
widest_fixed_size_mode_for_size.
(class op_by_pieces_d): Remove m_qi_vector_mode. Add m_op to
record the type of by pieces operations.
(op_by_pieces_d::op_by_pieces_d): Change last argument to the
type of by pieces operations, initialize m_op with it. Pass
m_op to function widest_fixed_size_mode_for_size.
(op_by_pieces_d::get_usable_mode): Pass m_op to function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
can_use_qi_vectors and qi_vector_mode_supported_p to do the
check.
(op_by_pieces_d::run): Pass m_op to function
widest_fixed_size_mode_for_size.
(move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES.
(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
(can_store_by_pieces): Pass the type of by pieces operations to
widest_fixed_size_mode_for_size.
(clear_by_pieces): Initialize class store_by_pieces_d with
CLEAR_BY_PIECES.
(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
COMPARE_BY_PIECES.
liuhongt [Wed, 18 Oct 2023 02:08:24 +0000 (10:08 +0800)]
Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big.
There's loop in vect_peel_nonlinear_iv_init to get init_expr *
pow (step_expr, skip_niters). When skipn_iters is too big, compile time
hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to
init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is
pow of 2, otherwise give up vectorization when skip_niters >=
TYPE_PRECISION (TREE_TYPE (init_expr)).
Also give up vectorization when niters_skip is negative which will be
used for fully masked loop.
gcc/ChangeLog:
PR tree-optimization/111820
PR tree-optimization/111833
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give
up vectorization for nonlinear iv vect_step_op_mul when
step_expr is not exact_log2 and niters is greater than
TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize
for nagative niters_skip which will be used by fully masked
loop.
(vect_can_advance_ivs_p): Pass whole phi_info to
vect_can_peel_nonlinear_iv_p.
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize
init_expr * pow (step_expr, skipn) to init_expr
<< (log2 (step_expr) * skipn) when step_expr is exact_log2.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr111820-1.c: New test.
* gcc.target/i386/pr111820-2.c: New test.
* gcc.target/i386/pr111820-3.c: New test.
* gcc.target/i386/pr103144-mul-1.c: Adjust testcase.
* gcc.target/i386/pr103144-mul-2.c: Adjust testcase.
Andrew Pinski [Wed, 18 Oct 2023 23:39:12 +0000 (16:39 -0700)]
aarch64: Emit csinv again for `a ? ~b : b` [PR110986]
After r14-3110-g7fb65f10285, the canonical form for
`a ? ~b : b` changed to be `-(a) ^ b` that means
for aarch64 we need to add a few new insn patterns
to be able to catch this and change it to be
what is the canonical form for the aarch64 backend.
A secondary pattern was needed to support a zero_extended
form too; this adds a testcase for all 3 cases.
Bootstrapped and tested on aarch64-linux-gnu with no regressions.
PR target/110986
gcc/ChangeLog:
* config/aarch64/aarch64.md (*cmov<mode>_insn_insv): New pattern.
(*cmov_uxtw_insn_insv): Likewise.
- Fix bootstrap failure with i686-darwin9.
```
Undefined symbols for architecture i386:
"gendocfile", referenced from:
__ZL20d_generate_ddoc_fileP6ModuleR9OutBuffer in d-lang.o
ld: symbol(s) not found for architecture i386
```
gcc/d/ChangeLog:
Patrick Palka [Sun, 22 Oct 2023 20:13:33 +0000 (16:13 -0400)]
objc++: type/expr tsubst conflation [PR111920]
After r14-4796-g3e3d73ed5e85e7, tsubst_copy_and_build (now named
tsubst_expr) no longer dispatches to tsubst for type trees, and
callers have to do it themselves if appropriate. This patch makes
some overlooked adjustments to Objective-C++-specific code paths.
PR objc++/111920
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) <case AT_ENCODE_EXPR>: Use tsubst instead
of tsubst_expr.
gcc/objcp/ChangeLog:
* objcp-lang.cc (objcp_tsubst_expr) <case CLASS_REFERENCE_EXPR>:
Use tsubst instead of tsubst_expr for type operands.
DYLD_LIBRARY_PATH is now removed from the environment for all system
tools, including the shell. Adapt the testsuite and pass the right
options to allow testing, even when the compiler and libraries have not
been installed.
gcc/ChangeLog:
* Makefile.in: set ENABLE_DARWIN_AT_RPATH in site.tmp.
Iain Sandoe [Sat, 28 May 2022 09:16:27 +0000 (10:16 +0100)]
Darwin, rpaths: Add --with-darwin-extra-rpath.
This is provided to allow distributions to add a single additional
runpath to allow for cases where the installed GCC library directories
are then symlinked to a common dirctory outside of any of the GCC
installations.
So that libraries which are designed to be found in the runpath we would then
add --with-darwin-add-rpath=/opt/distro/lib to the configure line.
This patch makes the configuration a little more forgiving of using
--disable-darwin-at-rpath (although for platform versions >= 10.11 this will
result in misconfigured target libraries).
Iain Sandoe [Sun, 28 Mar 2021 13:48:17 +0000 (14:48 +0100)]
Config,Darwin: Allow for configuring Darwin to use embedded runpath.
Recent Darwin versions place contraints on the use of run paths
specified in environment variables. This breaks some assumptions
in the GCC build.
This change allows the user to configure a Darwin build to use
'@rpath/libraryname.dylib' in library names and then to add an
embedded runpath to executables (and libraries with dependents).
The embedded runpath is added by default unless the user adds
'-nodefaultrpaths' to the link line.
For an installed compiler, it means that any executable built with
that compiler will reference the runtimes installed with the
compiler (equivalent to hard-coding the library path into the name
of the library).
During build-time configurations any "-B" entries will be added to
the runpath thus the newly-built libraries will be found by exes.
Since the install name is set in libtool, that decision needs to be
available here (but might also cause dependent ones in Makefiles,
so we need to export a conditional).
This facility is not available for Darwin 8 or earlier, however the
existing environment variable runpath does work there.
We default this on for systems where the external DYLD_LIBRARY_PATH
does not work and off for Darwin 8 or earlier. For systems that can
use either method, if the value is unset, we use the default (which
is currently DYLD_LIBRARY_PATH).
ChangeLog:
* configure: Regenerate.
* configure.ac: Do not add default runpaths to GCC exes
when we are building -static-libstdc++/-static-libgcc (the
default).
* libtool.m4: Add 'enable-darwin-at-runpath'. Act on the
enable flag to alter Darwin libraries to use @rpath names.
gcc/ChangeLog:
* aclocal.m4: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* config/darwin.h: Handle Darwin rpaths.
* config/darwin.opt: Handle Darwin rpaths.
* Makefile.in: Handle Darwin rpaths.
gcc/ada/ChangeLog:
* gcc-interface/Makefile.in: Handle Darwin rpaths.
gcc/jit/ChangeLog:
* Make-lang.in: Handle Darwin rpaths.
libatomic/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
libbacktrace/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
Iain Sandoe [Thu, 20 Dec 2018 09:00:38 +0000 (09:00 +0000)]
Driver: Provide a spec to insert rpaths for compiler lib dirs.
This provides a spec to insert "-rpath DDD" for each DDD corresponding
to a compiler startfile directory. This allows a target to use @rpath
as the install path for libraries, and have the compiler provide the
necessary rpath to handle this.
Embed real paths, not relative ones.
We embed a runpath for every path in which libraries might be found. This
change ensures that we embed the actual real path and not a relative one from
the compiler's version-specific directory.
This ensures that if we install, for example, 11.4.0 (and delete the 11.3.0
installation) exes built by 11.3 would continue to function (providing, of course
that 11.4 does not bump any SO names).
gcc/ChangeLog:
* gcc.cc (RUNPATH_OPTION): New.
(do_spec_1): Provide '%P' as a spec to insert rpaths for
each compiler startfile path.
Andrew Burgess [Sat, 5 Aug 2023 12:31:06 +0000 (14:31 +0200)]
libgcc: support heap-based trampolines
Add support for heap-based trampolines on x86_64-linux, aarch64-linux,
and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted functions for these targets.
Andrew Burgess [Sat, 5 Aug 2023 12:54:11 +0000 (14:54 +0200)]
core: Support heap-based trampolines
Generate heap-based nested function trampolines
Add support for allocating nested function trampolines on an
executable heap rather than on the stack. This is motivated by targets
such as AArch64 Darwin, which globally prohibit executing code on the
stack.
The target-specific routines for allocating and writing trampolines are
to be provided in libgcc.
The gcc flag -ftrampoline-impl controls whether to generate code
that instantiates trampolines on the stack, or to emit calls to
__builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted. Note that this flag is completely
independent of libgcc: If libgcc is for any reason missing those
symbols, you will get a link failure.
This implementation imposes some implicit restrictions as compared to
stack trampolines. longjmp'ing back to a state before a trampoline was
created will cause us to skip over the corresponding
__builtin_nested_func_ptr_deleted, which will leak trampolines
starting from the beginning of the linked list of allocated
trampolines. There may be scope for instrumenting longjmp/setjmp to
trigger cleanups of trampolines.
Jonathan Wakely [Fri, 29 Sep 2023 11:12:22 +0000 (12:12 +0100)]
libstdc++: Split std::basic_string::_M_use_local_data into two functions
This splits out the activate-the-union-member-for-constexpr logic from
_M_use_local_data, so that it can be used separately in cases that don't
need to use std::pointer_traits<pointer>::pointer_to to obtain the
return value.
This leaves only three uses of _M_use_local_data() which are all of the
same form:
Benjamin Brock [Fri, 20 Oct 2023 17:07:50 +0000 (18:07 +0100)]
libstdc++: Workaround for LLVM-61763 in <ranges>
This patch adds a small workaround that avoids declaring constrained
friends when compiling with Clang, instead making some members public.
MSVC's standard library has implemented a similar workaround.
Signed-off-by: Benjamin Brock <brock@cs.berkeley.edu>
libstdc++-v3/ChangeLog:
* include/std/ranges (zip_view, adjacent_view): Implement
workaround for LLVM-61763.
Dimitrij Mijoski [Wed, 18 Oct 2023 10:52:20 +0000 (12:52 +0200)]
libstdc++: testsuite: Enhance codecvt_unicode with tests for length()
We can test codecvt::length() with the same data that we test
codecvt::in(). For each call of in() we add another call to length().
Some additional small cosmentic changes are applied.
libstdc++-v3/ChangeLog:
* testsuite/22_locale/codecvt/codecvt_unicode.h: Test length()
The reason comes from the miniaml machine mode of QI is RVVMF8QI, which
is 1024 / 8 = 128 bits, aka the size of VNx16QI. When we set zvl2048b,
the bit size of RVVMFQI is 2048 / 8 = 256, which is not matching the
bit size of VNx16QI (128 bits).
Thus, this patch would like to enable the VLS mode for such case, aka
VNx16QI vls mode for zvl2048b.
Before this patch:
test:
srli a4,a1,40
andi a4,a4,0xff
srli a3,a1,32
srli a5,a1,48
slli a0,a4,8
andi a3,a3,0xff
andi a5,a5,0xff
slli a2,a5,16
or a0,a3,a0
srli a1,a1,56
or a0,a0,a2
slli a2,a1,24
slli a3,a3,32
or a0,a0,a2
slli a4,a4,40
or a0,a0,a3
slli a5,a5,48
or a0,a0,a4
or a0,a0,a5
slli a1,a1,56
or a0,a0,a1
mv a1,a0
ret
After this patch:
test:
vsetivli zero,16,e8,mf8,ta,ma
vle8.v v2,0(a1)
vsetivli zero,4,e32,mf2,ta,ma
vrgather.vi v1,v2,3
vsetivli zero,16,e8,mf8,ta,ma
vse8.v v1,0(a0)
ret
PR target/111857
gcc/ChangeLog:
* config/riscv/riscv-opts.h (TARGET_VECTOR_VLS): Remove.
* config/riscv/riscv-protos.h (vls_mode_valid_p): New func decl.
* config/riscv/riscv-v.cc (autovectorize_vector_modes): Replace
macro reference to func.
(vls_mode_valid_p): New func impl for vls mode valid or not.
* config/riscv/riscv-vector-switch.def (VLS_ENTRY): Replace
macro reference to func.
* config/riscv/vector-iterators.md: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/vls/def.h: Add help define.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-6.c: New test.
Roger Sayle [Fri, 20 Oct 2023 23:06:02 +0000 (00:06 +0100)]
PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md
This patch is the backend piece of a solution to PRs 101955 and 106245,
that adds a define_insn_and_split to the i386 backend, to perform sign
extension of a single (least significant) bit using and $1 then neg.
Not only is this smaller in size, but microbenchmarking confirms
that it's a performance win on both Intel and AMD; Intel sees only a
2% improvement (perhaps just a size effect), but AMD sees a 7% win.
2023-10-21 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split.
gcc/testsuite/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* gcc.target/i386/pr106245-2.c: New test case.
* gcc.target/i386/pr106245-3.c: New 32-bit test case.
* gcc.target/i386/pr106245-4.c: New 64-bit test case.
* gcc.target/i386/pr106245-5.c: Likewise.
Jason Merrill [Fri, 20 Oct 2023 20:23:43 +0000 (16:23 -0400)]
c++: abstract class and overload resolution
In my implementation of P0929 I treated a conversion to an rvalue of
abstract class type as a bad conversion, but that's still too soon to check
it; we need to wait until we're done with overload resolution.
gcc/cp/ChangeLog:
* call.cc (implicit_conversion_1): Rename...
(implicit_conversion): ...to this. Remove the old wrapper.
Jason Merrill [Thu, 5 Oct 2023 14:45:00 +0000 (10:45 -0400)]
c++: fix tourney logic
In r13-3766 I changed the logic at the end of tourney to avoid redundant
comparisons, but the change also meant skipping any less-good matches
between the champ_compared_to_predecessor candidate and champ itself.
This should not be a correctness issue, since we believe that joust is a
partial order. But it can lead to missed warnings, as in this testcase.
gcc/cp/ChangeLog:
* call.cc (tourney): Only skip champ_compared_to_predecessor.
Nathan Sidwell [Fri, 20 Oct 2023 16:20:37 +0000 (12:20 -0400)]
c++: Constructor streaming [PR105322]
An expresion node's type is streamed after the expression's operands,
because the type can come from some aspect of an operand (for instance
decltype and noexcept). There's a comment in the code explaining that.
But that doesn't work for constructors, which can directly reference
components of their type (eg FIELD_DECLS). If this is a
type-introducing CONSTRUCTOR, we need to ensure the type has been
streamed first. So move CONSTRUCTOR stream to after the type streaming.
The reason things like COMPONENT_REF work is that they stream their
first operand first, and that introduces the type that their second
operand looks up a field in.
Florian Weimer [Fri, 20 Oct 2023 19:27:52 +0000 (21:27 +0200)]
c: -Wincompatible-pointer-types should cover mismatches in ?:
gcc/c/
PR c/109826
PR other/44209
* c-typeck.cc (build_conditional_expr): Use
OPT_Wincompatible_pointer_types for pointer mismatches.
Emit location information for the operand.
Marek Polacek [Thu, 19 Oct 2023 20:32:10 +0000 (16:32 -0400)]
c-family: char8_t and aliasing in C vs C++ [PR111884]
In the PR, Joseph says that in C char8_t is not a distinct type. So
we should behave as if it can alias anything, like ordinary char.
In C, unsigned_char_type_node == char8_type_node, so with this patch
we return 0 instead of -1. And the following comment says:
/* The C standard guarantees that any object may be accessed via an
lvalue that has narrow character type (except char8_t). */
if (t == char_type_node
|| t == signed_char_type_node
|| t == unsigned_char_type_node)
return 0;
Which appears to be wrong, so I'm adjusting that as well.
PR c/111884
gcc/c-family/ChangeLog:
* c-common.cc (c_common_get_alias_set): Return -1 for char8_t only
in C++.
Patrick Palka [Fri, 20 Oct 2023 17:36:11 +0000 (13:36 -0400)]
rust: build failure after NON_DEPENDENT_EXPR removal [PR111899]
This patch removes stray NON_DEPENDENT_EXPR checks following the removal
of this tree code from the C++ FE. (Since this restores the build I
supppose it means the Rust FE never creates NON_DEPENDENT_EXPR trees in
the first place, so no further analysis is needed.)
Andre Vieira [Fri, 20 Oct 2023 16:02:32 +0000 (17:02 +0100)]
ifcvt: Don't lower bitfields with non-constant offsets [PR 111882]
This patch stops lowering of bitfields by ifcvt when they have non-constant
offsets as we are not likely to be able to do anything useful with those during
vectorization. That also fixes the issue reported in PR 111882, which was
being caused by an offset with a side-effect being lowered, but constants have
no side-effects so we will no longer run into that problem.
gcc/ChangeLog:
PR tree-optimization/111882
* tree-if-conv.cc (get_bitfield_rep): Return NULL_TREE for bitfields
with non-constant offsets.
Patrick Palka [Fri, 20 Oct 2023 15:22:11 +0000 (11:22 -0400)]
c++: rename tsubst_copy_and_build and tsubst_expr
After the previous patch, we now only have two tsubst entry points for
expression trees: tsubst_copy_and_build and tsubst_expr. The former
despite its unwieldy name is the main entry point, and the latter is
just a superset of the former that also handles statement trees. We
could merge them so that we just have tsubst_expr, but it seems natural
to distinguish statement trees from expression trees and to maintain a
separate entry point for them.
To that end, this this patch renames tsubst_copy_and_build to
tsubst_expr, and renames the current tsubst_expr to tsubst_stmt, which
continues to be a superset of the former (which is convenient since
sometimes expression trees appear in statement contexts, e.g. a branch
of an IF_STMT could be NOP_EXPR). (Making tsubst_stmt disjoint from
tsubst_expr is left as future work if deemed desirable.)
This patch in turn renames suitable existing uses of tsubst_expr (that
expect to take statement trees) to use tsubst_stmt. Thus untouched
tsubst_expr calls are implicitly strengthened to expect only expression
trees after this patch. For the tsubst_omp_* routines I opted to rename
all existing uses to ensure no unintended functional change. This patch
also moves the handling of CO_YIELD_EXPR and CO_AWAIT_EXPR from tsubst_stmt
to tsubst_expr since they're indeed expression trees.
gcc/cp/ChangeLog:
* cp-lang.cc (objcp_tsubst_copy_and_build): Rename to ...
(objcp_tsubst_expr): ... this.
* cp-objcp-common.h (objcp_tsubst_copy_and_build): Rename to ...
(objcp_tsubst_expr): ... this.
* cp-tree.h (tsubst_copy_and_build): Remove declaration.
* init.cc (maybe_instantiate_nsdmi_init): Use tsubst_expr
instead of tsubst_copy_and_build.
* pt.cc (expand_integer_pack): Likewise.
(instantiate_non_dependent_expr_internal): Likewise.
(instantiate_class_template): Use tsubst_stmt instead of
tsubst_expr for STATIC_ASSERT.
(tsubst_function_decl): Adjust tsubst_copy_and_build uses.
(tsubst_arg_types): Likewise.
(tsubst_exception_specification): Likewise.
(tsubst_tree_list): Likewise.
(tsubst): Likewise.
(tsubst_name): Likewise.
(tsubst_omp_clause_decl): Use tsubst_stmt instead of tsubst_expr.
(tsubst_omp_clauses): Likewise.
(tsubst_copy_asm_operands): Adjust tsubst_copy_and_build use.
(tsubst_omp_for_iterator): Use tsubst_stmt instead of tsubst_expr.
(tsubst_expr): Rename to ...
(tsubst_stmt): ... this.
<case CO_YIELD_EXPR, CO_AWAIT_EXPR>: Move to tsubst_expr.
(tsubst_omp_udr): Use tsubst_stmt instead of tsubst_expr.
(tsubst_non_call_postfix_expression): Adjust tsubst_copy_and_build
use.
(tsubst_lambda_expr): Likewise. Use tsubst_stmt instead of
tsubst_expr for the body of a lambda.
(tsubst_copy_and_build_call_args): Rename to ...
(tsubst_call_args): ... this. Adjust tsubst_copy_and_build use.
(tsubst_copy_and_build): Rename to tsubst_expr. Adjust
tsubst_copy_and_build and tsubst_copy_and_build_call_args use.
<case TRANSACTION_EXPR>: Use tsubst_stmt instead of tsubst_expr.
(maybe_instantiate_noexcept): Adjust tsubst_copy_and_build use.
(instantiate_body): Use tsubst_stmt instead of tsubst_expr for
substituting the function body.
(tsubst_initializer_list): Adjust tsubst_copy_and_build use.
Patrick Palka [Fri, 20 Oct 2023 15:21:54 +0000 (11:21 -0400)]
c++: merge tsubst_copy into tsubst_copy_and_build
The relationship between tsubst_copy_and_build and tsubst_copy (two of
the main template argument substitution routines for expression trees)
is rather hazy. The former is mostly a superset of the latter, with
some differences.
The main apparent difference is their handling of various tree codes,
but much of the tree code handling in tsubst_copy appears to be dead
code. This is because tsubst_copy mostly gets (directly) called on
id-expressions rather than on arbitrary expressions. The interesting
tree codes are PARM_DECL, VAR_DECL, BIT_NOT_EXPR, SCOPE_REF,
TEMPLATE_ID_EXPR and IDENTIFIER_NODE:
* for PARM_DECL and VAR_DECL, tsubst_copy_and_build calls tsubst_copy
followed by doing some extra handling of its own
* for BIT_NOT_EXPR tsubst_copy implicitly handles unresolved destructor
calls (i.e. the first operand is an identifier or a type)
* for SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE tsubst_copy
refrains from doing name lookup of the terminal name
Other more minor differences are that tsubst_copy exits early when
'args' is null, and it calls maybe_dependent_member_ref, and finally
it dispatches to tsubst for type trees.[1]
Thus tsubst_copy is similar enough to tsubst_copy_and_build that it
makes sense to merge the two functions, with the main difference we
want to preserve is tsubst_copy's lack of name lookup for id-expressions.
This patch achieves this via a new tsubst flag tf_no_name_lookup which
controls name lookup and resolution of a (top-level) id-expression.
[1]: Exiting early for null 'args' doesn't seem right since it means we
return templated trees even when !processing_template_decl. And
dispatching to tsubst for type trees muddles the distinction between
type and expressions which makes things less clear at the call site.
So these properties of tsubst_copy don't seem worth preserving.
N.B. the diff for this patch looks much cleaner when generated using
the "patience diff" algorithm via Git's --patience flag.
gcc/cp/ChangeLog:
* cp-tree.h (enum tsubst_flags): Add tf_no_name_lookup.
* pt.cc (tsubst_pack_expansion): Use tsubst for substituting
BASES_TYPE.
(tsubst_decl) <case USING_DECL>: Use tsubst_name instead of
tsubst_copy.
(tsubst) <case TEMPLATE_TYPE_PARM>: Use tsubst_copy_and_build
instead of tsubst_copy for substituting
CLASS_PLACEHOLDER_TEMPLATE.
<case TYPENAME_TYPE>: Use tsubst_name instead of tsubst_copy for
substituting TYPENAME_TYPE_FULLNAME.
(tsubst_name): Define.
(tsubst_qualified_id): Use tsubst_name instead of tsubst_copy
for substituting the component name of a SCOPE_REF.
(tsubst_copy): Remove.
(tsubst_copy_and_build): Clear tf_no_name_lookup at the start,
and remember if it was set. Call maybe_dependent_member_ref if
tf_no_name_lookup was not set.
<case IDENTIFIER_NODE>: Don't do name lookup if tf_no_name_lookup
was set.
<case TEMPLATE_ID_EXPR>: If tf_no_name_lookup was set, use
tsubst_name instead of tsubst_copy_and_build to substitute the
template and don't finish the template-id.
<case BIT_NOT_EXPR>: Handle identifier and type operand (if
tf_no_name_lookup was set).
<case SCOPE_REF>: Avoid trying to resolve a SCOPE_REF if
tf_no_name_lookup was set by calling build_qualified_name directly
instead of tsubst_qualified_id.
<case SIZEOF_EXPR>: Handling of sizeof... copied from tsubst_copy.
<case CALL_EXPR>: Use tsubst_name instead of tsubst_copy to
substitute a TEMPLATE_ID_EXPR callee naming an unresolved template.
<case COMPONENT_REF>: Likewise to substitute the member.
<case FUNCTION_DECL>: Copied from tsubst_copy and merged with ...
<case VAR_DECL, PARM_DECL>: ... these. Initial handling copied
from tsubst_copy. Optimize local variable substitution by
trying retrieve_local_specialization before checking
uses_template_parms.
<case CONST_DECL>: Copied from tsubst_copy.
<case FIELD_DECL>: Likewise.
<case NAMESPACE_DECL>: Likewise.
<case OVERLOAD>: Likewise.
<case TEMPLATE_DECL>: Likewise.
<case TEMPLATE_PARM_INDEX>: Likewise.
<case TYPE_DECL>: Likewise.
<case CLEANUP_POINT_EXPR>: Likewise.
<case OFFSET_REF>: Likewise.
<case EXPR_PACK_EXPANSION>: Likewise.
<case NONTYPE_ARGUMENT_PACK>: Likewise.
<case *_CST>: Likewise.
<case *_*_FOLD_EXPR>: Likewise.
<case DEBUG_BEGIN_STMT>: Likewise.
<case CO_AWAIT_EXPR>: Likewise.
<case TRAIT_EXPR>: Use tsubst and tsubst_copy_and_build instead
of tsubst_copy.
<default>: Copied from tsubst_copy.
(tsubst_initializer_list): Use tsubst and tsubst_copy_and_build
instead of tsubst_copy.
In cp_parser_postfix_expression, and in the CALL_EXPR case of
tsubst_copy_and_build, we essentially repeat the type-dependent and
COMPONENT_REF callee cases of finish_call_expr. This patch deduplicates
this logic by making both spots consistently go through finish_call_expr.
This allows us to easily fix PR106086 -- which is about us neglecting to
capture 'this' when we resolve a use of a non-static member function of
the current instantiation only at lambda regeneration time -- by moving
the call to maybe_generic_this_capture from the parser to finish_call_expr
so that we consider capturing 'this' at regeneration time as well.
PR c++/106086
gcc/cp/ChangeLog:
* parser.cc (cp_parser_postfix_expression): Consolidate three
calls to finish_call_expr, one to build_new_method_call and
one to build_min_nt_call_vec into one call to finish_call_expr.
Don't call maybe_generic_this_capture here.
* pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Remove
COMPONENT_REF callee handling.
(type_dependent_expression_p): Use t_d_object_e_p instead of
t_d_e_p for COMPONENT_REF and OFFSET_REF.
* semantics.cc (finish_call_expr): In the type-dependent case,
call maybe_generic_this_capture here instead.
gcc/testsuite/ChangeLog:
* g++.dg/template/crash127.C: Expect additional error due to
being able to check the member access expression ahead of time.
Strengthen the test by not instantiating the class template.
* g++.dg/cpp1y/lambda-generic-this5.C: New test.
Patrick Palka [Fri, 20 Oct 2023 14:45:00 +0000 (10:45 -0400)]
c++: remove NON_DEPENDENT_EXPR, part 1
This tree code dates all the way back to r69130[1] which implemented
typing of non-dependent expressions. Its motivation was never clear (to
me at least) since its documentation in e.g. cp-tree.def doesn't seem
accurate anymore. build_non_dependent_expr has since gained a bunch of
edge cases about whether or how to wrap certain templated trees, making
it hard to reason about in general.
So this patch removes this tree code, and temporarily turns
build_non_dependent_expr into the identity function. The subsequent
patch will remove build_non_dependent_expr and adjust its callers
appropriately.
We now need to more thoroughly handle templated (sub)trees in a couple
of places which previously didn't need to since they didn't look through
NON_DEPENDENT_EXPR.
Tamar Christina [Fri, 20 Oct 2023 13:58:39 +0000 (14:58 +0100)]
middle-end: don't pass loop_vinfo to vect_set_loop_condition during prolog peeling
During the refactoring I had passed loop_vinfo on to vect_set_loop_condition
during prolog peeling. This parameter is unused in most cases except for in
vect_set_loop_condition_partial_vectors where it's behaviour depends on whether
loop_vinfo is NULL or not. Apparently this code expect it to be NULL and it
reads the structures from a different location.
This fixes the failing testcase which was not using the lens values determined
earlier in vectorizable_store because it was looking it up in the given
loop_vinfo instead.
gcc/ChangeLog:
PR tree-optimization/111866
* tree-vect-loop-manip.cc (vect_do_peeling): Pass null as vinfo to
vect_set_loop_condition during prolog peeling.
The following fixes a missed check in the simple_iv attempt
to simplify (signed T)((unsigned T) base + step) where it
allows a truncating inner conversion leading to wrong code.
PR tree-optimization/111445
* tree-scalar-evolution.cc (simple_iv_with_niters):
Add missing check for a sign-conversion.
The following addresses IVOPTs rewriting expressions in its
strip_offset without caring for definedness of overflow. Rather
than the earlier attempt of just using the proper
split_constant_offset from data-ref analysis the following adjusts
IVOPTs helper trying to minimize changes from this fix, possibly
easing backports.
PR tree-optimization/110243
PR tree-optimization/111336
* tree-ssa-loop-ivopts.cc (strip_offset_1): Rewrite
operations with undefined behavior on overflow to
unsigned arithmetic.
* gcc.dg/torture/pr110243.c: New testcase.
* gcc.dg/torture/pr111336.c: Likewise.
Richard Biener [Fri, 20 Oct 2023 10:22:52 +0000 (12:22 +0200)]
tree-optimization/111891 - fix assert in vectorizable_simd_clone_call
The following fixes the assert in vectorizable_simd_clone_call to
assert we have a vector type during transform. Whether we have
one during analysis depends on whether another SLP user decided
on the type of a constant/external already. When we end up with
a mismatch in desire the updating will fail and make vectorization
fail.
Andrew Stubbs [Tue, 26 Sep 2023 11:22:36 +0000 (12:22 +0100)]
amdgcn: add -march=gfx1030 EXPERIMENTAL
Accept the architecture configure option and resolve build failures. This is
enough to build binaries, but I've not got a device to test it on, so there
are probably runtime issues to fix. The cache control instructions might be
unsafe (or too conservative), and the kernel metadata might be off. Vector
reductions will need to be reworked for RDNA2. In principle, it would be
better to use wavefrontsize32 for this architecture, but that would mean
switching everything to allow SImode masks, so wavefrontsize64 it is.
The multilib is not included in the default configuration so either configure
--with-arch=gfx1030 or include it in --with-multilib-list=gfx1030,....
The majority of this patch has no effect on other devices, but changing from
using scalar writes for the exit value to vector writes means we don't need
the scalar cache write-back instruction anywhere (which doesn't exist in RDNA2).
gcc/ChangeLog:
* config.gcc: Allow --with-arch=gfx1030.
* config/gcn/gcn-hsa.h (NO_XNACK): gfx1030 does not support xnack.
(ASM_SPEC): gfx1030 needs -mattr=+wavefrontsize64 set.
* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1030.
(TARGET_GFX1030): New.
(TARGET_RDNA2): New.
* config/gcn/gcn-valu.md (@dpp_move<mode>): Disable for RDNA2.
(addc<mode>3<exec_vcc>): Add RDNA2 syntax variant.
(subc<mode>3<exec_vcc>): Likewise.
(<convop><mode><vndi>2_exec): Add RDNA2 alternatives.
(vec_cmp<mode>di): Likewise.
(vec_cmp<u><mode>di): Likewise.
(vec_cmp<mode>di_exec): Likewise.
(vec_cmp<u><mode>di_exec): Likewise.
(vec_cmp<mode>di_dup): Likewise.
(vec_cmp<mode>di_dup_exec): Likewise.
(reduc_<reduc_op>_scal_<mode>): Disable for RDNA2.
(*<reduc_op>_dpp_shr_<mode>): Likewise.
(*plus_carry_dpp_shr_<mode>): Likewise.
(*plus_carry_in_dpp_shr_<mode>): Likewise.
* config/gcn/gcn.cc (gcn_option_override): Recognise gfx1030.
(gcn_global_address_p): RDNA2 only allows smaller offsets.
(gcn_addr_space_legitimate_address_p): Likewise.
(gcn_omp_device_kind_arch_isa): Recognise gfx1030.
(gcn_expand_epilogue): Use VGPRs instead of SGPRs.
(output_file_start): Configure gfx1030.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __RDNA2__;
(ASSEMBLER_DIALECT): New.
* config/gcn/gcn.md (rdna): New define_attr.
(enabled): Use "rdna" attribute.
(gcn_return): Remove s_dcache_wb.
(addcsi3_scalar): Add RDNA2 syntax variant.
(addcsi3_scalar_zero): Likewise.
(addptrdi3): Likewise.
(mulsi3): v_mul_lo_i32 should be v_mul_lo_u32 on all ISA.
(*memory_barrier): Add RDNA2 syntax variant.
(atomic_load<mode>): Add RDNA2 cache control variants, and disable
scalar atomics for RDNA2.
(atomic_store<mode>): Likewise.
(atomic_exchange<mode>): Likewise.
* config/gcn/gcn.opt (gpu_type): Add gfx1030.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1030): New.
(main): Recognise -march=gfx1030.
* config/gcn/t-omp-device: Add gfx1030 isa.
libgcc/ChangeLog:
* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Set false for __RDNA2__.
Richard Biener [Fri, 20 Oct 2023 09:54:07 +0000 (11:54 +0200)]
tree-optimization/111000 - restrict invariant motion of shifts
The following restricts moving variable shifts to when they are
always executed in the loop as we currently do not have an efficient
way to rewrite them to something that is unconditionally
well-defined and value range analysis will otherwise compute
invalid ranges for the shift operand.
Alexandre Oliva [Fri, 20 Oct 2023 10:50:33 +0000 (07:50 -0300)]
Control flow redundancy hardening
This patch introduces an optional hardening pass to catch unexpected
execution flows. Functions are transformed so that basic blocks set a
bit in an automatic array, and (non-exceptional) function exit edges
check that the bits in the array represent an expected execution path
in the CFG.
Functions with multiple exit edges, or with too many blocks, call an
out-of-line checker builtin implemented in libgcc. For simpler
functions, the verification is performed in-line.
-fharden-control-flow-redundancy enables the pass for eligible
functions, --param hardcfr-max-blocks sets a block count limit for
functions to be eligible, and --param hardcfr-max-inline-blocks
tunes the "too many blocks" limit for in-line verification.
-fhardcfr-skip-leaf makes leaf functions non-eligible.
Additional -fhardcfr-check-* options are added to enable checking at
exception escape points, before potential sibcalls, hereby dubbed
returning calls, and before noreturn calls and exception raises. A
notable case is the distinction between noreturn calls expected to
throw and those expected to terminate or loop forever: the default
setting for -fhardcfr-check-noreturn-calls, no-xthrow, performs
checking before the latter, but the former only gets checking in the
exception handler. GCC can only tell between them by explicit marking
noreturn functions expected to raise with the newly-introduced
expected_throw attribute, and corresponding ECF_XTHROW flag.
for gcc/ChangeLog
* tree-core.h (ECF_XTHROW): New macro.
* tree.cc (set_call_expr): Add expected_throw attribute when
ECF_XTHROW is set.
(build_common_builtin_node): Add ECF_XTHROW to
__cxa_end_cleanup and _Unwind_Resume or _Unwind_SjLj_Resume.
* calls.cc (flags_from_decl_or_type): Check for expected_throw
attribute to set ECF_XTHROW.
* gimple.cc (gimple_build_call_from_tree): Propagate
ECF_XTHROW from decl flags to gimple call...
(gimple_call_flags): ... and back.
* gimple.h (GF_CALL_XTHROW): New gf_mask flag.
(gimple_call_set_expected_throw): New.
(gimple_call_expected_throw_p): New.
* Makefile.in (OBJS): Add gimple-harden-control-flow.o.
* builtins.def (BUILT_IN___HARDCFR_CHECK): New.
* common.opt (fharden-control-flow-redundancy): New.
(-fhardcfr-check-returning-calls): New.
(-fhardcfr-check-exceptions): New.
(-fhardcfr-check-noreturn-calls=*): New.
(Enum hardcfr_check_noreturn_calls): New.
(fhardcfr-skip-leaf): New.
* doc/invoke.texi: Document them.
(hardcfr-max-blocks, hardcfr-max-inline-blocks): New params.
* flag-types.h (enum hardcfr_noret): New.
* gimple-harden-control-flow.cc: New.
* params.opt (-param=hardcfr-max-blocks=): New.
(-param=hradcfr-max-inline-blocks=): New.
* passes.def (pass_harden_control_flow_redundancy): Add.
* tree-pass.h (make_pass_harden_control_flow_redundancy):
Declare.
* doc/extend.texi: Document expected_throw attribute.
for gcc/ada/ChangeLog
* gcc-interface/trans.cc (gigi): Mark __gnat_reraise_zcx with
ECF_XTHROW.
(build_raise_check): Likewise for all rcheck subprograms.
Alex Coplan [Fri, 20 Oct 2023 10:46:27 +0000 (11:46 +0100)]
rtl-ssa: Don't leave NOTE_INSN_DELETED around
This patch tweaks change_insns to also call ::remove_insn to ensure the
underlying RTL insn gets removed from the insn chain in the case of a
deletion.
This avoids leaving NOTE_INSN_DELETED around after deleting insns.
For movement, the RTL insn chain is updated earlier in change_insns with
the call to move_insn. For deletion, it seems reasonable to do it here.
gcc/ChangeLog:
* rtl-ssa/changes.cc (function_info::change_insns): Ensure we call
::remove_insn on deleted insns.
Richard Biener [Fri, 20 Oct 2023 08:25:31 +0000 (10:25 +0200)]
Rewrite more refs for epilogue vectorization
The following makes sure to rewrite all gather/scatter detected by
dataref analysis plus stmts classified as VMAT_GATHER_SCATTER. Maybe
we need to rewrite all refs, the following covers the cases I've
run into now.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Rewrite
both STMT_VINFO_GATHER_SCATTER_P and VMAT_GATHER_SCATTER
stmt refs.
Richard Biener [Fri, 20 Oct 2023 07:30:45 +0000 (09:30 +0200)]
Fixup vect_get_and_check_slp_defs for gathers and .MASK_LOAD
I went a little bit too simple with implementing SLP gather support
for emulated and builtin based gathers. The following fixes the
conflict that appears when running into .MASK_LOAD where we rely
on vect_get_operand_map and the bolted-on STMT_VINFO_GATHER_SCATTER_P
checking wrecks that. The following properly integrates this with
vect_get_operand_map, adding another special index refering to
the vect_check_gather_scatter analyzed offset.
This unbreaks aarch64 (and hopefully riscv), I'll followup with
more fixes and testsuite coverage for x86 where I think I got
masked gather SLP support wrong.
* tree-vect-slp.cc (off_map, off_op0_map, off_arg2_map,
off_arg3_arg2_map): New.
(vect_get_operand_map): Get flag whether the stmt was
recognized as gather or scatter and use the above
accordingly.
(vect_get_and_check_slp_defs): Adjust.
(vect_build_slp_tree_2): Likewise.
Tobias Burnus [Fri, 20 Oct 2023 08:56:39 +0000 (10:56 +0200)]
omp_lib.f90.in: Deprecate omp_lock_hint_* for OpenMP 5.0
The omp_lock_hint_* parameters were deprecated in favor of
omp_sync_hint_*. While omp.h contained deprecation markers for those,
the omp_lib module only contained them for omp_{g,s}_nested.
Note: The -Wdeprecated-declarations warning will only become active once
openmp_version / _OPENMP is bumped from 201511 (4.5) to 201811 (5.0).
libgomp/ChangeLog:
* omp_lib.f90.in: Tag omp_lock_hint_* as being deprecated when
_OPENMP >= 201811.
Tamar Christina [Fri, 20 Oct 2023 07:09:45 +0000 (08:09 +0100)]
ifcvt: Support bitfield lowering of multiple-exit loops
With the patch enabling the vectorization of early-breaks, we'd like to allow
bitfield lowering in such loops, which requires the relaxation of allowing
multiple exits when doing so. In order to avoid a similar issue to PR107275,
the code that rejects loops with certain types of gimple_stmts was hoisted from
'if_convertible_loop_p_1' to 'get_loop_body_in_if_conv_order', to avoid trying
to lower bitfields in loops we are not going to vectorize anyway.
This also ensures 'ifcvt_local_dec' doesn't accidentally remove statements it
shouldn't as it will never come across them. I made sure to add a comment to
make clear that there is a direct connection between the two and if we were to
enable vectorization of any other gimple statement we should make sure both
handle it.
gcc/ChangeLog:
* tree-if-conv.cc (if_convertible_loop_p_1): Move check from here ...
(get_loop_body_if_conv_order): ... to here.
(if_convertible_loop_p): Remove single_exit check.
(tree_if_conversion): Move single_exit check to if-conversion part and
support multiple exits.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-bitfield-read-1-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-2-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-8.c: New test.
* gcc.dg/vect/vect-bitfield-read-9.c: New test.
Co-Authored-By: Andre Vieira <andre.simoesdiasvieira@arm.com>
Tamar Christina [Fri, 20 Oct 2023 07:08:54 +0000 (08:08 +0100)]
middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds
The bitfield vectorization support does not currently recognize bitfields inside
gconds. This means they can't be used as conditions for early break
vectorization which is a functionality we require.
This adds support for them by explicitly matching and handling gcond as a
source.
Testcases are added in the testsuite update patch as the only way to get there
is with the early break vectorization. See tests:
- vect-early-break_20.c
- vect-early-break_21.c
gcc/ChangeLog:
* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
from original statement.
(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.
Co-Authored-By: Andre Vieira <andre.simoesdiasvieira@arm.com>
Hu, Lin1 [Wed, 11 Oct 2023 08:03:17 +0000 (16:03 +0800)]
Fix testcases that are raised by support -mevex512
Hi, all
This patch aims to fix some scan-asm fail of pr89229-{5,6,7}b.c since we emit
scalar vmov{s,d} here, when trying to use x/ymm 16+ w/o avx512vl but with
avx512f+evex512.
If everyone has no objection to the modification of this behavior, then we tend
to solve these failures by modifying these testcases.
Note that this patch triggers multiple FAILs:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test
They failed are all because of bugs on VSETVL PASS:
10dd4: 0c707057 vsetvli zero,zero,e8,mf2,ta,ma
10dd8: 5e06b8d7 vmv.v.i v17,13
10ddc: 9ed030d7 vmv1r.v v1,v13
10de0: b21040d7 vncvt.x.x.w v1,v1 ----> raise illegal instruction since we don't have SEW = 8 -> SEW = 4 narrowing.
10de4: 5e0785d7 vmv.v.v v11,v15
Confirm the recent VSETVL refactor patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633231.html fixed all of them.
So this patch should be committed after the VSETVL refactor patch.
PR target/111848
gcc/ChangeLog:
* config/riscv/riscv-selftests.cc (run_const_vector_selftests): Adapt selftest.
* config/riscv/riscv-v.cc (expand_const_vector): Change it into vec_duplicate splitter.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: New test.
Lehua Ding [Fri, 20 Oct 2023 02:22:43 +0000 (10:22 +0800)]
RISC-V: Refactor and cleanup vsetvl pass
This patch refactors and cleanups the vsetvl pass in order to make the code
easier to modify and understand. This patch does several things:
1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3 only maintain
and modify this virtual CFG. Phase 4 performs insertion, modification and
deletion of vsetvl insns based on the virtual CFG. The basic block in the
virtual CFG is called vsetvl_block_info and the vsetvl information inside
is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the demand system,
this phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to uplift vsetvl
info to a pred basic block to a more unified method that there is a vsetvl
info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and Phase 5.
Phase 4 is responsible for inserting, modifying and deleting vsetvl
instructions based on fully optimized vsetvl infos. Phase 5 removes the avl
operand from the RVV instruction and removes the unused dest operand
register from the vsetvl insns.
These modifications resulted in some testcases needing to be updated. The reasons
for updating are summarized below:
1. more optimized
vlmax_back_prop-{25,26}.c
vlmax_conflict-{3,12}.c/vsetvl-{13,23}.c/vsetvl-23.c/
avl_single-{23,84,95}.c/pr109773-1.c
2. less unnecessary fusion
avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
scalar_move-1.c
4. add some bugfix testcases.
pr111037-{3,4}.c/pr111037-4.c
avl_single-{89,104,105,106,107,108,109}.c
Alexandre Oliva [Fri, 20 Oct 2023 03:35:17 +0000 (00:35 -0300)]
return edge in make_eh_edges
The need to initialize edge probabilities has made make_eh_edges
undesirably hard to use. I suppose we don't want make_eh_edges to
initialize the probability of the newly-added edge itself, so that the
caller takes care of it, but identifying the added edge in need of
adjustments is inefficient and cumbersome. Change make_eh_edges so
that it returns the added edge.
for gcc/ChangeLog
* tree-eh.cc (make_eh_edges): Return the new edge.
* tree-eh.h (make_eh_edges): Likewise.
Nathaniel Shead [Thu, 12 Oct 2023 08:53:55 +0000 (19:53 +1100)]
c++: indirect change of active union member in constexpr [PR101631,PR102286]
This patch adds checks for attempting to change the active member of a
union by methods other than a member access expression.
To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
patch redoes the solution for c++/59950 to avoid extranneous *&; it
seems that the only case that needed the workaround was when copying
empty classes.
This patch also ensures that constructors for a union field mark that
field as the active member before entering the call itself; this ensures
that modifications of the field within the constructor's body don't
cause false positives (as these will not appear to be member access
expressions). This means that we no longer need to start the lifetime of
empty union members after the constructor body completes.
As a drive-by fix, this patch also ensures that value-initialised unions
are considered to have activated their initial member for the purpose of
checking stores and accesses, which catches some additional mistakes
pre-C++20.
PR c++/101631
PR c++/102286
gcc/cp/ChangeLog:
* call.cc (build_over_call): Fold more indirect refs for trivial
assignment op.
* class.cc (type_has_non_deleted_trivial_default_ctor): Create.
* constexpr.cc (cxx_eval_call_expression): Start lifetime of
union member before entering constructor.
(cxx_eval_component_reference): Check against first member of
value-initialised union.
(cxx_eval_store_expression): Activate member for
value-initialised union. Check for accessing inactive union
member indirectly.
* cp-tree.h (type_has_non_deleted_trivial_default_ctor):
Forward declare.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
* g++.dg/cpp1y/constexpr-union6.C: New test.
* g++.dg/cpp1y/constexpr-union7.C: New test.
* g++.dg/cpp2a/constexpr-union2.C: New test.
* g++.dg/cpp2a/constexpr-union3.C: New test.
* g++.dg/cpp2a/constexpr-union4.C: New test.
* g++.dg/cpp2a/constexpr-union5.C: New test.
* g++.dg/cpp2a/constexpr-union6.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Tue, 10 Oct 2023 23:57:06 +0000 (10:57 +1100)]
c++: Improve diagnostics for constexpr cast from void*
This patch improves the errors given when casting from void* in C++26 to
include the expected type if the types of the pointed-to objects were
not similar. It also ensures (for all standard modes) that void* casts
are checked even for DECL_ARTIFICIAL declarations, such as
lifetime-extended temporaries, and is only ignored for cases where we
know it's OK (e.g. source_location::current) or have no other choice
(heap-allocated data).
gcc/cp/ChangeLog:
* constexpr.cc (is_std_source_location_current): New.
(cxx_eval_constant_expression): Only ignore cast from void* for
specific cases and improve other diagnostics.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-cast4.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Marek Polacek <polacek@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Marek Polacek [Thu, 19 Oct 2023 13:57:53 +0000 (09:57 -0400)]
c++: small tweak for cp_fold_r
This patch is an optimization tweak for cp_fold_r. If we cp_fold_r the
COND_EXPR's op0 first, we may be able to evaluate it to a constant if -O.
cp_fold has:
3143 if (callee && DECL_DECLARED_CONSTEXPR_P (callee)
3144 && !flag_no_inline)
...
3151 r = maybe_constant_value (x, /*decl=*/NULL_TREE,
flag_no_inline is 1 for -O0:
1124 if (opts->x_optimize == 0)
1125 {
1126 /* Inlining does not work if not optimizing,
1127 so force it not to be done. */
1128 opts->x_warn_inline = 0;
1129 opts->x_flag_no_inline = 1;
1130 }
but otherwise it's 0 and cp_fold will maybe_constant_value calls to
constexpr functions. And if it doesn't, then folding the COND_EXPR
will keep both arms, and we can avoid calling maybe_constant_value.
Andre Vieira [Thu, 19 Oct 2023 17:28:28 +0000 (18:28 +0100)]
vect: Use inbranch simdclones in masked loops
This patch enables the compiler to use inbranch simdclones when generating
masked loops in autovectorization.
gcc/ChangeLog:
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
compatible with mask parameters in clone.
* tree-vect-stmts.cc (vect_build_all_ones_mask): Allow vector boolean
typed masks.
(vectorizable_simd_clone_call): Enable the use of masked clones in
fully masked loops.
When analyzing a loop and choosing a simdclone to use it is possible to choose
a simdclone that cannot be used 'inbranch' for a loop that can use partial
vectors. This may lead to the vectorizer deciding to use partial vectors which
are not supported for notinbranch simd clones. This patch fixes that by
disabling the use of partial vectors once a notinbranch simd clone has been
selected.
gcc/ChangeLog:
PR tree-optimization/110485
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
vectors usage if a notinbranch simdclone has been selected.
Andre Vieira [Thu, 19 Oct 2023 17:30:15 +0000 (18:30 +0100)]
vect: Fix vect_get_smallest_scalar_type for simd clones
The vect_get_smallest_scalar_type helper function was using any argument to a
simd clone call when trying to determine the smallest scalar type that would be
vectorized. This included the function pointer type in a MASK_CALL for
instance, and would result in the wrong type being selected. Instead this
patch special cases simd_clone_call's and uses only scalar types of the
original function that get transformed into vector types.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Special case
simd clone calls and only use types that are mapped to vectors.
(simd_clone_call_p): New helper function.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
between targets with different pointer sizes.
* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
Andre Vieira [Thu, 19 Oct 2023 17:26:45 +0000 (18:26 +0100)]
parloops: Copy target and optimizations when creating a function clone
SVE simd clones require to be compiled with a SVE target enabled or the argument
types will not be created properly. To achieve this we need to copy
DECL_FUNCTION_SPECIFIC_TARGET from the original function declaration to the
clones. I decided it was probably also a good idea to copy
DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original function is meant to
be compiled with specific optimization options.
gcc/ChangeLog:
* tree-parloops.cc (create_loop_fn): Copy specific target and
optimization options to clone.
Andrew Pinski [Thu, 19 Oct 2023 05:42:02 +0000 (05:42 +0000)]
c: Fix ICE when an argument was an error mark [PR100532]
In the case of convert_argument, we would return the same expression
back rather than error_mark_node after the error message about
trying to convert to an incomplete type. This causes issues in
the gimplfier trying to see if another conversion is needed.
The code here dates back to before the revision history too so
it might be the case it never noticed we should return an error_mark_node.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR c/100532
gcc/c/ChangeLog:
* c-typeck.cc (convert_argument): After erroring out
about an incomplete type return error_mark_node.
Andrew Pinski [Thu, 19 Oct 2023 03:49:05 +0000 (20:49 -0700)]
c: Don't warn about converting NULL to different sso endian [PR104822]
In a similar way we don't warn about NULL pointer constant conversion to
a different named address we should not warn to a different sso endian
either.
This adds the simple check.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR c/104822
gcc/c/ChangeLog:
* c-typeck.cc (convert_for_assignment): Check for null pointer
before warning about an incompatible scalar storage order.
gcc/testsuite/ChangeLog:
* gcc.dg/sso-18.c: New test.
* gcc.dg/sso-19.c: New test.
Jason Merrill [Thu, 19 Oct 2023 15:23:03 +0000 (11:23 -0400)]
diagnostic: rename new permerror overloads
While checking another change, I noticed that the new permerror overloads
break gettext with "permerror used incompatibly as both
--keyword=permerror:2 --flag=permerror:2:gcc-internal-format and
--keyword=permerror:3 --flag=permerror:3:gcc-internal-format". So let's
change the name.
gcc/ChangeLog:
* diagnostic-core.h (permerror): Rename new overloads...
(permerror_opt): To this.
* diagnostic.cc: Likewise.
Jason Merrill [Wed, 18 Oct 2023 18:10:39 +0000 (14:10 -0400)]
c++: use G_ instead of _
Since these strings are passed to error_at, they should be marked for
translation with G_, like other diagnostic messages, rather than _, which
forces immediate (redundant) translation. The use of N_ is less
problematic, but also imprecise.
SPARK RM 6.1.11 introduces a new aspect Side_Effects to denote
those functions which may have output parameters, write global
variables, raise exceptions and not terminate. This adds support
for this aspect and the corresponding pragma in the frontend.
Handling of this aspect in the frontend is very similar to
the handling of aspect Extensions_Visible: both are Boolean
aspects whose expression should be static, they can be specified
on the same entities, with the same rule of inheritance from
overridden to overriding primitives for tagged types.
There is no impact on code generation.
gcc/ada/
* aspects.ads: Add aspect Side_Effects.
* contracts.adb (Add_Pre_Post_Condition)
(Inherit_Subprogram_Contract): Add support for new contract.
* contracts.ads: Update comments.
* einfo-utils.adb (Get_Pragma): Add support.
* einfo-utils.ads (Prag): Update comment.
* errout.ads: Add explain codes.
* par-prag.adb (Prag): Add support.
* sem_ch13.adb (Analyze_Aspect_Specifications)
(Check_Aspect_At_Freeze_Point): Add support.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper)
(Analyze_Subprogram_Declaration): Call new analysis procedure to
check SPARK legality rules.
(Analyze_SPARK_Subprogram_Specification): New procedure to check
SPARK legality rules. Use an explain code for the error.
(Analyze_Subprogram_Specification): Move checks to new subprogram.
This code was effectively dead, as the kind for parameters was set
to E_Void at this point to detect early references.
* sem_ch6.ads (Analyze_Subprogram_Specification): Add new
procedure.
* sem_prag.adb (Analyze_Depends_In_Decl_Part)
(Analyze_Global_In_Decl_Part): Adapt legality check to apply only
to functions without side-effects.
(Analyze_If_Present): Extract functionality in new procedure
Analyze_If_Present_Internal.
(Analyze_If_Present_Internal): New procedure to analyze given
pragma kind.
(Analyze_Pragmas_If_Present): New procedure to analyze given
pragma kind associated with a declaration.
(Analyze_Pragma): Adapt support for Always_Terminates and
Exceptional_Cases. Add support for Side_Effects. Make sure to call
Analyze_If_Present to ensure pragma Side_Effects is analyzed prior
to analyzing pragmas Global and Depends. Use explain codes for the
errors.
* sem_prag.ads (Analyze_Pragmas_If_Present): Add new procedure.
* sem_util.adb (Is_Function_With_Side_Effects): New query function
to determine if a function is a function with side-effects.
* sem_util.ads (Is_Function_With_Side_Effects): Same.
* snames.ads-tmpl: Declare new names for pragma and aspect.
* doc/gnat_rm/implementation_defined_aspects.rst: Document new aspect.
* doc/gnat_rm/implementation_defined_pragmas.rst: Document new pragma.
* gnat_rm.texi: Regenerate.
Sheri Bernstein [Wed, 9 Aug 2023 16:04:31 +0000 (16:04 +0000)]
ada: Refactor code to remove GNATcheck violation
Rewrite for loop containing an exit (which violates GNATcheck
rule Exits_From_Conditional_Loops), to use a while loop
which contains the exit criteria in its condition.
Also, move special case of first time through loop, to come
before loop.
Patrick Bernardi [Fri, 29 Sep 2023 21:01:56 +0000 (17:01 -0400)]
ada: Document gnatbind -Q switch
Add documentation for the -Q gnatbind switch in GNAT User's Guide and
improve gnatbind's help output for the switch to emphasize that it adds the
requested number of stacks to the secondary stack pool generated by the
binder.
gcc/ada/
* bindusg.adb (Display): Make it clear -Q adds to the number of
secondary stacks generated by the binder.
* doc/gnat_ugn/building_executable_programs_with_gnat.rst:
Document the -Q gnatbind switch and fix references to old
runtimes.
* gnat-style.texi: Regenerate.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
Lewis Hyatt [Wed, 18 Oct 2023 16:37:08 +0000 (12:37 -0400)]
c++: Make -Wunknown-pragmas controllable by #pragma GCC diagnostic [PR89038]
As noted on the PR, commit r13-1544, the fix for PR53431, did not handle
the specific case of -Wunknown-pragmas, because that warning is issued
during preprocessing, but not by libcpp directly (it comes from the
cb_def_pragma callback). Address that by handling this pragma in
addition to libcpp pragmas during the early pragma handler.
gcc/c-family/ChangeLog:
PR c++/89038
* c-pragma.cc (handle_pragma_diagnostic_impl): Handle
-Wunknown-pragmas during early processing.
gcc/testsuite/ChangeLog:
PR c++/89038
* c-c++-common/cpp/Wunknown-pragmas-1.c: New test.
Tamar Christina [Thu, 19 Oct 2023 12:44:01 +0000 (13:44 +0100)]
middle-end: don't create LC-SSA PHI variables for PHI nodes who dominate loop
As the testcase shows, when a PHI node dominates the loop there is no new
definition inside the loop. As such there would be no PHI nodes to update.
When we maintain LCSSA form we create an intermediate node in between the two
loops to thread alongt the value. However later on when we update the second
loop we don't have any PHI nodes to update and so adjust_phi_and_debug_stmts
does nothing. This leaves us with an incorrect phi node. Normally this does
nothing and just gets ignored. But in the case of the vUSE chain we end up
corrupting the chain.
As such whenever a PHI node's argument dominates the loop, we should remove
the newly created PHI node after edge redirection.
The one exception to this is when the loop has been versioned. In such cases
the versioned loop may not use the value but the second loop can.
When this happens and we add the loop guard unless the join block has the PHI
it can't find the original value for use inside the guard block.
The next refactoring in the series moves the formation of the guard block
inside peeling itself. Here we have all the information and wouldn't
need to re-create it later.
Richard Biener [Thu, 19 Oct 2023 08:33:01 +0000 (10:33 +0200)]
tree-optimization/111131 - SLP for non-IFN gathers
The following implements SLP vectorization support for gathers
without relying on IFNs being pattern detected (and supported by
the target). That includes support for emulated gathers but also
the legacy x86 builtin path.
PR tree-optimization/111131
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
sure to update all gather/scatter stmt DRs, not only those
that eventually got VMAT_GATHER_SCATTER set.
* tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
(vect_get_and_check_slp_defs): Handle gathers/scatters,
adding the offset as SLP operand and comparing base and scale.
(vect_build_slp_tree_1): Handle gathers.
(vect_build_slp_tree_2): Likewise.
* gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
everywhere.
* gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
Massage the scale case to more reliably produce a different
one. Scan for the specific messages.
* gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
for AVX2, but not emulated.
* gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
Massage to more properly ensure this.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
everywhere.
Richard Biener [Wed, 18 Oct 2023 12:39:21 +0000 (14:39 +0200)]
Refactor x86 vectorized gather path
The following moves the builtin decl gather vectorization path along
the internal function and emulated gather vectorization paths,
simplifying the existing function down to generating the call and
required conversions to the actual argument types. This thereby
exposes the unique support of two times larger number of offset
or data vector lanes. It also makes the code path handle SLP
in principle (but SLP build needs adjustments for this, patch coming).
* tree-vect-stmts.cc (vect_build_gather_load_calls): Rename
to ...
(vect_build_one_gather_load_call): ... this. Refactor,
inline widening/narrowing support ...
(vectorizable_load): ... here, do gather vectorization
with builtin decls along other gather vectorization.
This patch generalises the TFmode load/store pair patterns to TImode and
TDmode. This brings them in line with the DXmode patterns, and uses the
same technique with separate mode iterators (TX and TX2) to allow for
distinct modes in each arm of the load/store pair.
For example, in combination with the post-RA load/store pair fusion pass
in the following patch, this improves the codegen for the following
varargs testcase involving TImode stores:
Note that this patch isn't neeed if we only use the mode
canonicalization approach in the new ldp fusion pass (since we
canonicalize T{I,F,D}mode to V16QImode), but we seem to get slightly
better performance with mode canonicalization disabled (see
--param=aarch64-ldp-canonicalize-modes in the following patch).
gcc/ChangeLog:
* config/aarch64/aarch64.md (load_pair_dw_tftf): Rename to ...
(load_pair_dw_<TX:mode><TX2:mode>): ... this.
(store_pair_dw_tftf): Rename to ...
(store_pair_dw_<TX:mode><TX2:mode>): ... this.
* config/aarch64/iterators.md (TX2): New.
Alex Coplan [Wed, 11 Oct 2023 15:57:32 +0000 (15:57 +0000)]
aarch64, testsuite: Fix up pr71727.c
The test is trying to check that we don't use q-register stores with
-mstrict-align, so actually check specifically for that.
This is a prerequisite to avoid regressing:
scan-assembler-not "add\tx0, x0, :"
with the upcoming ldp fusion pass, as we change where the ldps are
formed such that a register is used rather than a symbolic (lo_sum)
address for the first load.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr71727.c: Adjust scan-assembler-not to
make sure we don't have q-register stores with -mstrict-align.
i.e. we now form an stp that we were missing previously. This patch
adjusts the scan-assembler such that it should pass whether or not
we form the stp.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pcs/args_9.c: Adjust scan-assemblers to
allow for stp.
Alex Coplan [Wed, 4 Oct 2023 12:32:36 +0000 (13:32 +0100)]
aarch64, testsuite: Prevent stp in lr_free_1.c
The test is looking for individual stores which are able to be merged
into stp instructions. The test currently passes -fno-schedule-fusion
-fno-peephole2, presumably to prevent these stores from being turned
into stps, but this is no longer sufficient with the new ldp/stp fusion
pass.
As such, we add --param=aarch64-stp-policy=never to prevent stps being
formed.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/lr_free_1.c: Add
--param=aarch64-stp-policy=never to dg-options.