Jonathan Wakely [Wed, 15 Nov 2023 23:02:34 +0000 (23:02 +0000)]
libstdc++: Implement std::out_ptr and std::inout_ptr for C++23 [PR111667]
This implements that changes from P1132R8, including optimized paths for
std::shared_ptr and std::unique_ptr.
For std::shared_ptr we pre-allocate a new control block in the
std::out_ptr_t constructor so that the destructor is non-throwing. This
requires some care because unlike the shared_ptr(Y*, D, A) constructor,
we don't want to invoke the deleter if allocating the control block
throws, because we don't own any pointer yet. In order to avoid the
unwanted deleter invocation, we create the control block manually. We
also want to avoid invoking the deleter on a null pointer on
destruction, so we destroy the control block manually if there is no
pointer to take ownership of.
For std::unique_ptr and for raw pointers, the out_ptr_t object hands out
direct access to the pointer, so that we don't have anything to do
(except possibly assign a new deleter) in the ~out_ptr_t destructor.
These optimizations avoid requiring additional temporary storage for the
pointer (and optional arguments), and avoid additional instructions to
copy that pointer into the smart pointer at the end.
libstdc++-v3/ChangeLog:
PR libstdc++/111667
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/out_ptr.h: New file.
* include/bits/shared_ptr.h (__is_shared_ptr): Move definition
to here ...
* include/bits/shared_ptr_atomic.h (__is_shared_ptr): ... from
here.
* include/bits/shared_ptr_base.h (__shared_count): Declare
out_ptr_t as a friend.
(_Sp_counted_deleter, __shared_ptr): Likewise.
* include/bits/unique_ptr.h (unique_ptr, unique_ptr<T[], D>):
Declare out_ptr_t and inout_ptr_t as friends.
(__is_unique_ptr): Define new variable template.
* include/bits/version.def (out_ptr): Define.
* include/bits/version.h: Regenerate.
* include/std/memory: Include new header.
* testsuite/20_util/smartptr.adapt/inout_ptr/1.cc: New test.
* testsuite/20_util/smartptr.adapt/inout_ptr/2.cc: New test.
* testsuite/20_util/smartptr.adapt/inout_ptr/shared_ptr_neg.cc:
New test.
* testsuite/20_util/smartptr.adapt/inout_ptr/void_ptr.cc: New
test.
* testsuite/20_util/smartptr.adapt/out_ptr/1.cc: New test.
* testsuite/20_util/smartptr.adapt/out_ptr/2.cc: New test.
* testsuite/20_util/smartptr.adapt/out_ptr/shared_ptr_neg.cc:
New test.
* testsuite/20_util/smartptr.adapt/out_ptr/void_ptr.cc: New
test.
Jonathan Wakely [Tue, 19 Sep 2023 16:46:32 +0000 (17:46 +0100)]
libstdc++: Only declare feature test macros in standard headers
This change moves the definitions of feature test macros (or strictly
speaking, the requests for <bits/version.h> to define them) so that only
standard headers define them. For example, <bits/shared_ptr.h> will no
longer define macros related to std::shared_ptr, only <memory> and
<version> will define them. This means that __cpp_lib_shared_ptr_arrays
will not be defined by <future> or by other headers that include
<bits/shared_ptr.h>. It will only be defined when <memory> has been
included. This will discourage users from relying on transitive
includes.
As a result, internal headers that need to query the macros should use
the internal macros like __glibcxx_shared_ptr_arrays instead of
__cpp_lib_shared_ptr_arrays, as those internal macros are defined by the
internal headers after icluding <bits/version.h>. There are some
exceptions to this rule, because __cpp_lib_is_constant_evaluated is
defined by bits/c++config.h and so is available everywhere, and
__cpp_lib_three_way_comparison is defined by <compare> which several
headers are explicitly specified to include, so its macro is guaranteed
to be usable too.
N.B. not many internal headers actually need an explicit include of
<bits/version.h>, because most of them include <type_traits> and so get
all the __glibcxx_foo internal macros from there.
Jonathan Wakely [Tue, 19 Sep 2023 16:46:32 +0000 (17:46 +0100)]
libstdc++: Test for feature test macros more accurately
Tests which check for feature test macros should use the no_pch option,
so that we're really testing for the definition being in the intended
header, and not just testing that it's present in <bits/stdc++.h> (which
includes all the standard headers and so defines all the macros).
Jonathan Wakely [Tue, 14 Nov 2023 19:22:47 +0000 (19:22 +0000)]
libstdc++: Use 202100L as feature test check for C++23
I noticed that our C++23 features were not being defined when using
Clang 16 with -std=c++2b, because it only defines __cplusplus=202101L
but <bits/version.h> uses 202302L since my r14-3252-g0c316669b092fb
change.
This changes <bits/version.h> to use 202100 instead of the final 202302
value so that we support Clang 16's -std=c++2b mode.
libstdc++-v3/ChangeLog:
* include/bits/version.def (stds): Use >= 202100 for C++23
condition.
* include/bits/version.h: Regenerate.
* include/std/thread: Use > C++20 instead of >= C++23 for
__cplusplus condition.
Jonathan Wakely [Tue, 14 Nov 2023 15:48:03 +0000 (15:48 +0000)]
libstdc++: Adjust feature test in <istream> and <ostream>
We don't need any library concepts to define the constraints for rvalue
stream overloads, only compiler support. So change the test from using
__cpp_lib_concepts to __cpp_concepts >= 201907L.
libstdc++-v3/ChangeLog:
* include/std/istream (__rvalue_stream_extraction_t): Test
__cpp_concepts instead of __cpp_lib_concepts.
* include/std/ostream (__derived_from_ios_base): Likewise.
(__rvalue_stream_insertion_t): Likewise.
The following testcase is miscompiled on x86_64 since PR110551 r14-4968
commit. That commit added 2 peephole2s, one for
mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY
which I believe is ok, and another one for
mov imm,%rXX; mov %rYY,%rdx; mulx %rXX, %rZZ, %rWW -> mov imm,%rdx; mulx %rYY, %rZZ, %rWW
which is wrong. Both peephole2s verify that %rXX above is dead at
the end of the pattern, by checking if %rXX is either one of the
registers overwritten in the multiplication (%rdx:%rax in the first
case, the 2 destination registers of mulx in the latter case), because
we no longer set %rXX to that immediate (we set %rax resp. %rdx to it
instead) when the peephole2 replaces it. But, we also need to ensure
that the other register previously set to the value of %rYY and newly
to imm isn't used after the multiplication, and neither of the peephole2s
does that. Now, for the first one (at least assuming in the % pattern
the matching operand (i.e. hardcoded %rax resp. %rdx) after RA will always go
first) I think it is always the case, because operands[2] if it must be %rax
register will be overwritten by mulq writing to %rdx:%rax. But in the
second case, there is no reason why %rdx couldn't be used after the pattern,
and if it is (like in the testcase), we can't make those changes.
So, the patch checks similarly to operands[0] that operands[2] (which ought
to be %rdx if RA puts the % match_dup operand first and nothing swaps it
afterwards) is either the same register as one of the destination registers
of mulx or dies at the end of the multiplication.
2023-11-16 Jakub Jelinek <jakub@redhat.com>
PR target/112526
* config/i386/i386.md
(mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi):
Verify in define_peephole2 that operands[2] dies or is overwritten
at the end of multiplication.
Jakub Jelinek [Thu, 16 Nov 2023 07:32:24 +0000 (08:32 +0100)]
slp: Fix handling of IFN_CLZ/CTZ [PR112536]
We ICE on the following testcase now that IFN_C[LT]Z calls can have one or
two arguments (where 2 mean it is well defined at zero).
The following patch makes us create child node only for the first argument
and compatible_calls_p ensures the other argument is the same, which
at least according to the testcase seems sufficient because of vect
patterns.
2023-11-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112536
* tree-vect-slp.cc (arg0_map): New variable.
(vect_get_operand_map): For IFN_CLZ or IFN_CTZ, return arg0_map.
Juzhe-Zhong [Thu, 16 Nov 2023 02:58:16 +0000 (10:58 +0800)]
VECT: Clear LOOP_VINFO_USING_SELECT_VL_P when loop is not partial vectorized
This patch fixes ICE:
https://godbolt.org/z/z8T6o6qov
<source>: In function 'b':
<source>:2:6: error: missing definition
2 | void b() {
| ^
for SSA_NAME: loop_len_8 in statement:
_1 = -loop_len_8;
during GIMPLE pass: vect
<source>:2:6: internal compiler error: verify_ssa failed
0x7f1b56331082 __libc_start_main
???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
Compiler returned: 1
The root cause is we generate such IR in vectorization:
The IR _18 = vect_vec_iv_.6_14 + vect_cst__11; is generated because of we are adding induction variable with
the result of SELECT_VL instead of VF.
The code is:
else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
{
/* When we're using loop_len produced by SELEC_VL, the non-final
iterations are not always processing VF elements. So vectorize
induction variable instead of
LOOP_VINFO_USING_SELECT_VL_P is set before loop vectorization analysis so we don't know whether it is partial
vectorization or not but the induction variable depends on SELECT_VL_P is true.
So update SELECT_VL_P as false when it is not partial vectorization.
PR middle-end/112554
gcc/ChangeLog:
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Clear SELECT_VL_P for non-partial vectorization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112554.c: New test.
Tom Tromey [Wed, 15 Nov 2023 05:27:52 +0000 (22:27 -0700)]
Fix crash in libcc1
The gdb tests of the libcc1 plugin have been failing lately. I
tracked this down to a crash trying to access an enum's underlying
type. This patch fixes the crash by setting this type.
* libcc1plugin.cc (plugin_build_enum_type): Set
ENUM_UNDERLYING_TYPE.
Marek Polacek [Thu, 9 Nov 2023 17:25:25 +0000 (12:25 -0500)]
c++: fix parsing with auto(x) [PR112410]
Here we are wrongly parsing
int y(auto(42));
which uses the C++23 cast-to-prvalue feature, and initializes y to 42.
However, we were treating the auto as an implicit template parameter.
Fixing the auto{42} case is easy, but when auto is followed by a (,
I found the fix to be much more involved. For instance, we cannot
use cp_parser_expression, because that can give hard errors. It's
also necessary to disambiguate 'auto(i)' as 'auto i', not a cast.
auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc.
are all function declarations.
This patch rectifies that by undoing the implicit function template
modification. In the test above, we should notice that the parameter
list is ill-formed, and since we've synthesized an implicit template
parameter, we undo it by calling abort_fully_implicit_template. Then,
we'll parse the "(auto(42))" as an initializer.
PR c++/112410
gcc/cp/ChangeLog:
* parser.cc (cp_parser_direct_declarator): Maybe call
abort_fully_implicit_template if it turned out the parameter list was
ill-formed.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-fncast13.C: New test.
* g++.dg/cpp23/auto-fncast14.C: New test.
Hongyu Wang [Thu, 9 Nov 2023 05:11:41 +0000 (13:11 +0800)]
[i386] APX: Fix EGPR usage in several patterns.
For vextract/insert{if}128 they cannot adopt EGPR in their memory operand, all
related pattern should be adjusted to disable EGPR usage on them.
Also fix a wrong gpr16 attr for insertps.
gcc/ChangeLog:
* config/i386/sse.md (vec_extract_hi_<mode>): Add noavx512vl
alternative with attr addr gpr16 and "jm" constraint.
(vec_extract_hi_<mode>): Likewise for SF vector modes.
(@vec_extract_hi_<mode>): Likewise.
(*vec_extractv2ti): Likewise.
(vec_set_hi_<mode><mask_name>): Likewise.
* config/i386/mmx.md (@sse4_1_insertps_<mode>): Correct gpr16 attr for
each alternative.
Patch introduces strict_low_part QImode insn patterns with both of
their input arguments extracted from high register. This invalid
insn is split after reload to a lowpart insert from the high register
and <insn>qi_ext<mode>_1_slp instruction.
PR target/78904
gcc/ChangeLog:
* config/i386/i386.md (*movstrictqi_ext<mode>_1): New insn pattern.
(*addqi_ext<mode>_2_slp): New define_insn_and_split pattern.
(*subqi_ext<mode>_2_slp): Ditto.
(*<any_logic:code>qi_ext<mode>_2_slp): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr78904-8.c: New test.
* gcc.target/i386/pr78904-8a.c: New test.
* gcc.target/i386/pr78904-8b.c: New test.
* gcc.target/i386/pr78904-9.c: New test.
* gcc.target/i386/pr78904-9a.c: New test.
* gcc.target/i386/pr78904-9b.c: New test.
Mark Wielaard [Wed, 15 Nov 2023 19:27:08 +0000 (20:27 +0100)]
Regenerate libiberty/aclocal.m4 with aclocal 1.15.1
There is a new buildbot check that all autotool files are generated
with the correct versions (automake 1.15.1 and autoconf 2.69).
https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen
Correct one file that was generated with the wrong version.
Patrick O'Neill [Tue, 14 Nov 2023 23:08:31 +0000 (15:08 -0800)]
RISC-V: Fix ICE in non-canonical march parsing
Passing in a base extension in non-canonical order (i, e, g) causes GCC
to ICE:
xgcc: error: '-march=rv64ge': ISA string is not in canonical order. 'e'
xgcc: internal compiler error: in add, at common/config/riscv/riscv-common.cc:671
...
This is fixed by skipping to the next extension when a non-canonical
order is detected.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_std_ext): Emit an error and skip to
the next extension when a non-canonical ordering is detected.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-27.c: New test.
* gcc.target/riscv/arch-28.c: New test.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
r14-985-gca2007a9bb3074 used the collapsed macro definition
CAN_HAVE_LOCATION_P in gcc-rich-location.cc and r14-977-g8861c80733da5c
in c++'s build_cplus_array_type ().
However, although otherwise correct, the usage of CAN_HAVE_LOCATION_P
in these two spots is misleading, so this patch reverts aforementioned
two hunks.
gcc/cp/ChangeLog:
* tree.cc (build_cplus_array_type): Revert using the macro
CAN_HAVE_LOCATION_P.
gcc/ChangeLog:
* gcc-rich-location.cc (maybe_range_label_for_tree_type_mismatch::get_text):
Revert using the macro CAN_HAVE_LOCATION_P.
We should first insert earliest fusion since it is the vsetvls information
already there which was seen by later LCM. We just delay the insertion.
So it should be come before the LCM suggested insertion.
PR target/112447
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Insert
local vsetvl info before LCM suggested one.
Vineet Gupta [Wed, 25 Oct 2023 03:38:49 +0000 (20:38 -0700)]
RISC-V: elide unnecessary sign extend when expanding cmp_and_jump
RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming function args
which ABI/ISA guarantee to be sign-extended already (this is true for
SI, HI, QI operands)
And subsequently REE fails to eliminate them as
"missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.
So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
potentially can also help there.
Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.
Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.
Patrick Palka [Wed, 15 Nov 2023 17:24:38 +0000 (12:24 -0500)]
c++: direct enum init from type-dep elt [PR112515]
The NON_DEPENDENT_EXPR removal exposed that is_direct_enum_init can be
called in a template context on a CONSTRUCTOR that isn't type-dependent
but whose element is.
PR c++/112515
gcc/cp/ChangeLog:
* decl.cc (is_direct_enum_init): Check type-dependence of the
single element.
Patrick Palka [Wed, 15 Nov 2023 17:17:55 +0000 (12:17 -0500)]
c++: partially inst requires-expr in noexcept-spec [PR101043]
Here we're ICEing from strip_typedefs for the partially instantiated
requires-expression when walking its REQUIRES_EXPR_EXTRA_ARGS which
in this case is a TREE_LIST with non-empty TREE_PURPOSE (to hold the
captured local specialization 't' as per build_extra_args) which
strip_typedefs doesn't expect.
We can probably skip walking REQUIRES_EXPR_EXTRA_ARGS at all since it
shouldn't contain any typedefs in the first place, but it seems safer
and more generally useful to just teach strip_typedefs to handle non-empty
TREE_PURPOSE the obvious way. (The code asserts TREE_PURPOSE was empty
even since since its inception i.e. r189298.)
Patrick Palka [Wed, 15 Nov 2023 17:10:16 +0000 (12:10 -0500)]
c++: non-dependent .* operand folding [PR112427]
Here when building up the non-dependent .* expression, we crash from
fold_convert on 'b.a' due to this (templated) COMPONENT_REF having an
IDENTIFIER_NODE instead of FIELD_DECL operand that middle-end routines
expect. Like in r14-4899-gd80a26cca02587, this patch fixes this by
replacing the problematic piecemeal folding with a single call to
cp_fully_fold. Also, don't bother building the POINTER_PLUS_EXPR in a
template context. This means the returned non-dependent tree might not
have TREE_SIDE_EFFECTS set when it used to, so we need to compensate
by making build_min_non_dep propagate TREE_SIDE_EFFECTS from the original
arguments like buildN and build_min do.
PR c++/112427
gcc/cp/ChangeLog:
* tree.cc (build_min_non_dep): Propagate TREE_SIDE_EFFECTS from
the original arguments.
(build_min_non_dep_call_vec): Likewise.
* typeck2.cc (build_m_component_ref): Use cp_convert, build2 and
cp_fully_fold instead of fold_build_pointer_plus and fold_convert.
Don't build the POINTER_PLUS_EXPR in a template context.
Patrick Palka [Wed, 15 Nov 2023 17:03:16 +0000 (12:03 -0500)]
c++: constantness of local var in constexpr fn [PR111703, PR112269]
potential_constant_expression was incorrectly treating most local
variables from a constexpr function as constant because it wasn't
considering the 'now' parameter. This patch fixes this by relaxing
its var_in_maybe_constexpr_fn checks accordingly, which turns out to
partially fix two recently reported regressions:
PR111703 is a regression caused by r11-550-gf65a3299a521a4 for restricting
constexpr evaluation during warning-dependent folding. The mechanism is
intended to restrict only constant evaluation of the instantiated
non-dependent expression, but it also ends up restricting constant
evaluation occurring during instantiation of the expression, in particular
when instantiating the converted argument 'x' (a VIEW_CONVERT_EXPR) into
a copy constructor call. This seems like a flaw in the mechanism, though
I don't know if we want to fix the mechanism or get rid of it completely
since the original testcases which motivated the mechanism are fixed more
simply by r13-1225-gb00b95198e6720. In any case, this patch partially
fixes this by making us correctly treat 'x' as non-constant which prevents
the problematic warning-dependent folding from occurring at all.
PR112269 is caused by r14-4796-g3e3d73ed5e85e7 for merging tsubst_copy
into tsubst_copy_and_build. tsubst_copy used to exit early when 'args'
was empty, behavior which that commit deliberately didn't preserve.
This early exit masked the fact that COMPLEX_EXPR wasn't handled by
tsubst at all, and is a tree code that apparently we could see during
warning-dependent folding on some targets. A complete fix is to add
handling for this tree code in tsubst_expr, but this patch should fix
the reported testsuite failures since the COMPLEX_EXPRs that crop up
in <complex> are considered non-constant expressions after this patch.
PR c++/111703
PR c++/112269
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1) <case VAR_DECL>:
Only consider var_in_maybe_constexpr_fn if 'now' is false.
<case INDIRECT_REF>: Likewise.
Roger Sayle [Wed, 7 Jun 2023 23:09:00 +0000 (00:09 +0100)]
Update nvptx's bitrev<mode>2 pattern to use BITREVERSE rtx.
This minor tweak to the nvptx backend switches the representation of
of the brev instruction from an UNSPEC to instead use the new BITREVERSE
rtx. This allows various RTL optimizations including evaluation (constant
folding) of integer constant arguments at compile-time.
Thomas Schwinge [Mon, 4 Sep 2023 21:06:27 +0000 (23:06 +0200)]
nvptx: Extend 'brev' test cases
In order to observe effects of a later patch, extend the 'brev' test cases
added in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".
gcc/testsuite/
* gcc.target/nvptx/brev-1.c: Extend.
* gcc.target/nvptx/brev-2.c: Rename to...
* gcc.target/nvptx/brev-2-O2.c: ... this, and extend. Copy to...
* gcc.target/nvptx/brev-2-O0.c: ... this, and adapt for '-O0'.
* gcc.target/nvptx/brevll-1.c: Extend.
* gcc.target/nvptx/brevll-2.c: Rename to...
* gcc.target/nvptx/brevll-2-O2.c: ... this, and extend. Copy to...
* gcc.target/nvptx/brevll-2-O0.c: ... this, and adapt for '-O0'.
Andrew Stubbs [Tue, 3 Oct 2023 13:03:49 +0000 (14:03 +0100)]
amdgcn: Add Accelerator VGPR registers
Add the new CDNA register file. We don't support any of the specialized
instructions that use these registers, but they're useful to relieve
register pressure without spilling to stack.
Co-authored-by: Andrew Jenner <andrew@codesourcery.com>
gcc/ChangeLog:
* gcc.target/gcn/avgpr-mem-double.c: New test.
* gcc.target/gcn/avgpr-mem-int.c: New test.
* gcc.target/gcn/avgpr-mem-long.c: New test.
* gcc.target/gcn/avgpr-mem-short.c: New test.
* gcc.target/gcn/avgpr-spill-double.c: New test.
* gcc.target/gcn/avgpr-spill-int.c: New test.
* gcc.target/gcn/avgpr-spill-long.c: New test.
* gcc.target/gcn/avgpr-spill-short.c: New test.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (max_isa_vgprs): New.
(run_kernel): CDNA2 devices have more VGPRs.
Andrew Stubbs [Fri, 6 Oct 2023 10:14:05 +0000 (11:14 +0100)]
amdgcn: simplify secondary reload patterns
Remove some unnecessary complexity; no functional change is intended,
although LRA appears to use the constraints from the reload_in/out
patterns, so it's probably an improvement for it to see the real sgprbase
constraints.
Richard Biener [Wed, 15 Nov 2023 11:24:46 +0000 (12:24 +0100)]
tree-optimization/112282 - wrong-code with ifcvt hoisting
The following avoids hoisting of invariants from conditionally
executed parts of an if-converted loop. That now makes a difference
since we perform bitfield lowering even when we do not actually
if-convert the loop. if-conversion deals with resetting flow-sensitive
info when necessary already.
PR tree-optimization/112282
* tree-if-conv.cc (ifcvt_hoist_invariants): Only hoist from
the loop header.
So that we don't have to bump libubsan.so.1 SONAME, the following patch
reverts part of the changes which removed two handlers. While we don't
actually use them from GCC, we shouldn't remove supported entrypoints
unless SONAME is changed (removal of __interceptor_* or ___interceptor_*
is fine). This is the only removal, other libraries just added some
symbols.
2023-11-15 Jakub Jelinek <jakub@redhat.com>
* ubsan/ubsan_handlers_cxx.h (FunctionTypeMismatchData): Forward
declare.
(__ubsan_handle_function_type_mismatch_v1,
__ubsan_handle_function_type_mismatch_v1_abort): Declare.
* ubsan/ubsan_handlers_cxx.cpp (handleFunctionTypeMismatch,
__ubsan_handle_function_type_mismatch_v1,
__ubsan_handle_function_type_mismatch_v1_abort): New functions readded
for backwards compatibility from older ubsan.
* ubsan/ubsan_interface.inc (__ubsan_handle_function_type_mismatch_v1,
__ubsan_handle_function_type_mismatch_v1_abort): Readd.
Jakub Jelinek [Wed, 15 Nov 2023 11:48:20 +0000 (12:48 +0100)]
libsanitizer: Adjust the asan/sanity-check-pure-c-1.c test
The updated libasan doesn't print __interceptor_free (or __interceptor_malloc)
but free (or malloc), the following patch adjusts the testcase so that it
accepts it.
2023-11-15 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/asan/sanity-check-pure-c-1.c: Adjust for interceptor_
or wrap_ substrings possibly not being emitted in newer libasan.
The following patch is result of libsanitizer/merge.sh
from c425db2eb558c263 (yesterday evening).
Bootstrapped/regtested on x86_64-linux and i686-linux (together with
the follow-up 3 patches I'm about to post).
BTW, seems upstream has added riscv64 support for I think lsan/tsan,
so if anyone is willing to try it there, it would be a matter of
copying e.g. the s390*-*-linux* libsanitizer/configure.tgt entry
to riscv64-*-linux* with the obvious s/s390x/riscv64/ change in it.
But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.
On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier. So it's always enough for acquire semantics. OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.
* config/loongarch/loongarch.cc
(loongarch_memmodel_needs_release_fence): Remove.
(loongarch_cas_failure_memorder_needs_acquire): New static
function.
(loongarch_print_operand): Redefine 'G' for the barrier on CAS
failure.
* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
Remove the redundant barrier before the LL instruction, and
emit an acquire barrier on failure if needed by
failure_memorder.
(atomic_cas_value_cmp_and_7_<mode>): Likewise.
(atomic_cas_value_add_7_<mode>): Remove the unnecessary barrier
before the LL instruction.
(atomic_cas_value_sub_7_<mode>): Likewise.
(atomic_cas_value_and_7_<mode>): Likewise.
(atomic_cas_value_xor_7_<mode>): Likewise.
(atomic_cas_value_or_7_<mode>): Likewise.
(atomic_cas_value_nand_7_<mode>): Likewise.
(atomic_cas_value_exchange_7_<mode>): Likewise.
The Xmethod for std::deque::operator[] has the same bug that I recently
fixed for the std::deque::size() Xmethod. The first node might have
unused capacity at the start, which needs to be accounted for when
indexing into the deque.
libstdc++-v3/ChangeLog:
PR libstdc++/112491
* python/libstdcxx/v6/xmethods.py (DequeWorkerBase.index):
Correctly handle unused capacity at the start of the first node.
* testsuite/libstdc++-xmethods/deque.cc: Check index operator
when elements have been removed from the front.
Jonathan Wakely [Wed, 15 Nov 2023 09:17:49 +0000 (09:17 +0000)]
libstdc++: std::stacktrace tweaks
Fix a typo in a string literal and make the new hash.cc test gracefully
handle missing stacktrace data (see PR 112541).
libstdc++-v3/ChangeLog:
* include/std/stacktrace (basic_stacktrace::at): Fix class name
in exception message.
* testsuite/19_diagnostics/stacktrace/hash.cc: Do not fail if
current() returns a non-empty stacktrace.
Richard Earnshaw [Wed, 15 Nov 2023 10:30:15 +0000 (10:30 +0000)]
arm: testsuite: fix test for armv6t2 hardware
My previous patch series added a new function to check for armv6t2
compatible hardware. But the test was not correctly implemented and
also did not follow the standard naming convention for Arm hw
compatibility tests. Fix both of these issues.
* config/riscv/riscv-v.cc (expand_vector_init_trailing_same_elem): New function.
(expand_vec_init): Add trailing optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: Add trailing tests.
* gcc.target/riscv/rvv/autovec/vls-vlmax/trailing-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/trailing-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/trailing_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/trailing_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/trailing-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/trailing-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/trailing-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/trailing-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/trailing-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/trailing-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/trailing-7.c: New test.
Pan Li [Wed, 15 Nov 2023 03:30:51 +0000 (11:30 +0800)]
RISC-V: Refine the mask generation for vec_init case 2
Update in v2:
1. Add more test cases for fixed-vlmax.
2, Add test cases for vls mode.
Original log:
We take vec_init element int mode when generate the mask for
case 2. But actually we don't need as many bits as the element.
The extra bigger mode may introduce some unnecessary insns.
For example as below code:
void __attribute__ ((noinline, noclone))
foo (int64_t *out, int64_t x, int64_t y)
{
v16di v = {y, x, y, x, y, x, y, x, y, x, y, x, y, x, y, x};
*(v16di *) out = v;
}
We will have VDImode when generate the 0b0101010101010101 mask but
actually VHImode is good enough here. This patch would like to
refine the mask generation to avoid:
1. Unnecessary scalar to generate big constant mask.
2. Unnecessary vector insn to v0 mask.
Before this patch:
foo:
li a5,-1431654400
li a4,-1431654400 <== unnecessary insn
addi a5,a5,-1365 <== unnecessary insn
addi a4,a4,-1366
slli a5,a5,32 <== unnecessary insn
add a5,a5,a4 <== unnecessary insn
vsetivli zero,16,e64,m8,ta,ma
vmv.v.x v8,a2
vmv.s.x v16,a5
vmv1r.v v0,v16 <== unnecessary insn
vmerge.vxm v8,v8,a1,v0
vse64.v v8,0(a0)
ret
After this patch:
foo:
li a5,-20480
addiw a5,a5,-1366
vsetivli zero,16,e64,m8,ta,ma
vmv.s.x v0,a5
vmv.v.x v8,a2
vmerge.vxm v8,v8,a1,v0
vs8r.v v8,0(a0)
ret
gcc/ChangeLog:
* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
Add inner_mode mask arg for mask int mode.
(get_repeating_sequence_dup_machine_mode): Add mask_bit_mode arg
to get the good enough vector int mode on precision.
(expand_vector_init_merge_repeating_sequence): Pass required args
to above func.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-15.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-9.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-repeat-sequence-8.c: New test.
Jakub Jelinek [Wed, 15 Nov 2023 07:27:07 +0000 (08:27 +0100)]
c++: Implement C++26 P2864R2 - Remove Deprecated Arithmetic Conversion on Enumerations From C++26
The following patch implements C++26 P2864R2 by emitting pedwarn enabled by
the same options as the C++20 and later warnings (i.e. -Wenum-compare,
-Wdeprecated-enum-enum-conversion and -Wdeprecated-enum-float-conversion
which are all enabled by default). I think we still want to allow users
some option workaround, so am not using directly error. Additionally, for
cxx_dialect >= cxx26 && (complain & tf_warning_or_error) == 0 it causes for
these newly ill-formed constructs error_mark_node to be silently returned.
2023-11-15 Jakub Jelinek <jakub@redhat.com>
gcc/cp/
* typeck.cc: Implement C++26 P2864R2 - Remove Deprecated Arithmetic
Conversion on Enumerations From C++26.
(do_warn_enum_conversions): Return bool rather than void, add COMPLAIN
argument. Use pedwarn rather than warning_at for C++26 and remove
" is deprecated" part of the diagnostics in that case. For SFINAE
in C++26 return true on newly erroneous cases.
(cp_build_binary_op): For C++26 call do_warn_enum_conversions
unconditionally, pass complain argument to it and if it returns true,
return error_mark_node.
* call.cc (build_conditional_expr): Use pedwarn rather than warning_at
for C++26 and remove " is deprecated" part of the diagnostics in that
case and check for complain & tf_warning_or_error. Use emit_diagnostic
with cxx_dialect >= cxx26 ? DK_PEDWARN : DK_WARNING. For SFINAE in
C++26 return error_mark_node on newly erroneous cases.
(build_new_op): Use emit_diagnostic with cxx_dialect >= cxx26
? DK_PEDWARN : DK_WARNING and complain & tf_warning_or_error check
for C++26. For SFINAE in C++26 return error_mark_node on newly
erroneous cases.
gcc/testsuite/
* g++.dg/cpp26/enum-conv1.C: New test.
* g++.dg/cpp2a/enum-conv1.C: Adjust expected diagnostics in C++26.
* g++.dg/diagnostic/enum3.C: Likewise.
* g++.dg/parse/attr3.C: Likewise.
* g++.dg/cpp0x/linkage2.C: Likewise.
Alexandre Oliva [Wed, 15 Nov 2023 01:16:29 +0000 (22:16 -0300)]
testsuite: tsan: add fallback overload for pthread_cond_clockwait
LTS GNU/Linux distros from 2018, still in use, don't have
pthread_cond_clockwait. There's no trivial way to detect it so as to
make the test conditional, but there's an easy enough way to silence
the fail due to lack of the function in libc, and that has nothing to
do with the false positive that this is testing against.
gcc.target/i386/pr95126-m32-[34].c expect push instructions that are
only present with -mno-accumulate-outgoing-args, so make that option
explicit rather than dependent on tuning.
Alexandre Oliva [Wed, 15 Nov 2023 01:15:29 +0000 (22:15 -0300)]
libstdc++: bvector: undef always_inline macro
It's customary to undefine temporary internal macros at the end of the
header that defines them, even such widely-usable ones as
_GLIBCXX_ALWAYS_INLINE, so do so in the header where the define was
recently introduced.
Lewis Hyatt [Fri, 10 Nov 2023 16:10:18 +0000 (11:10 -0500)]
c-family: Let libcpp know when the compilation is for a PCH [PR9471]
libcpp will generate diagnostics when it encounters things in the main file
that only belong in a header file, such as `#pragma once' or `#pragma GCC
system_header'. But sometimes the main file is a header file that is just
being compiled separately, e.g. to produce a C++ module or a PCH, in which
case such diagnostics should be suppressed. libcpp already has an interface
to request that, so make use of it in the C frontends to prevent libcpp from
issuing unwanted diagnostics when compiling a PCH.
gcc/c-family/ChangeLog:
PR pch/9471
PR pch/47857
* c-opts.cc (c_common_post_options): Set cpp_opts->main_search
so libcpp knows it is compiling a header file separately.
gcc/testsuite/ChangeLog:
PR pch/9471
PR pch/47857
* g++.dg/pch/main-file-warnings.C: New test.
* g++.dg/pch/main-file-warnings.Hs: New test.
* gcc.dg/pch/main-file-warnings.c: New test.
* gcc.dg/pch/main-file-warnings.hs: New test.
The current implementation calls __detail::__modulo which is relatively
expensive.
A better implementation is possible if we assume that x.ok() && y.ok() == true,
so that n = x.c_encoding() - y.c_encoding() is in [-6, 6]. In this case, it
suffices to return n >= 0 ? n : n + 7.
The above is allowed by [time.cal.wd.nonmembers]/5: the returned value is
unspecified when x.ok() || y.ok() == false.
The assembly emitted for x86-64 and ARM can be seen in:
https://godbolt.org/z/nMdc5vv9n.
Cassio Neri [Sat, 11 Nov 2023 22:59:50 +0000 (22:59 +0000)]
libstdc++: Simplify year::is_leap()
The current implementation returns
(_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
where __is_multiple_of_100 is calculated using an obfuscated algorithm which
saves one ror instruction when compared to _M_y % 100 == 0 [1].
In leap years calculation, it's correct to replace the divisibility check by
100 with the one by 25. It turns out that _M_y % 25 == 0 also saves the ror
instruction [2]. Therefore, the obfuscation is not required.
Cassio Neri [Sat, 11 Nov 2023 16:44:58 +0000 (16:44 +0000)]
libstdc++: Remove unnecessary "& 1" from year_month_day_last::day()
When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.
Jonathan Wakely [Tue, 14 Nov 2023 10:56:57 +0000 (10:56 +0000)]
libstdc++: Fix <charconv> uses of signed types with <bit> functions
In <charconv> we pass the int __base parameter to our internal versions
of <bit> functions, __bit_width and __countr_zero. Those functions are
only defined for unsigned types, so we need to convert the base to
unsigned. The base must be in the range [2,36] so we can mask off the
low bits and then convert that to unsigned, so that we don't need to
care about negative values becoming large unsigned values.
libstdc++-v3/ChangeLog:
* include/std/charconv (__from_chars_pow2_base): Convert base to
unsigned for call to __countr_zero.
(__from_chars_alnum): Likewise for call to __bit_width.
David Malcolm [Tue, 14 Nov 2023 20:51:52 +0000 (15:51 -0500)]
analyzer: enable taint state machine by default [PR103533]
gcc/analyzer/ChangeLog:
PR analyzer/103533
* sm-taint.cc: Remove "experimental" from comment.
* sm.cc (make_checkers): Always add taint state machine.
gcc/ChangeLog:
PR analyzer/103533
* doc/invoke.texi (Static Analyzer Options): Add the six
-Wanalyzer-tainted-* warnings. Update documentation of each
warning to reflect removed requirement to use
-fanalyzer-checker=taint. Remove discussion of
-fanalyzer-checker=taint.
gcc/testsuite/ChangeLog:
PR analyzer/103533
* c-c++-common/analyzer/attr-tainted_args-1.c: Remove use of
-fanalyzer-checker=taint.
* c-c++-common/analyzer/fread-1.c: Likewise.
* c-c++-common/analyzer/pr104029.c: Likewise.
* gcc.dg/analyzer/pr93032-mztools-signed-char.c: Add params to
work around state explosion.
* gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: Likewise.
* gcc.dg/analyzer/pr93382.c: Remove use of
-fanalyzer-checker=taint.
* gcc.dg/analyzer/switch-enum-taint-1.c: Likewise.
* gcc.dg/analyzer/taint-CVE-2011-2210-1.c: Likewise.
* gcc.dg/analyzer/taint-CVE-2020-13143-1.c: Likewise.
* gcc.dg/analyzer/taint-CVE-2020-13143-2.c: Likewise.
* gcc.dg/analyzer/taint-CVE-2020-13143.h: Likewise.
* gcc.dg/analyzer/taint-alloc-1.c: Likewise.
* gcc.dg/analyzer/taint-alloc-2.c: Likewise.
* gcc.dg/analyzer/taint-alloc-3.c: Likewise.
* gcc.dg/analyzer/taint-alloc-4.c: Likewise.
* gcc.dg/analyzer/taint-alloc-5.c: Likewise.
* gcc.dg/analyzer/taint-assert-BUG_ON.c: Likewise.
* gcc.dg/analyzer/taint-assert-macro-expansion.c: Likewise.
* gcc.dg/analyzer/taint-assert-system-header.c: Likewise.
* gcc.dg/analyzer/taint-assert.c: Likewise.
* gcc.dg/analyzer/taint-divisor-1.c: Likewise.
* gcc.dg/analyzer/taint-divisor-2.c: Likewise.
* gcc.dg/analyzer/taint-merger.c: Likewise.
* gcc.dg/analyzer/taint-ops.c: Delete this test: it was a
duplicate of material in operations.c and data-model-1.c, with
-fanalyzer-checker=taint added.
* gcc.dg/analyzer/taint-read-index-1.c: Remove use of
-fanalyzer-checker=taint.
* gcc.dg/analyzer/taint-read-offset-1.c: Likewise.
* gcc.dg/analyzer/taint-realloc.c: Likewise. Add missing
dg-warning for leak now that the malloc state machine is also
active.
* gcc.dg/analyzer/taint-size-1.c: Remove use of
-fanalyzer-checker=taint.
* gcc.dg/analyzer/taint-size-access-attr-1.c: Likewise.
* gcc.dg/analyzer/taint-write-index-1.c: Likewise.
* gcc.dg/analyzer/taint-write-offset-1.c: Likewise.
* gcc.dg/analyzer/torture/taint-read-index-2.c: Likewise.
* gcc.dg/analyzer/torture/taint-read-index-3.c: Likewise.
* gcc.dg/plugin/taint-CVE-2011-0521-1-fixed.c: Likewise. Add
-Wno-pedantic.
* gcc.dg/plugin/taint-CVE-2011-0521-1.c: Likewise.
* gcc.dg/plugin/taint-CVE-2011-0521-2-fixed.c: Likewise.
* gcc.dg/plugin/taint-CVE-2011-0521-2.c: Likewise.
* gcc.dg/plugin/taint-CVE-2011-0521-3-fixed.c: Likewise.
* gcc.dg/plugin/taint-CVE-2011-0521-3.c: Likewise. Fix C++-style
comment.
* gcc.dg/plugin/taint-CVE-2011-0521-4.c: Remove use of
-fanalyzer-checker=taint and add -Wno-pedantic. Remove xfail and
add missing dg-warning.
* gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c: Remove use of
-fanalyzer-checker=taint and add -Wno-pedantic.
* gcc.dg/plugin/taint-CVE-2011-0521-5.c: Likewise.
* gcc.dg/plugin/taint-CVE-2011-0521-6.c: Likewise.
* gcc.dg/plugin/taint-antipatterns-1.c: : Remove use of
-fanalyzer-checker=taint.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 14 Nov 2023 19:02:10 +0000 (14:02 -0500)]
diagnostics: make option-handling callbacks private
No functional change intended.
gcc/c-family/ChangeLog:
* c-warn.cc (conversion_warning): Update call to
global_dc->m_option_enabled to use option_enabled_p.
gcc/cp/ChangeLog:
* decl.cc (finish_function): Update call to
global_dc->m_option_enabled to use option_enabled_p.
gcc/ChangeLog:
* diagnostic-format-json.cc
(json_output_format::on_end_diagnostic): Update calls to m_context
callbacks to use member functions; tighten up scopes.
* diagnostic-format-sarif.cc (sarif_builder::make_result_object):
Likewise.
(sarif_builder::make_reporting_descriptor_object_for_warning):
Likewise.
* diagnostic.cc (diagnostic_context::initialize): Update for
callbacks being moved into m_option_callbacks and being renamed.
(diagnostic_context::set_option_hooks): New.
(diagnostic_option_classifier::classify_diagnostic): Update call
to global_dc->m_option_enabled to use option_enabled_p.
(diagnostic_context::print_option_information): Update calls to
m_context callbacks to use member functions; tighten up scopes.
(diagnostic_context::diagnostic_enabled): Likewise.
* diagnostic.h (diagnostic_option_enabled_cb): New typedef.
(diagnostic_make_option_name_cb): New typedef.
(diagnostic_make_option_url_cb): New typedef.
(diagnostic_context::option_enabled_p): New.
(diagnostic_context::make_option_name): New.
(diagnostic_context::make_option_url): New.
(diagnostic_context::set_option_hooks): New decl.
(diagnostic_context::m_option_enabled): Rename to
m_option_enabled_cb and move within m_option_callbacks, using
typedef.
(diagnostic_context::m_option_state): Move within
m_option_callbacks.
(diagnostic_context::m_option_name): Rename to
m_make_option_name_cb and move within m_option_callbacks, using
typedef.
(diagnostic_context::m_get_option_url): Likewise, renaming to
m_make_option_url_cb.
* lto-wrapper.cc (print_lto_docs_link): Update call to m_context
callback to use member function.
(main): Use diagnostic_context::set_option_hooks.
* opts-diagnostic.h (option_name): Make context param const.
(get_option_url): Likewise.
* opts.cc (option_name): Likewise.
(get_option_url): Likewise.
* toplev.cc (general_init): Use
diagnostic_context::set_option_hooks.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 14 Nov 2023 19:01:55 +0000 (14:01 -0500)]
diagnostics: make m_text_callbacks private
No functional change intended.
gcc/ChangeLog:
* diagnostic-show-locus.cc (diagnostic_context::show_locus):
Update for renaming of text callbacks fields.
* diagnostic.cc (diagnostic_context::initialize): Likewise.
* diagnostic.h (class diagnostic_context): Add "friend" for
accessors to m_text_callbacks.
(diagnostic_context::m_text_callbacks): Make private, and add an
"m_" prefix to field names.
(diagnostic_starter): Convert from macro to inline function.
(diagnostic_start_span): New.
(diagnostic_finalizer): Convert from macro to inline function.
gcc/fortran/ChangeLog:
* error.cc (gfc_diagnostics_init): Use diagnostic_start_span.
Jakub Jelinek [Tue, 14 Nov 2023 17:32:37 +0000 (18:32 +0100)]
libcpp, contrib: Update to Unicode 15.1
The following patch (in plaintext just a pseudo-patch where I've left out
the too big parts of either wget downloaded or regenerated files out with
..., full patch attached compressed) updates to Unicode 15.1 from 15.0
we had last year. Apparently Unicode forgot to add a new range to 4-8 Table
we are using, but from the other files it is clear what should have been
added; I've filed a bugreport against Unicode.
2023-11-14 Jakub Jelinek <jakub@redhat.com>
contrib/
* unicode/README: Adjust glibc git commit hash, number of Unicode
data files to be updated and latest Unicode version.
* unicode/from_glibc/utf8_gen.py: Update from glibc.
* unicode/UnicodeData.txt: Update from Unicode 15.1.
* unicode/EastAsianWidth.txt: Likewise.
* unicode/DerivedNormalizationProps.txt: Likewise.
* unicode/NameAliases.txt: Likewise.
* unicode/DerivedCoreProperties.txt: Likewise.
* unicode/PropList.txt: Likewise.
libcpp/
* makeucnid.cc (write_copyright): Update copyright year.
* makeuname2c.cc (write_copyright): Likewise.
(struct generated): Update latest Unicode version.
(generated_ranges): Add 2ebf0-2ee5d CJK UNIFIED IDEOGRAPH
range which was forgotten to be added to 4-8 table, but
clearly is expected to be there from the 15.1 additions.
* ucnid.h: Regenerated.
* uname2c.h: Regenerated.
* generated_cpp_wcwidth.h: Regenerated.
This paper voted in as DR makes some multi-character literals ill-formed.
'abcd' stays valid, but e.g. 'á' is newly invalid in UTF-8 exec charset
while valid e.g. in ISO-8859-1, because it is a single character which needs
2 bytes to be encoded.
The following patch does that by checking (only pedantically, especially
because it is a DR) if we'd emit a -Wmultichar warning because character
constant has more than one byte in it whether the number of source characters
is equal to the number of bytes in the multichar string.
If it is, it is normal multi-character literal constant
and is diagnosed normally with -Wmultichar, otherwise at least one of the
c-chars in the sequence was encoded as 2+ bytes.
2023-11-14 Jakub Jelinek <jakub@redhat.com>
PR c++/110341
libcpp/
* charset.cc: Implement C++26 P1854R4 - Making non-encodable string
literals ill-formed.
(one_count_chars, convert_count_chars, count_source_chars): New
functions.
(narrow_str_to_charconst): Change last arg type from cpp_ttype to
const cpp_token *. For C++ if pedantic and i > 1 in CPP_CHAR
interpret token also as CPP_STRING32 and if number of characters
in the CPP_STRING32 is larger than number of bytes in CPP_CHAR,
pedwarn on it. Make the diagnostics more detailed.
(wide_str_to_charconst): Change last arg type from cpp_ttype to
const cpp_token *. Make the diagnostics more detailed.
(cpp_interpret_charconst): Adjust narrow_str_to_charconst and
wide_str_to_charconst callers.
gcc/testsuite/
* g++.dg/cpp26/literals1.C: New test.
* g++.dg/cpp26/literals2.C: New test.
* g++.dg/cpp23/wchar-multi1.C: Adjust expected diagnostic wordings.
* g++.dg/cpp23/wchar-multi2.C: Likewise.
* gcc.dg/c23-utf8char-3.c: Likewise.
* gcc.dg/cpp/charconst-4.c: Likewise.
* gcc.dg/cpp/charconst.c: Likewise.
* gcc.dg/cpp/if-2.c: Likewise.
* gcc.dg/utf16-4.c: Likewise.
* gcc.dg/utf32-4.c: Likewise.
* g++.dg/cpp1z/utf8-neg.C: Likewise.
* g++.dg/cpp2a/ucn2.C: Likewise.
* g++.dg/ext/utf16-4.C: Likewise.
* g++.dg/ext/utf32-4.C: Likewise.
in favor of explicitly using a specific file_cache throughout, and only
using global_dc's file_cache in gcc-specific code.
Rather than creating global_dc's file_cache the first time its needed,
this patch simply creates one when a diagnostic_context is initialized,
and eliminates diagnostic_file_cache_init.
No functional change intended.
gcc/c-family/ChangeLog:
* c-common.cc (c_get_substring_location): Use global_dc's
file_cache.
* c-format.cc (get_corrected_substring): Likewise.
* c-indentation.cc (get_visual_column): Add file_cache param.
(get_first_nws_vis_column): Likewise.
(detect_intervening_unindent): Likewise.
(should_warn_for_misleading_indentation): Use global_dc's
file_cache.
(assert_get_visual_column_succeeds): Add file_cache param.
(ASSERT_GET_VISUAL_COLUMN_SUCCEEDS): Likewise.
(assert_get_visual_column_fails): Likewise.
(define ASSERT_GET_VISUAL_COLUMN_FAILS): Likewise.
(selftest::test_get_visual_column): Create and use a temporary
file_cache.
gcc/cp/ChangeLog:
* contracts.cc (build_comment): Use global_dc's file_cache.
gcc/ChangeLog:
* diagnostic-format-sarif.cc (sarif_builder::get_sarif_column):
Use m_context's file_cache.
(sarif_builder::maybe_make_artifact_content_object): Likewise.
(sarif_builder::get_source_lines): Likewise.
* diagnostic-show-locus.cc
(exploc_with_display_col::exploc_with_display_col): Add file_cache
param.
(layout::m_file_cache): New field.
(make_range): Add file_cache param.
(selftest::test_layout_range_for_single_point): Create and use a
temporary file_cache.
(selftest::test_layout_range_for_single_line): Likewise.
(selftest::test_layout_range_for_multiple_lines): Likewise.
(layout::layout): Initialize m_file_cache from the context and use it.
(layout::maybe_add_location_range): Use m_file_cache.
(layout::calculate_x_offset_display): Likewise.
(get_affected_range): Add file_cache param.
(get_printed_columns): Likewise.
(line_corrections::line_corrections): Likewwise.
(line_corrections::m_file_cache): New field.
(source_line::source_line): Add file_cache param.
(line_corrections::add_hint): Use m_file_cache.
(layout::print_trailing_fixits): Likewise.
(layout::print_line): Likewise.
(selftest::test_layout_x_offset_display_utf8): Create and use a
temporary file_cache.
(selftest::test_layout_x_offset_display_tab): Likewise.
(selftest::test_diagnostic_show_locus_one_liner_utf8): Likewise.
(selftest::test_add_location_if_nearby): Pass global_dc's
file_cache to temp_source_file ctor.
(selftest::test_overlapped_fixit_printing): Create and use a
temporary file_cache.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_2): Use dc's file_cache.
* diagnostic.cc (diagnostic_context::initialize): Always create a
file_cache.
(diagnostic_context::initialize_input_context): Assume
m_file_cache has already been created.
(diagnostic_context::create_edit_context): Pass m_file_cache to
edit_context.
(convert_column_unit): Add file_cache param.
(diagnostic_context::converted_column): Use context's file_cache.
(print_parseable_fixits): Add file_cache param.
(diagnostic_context::report_diagnostic): Use context's file_cache.
(selftest::test_print_parseable_fixits_none): Create and use a
temporary file_cache.
(selftest::test_print_parseable_fixits_insert): Likewise.
(selftest::test_print_parseable_fixits_remove): Likewise.
(selftest::test_print_parseable_fixits_replace): Likewise.
(selftest::test_print_parseable_fixits_bytes_vs_display_columns):
Likewise.
* diagnostic.h (diagnostic_context::file_cache_init): Delete.
(diagnostic_context::get_file_cache): Convert return type from
pointer to reference.
* edit-context.cc (edited_file::get_file_cache): New.
(edited_file::m_edit_context): New.
(edit_context::edit_context): Add file_cache param.
(edit_context::get_or_insert_file): Pass this to edited_file's
ctor.
(edited_file::edited_file): Add edit_context param.
(edited_file::print_content): Use get_file_cache.
(edited_file::print_diff_hunk): Likewise.
(edited_file::print_run_of_changed_lines): Likewise.
(edited_file::get_or_insert_line): Likewise.
(edited_file::get_num_lines): Likewise.
(edited_line::edited_line): Pass in file_cache and use it.
(selftest::test_get_content): Create and use a
temporary file_cache.
(selftest::test_applying_fixits_insert_before): Likewise.
(selftest::test_applying_fixits_insert_after): Likewise.
(selftest::test_applying_fixits_insert_after_at_line_end):
Likewise.
(selftest::test_applying_fixits_insert_after_failure): Likewise.
(selftest::test_applying_fixits_insert_containing_newline):
Likewise.
(selftest::test_applying_fixits_growing_replace): Likewise.
(selftest::test_applying_fixits_shrinking_replace): Likewise.
(selftest::test_applying_fixits_replace_containing_newline):
Likewise.
(selftest::test_applying_fixits_remove): Likewise.
(selftest::test_applying_fixits_multiple): Likewise.
(selftest::test_applying_fixits_multiple_lines): Likewise.
(selftest::test_applying_fixits_modernize_named_init): Likewise.
(selftest::test_applying_fixits_modernize_named_init): Likewise.
(selftest::test_applying_fixits_unreadable_file): Likewise.
(selftest::test_applying_fixits_line_out_of_range): Likewise.
(selftest::test_applying_fixits_column_validation): Likewise.
(selftest::test_applying_fixits_column_validation): Likewise.
(selftest::test_applying_fixits_column_validation): Likewise.
(selftest::test_applying_fixits_column_validation): Likewise.
* edit-context.h (edit_context::edit_context): Add file_cache
param.
(edit_context::get_file_cache): New.
(edit_context::m_file_cache): New.
* final.cc: Include "diagnostic.h".
(asm_show_source): Use global_dc's file_cache.
* gcc-rich-location.cc (blank_line_before_p): Add file_cache
param.
(use_new_line): Likewise.
(gcc_rich_location::add_fixit_insert_formatted): Use global dc's
file_cache.
* input.cc (diagnostic_file_cache_init): Delete.
(diagnostic_context::file_cache_init): Delete.
(diagnostics_file_cache_forcibly_evict_file): Delete.
(file_cache::missing_trailing_newline_p): New.
(file_cache::evicted_cache_tab_entry): Don't call
diagnostic_file_cache_init.
(location_get_source_line): Delete.
(get_source_text_between): Add file_cache param.
(get_source_file_content): Delete.
(location_missing_trailing_newline): Delete.
(location_compute_display_column): Add file_cache param.
(dump_location_info): Create and use temporary file_cache.
(get_substring_ranges_for_loc): Add file_cache param.
(get_location_within_string): Likewise.
(get_source_range_for_char): Likewise.
(get_num_source_ranges_for_substring): Likewise.
(selftest::test_reading_source_line): Create and use temporary
file_cache.
(selftest::lexer_test::m_file_cache): New field.
(selftest::assert_char_at_range): Use test.m_file_cache.
(selftest::assert_num_substring_ranges): Likewise.
(selftest::assert_has_no_substring_ranges): Likewise.
(selftest::test_lexer_string_locations_concatenation_2): Likewise.
* input.h (class file_cache): New forward decl.
(location_compute_display_column): Add file_cache param.
(location_get_source_line): Delete.
(get_source_text_between): Add file_cache param.
(get_source_file_content): Delete.
(location_missing_trailing_newline): Delete.
(file_cache::missing_trailing_newline_p): New decl.
(diagnostics_file_cache_forcibly_evict_file): Delete.
* selftest.cc (named_temp_file::named_temp_file): Add file_cache
param.
(named_temp_file::~named_temp_file): Optionally evict the file
from the given file_cache.
(temp_source_file::temp_source_file): Add file_cache param.
* selftest.h (class file_cache): New forward decl.
(named_temp_file::named_temp_file): Add file_cache param.
(named_temp_file::m_file_cache): New field.
(temp_source_file::temp_source_file): Add file_cache param.
* substring-locations.h (get_location_within_string): Add
file_cache param.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c: Use
global_dc's file cache.
* gcc.dg/plugin/expensive_selftests_plugin.c: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 14 Nov 2023 16:01:39 +0000 (11:01 -0500)]
json: reduce use of naked new in json-building code
No functional change intended.
gcc/ChangeLog:
* diagnostic-format-json.cc: Use type-specific "set_*" functions
of json::object to avoid naked new of json value subclasses.
* diagnostic-format-sarif.cc: Likewise.
* gcov.cc: Likewise.
* json.cc (object::set_string): New.
(object::set_integer): New.
(object::set_float): New.
(object::set_bool): New.
(selftest::test_writing_objects): Use object::set_string.
* json.h (object::set_string): New decl.
(object::set_integer): New decl.
(object::set_float): New decl.
(object::set_bool): New decl.
* optinfo-emit-json.cc: Use type-specific "set_*" functions of
json::object to avoid naked new of json value subclasses.
* timevar.cc: Likewise.
* tree-diagnostic-path.cc: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
The Xmethod for std::deque::size() assumed that the first element would
be at the start of the first node. That's only true if elements are only
added at the back. If an element is inserted at the front, or removed
from the front (or anywhere before the middle) then the first node will
not be completely populated, and the Xmethod will give the wrong result.
libstdc++-v3/ChangeLog:
PR libstdc++/112491
* python/libstdcxx/v6/xmethods.py (DequeWorkerBase.size): Fix
calculation to use _M_start._M_cur.
* testsuite/libstdc++-xmethods/deque.cc: Check failing cases.
s390: Fix vec_scatter_element for vectors of floats
The offset for vec_scatter_element of floats should be a vector of type
UV4SI instead of V4SF. Note, this is an incompatibility change.
gcc/ChangeLog:
* config/s390/s390-builtin-types.def: Add/remove types.
* config/s390/s390-builtins.def (s390_vec_scatter_element_flt):
The type for the offset should be UV4SI instead of V4SF.
The change in r14-2852-gf5fb9ff2396fd4 failed to update patch_loop_exit
to compensate for rewriting of a NE/EQ_EXPR to a new code. Fixed
with the following.
PR tree-optimization/111233
PR tree-optimization/111652
PR tree-optimization/111727
PR tree-optimization/111838
PR tree-optimization/112113
* tree-ssa-loop-split.cc (patch_loop_exit): Get the new
guard code instead of the old guard stmt.
(split_loop): Adjust.
Richard Biener [Tue, 14 Nov 2023 11:53:18 +0000 (12:53 +0100)]
Loop distribution fix for SCC detection
The following adjusts data_dep_in_cycle_p to properly consider the
whole loop nest when looking for data dep cycles and exempting
zero-distance DDRs instead of just the outermost loop.
* tree-loop-distribution.cc (loop_distribution::data_dep_in_cycle_p):
Consider all loops in the nest when looking for
lambda_vector_zerop.
Richard Biener [Tue, 14 Nov 2023 10:37:13 +0000 (11:37 +0100)]
tree-optimization/112281 - loop distribution and zero dependence distances
We currently distribute
for (c = 2; c; c--)
for (e = 0; e < 2; e++) {
d[c] = b = d[c + 1];
d[c + 1].a = 0;
}
in a wrong way where the inner loop zero dependence distance should
make us preserve stmt execution order. We fail to do so since we
only look for a fully zero distance vector rather than looking at
the innermost loop distance. This is somewhat similar to PR87022
where we instead looked at the outermost loop distance and changed
this to what we do now. The following switches us to look at the
innermost loop distance.
PR tree-optimization/112281
* tree-loop-distribution.cc (pg_add_dependence_edges):
Preserve stmt order when the innermost loop has exact
overlap.
Jakub Jelinek [Tue, 14 Nov 2023 12:19:48 +0000 (13:19 +0100)]
i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523]
On Sun, Nov 12, 2023 at 09:03:42PM -0000, Roger Sayle wrote:
> This patch improves register pressure during reload, inspired by PR 97756.
> Normally, a double-word right-shift by a constant produces a double-word
> result, the highpart of which is dead when followed by a truncation.
> The dead code calculating the high part gets cleaned up post-reload, so
> the issue isn't normally visible, except for the increased register
> pressure during reload, sometimes leading to odd register assignments.
> Providing a post-reload splitter, which clobbers a single wordmode
> result register instead of a doubleword result register, helps (a bit).
Unfortunately this broke bootstrap on i686-linux, broke all ACATS tests
on x86_64-linux as well as miscompiled e.g. __floattisf in libgcc there
as well.
The bug is that shrd{l,q} instruction expects the low part of the input
to be the same register as the output, rather than the high part as the
patch implemented.
split_double_mode (<DWI>mode, &operands[1], 1, &operands[1], &operands[3]);
sets operands[1] to the lo_half and operands[3] to the hi_half, so if
operands[0] is not the same register as operands[1] (rather than [3]) after
RA, we should during splitting move operands[1] into operands[0].
Your testcase:
> #define MASK60 ((1ul << 60) - 1)
> unsigned long foo (__uint128_t n)
> {
> unsigned long a = n & MASK60;
> unsigned long b = (n >> 60);
> b = b & MASK60;
> unsigned long c = (n >> 120);
> return a+b+c;
> }
still has the same number of instructions.
Bootstrapped/regtested on x86_64-linux (where it e.g. turns
=== acats Summary ===
-# of unexpected failures 2328
+# of expected passes 2328
+# of unexpected failures 0
and fixes gcc.dg/torture/fp-int-convert-*timode.c FAILs as well)
and i686-linux (where it previously didn't bootstrap, but compared to
Friday evening's bootstrap the testresults are ok).
2023-11-14 Jakub Jelinek <jakub@redhat.com>
PR target/112523
PR ada/112514
* config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): Move
operands[1] aka low part of input rather than operands[3] aka high
part of input to output if not the same register.
The r14-5312-g040e5b0edbca861196d9e2ea2af5e805769c8d5d commit log contains
a line from git revert with correct hash, but unfortunately hand ammended
with explanation, so it got through the pre-commit hook but failed during
update_version_git generation. Please don't do this.
Georg-Johann Lay [Tue, 14 Nov 2023 11:05:19 +0000 (12:05 +0100)]
LibF7: sinh: Fix loss of precision due to cancellation for small values.
libgcc/config/avr/libf7/
* libf7-const.def [F7MOD_sinh_]: Add MiniMax polynomial.
* libf7.c (f7_sinh): Use it instead of (exp(x) - exp(-x)) / 2
when |x| < 0.5 to avoid loss of precision due to cancellation.
Lehua Ding [Tue, 14 Nov 2023 08:42:19 +0000 (16:42 +0800)]
x86: Make testcase apx-spill_to_egprs-1.c more robust
Hi,
This little patch adjust the assert in apx-spill_to_egprs-1.c testcase.
The -mapxf compilation option allows more registers to be used, which in
turn eliminates the need for local variables to be stored in stack memory.
Therefore, the assertion is changed to detects no memory loaded through the
%rsp register.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-spill_to_egprs-1.c: Make sure that no local
variables are stored on the stack.
Andreas Krebbel [Tue, 14 Nov 2023 10:33:45 +0000 (11:33 +0100)]
IBM Z: Add GTY marker to builtin data structures
This adds GTY markers to s390_builtin_types, s390_builtin_fn_types,
and s390_builtin_decls. These were missing causing problems in
particular when using builtins after including a precompiled header.
Unfortunately the declaration of these data structures use enum values
from s390-builtins.h. This file however is not included everywhere
and is rather large. In order to include it only for the purpose of
gtype-desc.cc we place a preprocessed copy of it in the build
directory and include only this.
Andreas Krebbel [Tue, 14 Nov 2023 10:33:44 +0000 (11:33 +0100)]
IBM Z: Fix ICE with overloading and checking enabled
s390_resolve_overloaded_builtin, when called on NON_DEPENDENT_EXPR,
ICEs when using the type from it which ends up as error_mark_node.
This particular instance of the problem does not occur anymore since
NON_DEPENDENT_EXPR has been removed. Nevertheless that case needs to
be handled here.
gcc/ChangeLog:
* config/s390/s390-c.cc (s390_fn_types_compatible): Add a check
for error_mark_node.
Jonathan Wakely [Mon, 13 Nov 2023 12:03:31 +0000 (12:03 +0000)]
c++: Link extended FP conversion pedwarns to -Wnarrowing [PR111842]
Several users have been confused by the status of these warnings,
which can be misunderstood as "this might not be what you want",
rather than diagnostics required by the C++ standard. Add the text "ISO
C++ does not allow" to make this clear.
Also link them to -Wnarrowing so that they can be disabled or promoted
to errors independently of other pedwarns.
PR c++/111842
PR c++/112498
gcc/cp/ChangeLog:
* call.cc (convert_like_internal): Use OPT_Wnarrowing for
pedwarns about illformed conversions involving extended
floating-point types. Clarify that ISO C++ requires these
diagnostics.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/ext-floating16.C: New test.
* g++.dg/cpp23/ext-floating17.C: New test.
The following patch adds 6 new type-generic builtins,
__builtin_clzg
__builtin_ctzg
__builtin_clrsbg
__builtin_ffsg
__builtin_parityg
__builtin_popcountg
The g at the end stands for generic because the unsuffixed variant
of the builtins already have unsigned int or int arguments.
The main reason to add these is to support arbitrary unsigned (for
clrsb/ffs signed) bit-precise integer types and also __int128 which
wasn't supported by the existing builtins, so that e.g. <stdbit.h>
type-generic functions could then support not just bit-precise unsigned
integer type whose width matches a standard or extended integer type,
but others too.
None of these new builtins promote their first argument, so the argument
can be e.g. unsigned char or unsigned short or unsigned __int20 etc.
The first 2 support either 1 or 2 arguments, if only 1 argument is supplied,
the behavior is undefined for argument 0 like for other __builtin_c[lt]z*
builtins, if 2 arguments are supplied, the second argument should be int
that will be returned if the argument is 0. All other builtins have
just one argument. For __builtin_clrsbg and __builtin_ffsg the argument
shall be any signed standard/extended or bit-precise integer, for the others
any unsigned standard/extended or bit-precise integer (bool not allowed).
One possibility would be to also allow signed integer types for
the clz/ctz/parity/popcount ones (and just cast the argument to
unsigned_type_for during folding) and similarly unsigned integer types
for the clrsb/ffs ones, dunno what is better; for stdbit.h the current
version is sufficient and diagnoses use of the inappropriate sign,
though on the other side I wonder if users won't be confused by
__builtin_clzg (1) being an error and having to write __builtin_clzg (1U).
The new builtins are lowered to corresponding builtins with other suffixes
or internal calls (plus casts and adjustments where needed) during FE
folding or during gimplification at latest, the non-suffixed builtins
handling precisions up to precision of int, l up to precision of long,
ll up to precision of long long, up to __int128 precision lowered to
double-word expansion early and the rest (which must be _BitInt) lowered
to internal fn calls - those are then lowered during bitint lowering pass.
The patch also changes representation of IFN_CLZ and IFN_CTZ calls,
previously they were in the IL only if they are directly supported optab
and depending on C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 they had or didn't
have defined behavior at 0, now they are in the IL either if directly
supported optab, or for the large/huge BITINT_TYPEs and they have either
1 or 2 arguments. If one, the behavior is undefined at zero, if 2, the
second argument is an int constant that should be returned for 0.
As there is no extra support during expansion, for directly supported optab
the second argument if present should still match the
C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 value, but for BITINT_TYPE arguments
it can be arbitrary int INTEGER_CST.
The indended uses in stdbit.h are e.g.
#ifdef __has_builtin
#if __has_builtin(__builtin_clzg) && __has_builtin(__builtin_ctzg) && __has_builtin(__builtin_popcountg)
#define stdc_leading_zeros(value) \
((unsigned int) __builtin_clzg (value, __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#define stdc_leading_ones(value) \
((unsigned int) __builtin_clzg ((__typeof (value)) ~(value), __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#define stdc_first_trailing_one(value) \
((unsigned int) (__builtin_ctzg (value, -1) + 1))
#define stdc_trailing_zeros(value) \
((unsigned int) __builtin_ctzg (value, __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#endif
#endif
where __builtin_popcountg ((__typeof (x)) -1) computes the bit precision
of x's type (kind of _Bitwidthof (x) alternative).
They also allow casting of arbitrary unsigned _BitInt other than
unsigned _BitInt(1) to corresponding signed _BitInt by using
signed _BitInt(__builtin_popcountg ((__typeof (a)) -1))
and of arbitrary signed _BitInt to corresponding unsigned _BitInt
using unsigned _BitInt(__builtin_clrsbg ((__typeof (a)) -1) + 1).
2023-11-14 Jakub Jelinek <jakub@redhat.com>
PR c/111309
gcc/
* builtins.def (BUILT_IN_CLZG, BUILT_IN_CTZG, BUILT_IN_CLRSBG,
BUILT_IN_FFSG, BUILT_IN_PARITYG, BUILT_IN_POPCOUNTG): New
builtins.
* builtins.cc (fold_builtin_bit_query): New function.
(fold_builtin_1): Use it for
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(fold_builtin_2): Use it for BUILT_IN_{CLZ,CTZ}G.
* fold-const-call.cc: Fix comment typo on tm.h inclusion.
(fold_const_call_ss): Handle
CFN_BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(fold_const_call_sss): New function.
(fold_const_call_1): Call it for 2 argument functions returning
scalar when passed 2 INTEGER_CSTs.
* genmatch.cc (cmp_operand): For function calls also compare
number of arguments.
(fns_cmp): New function.
(dt_node::gen_kids): Sort fns and generic_fns.
(dt_node::gen_kids_1): Handle fns with the same id but different
number of arguments.
* match.pd (CLZ simplifications): Drop checks for defined behavior
at zero. Add variant of simplifications for IFN_CLZ with 2 arguments.
(CTZ simplifications): Drop checks for defined behavior at zero,
don't optimize precisions above MAX_FIXED_MODE_SIZE. Add variant of
simplifications for IFN_CTZ with 2 arguments.
(a != 0 ? CLZ(a) : CST -> .CLZ(a)): Use TREE_TYPE (@3) instead of
type, add BITINT_TYPE handling, create 2 argument IFN_CLZ rather than
one argument. Add variant for matching CLZ with 2 arguments.
(a != 0 ? CTZ(a) : CST -> .CTZ(a)): Similarly.
* gimple-lower-bitint.cc (bitint_large_huge::lower_bit_query): New
method.
(bitint_large_huge::lower_call): Use it for IFN_{CLZ,CTZ,CLRSB,FFS}
and IFN_{PARITY,POPCOUNT} calls.
* gimple-range-op.cc (cfn_clz::fold_range): Don't check
CLZ_DEFINED_VALUE_AT_ZERO for m_gimple_call_internal_p, instead
assume defined value at zero if the call has 2 arguments and use
second argument value for that case.
(cfn_ctz::fold_range): Similarly.
(gimple_range_op_handler::maybe_builtin_call): Use op_cfn_clz_internal
or op_cfn_ctz_internal only if internal fn call has 2 arguments and
set m_op2 in that case.
* tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern,
vect_recog_popcount_clz_ctz_ffs_pattern): For value defined at zero
use second argument of calls if present, otherwise assume UB at zero,
create 2 argument .CLZ/.CTZ calls if needed.
* tree-vect-stmts.cc (vectorizable_call): Handle 2 argument .CLZ/.CTZ
calls.
* tree-ssa-loop-niter.cc (build_cltz_expr): Create 2 argument
.CLZ/.CTZ calls if needed.
* tree-ssa-forwprop.cc (simplify_count_trailing_zeroes): Create 2
argument .CTZ calls if needed.
* tree-ssa-phiopt.cc (cond_removal_in_builtin_zero_pattern): Handle
2 argument .CLZ/.CTZ calls, handle BITINT_TYPE, create 2 argument
.CLZ/.CTZ calls.
* doc/extend.texi (__builtin_clzg, __builtin_ctzg, __builtin_clrsbg,
__builtin_ffsg, __builtin_parityg, __builtin_popcountg): Document.
gcc/c-family/
* c-common.cc (check_builtin_function_arguments): Handle
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
* c-gimplify.cc (c_gimplify_expr): If __builtin_c[lt]zg second
argument hasn't been folded into constant yet, transform it to one
argument call inside of a COND_EXPR which for first argument 0
returns the second argument.
gcc/c/
* c-typeck.cc (convert_arguments): Don't promote first argument
of BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
gcc/cp/
* call.cc (magic_varargs_p): Return 4 for
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(build_over_call): Don't promote first argument of
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
* cp-gimplify.cc (cp_gimplify_expr): For BUILT_IN_C{L,T}ZG use
c_gimplify_expr.
gcc/testsuite/
* c-c++-common/pr111309-1.c: New test.
* c-c++-common/pr111309-2.c: New test.
* gcc.dg/torture/bitint-43.c: New test.
* gcc.dg/torture/bitint-44.c: New test.
Xi Ruoyao [Fri, 3 Nov 2023 13:19:59 +0000 (21:19 +0800)]
LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]
As the commit message of r14-4674 has indicated, if the assembler does
not support conditional branch relaxation, a relocation overflow may
happen on conditional branches when relaxation is enabled because the
number of NOP instructions inserted by the assembler will be more than
the number estimated by GCC.
To work around this issue, disable relaxation by default if the
assembler is detected incapable to perform conditional branch relaxation
at GCC build time. We also need to pass -mno-relax to the assembler to
really disable relaxation. But, if the assembler does not support
-mrelax option at all, we should not pass -mno-relax to the assembler or
it will immediately error out. Also handle this with the build time
assembler capability probing, and add a pair of options
-m[no-]pass-mrelax-to-as to allow using a different assembler from the
build-time one.
With this change, if GCC is built with GAS 2.41, relaxation will be
disabled by default. So the default value of -mexplicit-relocs= is also
changed to 'always' if -mno-relax is specified or implied by the
build-time default, because using assembler macros for symbol addresses
produces no benefit when relaxation is disabled.
gcc/ChangeLog:
PR target/112330
* config/loongarch/genopts/loongarch.opt.in: Add
-m[no]-pass-relax-to-as. Change the default of -m[no]-relax to
account conditional branch relaxation support status.
* config/loongarch/loongarch.opt: Regenerate.
* configure.ac (gcc_cv_as_loongarch_cond_branch_relax): Check if
the assembler supports conditional branch relaxation.
* configure: Regenerate.
* config.in: Regenerate. Note that there are some unrelated
changes introduced by r14-5424 (which does not contain a
config.in regeneration).
* config/loongarch/loongarch-opts.h
(HAVE_AS_COND_BRANCH_RELAXATION): Define to 0 if not defined.
* config/loongarch/loongarch-driver.h (ASM_MRELAX_DEFAULT):
Define.
(ASM_MRELAX_SPEC): Define.
(ASM_SPEC): Use ASM_MRELAX_SPEC instead of "%{mno-relax}".
* config/loongarch/loongarch.cc: Take the setting of
-m[no-]relax into account when determining the default of
-mexplicit-relocs=.
* doc/invoke.texi: Document -m[no-]relax and
-m[no-]pass-mrelax-to-as for LoongArch. Update the default
value of -mexplicit-relocs=.
Xi Ruoyao [Mon, 13 Nov 2023 21:32:38 +0000 (05:32 +0800)]
LoongArch: Use finer-grained DBAR hints
LA664 defines DBAR hints 0x1 - 0x1f (except 0xf and 0x1f) as follows [1-2]:
- Bit 4: kind of constraint (0: completion, 1: ordering)
- Bit 3: barrier for previous read (0: true, 1: false)
- Bit 2: barrier for previous write (0: true, 1: false)
- Bit 1: barrier for succeeding read (0: true, 1: false)
- Bit 0: barrier for succeeding write (0: true, 1: false)
LLVM has already utilized them for different memory orders [3]:
- Bit 4 is always set to one because it's only intended to be zero for
things like MMIO devices, which are out of the scope of memory orders.
- An acquire barrier is used to implement acquire loads like
ld.d $a1, $t0, 0
dbar acquire_hint
where the load operation (ld.d) should not be reordered with any load
or store operation after the acquire load. To accomplish this
constraint, we need to prevent the load operation from being reordered
after the barrier, and also prevent any following load/store operation
from being reordered before the barrier. Thus bits 0, 1, and 3 must
be zero, and bit 2 can be one, so acquire_hint should be 0b10100.
- An release barrier is used to implement release stores like
dbar release_hint
st.d $a1, $t0, 0
where the store operation (st.d) should not be reordered with any load
or store operation before the release store. So we need to prevent
the store operation from being reordered before the barrier, and also
prevent any preceding load/store operation from being reordered after
the barrier. So bits 0, 2, 3 must be zero, and bit 1 can be one. So
release_hint should be 0b10010.
A similar mapping has been utilized for RISC-V GCC [4], LoongArch Linux
kernel [1], and LoongArch LLVM [3]. So the mapping should be correct.
And I've also bootstrapped & regtested GCC on a LA664 with this patch.
The LoongArch CPUs should treat "unknown" hints as dbar 0, so we can
unconditionally emit the new hints without a compiler switch.
Jakub Jelinek [Tue, 14 Nov 2023 08:24:34 +0000 (09:24 +0100)]
tree: Handle BITINT_TYPE in type_contains_placeholder_1 [PR112511]
The following testcase ICEs because BITINT_TYPE isn't handled in
type_contains_placeholder_1. Given that Ada doesn't emit it, it doesn't
matter that much where exactly we handle it as right now it should never
contain a placeholder; I've picked the same spot as INTEGER_TYPE, but if
you prefer e.g. the one with OFFSET_TYPE above, I can move it there too.
2023-11-14 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112511
* tree.cc (type_contains_placeholder_1): Handle BITINT_TYPE like
INTEGER_TYPE.
Jakub Jelinek [Tue, 14 Nov 2023 07:11:44 +0000 (08:11 +0100)]
i386: Don't optimize vshuf{i,f}{32x4,64x2} and vperm{i,f}128 to vblendps for %ymm16+ [PR112435]
The vblendps instruction is only VEX encoded, not EVEX, so can't be used if
there are %ymm16+ or EGPR registers involved.
2023-11-14 Jakub Jelinek <jakub@redhat.com>
Hu, Lin1 <lin1.hu@intel.com>
PR target/112435
* config/i386/sse.md (avx512vl_shuf_<shuffletype>32x4_1<mask_name>,
<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>): Add
alternative with just x instead of v constraints and xjm instead of
vm and use vblendps as optimization only with that alternative.
* gcc.target/i386/avx512vl-pr112435-1.c: New test.
* gcc.target/i386/avx512vl-pr112435-2.c: New test.
* gcc.target/i386/avx512vl-pr112435-3.c: New test.