Jakub Jelinek [Fri, 19 Apr 2024 22:12:36 +0000 (00:12 +0200)]
c-family: Allow arguments with NULLPTR_TYPE as sentinels [PR114780]
While in C++ the ellipsis argument conversions include
"An argument that has type cv std::nullptr_t is converted to type void*"
in C23 a nullptr_t argument is not promoted in any way, but va_arg
description says:
"the type of the next argument is nullptr_t and type is a pointer type that has the same
representation and alignment requirements as a pointer to a character type."
So, while in C++ check_function_sentinel will never see NULLPTR_TYPE, for
C23 it can see that and currently we incorrectly warn about those.
The only question is whether we should warn on any argument with
nullptr_t type or just about nullptr (nullptr_t argument with integer_zerop
value). Through undefined behavior guess one could pass non-NULL pointer
that way, say by union { void *p; nullptr_t q; } u; u.p = &whatever;
and pass u.q to ..., but valid code should always pass something that will
read as (char *) 0 when read using va_arg (ap, char *), so I think it is
better not to warn rather than warn in those cases.
Note, clang seems to pass (void *)0 rather than expression of nullptr_t
type to ellipsis in C23 mode as if it did the C++ ellipsis argument
conversions, in that case guess not warning about that would be even safer,
but what GCC does I think follows the spec more closely, even when in a
valid program one shouldn't be able to observe the difference.
2024-04-20 Jakub Jelinek <jakub@redhat.com>
PR c/114780
* c-common.cc (check_function_sentinel): Allow as sentinel any
argument of NULLPTR_TYPE.
Jakub Jelinek [Fri, 19 Apr 2024 22:05:21 +0000 (00:05 +0200)]
c: Fix ICE with -g and -std=c23 related to incomplete types [PR114361]
We did not update TYPE_CANONICAL for incomplete variants when
completing a structure. We now set for flag_isoc23 TYPE_STRUCTURAL_EQUALITY_P
for incomplete structure and union types and then update TYPE_CANONICAL
later, though update it only for the variants and derived pointer types
which can be easily discovered. Other derived types created while
the type was still incomplete will remain TYPE_STRUCTURAL_EQUALITY_P.
See PR114574 for discussion.
2024-04-20 Martin Uecker <uecker@tugraz.at>
Jakub Jelinek <jakub@redhat.com>
PR lto/114574
PR c/114361
gcc/c/
* c-decl.cc (shadow_tag_warned): For flag_isoc23 and code not
ENUMERAL_TYPE use SET_TYPE_STRUCTURAL_EQUALITY.
(parser_xref_tag): Likewise.
(start_struct): For flag_isoc23 use SET_TYPE_STRUCTURAL_EQUALITY.
(c_update_type_canonical): New function.
(finish_struct): Put NULL as second == operand rather than first.
Assert TYPE_STRUCTURAL_EQUALITY_P. Call c_update_type_canonical.
* c-typeck.cc (composite_type_internal): Use
SET_TYPE_STRUCTURAL_EQUALITY. Formatting fix.
gcc/testsuite/
* gcc.dg/pr114574-1.c: New test.
* gcc.dg/pr114574-2.c: New test.
* gcc.dg/pr114361.c: New test.
* gcc.dg/c23-tag-incomplete-1.c: New test.
* gcc.dg/c23-tag-incomplete-2.c: New test.
Jonathan Wakely [Fri, 19 Apr 2024 16:42:04 +0000 (17:42 +0100)]
libstdc++: Simplify constraints on <=> for std::reference_wrapper
Instead of constraining these overloads in terms of synth-three-way we
can just check that the value_type is less-than-comparable, which is
what synth-three-way's constraints check.
The reason that I implemented these with constraints has now been filed
as LWG 4071, so add a comment about that too.
Jonathan Wakely [Thu, 18 Apr 2024 11:14:41 +0000 (12:14 +0100)]
libstdc++: Support link chains in std::chrono::tzdb::locate_zone [PR114770]
Since 2022 the TZif format defined in the zic(8) man page has said that
links can refer to other links, rather than only referring to a zone.
This isn't supported by the C++20 spec, which assumes that the target()
for a chrono::time_zone_link always names a chrono::time_zone, not
another chrono::time_zone_link.
This hasn't been a problem until now, because there are no entries in
the tzdata file that chain links together. However, Debian Sid has
changed the target of the Asia/Chungking link from the Asia/Shanghai
zone to the Asia/Chongqing link, creating a link chain. The libstdc++
code is unable to handle this, so chrono::locate_zone("Asia/Chungking")
will fail with the tzdata.zi file from Debian Sid.
It seems likely that the C++ spec will need a change to allow link
chains, so that the original structure of the IANA database can be fully
represented by chrono::tzdb. The alternative would be for chrono::tzdb
to flatten all chains when loading the data, so that a link's target is
always a zone, but this means throwing away information present in the
tzdata.zi input file.
In anticipation of a change to the spec, this commit adds support for
chained links to libstdc++. When a name is found to be a link, we try to
find its target in the list of zones as before, but now if the target
isn't the name of a zone we don't fail. Instead we look for another link
with that name, and keep doing that until we reach the end of the chain
of links, and then look up the last target as a zone.
This new logic would get stuck in a loop if the tzdata.zi file is buggy
and defines a link chain that contains a cycle, e.g. two links that
refer to each other. To deal with that unlikely case, we use the
tortoise and hare algorithm to detect cycles in link chains, and throw
an exception if we detect a cycle. Cycles in links should never happen,
and it is expected that link chains will be short (if they occur at all)
and so the code is optimized for short chains without cycles. Longer
chains (four or more links) and cycles will do more work, but won't fail
to resolve a chain or get stuck in a loop.
The new test file checks various forms of broken links and cycles.
Also add a new check in the testsuite that every element in the
get_tzdb().zones and get_tzdb().links sequences can be successfully
found using locate_zone.
libstdc++-v3/ChangeLog:
PR libstdc++/114770
* src/c++20/tzdb.cc (do_locate_zone): Support links that have
another link as their target.
* testsuite/std/time/tzdb/1.cc: Check that all zones and links
can be found by locate_zone.
* testsuite/std/time/tzdb/links.cc: New test.
Tamar Christina [Fri, 19 Apr 2024 14:22:13 +0000 (15:22 +0100)]
middle-end: refactory vect_recog_absolute_difference to simplify flow [PR114769]
Hi All,
As the reporter in PR114769 points out the control flow for the abd detection
is hard to follow. This is because vect_recog_absolute_difference has two
different ways it can return true.
1. It can return true when the widening operation is matched, in which case
unprom is set, half_type is not NULL and diff_stmt is not set.
2. It can return true when the widening operation is not matched, but the stmt
being checked is a minus. In this case unprom is not set, half_type is set
to NULL and diff_stmt is set. This because to get to diff_stmt you have to
dig through the abs statement and any possible promotions.
This however leads to complicated uses of the function at the call sites as the
exact semantic needs to be known to use it safely.
vect_recog_absolute_difference has two callers:
1. vect_recog_sad_pattern where if you return true with unprom not set, then
*half_type will be NULL. The call to vect_supportable_direct_optab_p will
always reject it since there's no vector mode for NULL. Note that if looking
at the dump files, the convention in the dump files have always been that we
first indicate that a pattern could possibly be recognize and then check that
it's supported.
This change somewhat incorrectly makes the diagnostic message get printed for
"invalid" patterns.
2. vect_recog_abd_pattern, where if half_type is NULL, it then uses diff_stmt to
set them.
This refactors the code, it now only has 1 success condition, and diff_stmt is
always set to the minus statement in the abs if there is one.
The function now only returns success if the widening minus is found, in which
case unprom and half_type set.
This then leaves it up to the caller to decide if they want to do anything with
diff_stmt.
Thanks,
Tamar
gcc/ChangeLog:
PR tree-optimization/114769
* tree-vect-patterns.cc:
(vect_recog_absolute_difference): Have only one success condition.
(vect_recog_abd_pattern): Handle further checks if
vect_recog_absolute_difference fails.
Thomas Schwinge [Fri, 19 Apr 2024 10:32:03 +0000 (12:32 +0200)]
Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768]
Follow-up to commit 9f295847a9c32081bdd0fe908ffba58e830a24fb
"rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]": nvptx does
behave in the exactly same way as expected; see 'diff' of before vs. after the
'gcc/rtlanal.cc' code changes:
PASS: gcc.dg/pr114768.c (test for excess errors)
[-FAIL:-]{+PASS:+} gcc.dg/pr114768.c scan-rtl-dump final "\\(mem/v:"
bpf: remove huge memory waste with string allocation.
The BPF backend was allocating an unnecessarily large string when
constructing CO-RE relocations for enum types.
This patch also verifies that those enumerators are valid for CO-RE,
returning an error otherwise.
gcc/ChangeLog:
* config/bpf/core-builtins.cc (get_index_for_enum_value): Create
function.
(pack_enum_value): Check for enumerator and error out.
(process_enum_value): Correct string allocation.
bpf: support more instructions to match CO-RE relocations
BPF supports multiple instructions to be CO-RE relocatable regardless of
the position of the immediate field in the encoding.
In particular, not only the MOV instruction allows a CO-RE
relocation of its immediate operand, but the LD and ST instructions can
have a CO-RE relocation happening to their offset immediate operand,
even though those operands are encoded in different encoding bits.
This patch moves matching from a more traditional matching of the
UNSPEC_CORE_RELOC pattern within a define_insn to a match within the
constraints of both immediates and address operands from more generic
mov define_insn rule.
gcc/Changelog:
* config/bpf/bpf-protos.h (bpf_add_core_reloc): Renamed function
to bpf_output_move.
* config/bpf/bpf.cc (bpf_legitimate_address_p): Allow
UNSPEC_CORE_RELOC to match an address.
(bpf_insn_cost): Make UNSPEC_CORE_RELOC immediate moves
expensive to prioritize loads and stores.
(TARGET_INSN_COST): Add hook.
(bpf_output_move): Wrapper to call bpf_output_core_reloc.
(bpf_print_operand): Add support to print immediate operands
specified with the UNSPEC_CORE_RELOC.
(bpf_print_operand_address): Likewise, but to support
UNSPEC_CORE_RELOC in addresses.
(bpf_init_builtins): Flag BPF_BUILTIN_CORE_RELOC as NOTHROW.
* config/bpf/bpf.md: Wrap patterns for MOV, LD and ST
instruction with bpf_output_move call.
(mov_reloc_core<MM:mode>): Remove now spurious define_insn.
* config/bpf/constraints.md: Added "c" and "C" constraints to
match immediates represented with UNSPEC_CORE_RELOC.
* config/bpf/core-builtins.cc (bpf_add_core_reloc): Remove
(bpf_output_core_reloc): Add function to create the CO-RE
relocations based on new matching rules.
* config/bpf/core-builtins.h (bpf_output_core_reloc): Add
prototype.
* config/bpf/predicates.md (core_imm_operand) Add predicate.
(mov_src_operand): Add match for core_imm_operand.
d: Fix ICE in build_deref, at d/d-codegen.cc:1650 [PR111650]
PR d/111650
gcc/d/ChangeLog:
* decl.cc (get_fndecl_arguments): Move generation of frame type to ...
(DeclVisitor::visit (FuncDeclaration *)): ... here, after the call to
build_closure.
Jakub Jelinek [Fri, 19 Apr 2024 06:47:53 +0000 (08:47 +0200)]
rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]
On the following testcase, combine propagates the mem/v load into mem store
with the same address and then removes it, because noop_move_p says it is a
no-op move. If it was the other way around, i.e. mem/v store and mem load,
or both would be mem/v, it would be kept.
The problem is that rtx_equal_p never checks any kind of flags on the rtxes
(and I think it would be quite dangerous to change it at this point), and
set_noop_p checks side_effects_p on just one of the operands, not both.
In the MEM <- MEM set, it only checks it on the destination, in
store to ZERO_EXTRACT only checks it on the source.
The following patch adds the missing side_effects_p checks.
2024-04-19 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114768
* rtlanal.cc (set_noop_p): Don't return true for MEM <- MEM
sets if src has side-effects or for stores into ZERO_EXTRACT
if ZERO_EXTRACT operand has side-effects.
Jakub Jelinek [Fri, 19 Apr 2024 06:44:54 +0000 (08:44 +0200)]
libgcc: Another __divmodbitint4 bug fix [PR114762]
The following testcase is miscompiled because the code to decrement
vn on negative value with all ones in most significant limb (even partial)
and 0 in most significant bit of the second most significant limb doesn't
take into account the case where all bits below the most significant limb
are zero. This has been a problem both in the version before yesterday's
commit where it has been done only if un was one shorter than vn before this
decrement, and is now problem even more often when it is done earlier.
When we decrement vn in such case and negate it, we end up with all 0s in
the v2 value, so have both the problems with UB on __builtin_clz* and the
expectations of the algorithm that the divisor has most significant bit set
after shifting, plus when the decremented vn is 1 it can SIGFPE on division
by zero even when it is not division by zero etc. Other values shouldn't
get 0 in the new most significant limb after negation, because the
bitint_reduce_prec canonicalization should reduce prec if the second most
significant limb is all ones and if that limb is all zeros, if at least
one limb below it is non-zero, carry in will make it non-zero.
The following patch fixes it by checking if at least one bit below the
most significant limb is non-zero, in that case it decrements, otherwise
it will do nothing (but e.g. for the un < vn case that also means the
divisor is large enough that the result should be q 0 r u).
2024-04-18 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114762
* libgcc2.c (__divmodbitint4): Perform the decrement on negative
v with most significant limb all ones and the second least
significant limb with most significant bit clear always, regardless of
un < vn.
This patch marks the nios2*-*-* targets obsolete in GCC 14. Intel has
EOL'ed this architecture and the maintainers no longer have access to
hardware for testing. While the port is still in reasonably good
shape at this time, no further testing or updates are planned.
gcc/
* config.gcc: Add nios2*-*-* to the list of obsoleted targets.
contrib/
* config-list.mk (LIST): --enable-obsolete for nios2*-*-*.
Paul Thomas [Thu, 18 Apr 2024 17:07:25 +0000 (18:07 +0100)]
Fortran: Fix ICE and clear incorrect error messages [PR114739]
2024-04-18 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/114739
* primary.cc (gfc_match_varspec): Check for default type before
checking for derived types with the right component name.
gcc/testsuite/
PR fortran/114739
* gfortran.dg/pr114739.f90: New test.
* gfortran.dg/derived_comp_array_ref_8.f90: Add 'implicit none'
for consistency with expected error message.
* gfortran.dg/nullify_4.f90: ditto
* gfortran.dg/pointer_init_6.f90: ditto
* gfortran.dg/pr107397.f90: ditto
* gfortran.dg/pr88138.f90: ditto
[testsuite] [i386] work around fails with --enable-frame-pointer
A few x86 tests get unexpected insn counts if the toolchain is
configured with --enable-frame-pointer. Add explicit
-fomit-frame-pointer so that the expected insn sequences are output.
[c++] [testsuite] adjust contracts9.C for negative addresses
The test expected the address of a literal string, converted to long
long, to yield a positive value. That expectation doesn't necessarily
hold, and the test fails where it doesn't.
Adjust the test to use a pointer that will compare as expected.
for gcc/testsuite/ChangeLog
* g++.dg/contracts/contracts9.C: Don't assume string literals
have non-negative addresses.
[testsuite] xfail pr103798-2 in C++ on vxworks too [PR113706]
pr103798-2.c fails in C++ on targets that provide a ISO C++-compliant
declaration of memchr, because it mismatches the C-compatible builtin,
as per PR113706. Expect the C++ test to fail on vxworks as well.
for gcc/testsuite/ChangeLog
PR testsuite/113706
* c-c++-common/pr103798-2.c: XFAIL in C++ on vxworks too.
A number of tests that call strndup fail on vxworks, where there's no
strndup. Some of them already had workarounds to skip the strndup
parts of the tests on platforms that don't offer it. I've changed
them to rely on a strndup effective target instead, and extended the
logic to other tests that were otherwise skipped entirely.
[libstdc++] [testsuite] disable SRA for compare_exchange_padding
On arm-vx7r2, the uses of as.load() as initializer get SRAed, so the
padding bits in the tests are not what we might expect from full-word
struct copies.
I tried adding a function to perform bitwise copying, but even taking
the as.load() argument by const&, we'd still construct a temporary
with SRAed field-wise copying. Unable to find another way to ensure
we wouldn't get a temporary, I went for disabling SRA.
[libstdc++] [testsuite] xfail double-prec from_chars for float128_t
Tests 20_util/from_chars/4.cc and 20_util/to_chars/long_double.cc were
adjusted about a year ago to skip long double on some targets, because
the fastfloat library was limited to 64-bit doubles.
The same problem comes up in similar float128_t tests on
aarch64-vxworks. This patch adjusts them similarly.
Unlike the earlier tests, that got similar treatment for
x86_64-vxworks, these haven't failed there.
for libstdc++-v3/ChangeLog
* testsuite/20_util/from_chars/8.cc: Skip float128_t testing
on aarch64-vxworks.
* testsuite/20_util/to_chars/float128_c++23.cc: Xfail run on
aarch64-vxworks.
[libstdc++] define zoneinfo_dir_override on vxworks
VxWorks fails to load kernel-mode modules with weak undefined symbols.
In RTP mode modules, that undergo final linking, weak undefined
symbols are not a problem.
This patch adds kernel-mode VxWorks multilibs to the set of targets
that don't support weak undefined symbols without special flags, in
which tzdb's zoneinfo_dir_override is given a weak definition.
for libstdc++-v3/ChangeLog
* src/c++20/tzdb.cc (__gnu_cxx::zoneinfo_dir_override): Define
on VxWorks non-RTP.
Tamar Christina [Thu, 18 Apr 2024 10:47:42 +0000 (11:47 +0100)]
AArch64: remove reliance on register allocator for simd/gpreg costing. [PR114741]
In PR114741 we see that we have a regression in codegen when SVE is enable where
the simple testcase:
void foo(unsigned v, unsigned *p)
{
*p = v & 1;
}
generates
foo:
fmov s31, w0
and z31.s, z31.s, #1
str s31, [x1]
ret
instead of:
foo:
and w0, w0, 1
str w0, [x1]
ret
This causes an impact it not just codesize but also performance. This is caused
by the use of the ^ constraint modifier in the pattern <optab><mode>3.
The documentation states that this modifier should only have an effect on the
alternative costing in that a particular alternative is to be preferred unless
a non-psuedo reload is needed.
The pattern was trying to convey that whenever both r and w are required, that
it should prefer r unless a reload is needed. This is because if a reload is
needed then we can construct the constants more flexibly on the SIMD side.
We were using this so simplify the implementation and to get generic cases such
as:
double negabs (double x)
{
unsigned long long y;
memcpy (&y, &x, sizeof(double));
y = y | (1UL << 63);
memcpy (&x, &y, sizeof(double));
return x;
}
which don't go through an expander.
However the implementation of ^ in the register allocator is not according to
the documentation in that it also has an effect during coloring. During initial
register class selection it applies a penalty to a class, similar to how ? does.
In this example the penalty makes the use of GP regs expensive enough that it no
longer considers them:
r106: preferred FP_REGS, alternative NO_REGS, allocno FP_REGS
;; 3--> b 0: i 9 r106=r105&0x1
:cortex_a53_slot_any:GENERAL_REGS+0(-1)FP_REGS+1(1)PR_LO_REGS+0(0)
PR_HI_REGS+0(0):model 4
which is not the expected behavior. For GCC 14 this is a conservative fix.
1. we remove the ^ modifier from the logical optabs.
2. In order not to regress copysign we then move the copysign expansion to
directly use the SIMD variant. Since copysign only supports floating point
modes this is fine and no longer relies on the register allocator to select
the right alternative.
It once again regresses the general case, but this case wasn't optimized in
earlier GCCs either so it's not a regression in GCC 14. This change gives
strict better codegen than earlier GCCs and still optimizes the important cases.
gcc/ChangeLog:
PR target/114741
* config/aarch64/aarch64.md (<optab><mode>3): Remove ^ from alt 2.
(copysign<GPF:mode>3): Use SIMD version of IOR directly.
gcc/testsuite/ChangeLog:
PR target/114741
* gcc.target/aarch64/fneg-abs_2.c: Update codegen.
* gcc.target/aarch64/fneg-abs_4.c: xfail for now.
* gcc.target/aarch64/pr114741.c: New test.
Jakub Jelinek [Thu, 18 Apr 2024 07:49:02 +0000 (09:49 +0200)]
libgcc: Fix up __divmodbitint4 [PR114755]
The following testcase aborts on aarch64-linux but does not on x86_64-linux.
In both cases there is UB in the __divmodbitint4 implemenetation.
When the divisor is negative with most significant limb (even when partial)
all ones, has at least 2 limbs and the second most significant limb has the
most significant bit clear, when this number is negated, it will have 0
in the most significant limb.
Already in the PR114397 r14-9592 fix I was dealing with such divisors, but
thought the problem is only if because of that un < vn doesn't imply the
quotient is 0 and remainder u.
But as this testcase shows, the problem is with such divisors always.
What happens is that we use __builtin_clz* on the most significant limb,
and assume it will not be 0 because that is UB for the builtins.
Normally the most significant limb of the divisor shouldn't be 0, as
guaranteed by the bitint_reduce_prec e.g. for the positive numbers, unless
the divisor is just 0 (but for vn == 1 we have special cases).
The following patch moves the handling of this corner case a few lines
earlier before the un < vn check, because adjusting the vn later is harder.
2024-04-18 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114755
* libgcc2.c (__divmodbitint4): Perform the decrement on negative
v with most significant limb all ones and the second least
significant limb with most significant bit clear always, regardless of
un < vn.
Jakub Jelinek [Thu, 18 Apr 2024 07:45:14 +0000 (09:45 +0200)]
internal-fn: Temporarily disable flag_trapv during .{ADD,SUB,MUL}_OVERFLOW etc. expansion [PR114753]
__builtin_{add,sub,mul}_overflow{,_p} builtins are well defined
for all inputs even for -ftrapv, and the -fsanitize=signed-integer-overflow
ifns shouldn't abort in libgcc but emit the desired ubsan diagnostics
or abort depending on -fsanitize* setting regardless of -ftrapv.
The expansion of these internal functions uses expand_expr* in various
places (e.g. MULT_EXPR at least in 2 spots), so temporarily disabling
flag_trapv in all those spots would be hard.
The following patch disables it around the bodies of 3 functions
which can do the expand_expr calls.
If it was in the C++ FE, I'd use some RAII sentinel, but I don't think
we have one in the middle-end.
2024-04-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114753
* internal-fn.cc (expand_mul_overflow): Save flag_trapv and
temporarily clear it for the duration of the function, then
restore previous value.
(expand_vector_ubsan_overflow): Likewise.
(expand_arith_overflow): Likewise.
Kewen Lin [Thu, 18 Apr 2024 03:20:07 +0000 (22:20 -0500)]
testsuite, rs6000: Fix builtins-6-p9-runnable.c for BE [PR114744]
Test case builtins-6-p9-runnable.c doesn't work well on BE
due to two problems:
- When applying vec_xl_len onto data_128 and data_u128
with length 8, it expects to load 1280000[01] from
the memory, but unfortunately assigning 1280000[01] to
a {vector} {u,}int128 type variable, the value isn't
guaranteed to be at the beginning of storage (in the
low part of memory), which means the loaded value can
be unexpected (as shown on BE). So this patch is to
introduce getU128 which can ensure the given value
shows up as expected and also update some dumping code
for debugging.
- When applying vec_xl_len_r with length 16, on BE it's
just like the normal vector load, so the expected data
should not be reversed from the original.
PR testsuite/114744
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-6-p9-runnable.c: Adjust for BE by fixing
data_{u,}128, their uses and vec_uc_expected1, also adjust some formats.
gcc/testsuite/
* gcc.target/powerpc/bcd-4.c: Enable the case to be tested on P9.
Enable the case to be run on big endian. Fix function maxbcd and
other misc. problems.
Jonathan Wakely [Thu, 21 Mar 2024 23:09:14 +0000 (23:09 +0000)]
libstdc++: Implement "Printing blank lines with println" for C++23
This was recently approved for C++26 at the Tokyo meeting. As suggested
by Stephan T. Lavavej, I'm defining it as an extension for C++23 mode
(when std::print and std::prinln were first added) rather than as a new
C++26 feature. Both MSVC and libc++ have agreed to do this too.
libstdc++-v3/ChangeLog:
* include/std/ostream (println(ostream&)): Define new overload.
* include/std/print (println(FILE*), println()): Likewise.
* testsuite/27_io/basic_ostream/print/2.cc: New test.
* testsuite/27_io/print/1.cc: Remove unused header.
* testsuite/27_io/print/3.cc: New test.
Jakub Jelinek [Wed, 17 Apr 2024 14:17:22 +0000 (16:17 +0200)]
DOCUMENTATION_ROOT_URL vs. release branches [PR114738]
Starting with GCC 14 we have the nice URLification of the options printed
in diagnostics, say for in
test.c:4:23: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long int’ [-Wformat=]
the -Wformat= is underlined in some terminals and hovering on it shows
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wformat
link.
This works nicely on the GCC trunk, where the online documentation is
regenerated every day from a cron job and more importantly, people rarely
use the trunk snapshots for too long, so it is unlikely that further changes
in the documentation will make too many links stale, because users will
simply regularly update to newer snapshots.
I think it doesn't work properly on release branches though.
Some users only use the relased versions (i.e. MAJOR.MINOR.0) from tarballs
but can use them for a couple of years, others use snapshots from the
release branches, but again they could be in use for months or years and
the above mentioned online docs which represent just the GCC trunk might
diverge significantly.
Now, for the relases we always publish also online docs for the release,
which unlike the trunk online docs will not change further, under
e.g.
https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wformat
or
https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Warning-Options.html#index-Wformat
etc.
So, I think at least for the MAJOR.MINOR.0 releases we want to use
URLs like above rather than the trunk ones and we can use the same process
of updating *.opt.urls as well for that.
For the snapshots from release branches, we don't have such docs.
One option (implemented in the patch below for the URL printing side) is
point to the MAJOR.MINOR.0 docs even for MAJOR.MINOR.1 snapshots.
Most of the links will work fine, for options newly added on the release
branches (rare thing but still happens) can have until the next release
no URLs for them and get them with the next point release.
The question is what to do about make regenerate-opt-urls for the release
branch snapshots. Either just document that users shouldn't
make regenerate-opt-urls on release branches (and filter out *.opt.urls
changes from their commits), add make regenerate-opt-urls task be RM
responsibility before making first release candidate from a branch and
adjust the autoregen CI to know about that. Or add a separate goal
which instead of relying on make html created files would download
copy of the html files from the last release from web (kind of web
mirroring the https://gcc.gnu.org/onlinedocs/gcc-14.1.0/ subtree locally)
and doing regenerate-opt-urls on top of that? But how to catch the
point when first release candidate is made and we want to update to
what will be the URLs once the release is made (but will be stale URLs
for a week or so)?
Another option would be to add to cron daily regeneration of the online
docs for the release branches. I don't think that is a good idea though,
because as I wrote earlier, not all users update to the latest snapshot
frequently, so there can be users that use gcc 13.1.1 20230525 for months
or years, and other users which use gcc 13.1.1 20230615 for years etc.
Another question is what is most sensible for users who want to override
the default root and use the --with-documentation-root-url= configure
option. Do we expect them to grab the whole onlinedocs tree or for release
branches at least include gcc-14.1.0/ subdirectory under the root?
If so, the patch below deals with that. Or should we just change the
default documentation root url, so if user doesn't specify
--with-documentation-root-url= and we are on a release branch, default that
to https://gcc.gnu.org/onlinedocs/gcc-14.1.0/ or
https://gcc.gnu.org/onlinedocs/gcc-14.2.0/ etc. and don't add any infix in
get_option_url/make_doc_url, but when people supply their own, let them
point to the root of the tree which contains the right docs?
Then such changes would go into gcc/configure.ac, some case based on
"$gcc_version", from that decide if it is a release branch or trunk.
2024-04-17 Jakub Jelinek <jakub@redhat.com>
PR other/114738
* opts.cc (get_option_url): On release branches append
gcc-MAJOR.MINOR.0/ after DOCUMENTATION_ROOT_URL.
* gcc-urlifier.cc (gcc_urlifier::make_doc_url): Likewise.
Richard Biener [Wed, 17 Apr 2024 08:40:04 +0000 (10:40 +0200)]
tree-optimization/114749 - reset partial vector decision for no-SLP retry
The following makes sure to reset LOOP_VINFO_USING_PARTIAL_VECTORS_P
to its default of false when re-trying without SLP as otherwise
analysis may run into bogus asserts.
PR tree-optimization/114749
* tree-vect-loop.cc (vect_analyze_loop_2): Reset
LOOP_VINFO_USING_PARTIAL_VECTORS_P when re-trying without SLP.
PR libstdc++/114750
* include/experimental/bits/simd_builtin.h
(_SimdImplBuiltin::_S_load, _S_store): Fall back to copying
scalars if the memory type cannot be vectorized for the target.
.ABNORMAL_DISPATCHER is currently the only internal function with
ECF_NORETURN, and asan likes to instrument ECF_NORETURN calls by adding
some builtin call before them, which breaks the .ABNORMAL_DISPATCHER
discovery added in gsi_safe_*.
The following patch fixes asan not to instrument .ABNORMAL_DISPATCHER
calls, like it doesn't instrument a couple of specific builtin calls
as well.
2024-04-17 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/114743
* asan.cc (maybe_instrument_call): Don't instrument calls to
.ABNORMAL_DISPATCHER.
* gcc.dg/asan/pr112709-2.c (freddy): New function from
gcc.dg/ubsan/pr112709-2.c version of the test.
Harald Anlauf [Sat, 13 Apr 2024 17:09:24 +0000 (19:09 +0200)]
Fortran: ALLOCATE of fixed-length CHARACTER with SOURCE/MOLD [PR113793]
F2008 requires for ALLOCATE with SOURCE= or MOLD= specifier that the kind
type parameters of allocate-object and source-expr have the same values.
Add compile-time diagnostics for different character length and a runtime
check (under -fcheck=bounds). Use length from allocate-object to prevent
heap corruption and to allow string padding or truncation on assignment.
gcc/fortran/ChangeLog:
PR fortran/113793
* resolve.cc (resolve_allocate_expr): Reject ALLOCATE with SOURCE=
or MOLD= specifier for unequal length.
* trans-stmt.cc (gfc_trans_allocate): If an allocatable character
variable has fixed length, use it and do not use the source length.
With bounds-checking enabled, add a runtime check for same length.
gcc/testsuite/ChangeLog:
PR fortran/113793
* gfortran.dg/allocate_with_source_29.f90: New test.
* gfortran.dg/allocate_with_source_30.f90: New test.
* gfortran.dg/allocate_with_source_31.f90: New test.
Andrew Pinski [Tue, 16 Apr 2024 00:13:36 +0000 (17:13 -0700)]
Document that vector_size works with typedefs [PR92880]
This just adds a clause to make it more obvious that the vector_size
attribute extension works with typedefs.
Note this whole section needs a rewrite to be a similar format as other
extensions. But that is for another day.
gcc/ChangeLog:
PR c/92880
* doc/extend.texi (Using Vector Instructions): Add that
the base_types could be a typedef of them.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Richard Biener [Tue, 16 Apr 2024 09:33:48 +0000 (11:33 +0200)]
tree-optimization/114736 - SLP DFS walk issue
The following fixes a DFS walk issue when identifying to be ignored
latch edges. We have (bogus) SLP_TREE_REPRESENTATIVEs for VEC_PERM
nodes so those have to be explicitly ignored as possibly being PHIs.
PR tree-optimization/114736
* tree-vect-slp.cc (vect_optimize_slp_pass::is_cfg_latch_edge):
Do not consider VEC_PERM_EXPRs as PHI use.
OpenACC 2.7: Adjust acc_map_data/acc_unmap_data interaction with reference counters
This patch adjusts the implementation of acc_map_data/acc_unmap_data API library
routines to more fit the description in the OpenACC 2.7 specification.
Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA
special value to mark acc_map_data-created mappings. Adjustment around
mapping related code to respect OpenACC semantics are also added.
libgomp/ChangeLog:
* libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2).
* oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
initialize dynamic_refcount as 1.
(acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
(goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case.
(goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect
REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
(goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case.
* target.c (gomp_increment_refcount): Return early for
REFCOUNT_ACC_MAP_DATA case.
(gomp_decrement_refcount): Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase.
* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust
testcase error output scan test.
Jakub Jelinek [Tue, 16 Apr 2024 07:55:25 +0000 (09:55 +0200)]
Fix some comment nits
While studying the TYPE_CANONICAL/TYPE_STRUCTURAL_EQUALITY_P stuff,
I've noticed some nits in comments, the following patch fixes them.
2024-04-16 Jakub Jelinek <jakub@redhat.com>
* tree.cc (array_type_nelts): Ensure 2 spaces after . in comment
instead of just one.
(build_variant_type_copy): Likewise.
(tree_check_failed): Likewise.
(build_atomic_base): Likewise.
* ipa-free-lang-data.cc (fld_incomplete_type_of): Use an indefinite
article rather than a.
On 2024-04-15T13:14:42+0200, I wrote:
> I now wonder: instead of 'AC_CHECK_TOOL', shouldn't this use
> 'AC_CHECK_PROG'? (We always want plain 'cargo', not host-prefixed
> 'aarch64-linux-gnu-cargo' etc., right?) I'll look into changing this.
* configure: Regenerate.
config/
* acx.m4 (ACX_PROG_CARGO): Use 'AC_CHECK_PROGS'.
Jakub Jelinek [Tue, 16 Apr 2024 07:39:19 +0000 (09:39 +0200)]
c++: Handle ARRAY_TYPE in check_bit_cast_type [PR114706]
https://eel.is/c++draft/bit.cast#3 says that std::bit_cast isn't constexpr
if To, From and the types of all subobjects have certain properties which the
check_bit_cast_type checks (such as it isn't a pointer, reference, union,
member pointer, volatile). The function doesn't cp_walk_tree though, so
I've missed one important case, for ARRAY_TYPEs we need to recurse on the
element type. I think we don't need to handle VECTOR_TYPEs/COMPLEX_TYPEs,
because those will not have a pointer/reference/union/member pointer in
the element type and if the element type is volatile, I think the whole
derived type is volatile as well.
When one of the two input operands is 0, ADD and IOR are functionally
equivalent.
ADD is slightly preferred over IOR because ADD has a higher likelihood
of being implemented as a compressed instruction when compared to IOR.
C.ADD uses the CR format with any of the 32 RVI registers availble,
while C.OR uses the CA format with limit to just 8 of them.
Conditional select, if zero case:
rd = (rc == 0) ? rs1 : rs2
[strub] improve handling of indirected volatile parms [PR112938]
The earlier patch for PR112938 arranged for volatile parms to be made
indirect in internal strub wrapped bodies.
The first problem that remained, more evident, was that the indirected
parameter remained volatile, despite the indirection, but it wasn't
regimplified, so indirecting it was malformed gimple.
Regimplifying turned out not to be needed. The best course of action
was to drop the volatility from the by-reference parm, that was being
unexpectedly inherited from the original volatile parm.
That exposed another problem: the dereferences would then lose their
volatile status, so we had to bring volatile back to them.
for gcc/ChangeLog
PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Drop volatility from
indirected parm.
(maybe_make_indirect): Restore volatility in dereferences.
Jakub Jelinek [Mon, 15 Apr 2024 20:32:37 +0000 (22:32 +0200)]
gotools: Workaround non-reproduceability of automake
The regen bot recently flagged a difference in gotools/Makefile.in.
Trying it locally, it seems pretty random
for i in `seq 20`; do PATH=~/automake-1.15.1/bin:~/autoconf-2.69/bin:$PATH automake; echo -n `git diff Makefile.in | wc -l`" "; done; echo; for i in `seq 20`; do
+PATH=~/automake-1.15.1/bin:~/autoconf-2.69/bin:$PATH setarch x86_64 -R automake; echo -n `git diff Makefile.in | wc -l`" "; done; echo;
14 14 14 0 0 0 14 0 14 0 14 14 14 14 0 14 14 0 0 0
14 0 14 0 0 14 14 14 0 14 14 0 0 14 14 14 0 0 0 14
The 14 line git diff is
diff --git a/gotools/Makefile.in b/gotools/Makefile.in
index 36c2ec2abd3..f40883c39be 100644
--- a/gotools/Makefile.in
+++ b/gotools/Makefile.in
@@ -704,8 +704,8 @@ distclean-generic:
maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
-@NATIVE_FALSE@install-exec-local:
@NATIVE_FALSE@uninstall-local:
+@NATIVE_FALSE@install-exec-local:
clean: clean-am
clean-am: clean-binPROGRAMS clean-generic clean-noinstPROGRAMS \
so whether it is
@NATIVE_FALSE@install-exec-local:
@NATIVE_FALSE@uninstall-local:
or
@NATIVE_FALSE@uninstall-local:
@NATIVE_FALSE@install-exec-local:
depends on some hash table traversal or what.
I'm not familiar with automake/m4 enough to debug that, so I'm
instead offering a workaround, with this patch the order is deterministic.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
* Makefile.am (install-exec-local, uninstall-local): Add goals
on the else branch of if NATIVE to ensure reproducibility.
* Makefile.in: Regenerate.
Jonathan Wakely [Fri, 22 Mar 2024 13:20:21 +0000 (13:20 +0000)]
libstdc++: Add std::reference_wrapper comparison operators for C++26
This C++26 change was just approved in Tokyo, in P2944R3. It adds
operator== and operator<=> overloads to std::reference_wrapper.
The operator<=> overloads in the paper cause compilation errors for any
type without <=> so they're implemented here with deduced return types
and constrained by a requires clause.
libstdc++-v3/ChangeLog:
* include/bits/refwrap.h (reference_wrapper): Add comparison
operators as proposed by P2944R3.
* include/bits/version.def (reference_wrapper): Define.
* include/bits/version.h: Regenerate.
* include/std/functional: Enable feature test macro.
* testsuite/20_util/reference_wrapper/compare.cc: New test.
I'm only treating this as a DR for C++20 for now, because it's less work
and only requires changes to operator== and operator<=>. To do this for
older standards would require changes to the six relational operators
used pre-C++20.
libstdc++-v3/ChangeLog:
PR libstdc++/113386
* include/bits/stl_pair.h (operator==, operator<=>): Support
heterogeneous comparisons, as per LWG 3865.
* testsuite/20_util/pair/comparison_operators/lwg3865.cc: New
test.
Jonathan Wakely [Thu, 4 Apr 2024 09:33:33 +0000 (10:33 +0100)]
libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
A negative delim value passed to std::istream::ignore can never match
any character in the stream, because the comparison is done using
traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns
negative values (except at EOF). The optimized version of ignore for the
std::istream specialization uses traits_type::find to locate the delim
character in the streambuf, which _can_ match a negative delim on
platforms where char is signed, but then we do another comparison using
eq_int_type which fails. The code then keeps looping forever, with
traits_type::find locating the character and traits_type::eq_int_type
saying it's not a match, so traits_type::find is used again and finds
the same character again.
A possible fix would be to check with eq_int_type after a successful
find, to see whether we really have a match. However, that would be
suboptimal since we know that a negative delimiter will never match
using eq_int_type. So a better fix is to adjust the check at the top of
the function that handles delim==eof(), so that we treat all negative
delim values as equivalent to EOF. That way we don't bother using find
to search for something that will never match with eq_int_type.
The version of ignore in the primary template doesn't need a change,
because it doesn't use traits_type::find, instead characters are
extracted one-by-one and always matched using eq_int_type. That avoids
the inconsistency between find and eq_int_type. The specialization for
std::wistream does use traits_type::find, but traits_type::to_int_type
is equivalent to an implicit conversion from wchar_t to wint_t, so
passing a wchar_t directly to ignore without using to_int_type works.
libstdc++-v3/ChangeLog:
PR libstdc++/93672
* src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
Treat all negative delimiter values as eof().
* testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
* testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New
test.
Jakub Jelinek [Mon, 15 Apr 2024 15:46:03 +0000 (17:46 +0200)]
m68k: Quiet up cppcheck warning [PR114689]
cppcheck apparently warns on the | !!sticky part of the expression and
using | (!!sticky) quiets it up (it is correct as is).
The following patch adds the ()s, and also adds them around mant >> 1 just
in case it makes it clearer to all readers that the expression is parsed
that way already.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114689
* config/m68k/fpgnulib.c (__truncdfsf2): Add parentheses around
!!sticky bitwise or operand to quiet up cppcheck. Add parentheses
around mant >> 1 bitwise or operand.
Guard the longjmp to not infinitely loop. The longjmp (jump) function is
called unconditionally to make test flow simpler, but the jump
destination would return to a point in main that would call longjmp
again. The longjmp is really there to exercise the then-branch of
setjmp, to verify coverage is accurately counted in the presence of
complex edges.
PR gcov-profile/114720
gcc/testsuite/ChangeLog:
* gcc.misc-tests/gcov-22.c: Guard longjmp to not loop.
Richard Biener [Mon, 15 Apr 2024 09:09:17 +0000 (11:09 +0200)]
gcov-profile/114715 - missing coverage for switch
The following avoids missing coverage for the line of a switch statement
which happens when gimplification emits a BIND_EXPR wrapping the switch
as that prevents us from setting locations on the containing statements
via annotate_all_with_location. Instead set the location of the GIMPLE
switch directly.
PR gcov-profile/114715
* gimplify.cc (gimplify_switch_expr): Set the location of the
GIMPLE switch.
H.J. Lu [Fri, 12 Apr 2024 22:42:12 +0000 (15:42 -0700)]
x86: Allow TImode offsettable memory only with 8-bit constant
The x86 instruction size limit is 15 bytes. If a NDD instruction has
a segment prefix byte, a 4-byte opcode prefix, a MODRM byte, a SIB byte,
a 4-byte displacement and a 4-byte immediate, adding an address size
prefix will exceed the size limit. Change TImode ADD, AND, OR and XOR
to allow offsettable memory only with 8-bit signed integer constant,
which is encoded with a 1-byte immediate, if the address size prefix
is used.
gcc/
PR target/114696
* config/i386/i386.md (isa): Add apx_ndd_64.
(enabled): Likewise.
(*add<dwi>3_doubleword): Change rjO to r,ro,jO with 8-bit
signed integer constant and enable jO only for apx_ndd_64.
(*add<dwi>3_doubleword_cc_overflow_1): Likewise.
(*and<dwi>3_doubleword): Likewise.
(*<code><dwi>3_doubleword): Likewise.
Tamar Christina [Mon, 15 Apr 2024 11:06:21 +0000 (12:06 +0100)]
middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403].
This fixes a bug with the interaction between peeling for gaps and early break.
Before I go further, I'll first explain how I understand this to work for loops
with a single exit.
When peeling for gaps we peel N < VF iterations to scalar.
This happens by removing N iterations from the calculation of niters such that
vect_iters * VF == niters is always false.
In other words, when we exit the vector loop we always fall to the scalar loop.
The loop bounds adjustment guarantees this. Because of this we potentially
execute a vector loop iteration less. That is, if you're at the boundary
condition where niters % VF by peeling one or more scalar iterations the vector
loop executes one less.
This is accounted for by the adjustments in vect_transform_loops. This
adjustment happens differently based on whether the the vector loop can be
partial or not:
Peeling for gaps sets the bias to 0 and then:
when not partial: we take the floor of (scalar_upper_bound / VF) - 1 to get the
vector latch iteration count.
when loop is partial: For a single exit this means the loop is masked, we take
the ceil to account for the fact that the loop can handle
the final partial iteration using masking.
Note that there's no difference between ceil an floor on the boundary condition.
There is a difference however when you're slightly above it. i.e. if scalar
iterates 14 times and VF = 4 and we peel 1 iteration for gaps.
The partial loop does ((13 + 0) / 4) - 1 == 2 vector iterations. and in effect
the partial iteration is ignored and it's done as scalar.
This is fine because the niters modification has capped the vector iteration at
2. So that when we reduce the induction values you end up entering the scalar
code with ind_var.2 = ind_var.1 + 2 * VF.
Now lets look at early breaks. To make it esier I'll focus on the specific
testcase:
which means we'll always fall through to the scalar code. as intended.
Here are two key things to note:
1. In this loop, the early exit will always be the one taken. When it's taken
we enter the scalar loop with the correct induction value to apply the gap
peeling.
2. If the main exit is taken, the induction values assumes you've finished all
vector iterations. i.e. it assumes you have completed 24 iterations, as we
treat the main exit the same for normal loop vect and early break when not
PEELED.
This means the induction value is adjusted to ind_var.2 = ind_var.1 + 24 * VF;
So what's going wrong. The vectorizer's codegen is correct and efficient,
however when we adjust the upper bounds, that code knows that the loops upper
bound is based on the early exit. i.e. 8 latch iterations. or in other words.
It thinks the loop iterates once.
This is incorrect as the vector loop iterates twice, as it has set up the
induction value such that it exits at the early exit. So it in effect iterates
2.5x times.
Becuase the upper bound is incorrect, when we unroll it now exits from the main
exit which uses the incorrect induction value.
So there are three ways to fix this:
1. If we take the position that the main exit should support both premature
exits and final exits then vect_update_ivs_after_vectorizer needs to be
skipped for this case, and vectorizable_induction updated with third case
where we reduce with LAST reduction based on the IVs instead of assuming
you're at the end of the vector loop.
I don't like this approach. It don't think we should add a third induction
style to cover up an issue introduced by unrolling. It makes the code
harder to follow and makes main exits harder to reason about.
2. We could say that vec_init_loop_exit_info should pick the exit which has the
smallest known iteration count. This would turn this case into a PEELED case
and the induction values would be correct as we'd always recalculate them
from a reduction. This is suboptimal though as the reason we pick the latch
exit as the IV one is to prevent having to rotate the loop. This results
in more efficient code for what we assume is the common case, i.e. the main
exit.
3. In PR113734 we've established that for vectorization of early breaks that we
must always treat the loop as partial. Here partiallity means that we have
enough vector elements to start the iteration, but we may take an early exit
and so never reach the latch/main exit.
This requirement is overwritten by the peeling for gaps adjustment of the
upper bound. I believe the bug is simply that this shouldn't be done.
The adjustment here is to indicate that the main exit always leads to the
scalar loop when peeling for gaps.
But this invariant is already always true for all early exits. Remember that
early exits restart the scalar loop at the start of the vector iteration, so
the induction values will start it where we want to do the gaps peeling.
I think no# 3 is the correct fix, and also one that doesn't degrade code quality.
gcc/ChangeLog:
PR tree-optimization/114403
* tree-vect-loop.cc (vect_transform_loop): Adjust upper bounds for when
peeling for gaps and early break.
gcc/testsuite/ChangeLog:
PR tree-optimization/114403
* gcc.dg/vect/vect-early-break_124-pr114403.c: New test.
* gcc.dg/vect/vect-early-break_125-pr114403.c: New test.
testsuite: i386: Restrict gcc.target/i386/fhardened-1.c etc. to Linux/GNU
The new gcc.target/i386/fhardened-1.c etc. tests FAIL on Solaris/x86 and
Darwin/x86:
FAIL: gcc.target/i386/fhardened-1.c (test for excess errors)
FAIL: gcc.target/i386/fhardened-2.c (test for excess errors)
Excess errors:
cc1: warning: '-fhardened' not supported for this target
Support for -fhardened is restricted to HAVE_FHARDENED_SUPPORT in
toplev.cc (process_options) which again is only defined for linux*|gnu*
targets in gcc/configure.ac.
Accordingly, this patch restricts the tests to those two, as is already
done in gcc.target/i386/cf_check-6.c.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
Jakub Jelinek [Mon, 15 Apr 2024 08:25:22 +0000 (10:25 +0200)]
attribs: Don't crash on NULL TREE_TYPE in diag_attr_exclusions [PR114634]
The enumerator still doesn't have TREE_TYPE set but diag_attr_exclusions
assumes that all decls must have types.
I think it is better in something as unimportant as diag_attr_exclusions
to be more robust, if there is no type, it can just diagnose exclusions
on the DECL_ATTRIBUTES, like for types it only diagnoses it on
TYPE_ATTRIBUTES.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
PR c++/114634
* attribs.cc (diag_attr_exclusions): Set attrs[1] to NULL_TREE for
decls with NULL TREE_TYPE.
Nathaniel Shead [Sat, 17 Feb 2024 12:10:49 +0000 (23:10 +1100)]
c++: Setup aliases imported from modules [PR106820]
I wonder if more generally we need to be doing more work when importing
definitions from header units especially to handle all the work that
'make_rtl_for_nonlocal_decl' and 'rest_of_decl_compilation' would have
been performing. But this patch fixes at least one missing step.
PR c++/106820
gcc/cp/ChangeLog:
* module.cc (trees_in::decl_value): Assemble alias when needed.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr106820_a.H: New test.
* g++.dg/modules/pr106820_b.C: New test.
Mark Wielaard [Sat, 13 Apr 2024 21:02:14 +0000 (23:02 +0200)]
Regenerate c.opt.urls
Fixes: df7bfdb7dbf2 ("c++: reference cast, conversion fn [PR113141]")
A new warning option -Wcast-user-defined was added to c.opt and
documented in doc/invoke.texi. But c.opt.urls wasn't regenerate.
Patrick Palka [Sat, 13 Apr 2024 14:52:32 +0000 (10:52 -0400)]
c++/modules: optimize tree flag streaming
One would expect consecutive calls to bytes_in/out::b for streaming
adjacent bits, as is done for tree flag streaming, to at least be
optimized by the compiler into individual bit operations using
statically known bit positions (and ideally combined into larger sized
reads/writes).
Unfortunately this doesn't happen because the compiler has trouble
tracking the values of this->bit_pos and this->bit_val across the
calls, likely because the compiler doesn't know the value of 'this'.
Thus for each consecutive bit stream operation, bit_pos and bit_val are
loaded from 'this', checked if buffering is needed, and finally the bit
is extracted from bit_val according to the (unknown) bit_pos, even
though relative to the previous operation (if we didn't need to buffer)
bit_val is unchanged and bit_pos is just 1 larger. This ends up being
quite slow, with tree_node_bools taking 10% of time when streaming in
the std module.
This patch improves this by making tracking of bit_pos and bit_val
easier for the compiler. Rather than bit_pos and bit_val being members
of the (effectively global) bytes_in/out objects, this patch factors out
the bit streaming code/state into separate classes bits_in/out that get
constructed locally as needed for bit streaming. Since these objects
are now clearly local, the compiler can more easily track their values
and optimize away redundant buffering checks.
And since bit streaming is intended to be batched it's natural for these
new classes to be RAII-enabled such that the bit stream is flushed upon
destruction.
In order to make the most of this improved tracking of bit position,
this patch changes parts where we conditionally stream a tree flag
to unconditionally stream (the flag or a dummy value). That way
the number of bits streamed and the respective bit positions are as
statically known as reasonably possible. In lang_decl_bools and
lang_type_bools this patch makes us flush the current bit buffer at the
start so that subsequent bit positions are in turn statically known.
And in core_bools, we can add explicit early exits utilizing invariants
that the compiler can't figure out itself (e.g. a tree code can't have
both TS_TYPE_COMMON and TS_DECL_COMMON, and if a tree code doesn't have
TS_DECL_COMMON then it doesn't have TS_DECL_WITH_VIS).
This patch also moves the definitions of the relevant streaming classes
into anonymous namespaces so that the compiler can make more informed
decisions about inlining their member functions.
After this patch, compile time for a simple Hello World using the std
module is reduced by 7% with a release compiler. The on-disk size of
the std module increases by 0.4% (presumably due to the extra flushing
done in lang_decl_bools and lang_type_bools).
The bit stream out performance isn't improved as much as the stream in
due to the spans/lengths instrumentation performed on stream out (which
maybe should be disabled for release builds?)
gcc/cp/ChangeLog:
* module.cc: Update comment about classes defined within.
(class data): Enclose in an anonymous namespace.
(data::calc_crc): Moved from bytes::calc_crc.
(class bytes): Remove. Move bit_flush to namespace scope.
(class bytes_in): Enclose in an anonymous namespace. Inherit
directly from data and adjust accordingly. Move b and bflush
members to bits_in.
(class bytes_out): As above. Remove is_set static data member.
(bit_flush): Moved from class bytes.
(struct bytes_in::bits_in): Define.
(struct bytes_out::bits_out): Define.
(bytes_in::stream_bits): Define.
(bytes_out::stream_bits): Define.
(bytes_out::bflush): Moved to bits_out/in.
(bytes_in::bflush): Likewise
(bytes_in::bfill): Removed.
(bytes_out::b): Moved to bits_out/in.
(bytes_in::b): Likewise.
(class trees_in): Enclose in an anonymous namespace.
(class trees_out): Enclose in an anonymous namespace.
(trees_out::core_bools): Add bits_out/in parameter and use it.
Unconditionally stream a bit for public_flag. Add early exits
as appropriate.
(trees_out::core_bools): Likewise.
(trees_out::lang_decl_bools): Add bits_out/in parameter and use
it. Flush the current bit buffer at the start. Unconditionally
stream a bit for module_keyed_decls_p.
(trees_in::lang_decl_bools): Likewise.
(trees_out::lang_type_bools): Add bits_out/in parameter and use
it. Flush the current bit buffer at the start.
(trees_in::lang_type_bools): Likewise.
(trees_out::tree_node_bools): Construct a bits_out object and
use/pass it.
(trees_in::tree_node_bools): Likewise.
(trees_out::decl_value): Likewise.
(trees_in::decl_value): Likewise.
(module_state::write_define): Likewise.
(module_state::read_define): Likewise.
Andrew Carlotti [Fri, 12 Apr 2024 01:09:57 +0000 (02:09 +0100)]
aarch64: Add rcpc3 dependency on rcpc2 and rcpc
We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding
one will require extending the feature bitmask). Instead, make the
FEAT_LRCPC2 patterns available when either armv8.4-a or +rcpc3 is
specified. We already have a +rcpc flag, so this dependency can be
specified directly.
Also add an explicit dependance on +rcpc to the FEAT_LRCPC2 patterns, so
that they are disabled with armv8.4-a+norcpc.
The cpunative test needed updating because it used an invalid Features
list, since lrcpc3 requires both ilrcpc and lrcpc to be present.
Without this change, host_detect_local_cpu would return the architecture
string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def: Add RCPC to
RCPC3 dependencies.
* config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for
RCPC3 bit
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.
Marek Polacek [Mon, 11 Mar 2024 21:45:55 +0000 (17:45 -0400)]
c++: ICE with temporary of class type in array DMI [PR109966]
This ICE started with the fairly complicated r13-765. We crash in
gimplify_var_or_parm_decl because a stray VAR_DECL leaked there.
The problem is ultimately that potential_prvalue_result_of wasn't
correctly handling arrays and replace_placeholders_for_class_temp_r
replaced a PLACEHOLDER_EXPR in a TARGET_EXPR which is used in the
context of copy elision. If I have
M m[2] = { M{""}, M{""} };
then we don't invoke the M(const M&) copy-ctor.
One part of the fix is to use TARGET_EXPR_ELIDING_P rather than
potential_prvalue_result_of. That unfortunately doesn't handle the
case like
struct N { N(M); };
N arr[2] = { M{""}, M{""} };
because TARGET_EXPRs that initialize a function argument are not
marked TARGET_EXPR_ELIDING_P even though gimplify_arg drops such
TARGET_EXPRs on the floor. We can use a pset to avoid replacing
placeholders in them.
I made an attempt to use set_target_expr_eliding in
convert_for_arg_passing but that regressed constexpr-diag1.C, and does
not seem like a prudent change in stage 4 anyway.
PR c++/109966
gcc/cp/ChangeLog:
* typeck2.cc (potential_prvalue_result_of): Remove.
(replace_placeholders_for_class_temp_r): Check TARGET_EXPR_ELIDING_P.
Use a pset. Don't replace_placeholders in TARGET_EXPRs that initialize
a function argument.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/nsdmi-aggr20.C: New test.
* g++.dg/cpp1y/nsdmi-aggr21.C: New test.
Will Schmidt [Fri, 12 Apr 2024 19:55:16 +0000 (14:55 -0500)]
rs6000: Add OPTION_MASK_POWER8 [PR101865]
The bug in PR101865 is the _ARCH_PWR8 predefine macro is conditional upon
TARGET_DIRECT_MOVE, which can be false for some -mcpu=power8 compiles if the
-mno-altivec or -mno-vsx options are used. The solution here is to create
a new OPTION_MASK_POWER8 mask that is true for -mcpu=power8, regardless of
Altivec or VSX enablement.
Unfortunately, the only way to create an OPTION_MASK_* mask is to create
a new option, which we have done here, but marked it as WarnRemoved since
we do not want users using it. For stage1, we will look into how we can
create ISA mask flags for use in the compiler without the need for explicit
options.
2024-04-12 Will Schmidt <will_schmidt@linux.ibm.com>
Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/101865
* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER8.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Use
OPTION_MASK_POWER8.
* config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Add OPTION_MASK_POWER8.
(ISA_2_7_MASKS_SERVER): Likewise.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Update
comment. Use OPTION_MASK_POWER8 and TARGET_POWER8.
* config/rs6000/rs6000.h (TARGET_SYNC_HI_QI): Use TARGET_POWER8.
* config/rs6000/rs6000.md (define_attr "isa"): Add p8.
(define_attr "enabled"): Handle it.
(define_insn "prefetch"): Use TARGET_POWER8.
* config/rs6000/rs6000.opt (mpower8-internal): New.
gcc/testsuite/
PR target/101865
* gcc.target/powerpc/predefine-p7-novsx.c: New test.
* gcc.target/powerpc/predefine-p8-noaltivec-novsx.c: New test.
* gcc.target/powerpc/predefine-p8-noaltivec.c: New test.
* gcc.target/powerpc/predefine-p8-novsx.c: New test.
* gcc.target/powerpc/predefine-p8-pragma-vsx.c: New test.
* gcc.target/powerpc/predefine-p9-novsx.c: New test.
Patrick Palka [Fri, 12 Apr 2024 19:50:04 +0000 (15:50 -0400)]
c++/modules: local type merging [PR99426]
One known missing piece in the modules implementation is merging of a
streamed-in local type (class or enum) with the corresponding in-TU
version of the local type. This missing piece turns out to cause a
hard-to-reduce use-after-free GC issue due to the entity_ary not being
marked as a GC root (deliberately), and manifests as a serialization
error on stream-in as in PR99426 (see comment #6 for a reduction). It's
also reproducible on trunk when running the xtreme-header tests without
-fno-module-lazy.
This patch implements this missing piece, making us merge such local
types according to their position within the containing function's
definition, analogous to how we merge FIELD_DECLs of a class according
to their index in the TYPE_FIELDS list.
PR c++/99426
gcc/cp/ChangeLog:
* module.cc (merge_kind::MK_local_type): New enumerator.
(merge_kind_name): Update.
(trees_out::chained_decls): Move BLOCK-specific handling
of DECL_LOCAL_DECL_P decls to ...
(trees_out::core_vals) <case BLOCK>: ... here. Stream
BLOCK_VARS manually.
(trees_in::core_vals) <case BLOCK>: Stream BLOCK_VARS
manually. Handle deduplicated local types..
(trees_out::key_local_type): Define.
(trees_in::key_local_type): Define.
(trees_out::get_merge_kind) <case FUNCTION_DECL>: Return
MK_local_type for a local type.
(trees_out::key_mergeable) <case FUNCTION_DECL>: Use
key_local_type.
(trees_in::key_mergeable) <case FUNCTION_DECL>: Likewise.
(trees_in::is_matching_decl): Be flexible with type mismatches
for local entities.
(trees_in::register_duplicate): Also register the
DECL_TEMPLATE_RESULT of a TEMPLATE_DECL as a duplicate.
(depset_cmp): Return 0 for equal IDENTIFIER_HASH_VALUEs.
gcc/testsuite/ChangeLog:
* g++.dg/modules/merge-17.h: New test.
* g++.dg/modules/merge-17_a.H: New test.
* g++.dg/modules/merge-17_b.C: New test.
* g++.dg/modules/xtreme-header-7_a.H: New test.
* g++.dg/modules/xtreme-header-7_b.C: New test.
Jason Merrill [Wed, 10 Apr 2024 19:12:26 +0000 (15:12 -0400)]
c++: reference cast, conversion fn [PR113141]
The second testcase in 113141 is a separate issue: we first decide that the
conversion is ill-formed, but then when recalculating the special c_cast_p
handling makes us think it's OK. We don't want that, it should continue to
fall back to the reinterpret_cast interpretation. And while we're here,
let's warn that we're not using the conversion function.
Note that the standard seems to say that in this case we should
treat (Matrix &) as const_cast<Matrix &>(static_cast<const Matrix &>(X)),
which would use the conversion operator, but that doesn't match existing
practice, so let's resolve that another day. I've raised this issue with
CWG; at the moment I lean toward never binding a temporary in a C-style cast
to reference type, which would also be a change from existing practice.
PR c++/113141
gcc/c-family/ChangeLog:
* c.opt: Add -Wcast-user-defined.
gcc/ChangeLog:
* doc/invoke.texi: Document -Wcast-user-defined.
gcc/cp/ChangeLog:
* call.cc (reference_binding): For an invalid cast, warn and don't
recalculate.