Pan Li [Thu, 9 May 2024 02:56:46 +0000 (10:56 +0800)]
RISC-V: Make full-vec-move1.c test robust for optimization
During investigate the support of early break autovec, we notice
the test full-vec-move1.c will be optimized to 'return 0;' in main
function body. Because somehow the value of V type is compiler
time constant, and then the second loop will be considered as
assert (true).
Thus, the ccp4 pass will eliminate these stmt and just return 0.
typedef int16_t V __attribute__((vector_size (128)));
int main ()
{
V v;
for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
(v)[i] = i;
V res = v;
for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
assert (res[i] == i); // will be optimized to assert (true)
}
This patch would like to introduce a extern function to use the res[i]
that get rid of the ccp4 optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c:
Introduce extern func use to get rid of ccp4 optimization.
cpymemsi expansion was available for RISC-V since the initial port.
However, there are not tests to detect regression.
This patch adds such tests.
Three of the tests target the expansion requirements (known length and
alignment). One test reuses an existing memcpy test from the by-pieces
framework (gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c).
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymemsi-1.c: New test.
* gcc.target/riscv/cpymemsi-2.c: New test.
* gcc.target/riscv/cpymemsi-3.c: New test.
* gcc.target/riscv/cpymemsi.c: New test.
3 The discussion about Nan-box can be found on the website:
<https://www.mail-archive.com/search?q=Nan-box+the+result+of+movhf+on+soft-fp16&l=gcc-patches%40gcc.gnu.org>
4 Below test are passed for this patch
* The riscv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_move): Expand movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movbf_softfloat_boxing): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/_Bfloat16-nanboxing.c: New test.
Jeff Law [Wed, 8 May 2024 19:44:00 +0000 (13:44 -0600)]
[RISC-V][V2] Fix incorrect if-then-else nesting of Zbs usage in constant synthesis
Reposting without the patch that ignores whitespace. The CI system doesn't
like including both patches, that'll generate a failure to apply and none of
the tests actually get run.
So I managed to goof the if-then-else level of the bseti bits last week. They
were supposed to be a last ditch effort to improve the result, but ended up
inside a conditional where they don't really belong. I almost always use Zba,
Zbb and Zbs together, so it slipped by.
So it's NFC if you always test with Zbb and Zbs enabled together. But if you
enabled Zbs without Zbb you'd see a failure to use bseti.
RISC-V: Cover sign-extensions in lshr<GPR:mode>3_zero_extend_4
The lshr<GPR:mode>3_zero_extend_4 pattern targets bit extraction
with zero-extension. This pattern represents the canonical form
of zero-extensions of a logical right shift.
The same optimization can be applied to sign-extensions.
Given the two optimizations are so similar, this patch converts
the existing one to also cover the sign-extension case as well.
gcc/ChangeLog:
* config/riscv/iterators.md (ashiftrt): New code attribute
'extract_shift' and adding extractions to optab.
* config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): Rename to...
(*<any_extract:optab><GPR:mode>3):...this and add support for
sign-extensions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/extend-shift-helpers.h: Add helpers for
sign-extension.
* gcc.target/riscv/sign-extend-rshift-32.c: New test.
* gcc.target/riscv/sign-extend-rshift-64.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: New test.
The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions. Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.
Tested with SPEC CPU 2017 (rv64gc).
PR target/111501
gcc/ChangeLog:
* config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): New
pattern for zero-extraction.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/extend-shift-helpers.h: New test.
* gcc.target/riscv/pr111501.c: New test.
* gcc.target/riscv/zero-extend-rshift-32.c: New test.
* gcc.target/riscv/zero-extend-rshift-64.c: New test.
* gcc.target/riscv/zero-extend-rshift.c: New test.
RISC-V: Cover sign-extensions in lshrsi3_zero_extend_2
The pattern lshrsi3_zero_extend_2 extracts the MSB bits of the lower
32-bit word and zero-extends it back to DImode.
This is realized using srliw, which operates on 32-bit registers.
The same optimziation can be applied to sign-extensions when emitting
a sraiw instead of the srliw.
Given these two optimizations are so similar, this patch simply
converts the existing one to also cover the sign-extension case as well.
gcc/ChangeLog:
* config/riscv/iterators.md (sraiw): New code iterator 'any_extract'.
New code attribute 'extract_sidi_shift'.
* config/riscv/riscv.md (*lshrsi3_zero_extend_2): Rename to...
(*lshrsi3_extend_2):...this and add support for sign-extensions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sign-extend-1.c: Test sraiw 24 and sraiw 16.
[committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P
This is almost exclusively work from the VRULL team.
As we've discussed in the Tuesday meeting in the past, we'd like to have a knob
in the tuning structure to indicate that overlapped stores during
move_by_pieces expansion of memcpy & friends are acceptable.
This patch adds the that capability in our tuning structure. It's off for all
the uarchs upstream, but we have been using it inside Ventana for our uarch
with success. So technically it's NFC upstream, but puts in the infrastructure
multiple organizations likely need.
gcc/
* config/riscv/riscv.cc (struct riscv_tune_param): Add new
"overlap_op_by_pieces" field.
(rocket_tune_info, sifive_7_tune_info): Set it.
(sifive_p400_tune_info, sifive_p600_tune_info): Likewise.
(thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise.
(generic_ooo_tune_info, optimize_size_tune_info): Likewise.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): define.
gcc/testsuite/
* gcc.target/riscv/memcpy-nonoverlapping.c: New test.
* gcc.target/riscv/memset-nonoverlapping.c: New test.
Jeff Law [Tue, 7 May 2024 17:43:09 +0000 (11:43 -0600)]
[RISC-V] [PATCH v2] Enable inlining str* by default
So with Chrstoph's patches from late 2022 we've had the ability to inline
strlen, and str[n]cmp (scalar). However, we never actually turned this
capability on by default!
This patch flips the those default to allow inlinining by default. It also
fixes one bug exposed by our internal testing when NBYTES is zero for strncmp.
I don't think that case happens enough to try and optimize it, we just disable
inline expansion for that instance.
This has been bootstrapped and regression tested on rv64gc at various times as
well as cross tested on rv64gc more times than I can probably count (we've have
this patch internally for a while). More importantly, I just successfully
tested it on rv64gc and rv32gcv elf configurations with the trunk
gcc/
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Do not inline
strncmp with zero size.
(emit_strcmp_scalar_compare_subword): Adjust rotation for rv32 vs rv64.
* config/riscv/riscv.opt (var_inline_strcmp): Enable by default.
(vriscv_inline_strncmp, riscv_inline_strlen): Likewise.
gcc/testsuite
* gcc.target/riscv/zbb-strlen-disabled-2.c: Turn off inlining.
Xiao Zeng [Mon, 6 May 2024 21:57:37 +0000 (15:57 -0600)]
[PATCH 1/1] RISC-V: Add Zfbfmin extension to the -march= option
This patch would like to add new sub extension (aka Zfbfmin) to the
-march= option. It introduces a new data type BF16.
1 The Zfbfmin extension depend on 'F', and the FLH, FSH, FMV.X.H, and
FMV.H.X instructions as defined in the Zfh extension.
2 The Zfhmin extension includes the following instructions from the
Zfh extension: FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S.
3 Zfhmin extension depend on 'F'.
4 Simply put, just make Zfbfmin dependent on Zfhmin.
Perhaps in the future, we could propose making the FLH, FSH, FMV.X.H, and
FMV.H.X instructions an independent extension to achieve precise dependency
relationships for the Zfbfmin.
You can locate more information about Zfbfmin from below spec doc.
gcc/testsuite/
* gcc.target/riscv/arch-35.c: New test.
* gcc.target/riscv/arch-36.c: New test.
* gcc.target/riscv/predef-34.c: New test.
* gcc.target/riscv/predef-35.c: New test.
Xiao Zeng [Mon, 6 May 2024 21:39:12 +0000 (15:39 -0600)]
[RISC-V] Add support for _Bfloat16
1 At point <https://github.com/riscv/riscv-bfloat16>,
BF16 has already been completed "post public review".
2 LLVM has also added support for RISCV BF16 in
<https://reviews.llvm.org/D151313> and
<https://reviews.llvm.org/D150929>.
3 According to the discussion <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/367>,
this use __bf16 and use DF16b in riscv_mangle_type like x86.
Below test are passed for this patch
* The riscv fully regression test.
gcc/ChangeLog:
* config/riscv/iterators.md: New mode iterator HFBF.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type _Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_init_libfuncs): Set the conversion method for BFmode and
HFmode.
(riscv_block_arith_comp_libfuncs_for_mode): Set the arithmetic
and comparison libfuncs for the mode.
* config/riscv/riscv.md (mode" ): Add BF.
(movhf): Support for BFmode.
(mov<mode>): Ditto.
(*movhf_softfloat): Ditto.
(*mov<mode>_softfloat): Ditto.
libgcc/ChangeLog:
* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): Ditto.
* config/riscv/t-softfp32: Add support for BF16 libfuncs.
* config/riscv/t-softfp64: Ditto.
* soft-fp/floatsibf.c: For si -> bf16.
* soft-fp/floatunsibf.c: For unsi -> bf16.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparison.c: New test.
* gcc.target/riscv/bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/bf16_integer_libcall_convert.c: New test.
Jeff Law [Mon, 6 May 2024 21:27:43 +0000 (15:27 -0600)]
So another constant synthesis improvement.
In this patch we're looking at cases where we'd like to be able to use
lui+slli, but can't because of the sign extending nature of lui on
TARGET_64BIT. For example: 0x8001100000020UL. The trunk currently generates 4
instructions for that constant, when it can be done with 3 (lui+slli.uw+addi).
When Zba is enabled, we can use lui+slli.uw as the slli.uw masks off the bits
32..63 before shifting, giving us the precise semantics we want.
I strongly suspect we'll want to do the same for a set of constants with
lui+add.uw, lui+shNadd.uw, so you'll see the beginnings of generalizing support
for lui followed by a "uw" instruction.
The new test just tests the set of cases that showed up while exploring a
particular space of the constant synthesis problem. It's not meant to be
exhaustive (failure to use shadd when profitable).
gcc/
* config/riscv/riscv.cc (riscv_integer_op): Add field tracking if we
want to use a "uw" instruction variant.
(riscv_build_integer_1): Initialize the new field in various places.
Use lui+slli.uw for some constants.
(riscv_move_integer): Handle slli.uw.
Jeff Law [Thu, 2 May 2024 23:13:12 +0000 (17:13 -0600)]
[committed][RISC-V] Fix nearbyint failure on rv32 and formatting nits
The CI system tripped an execution failure for rv32 with the ceil/round patch.
The fundamental problem is the FP->INT step in these sequences requires the
input size to match the output size. The output size was based on rv32/rv64.
Meaning that we'd try to do DF->SI->DF.
That doesn't preserve the semantics we want in at least two ways.
The net is we can't use this trick for DF values on rv32. While inside the
code I realized we had a similar problem for HF modes. HF modes we can support
only for Zfa. So I fixed that proactively.
The CI system also pointed out various formatting nits. I think this fixes all
but one overly long line.
Note I could have factored the TARGET_ZFA test. But I think as-written it's
clearer what the desired cases to transform are.
gcc/
* config/riscv/riscv.md (<round_pattern><ANYF:mode>2): Adjust
condition to match what can be properly implemented. Fix various
formatting issues.
(l<round_pattern><ANYF:mode>si2_sext): Fix formatting
Jeff Law [Thu, 2 May 2024 20:06:22 +0000 (14:06 -0600)]
[RFA][RISC-V] Improve constant synthesis for constants with 2 bits set
In doing some preparation work for using zbkb's pack instructions for constant
synthesis I figured it would be wise to get a sense of how well our constant
synthesis is actually working and address any clear issues.
So the first glaring inefficiency is in our handling of constants with a small
number of bits set. Let's start with just two bits set. There are 2016
distinct constants in that space (rv64). With Zbs enabled the absolute worst
we should ever do is two instructions (bseti+bseti). Yet we have 503 cases
where we're generating 3+ instructions when there's just two bits set in the
constant. A constant like 0x8000000000001000 generates 4 instructions!
This patch adds bseti (and indirectly binvi if we needed it) as a first class
citizen for constant synthesis. There's two components to this change.
First, we can't generate an IOR with a constant like (1 << 45) as an operand.
The IOR/XOR define_insn is in riscv.md. The constant argument for those
patterns must match an arith_operand which means its not really usable for
generating bseti directly in the cases we care about (at least one of the bits
will be in the 32..63 range and thus won't match arith_operand).
We have a few things we could do. One would be to extend the existing pattern
to incorporate bseti cases. But I suspect folks like the separation of the
base architecture (riscv.md) from the Zb* extensions (bitmanip.md). We could
also try to generate the RTL for bseti
directly, bypassing gen_fmt_ee (which forces undesirable constants into registers based on the predicate of the appropriate define_insn). Neither of these seemed particularly appealing to me.
So what I've done instead is to make ior/xor a define_expand and have the
expander allow a wider set of constant operands when Zbs is enabled. That
allows us to keep the bulk of Zb* support inside bitmanip.md and continue to
use gen_fmt_ee in the constant synthesis paths.
Note the code generation in this case is designed to first set as many bits as
we can with lui, then with addi since those can both set multiple bits at a
time. If there are any residual bits left to set we can emit bseti
instructions up to the current cost ceiling.
This results in fixing all of the 503 2-bit set cases where we emitted too many
instructions. It also significantly helps other scenarios with more bits set.
The testcase I'm including verifies the number of instructions we generate for
the full set of 2016 possible cases. Obviously this won't be possible as we
increase the number of bits (there are something like 48k cases with just 3
bits set).
gcc/
* config/riscv/predicates.md (arith_or_zbs_operand): New predicate.
* config/riscv/riscv.cc (riscv_build_integer_one): Use bseti to set
single bits when profitable.
* config/riscv/riscv.md (*<optab><mode>3): Renamed with '*' prefix.
(<optab><mode>3): New expander for IOR/XOR.
gcc/testsuite
* gcc.target/riscv/synthesis-1.c: New test.
Jeff Law [Thu, 2 May 2024 14:42:32 +0000 (08:42 -0600)]
[committed] [RISC-V] Don't run new rounding tests on newlib risc-v targets
The new round_32.c and round_64.c tests depend on the optimizers to recognize
the conversions feeding the floor/ceil calls and convert them into ceilf,
floorf and the like.
Those transformations only occur when the target indicates the C library has
the appropriate routines (fnclass == function_c99_misc). While newlib has
these routines, they are not exposed as available to the compiler and thus the
transformation the tests depend on do not happen. Naturally the scan-tests then
fail.
Note the explicit subreg. We can instead use a match_operand with QImode.
This ever-so-slightly simplifies the machine description.
It also means that if we have a QImode object lying around (say we loaded it
from memory in QImode), we can use it directly rather than first extending it
to X, then truncing to QI. So we end up with simpler RTL and in rare cases
improve the code we generate.
When used in a define_split or define_insn_and_split we need to make suitable
adjustments to the split RTL.
Bootstrapped a while back. Just re-tested with a cross.
gcc/
* config/riscv/bitmanip.md (splitter to use w-form division): Remove
explicit subregs.
(zero extended bitfield extraction): Similarly.
* config/riscv/thead.md (*th_memidx_operand): Similarly.
Jeff Law [Wed, 1 May 2024 17:28:41 +0000 (11:28 -0600)]
[committed] [RISC-V] Fix detection of store pair fusion cases
We've got the ability to count the number of store pair fusions happening in
the front-end of the pipeline. When comparing some code from last year vs the
current trunk we saw a fairly dramatic drop.
The problem is the store pair fusion detection code was actively harmful due to
a minor bug in checking offsets. So instead of pairing up 8 byte stores such
as sp+0 with sp+8, it tried to pair up sp+8 and sp+16.
Given uarch sensitivity I didn't try to pull together a testcase. But we could
certainly see the undesirable behavior in benchmarks as simplistic as dhrystone
up through spec2017.
Anyway, bootstrapped a while back. Also verified through our performance
counters that store pair fusion rates are back up. Regression tested with
crosses a few minutes ago.
gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Break out
tests for easier debugging in store pair fusion case. Fix offset
check in same.
This patch is primarily meant to improve the code we generate for FP rounding
such as ceil/floor. It also addresses some unnecessary sign extensions in the
same areas.
RISC-V's FP conversions have a bit of undesirable behavior that make them
non-suitable as-is for ceil/floor and other related functions. These
deficiencies are addressed in the Zfa extension, but there's no reason not to
pick up a nice improvement when we can.
Basically we can still use the basic FP conversions for floor/ceil and friends
when we don't care about inexact exceptions by checking for the special cases
first, then emitting the conversion when the special cases don't apply. That's
still much faster than calling into glibc.
The redundant sign extensions are eliminated using the same trick Jivan added
last year, just in a few more places ;-)
This eliminates roughly 10% of the dynamic instruction count for imagick. But
more importantly it's about a 17% performance improvement for that workload
within spec.
This has been bootstrapped as well as regression tested in a cross environment.
It's also successfully built & run specint/specfp correctly.
Pushing to the trunk and the coordination branch momentarily.
gcc/
* config/riscv/iterators.md (fix_ops, fix_uns): New iterators.
(RINT, rint_pattern, rint_rm): Remove unused iterators.
* config/riscv/riscv-protos.h (get_fp_rounding_coefficient): Prototype.
* config/riscv/riscv-v.cc (get_fp_rounding_coefficient): Externalize.
external linkage.
* config/riscv/riscv.md (UNSPEC_LROUND): Remove.
(fix_trunc<ANYF:mode><GPR:mode>2): Replace with ...
(<fix_uns>_trunc<ANYF:mode>si2): New expander & associated insn.
(<fix_uns>_trunc<ANYF:mode>si2_ext): New insn.
(<fix_uns>_trunc<ANYF:mode>di2): Likewise.
(l<rint_pattern><ANYF:mode><GPR:mode>2): Replace with ...
(lrint<ANYF:mode>si2): New expander and associated insn.
(lrint<ANYF:mode>si2_ext, lrint<ANYF:mode>di2): New insns.
(<round_pattern><ANYF:mode>2): Replace with....
(l<round_pattern><ANYF:mode>si2): New expander and associated insn.
(l<round_pattern><ANYF:mode>si2_sext): New insn.
(l<round_pattern><ANYF:mode>di2): Likewise.
(<round_pattern><ANYF:mode>2): New expander.
gcc/testsuite/
* gcc.target/riscv/fix.c: New test.
* gcc.target/riscv/round.c: New test.
* gcc.target/riscv/round_32.c: New test.
* gcc.target/riscv/round_64.c: New test.
The extension parsing table entries for a range of Zic* extensions
does not match the mask definition in riscv.opt.
This results in broken TARGET_ZIC* macros, because the values of
riscv_zi_subext and riscv_zicmo_subext are set wrong.
This patch fixes this by moving Zic64b into riscv_zicmo_subext
and all other affected Zic* extensions to riscv_zi_subext.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Move ziccamoa, ziccif,
zicclsm, and ziccrse into riscv_zi_subext.
* config/riscv/riscv.opt: Define MASK_ZIC64B for
riscv_ziccmo_subext.
Fangrui Song [Sat, 27 Apr 2024 01:14:33 +0000 (18:14 -0700)]
RISC-V: Add -X to link spec
--discard-locals (-X) instructs the linker to remove local .L* symbols,
which occur a lot due to label differences for linker relaxation. The
arm port has a similar need and passes -X to ld.
In contrast, the RISC-V port does not pass -X to ld and rely on the
default --discard-locals in GNU ld's riscv port. The arm way is more
conventional (compiler driver instead of the linker customizes the
default linker behavior) and works with lld.
Harald Anlauf [Mon, 13 May 2024 20:06:33 +0000 (22:06 +0200)]
Fortran: fix bounds check for assignment, class component [PR86100]
gcc/fortran/ChangeLog:
PR fortran/86100
* trans-array.cc (gfc_conv_ss_startstride): Use abridged_ref_name
to generate a more user-friendly name for bounds-check messages.
* trans-expr.cc (gfc_copy_class_to_class): Fix bounds check for
rank>1 by looping over the dimensions.
gcc/testsuite/ChangeLog:
PR fortran/86100
* gfortran.dg/bounds_check_25.f90: New test.
Jason Merrill [Wed, 22 May 2024 22:41:27 +0000 (18:41 -0400)]
c++: deleting array temporary [PR115187]
Decaying the array temporary to a pointer and then deleting that crashes in
verify_gimple_stmt, because the TARGET_EXPR is first evaluated inside the
TRY_FINALLY_EXPR, but the cleanup point is outside. Fixed by using
get_target_expr instead of save_expr.
I also adjust the stabilize_expr comment to prevent me from again thinking
it's a suitable replacement.
PR c++/115187
gcc/cp/ChangeLog:
* init.cc (build_delete): Use get_target_expr instead of save_expr.
* tree.cc (stabilize_expr): Update comment.
c++: Propagate using decls from partitions [PR114868]
The modules code currently neglects to set OVL_USING_P on the dependency
created for a using-decl, which causes it not to remember that the
OVL_EXPORT_P flag had been set on it when emitted from the primary
interface unit. This patch ensures that it occurs.
PR c++/114868
gcc/cp/ChangeLog:
* module.cc (depset::hash::add_binding_entity): Propagate
OVL_USING_P for using-declarations.
gcc/testsuite/ChangeLog:
* g++.dg/modules/using-15_a.C: New test.
* g++.dg/modules/using-15_b.C: New test.
* g++.dg/modules/using-15_c.C: New test.
c++: Fix instantiation of imported temploid friends [PR114275]
This patch fixes a number of issues with the handling of temploid friend
declarations.
The primary issue is that instantiations of friend declarations should
attach the declaration to the same module as the befriending class, by
[module.unit] p7.1 and [temp.friend] p2; this could be a different
module from the current TU, and so needs special handling.
The other main issue here is that we can't assume that just because name
lookup didn't find a definition for a hidden class template, that it
doesn't exist at all: it could be a non-exported entity that we've
nevertheless streamed in from an imported module. We need to ensure
that when instantiating template friend classes that we return the same
TEMPLATE_DECL that we got from our imports, otherwise we will get later
issues with 'duplicate_decls' (rightfully) complaining that they're
different when trying to merge.
This doesn't appear necessary for function templates due to the existing
name lookup handling already finding these hidden declarations.
* cp-tree.h (propagate_defining_module): Declare.
(remove_defining_module): Declare.
(lookup_imported_hidden_friend): Declare.
* decl.cc (duplicate_decls): Also check if hidden decls can be
redeclared in this module. Call remove_defining_module on
to-be-freed newdecl.
* module.cc (imported_temploid_friends): New.
(init_modules): Initialize it.
(trees_out::decl_value): Write it; don't consider imported
temploid friends as attached to a module.
(trees_in::decl_value): Read it for non-discarded decls.
(get_originating_module_decl): Follow the owning decl for an
imported temploid friend.
(propagate_defining_module): New.
(remove_defining_module): New.
* name-lookup.cc (get_mergeable_namespace_binding): New.
(lookup_imported_hidden_friend): New.
* pt.cc (tsubst_friend_function): Propagate defining module for
new friend functions.
(tsubst_friend_class): Lookup imported hidden friends. Check
for valid module attachment of existing names. Propagate
defining module for new classes.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-friend-10_a.C: New test.
* g++.dg/modules/tpl-friend-10_b.C: New test.
* g++.dg/modules/tpl-friend-10_c.C: New test.
* g++.dg/modules/tpl-friend-10_d.C: New test.
* g++.dg/modules/tpl-friend-11_a.C: New test.
* g++.dg/modules/tpl-friend-11_b.C: New test.
* g++.dg/modules/tpl-friend-12_a.C: New test.
* g++.dg/modules/tpl-friend-12_b.C: New test.
* g++.dg/modules/tpl-friend-12_c.C: New test.
* g++.dg/modules/tpl-friend-12_d.C: New test.
* g++.dg/modules/tpl-friend-12_e.C: New test.
* g++.dg/modules/tpl-friend-12_f.C: New test.
* g++.dg/modules/tpl-friend-13_a.C: New test.
* g++.dg/modules/tpl-friend-13_b.C: New test.
* g++.dg/modules/tpl-friend-13_c.C: New test.
* g++.dg/modules/tpl-friend-13_d.C: New test.
* g++.dg/modules/tpl-friend-13_e.C: New test.
* g++.dg/modules/tpl-friend-13_f.C: New test.
* g++.dg/modules/tpl-friend-13_g.C: New test.
* g++.dg/modules/tpl-friend-14_a.C: New test.
* g++.dg/modules/tpl-friend-14_b.C: New test.
* g++.dg/modules/tpl-friend-14_c.C: New test.
* g++.dg/modules/tpl-friend-14_d.C: New test.
* g++.dg/modules/tpl-friend-9.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Currently different places calling 'module_may_redeclare' all emit very
similar but slightly different error messages, and handle different
kinds of declarations differently. This patch makes the function
perform its own error messages so that they're all in one place, and
prepares it for use with temploid friends.
Martin Jambor [Thu, 9 May 2024 14:39:44 +0000 (16:39 +0200)]
sra: Do not leave work for DSE (that it can sometimes not perform)
When looking again at the g++.dg/tree-ssa/pr109849.C testcase we
discovered that it generates terrible store-to-load forwarding stalls
because SRA was leaving behind aggregate loads but all the stores were
by scalar parts and DSE failed to remove the useless load. SRA has
all the knowledge to remove the statement even now, so this small
patch makes it do so.
With this patch, the g++.dg/tree-ssa/pr109849.C micro-benchmark runs 9
times faster (on an AMD EPYC 75F3 machine).
gcc/ChangeLog:
2024-04-18 Martin Jambor <mjambor@suse.cz>
* tree-sra.cc (sra_modify_assign): Remove the original statement
also when dealing with a store to a fully covered aggregate from a
non-candidate.
gcc/testsuite/ChangeLog:
2024-04-23 Martin Jambor <mjambor@suse.cz>
* g++.dg/tree-ssa/pr109849.C: Also check that the aggeegate store
to cur disappears.
* gcc.dg/tree-ssa/ssa-dse-26.c: Instead of relying on DSE,
check that the unwanted stores were removed at early SRA time.
Marek Polacek [Wed, 8 May 2024 21:02:49 +0000 (17:02 -0400)]
c++: failure to suppress -Wsizeof-array-div in template [PR114983]
-Wsizeof-array-div offers a way to suppress the warning by wrapping
the second operand of the division in parens:
sizeof (samplesBuffer) / (sizeof(unsigned char))
but this doesn't work in a template, because we fail to propagate
the suppression bits. Do it, then.
The finish_parenthesized_expr hunk is not needed because suppress_warning
isn't very fine-grained. But I think it makes sense to be explicit and
not rely on OPT_Wparentheses also suppressing OPT_Wsizeof_array_div.
PR c++/114983
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) <case SIZEOF_EXPR>: Use copy_warning.
* semantics.cc (finish_parenthesized_expr): Also suppress
-Wsizeof-array-div.
Eric Botcazou [Wed, 22 May 2024 16:10:39 +0000 (18:10 +0200)]
Fix internal error in seh_cfa_offset with -O2 -fno-omit-frame-pointer
The problem directly comes from the -ffold-mem-offsets pass messing up with
the prologue and the frame-related instructions, which is a no-no with SEH,
so the fix simply disconnects the pass in these circumstances.
gcc/
PR rtl-optimization/115038
* fold-mem-offsets.cc (fold_offsets): Return 0 if the defining
instruction of the register is frame related.
Jonathan Wakely [Fri, 17 May 2024 09:55:32 +0000 (10:55 +0100)]
libstdc++: Implement std::formatter<std::thread::id> without <sstream> [PR115099]
The std::thread::id formatter uses std::basic_ostringstream without
including <sstream>, which went unnoticed because the test for it uses
a stringstream to check the output is correct.
The fix implemented here is to stop using basic_ostringstream for
formatting thread::id and just use std::format instead.
As a drive-by fix, the formatter specialization is constrained to
require that the thread::id::native_handle_type can be formatted, to
avoid making the formatter ill-formed if the pthread_t type is not a
pointer or integer. Since non-void pointers can't be formatted, ensure
that we convert pointers to const void* for formatting. Make a similar
change to the existing operator<< overload so that in the unlikely case
that pthread_t is a typedef for char* we don't treat it as a
null-terminated string when inserting into a stream.
libstdc++-v3/ChangeLog:
PR libstdc++/115099
* include/bits/std_thread.h: Declare formatter as friend of
thread::id.
* include/std/thread (operator<<): Convert non-void pointers to
void pointers for output.
(formatter): Add constraint that thread::native_handle_type is a
pointer or integer.
(formatter::format): Reimplement without basic_ostringstream.
* testsuite/30_threads/thread/id/output.cc: Check output
compiles before <sstream> has been included.
Jakub Jelinek [Wed, 22 May 2024 07:13:50 +0000 (09:13 +0200)]
strlen: Fix up !si->full_string_p handling in count_nonzero_bytes_addr [PR115152]
The following testcase is miscompiled because
strlen_pass::count_nonzero_bytes_addr doesn't handle correctly
the !si->full_string_p case.
If si->full_string_p, it correctly computes minlen and maxlen as
minimum and maximum length of the '\0' terminated stgring and
clears *nulterm (ie. makes sure !full_string_p in the ultimate
caller) if minlen is equal or larger than nbytes and so
'\0' isn't guaranteed to be among those bytes.
But in the !si->full_string_p case, all we know is that there
are [minlen,maxlen] non-zero bytes followed by unknown bytes,
so effectively the maxlen is infinite (but caller cares about only
the first nbytes bytes) and furthermore, we never know if there is
any '\0' char among those, so *nulterm needs to be always cleared.
2024-05-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/115152
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes_addr): If
!si->full_string_p, clear *nulterm and set maxlen to nbytes.
Jakub Jelinek [Wed, 22 May 2024 07:12:28 +0000 (09:12 +0200)]
ubsan: Use right address space for MEM_REF created for bool/enum sanitization [PR115172]
The following testcase is miscompiled, because -fsanitize=bool,enum
creates a MEM_REF without propagating there address space qualifiers,
so what should be normally loaded using say %gs:/%fs: segment prefix
isn't. Together with asan it then causes that load to be sanitized.
2024-05-22 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/115172
* ubsan.cc (instrument_bool_enum_load): If rhs is not in generic
address space, use qualified version of utype with the right
address space. Formatting fix.
Haochen Jiang [Tue, 21 May 2024 06:10:43 +0000 (14:10 +0800)]
i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW
Since vpermq is really slow, we should avoid using it for permutation
when vpmovwb is not available (needs AVX512BW) for ix86_expand_vecop_qihi2
and fall back to ix86_expand_vecop_qihi.
gcc/ChangeLog:
PR target/115069
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Do not enable the optimization when AVX512BW is not enabled.
Patrick Palka [Tue, 21 May 2024 19:54:10 +0000 (15:54 -0400)]
c++: folding non-dep enumerator from current inst [PR115139]
After the tsubst_copy removal r14-4796-g3e3d73ed5e85e7 GCC 14 ICEs during
fold_non_dependent_expr for 'e1 | e2' below ultimately because we no
longer exit early when substituting the CONST_DECLs for e1 and e2 with
args=NULL_TREE and instead proceed to substitute the class context A<Ts...>
(also with args=NULL_TREE) which ends up ICEing from tsubst_pack_expansion
(due to processing_template_decl being cleared).
Incidentally, the ICE went away on trunk ever since the tsubst_aggr_type
removal r15-123-gf04dc89a991ddc since it changed the CONST_DECL case of
tsubst_expr to use tsubst to substitute the context, which short circuits
for empty args and so avoids the ICE.
This patch fixes this ICE for GCC 14 by narrowly restoring the early exit
for empty args that would've happened in tsubst_copy when substituting an
enumerator CONST_DECL. We might as well apply this to trunk too, as a
small optimization.
PR c++/115139
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) <case CONST_DECL>: Exit early if args
is empty.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent33.C: New test.
Reviewed-by: Marek Polacek <mpolacek@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit f0c0bced62b9c728ed1e672747aa234d918da22c)
Andrew Pinski [Mon, 20 May 2024 07:16:40 +0000 (00:16 -0700)]
match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]
The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for signed 1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.
Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround the undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for that.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/115154
gcc/ChangeLog:
* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): Disable
for 1bit signed types.
gcc/testsuite/ChangeLog:
* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.
Andrew Pinski [Sat, 18 May 2024 18:55:58 +0000 (11:55 -0700)]
PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]
The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.
Bootstrapped and tested on x86_64_linux-gnu with no regressions.
PR tree-optimization/115143
gcc/ChangeLog:
* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.
Patrick Palka [Fri, 17 May 2024 13:02:52 +0000 (09:02 -0400)]
c++: aggregate CTAD w/ paren init and bases [PR115114]
During aggregate CTAD with paren init, we're accidentally overlooking
base classes since TYPE_FIELDS of a template type doesn't contain
corresponding base fields. So we need to consider them separately.
PR c++/115114
gcc/cp/ChangeLog:
* pt.cc (maybe_aggr_guide): Consider bases in the paren init case.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-aggr15.C: New test.
Patrick Palka [Wed, 15 May 2024 02:55:16 +0000 (22:55 -0400)]
c++: lvalueness of non-dependent assignment expr [PR114994]
r14-4111-g6e92a6a2a72d3b made us check non-dependent simple assignment
expressions ahead of time and give them a type, as was already done for
compound assignments. Unlike for compound assignments however, if a
simple assignment resolves to an operator overload we represent it as a
(typed) MODOP_EXPR instead of a CALL_EXPR to the selected overload.
(I reckoned this was at worst a pessimization -- we'll just have to repeat
overload resolution at instantiatiation time.)
But this turns out to break the below testcase ultimately because
MODOP_EXPR (of non-reference type) is always treated as an lvalue
according to lvalue_kind, which is incorrect for the MODOP_EXPR
representing x=42.
We can fix this by representing such class assignment expressions as
CALL_EXPRs as well, but this turns out to require some tweaking of our
-Wparentheses warning logic and may introduce other fallout making it
unsuitable for backporting.
So this patch instead fixes lvalue_kind to consider the type of a
MODOP_EXPR representing a class assignment.
PR c++/114994
gcc/cp/ChangeLog:
* tree.cc (lvalue_kind) <case MODOP_EXPR>: For a class
assignment, consider the result type.
The libgcc implementation of __clzhi2 can be tweaked by
one cycle in some situations by re-arranging the instructions.
It also reduces the WCET by 1 cycle.
Paul Thomas [Fri, 17 May 2024 14:19:26 +0000 (15:19 +0100)]
Fortran: Fix select type regression due to r14-9489 [PR114874]
2024-05-17 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/114874
* gfortran.h: Add 'assoc_name_inferred' to gfc_namespace.
* match.cc (gfc_match_select_type): Set 'assoc_name_inferred'
in select type namespace if the selector has inferred type.
* primary.cc (gfc_match_varspec): If a select type temporary
is apparently scalar and a left parenthesis has been detected,
check the current namespace has 'assoc_name_inferred' set. If
so, set inferred_type.
* resolve.cc (resolve_variable): If the namespace of a select
type temporary is marked with 'assoc_name_inferred' call
gfc_fixup_inferred_type_refs to ensure references are OK.
(gfc_fixup_inferred_type_refs): Catch invalid array refs..
gcc/testsuite/
PR fortran/114874
* gfortran.dg/pr114874_1.f90: New test for valid code.
* gfortran.dg/pr114874_2.f90: New test for invalid code.
Richard Biener [Fri, 10 May 2024 12:19:49 +0000 (14:19 +0200)]
tree-optimization/114998 - use-after-free with loop distribution
When loop distribution releases a PHI node of the original IL it
can end up clobbering memory that's re-used when it upon releasing
its RDG resets all stmt UIDs back to -1, even those that got released.
The fix is to avoid resetting UIDs based on stmts in the RDG but
instead reset only those still present in the loop.
PR tree-optimization/114998
* tree-loop-distribution.cc (free_rdg): Take loop argument.
Reset UIDs of stmts still in the IL rather than all stmts
referenced from the RDG.
(loop_distribution::build_rdg): Pass loop to free_rdg.
(loop_distribution::distribute_loop): Likewise.
(loop_distribution::transform_reduction_loop): Likewise.
Richard Biener [Fri, 3 May 2024 08:44:50 +0000 (10:44 +0200)]
middle-end/114931 - type_hash_canon and structual equality types
TYPE_STRUCTURAL_EQUALITY_P is part of our type system so we have
to make sure to include that into the type unification done via
type_hash_canon. This requires the flag to be set before querying
the hash which is the biggest part of the patch.
PR middle-end/114931
gcc/
* tree.cc (type_hash_canon_hash): Hash TYPE_STRUCTURAL_EQUALITY_P.
(type_cache_hasher::equal): Compare TYPE_STRUCTURAL_EQUALITY_P.
(build_array_type_1): Set TYPE_STRUCTURAL_EQUALITY_P before
probing with type_hash_canon.
(build_function_type): Likewise.
(build_method_type_directly): Likewise.
(build_offset_type): Likewise.
(build_complex_type): Likewise.
* attribs.cc (build_type_attribute_qual_variant): Likewise.
gcc/c-family/
* c-common.cc (complete_array_type): Set TYPE_STRUCTURAL_EQUALITY_P
before probing with type_hash_canon.
Richard Biener [Fri, 3 May 2024 09:48:07 +0000 (11:48 +0200)]
Avoid changing type in the type_hash_canon hash
When building a type and type_hash_canon returns an existing type
avoid changing it, in particular its TYPE_CANONICAL.
PR middle-end/114931
* tree.cc (build_array_type_1): Return early when type_hash_canon
returned an older existing type.
(build_function_type): Likewise.
(build_method_type_directly): Likewise.
(build_offset_type): Likewise.
Jonathan Wakely [Thu, 11 Apr 2024 14:35:11 +0000 (15:35 +0100)]
libstdc++: Update ABI test to disallow adding to released symbol versions
If we update the list of "active" symbols versions now, rather than when
adding a new symbol version, we will notice if new symbols get added to
the wrong version (as in PR 114692).
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_abi.cc: Update latest versions to
new versions that should be used in future.
Jonathan Wakely [Wed, 1 May 2024 16:09:39 +0000 (17:09 +0100)]
libstdc++: Fix handling of incomplete UTF-8 sequences in _Unicode_view
Eddie Nolan reported to me that _Unicode_view was not correctly
implementing the substitution of ill-formed subsequences with U+FFFD,
due to failing to increment the counter when the iterator reaches the
end of the sequence before a multibyte sequence is complete. As a
result, the incomplete sequence was not completely consumed, and then
the remaining character was treated as another ill-formed sequence,
giving two U+FFFD characters instead of one.
To avoid similar mistakes in future, this change introduces a lambda
that increments the iterator and the counter together. This ensures the
counter is always incremented when the iterator is incremented, so that
we always know how many characters have been consumed.
libstdc++-v3/ChangeLog:
* include/bits/unicode.h (_Unicode_view::_M_read_utf8): Ensure
count of characters consumed is correct when the end of the
input is reached unexpectedly.
* testsuite/ext/unicode/view.cc: Test incomplete UTF-8
sequences.
During maybe_aggr_guide with a nested class template and paren init,
like with list init we need to consider the generic template type rather
than the partially instantiated type since partial instantiations don't
have (partially instantiated) TYPE_FIELDS. In turn we need to partially
substitute PARMs in the paren init case as well. As a drive-by improvement
it seems better to use outer_template_args instead of DECL_TI_ARGS during
this partial substitution so that we lower instead of substitute the
innermost template parameters, which is generally more robust.
And during alias_ctad_tweaks with a nested class template, even though
the guides may be already partially instantiated we still need to
substitute the outermost arguments into its constraints.
PR c++/114974
PR c++/114901
PR c++/114903
gcc/cp/ChangeLog:
* pt.cc (maybe_aggr_guide): Fix obtaining TYPE_FIELDS in
the paren init case. Hoist out partial substitution logic
to apply to the paren init case as well.
(alias_ctad_tweaks): Substitute outer template arguments into
a guide's constraints.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-aggr14.C: New test.
* g++.dg/cpp2a/class-deduction-alias20.C: New test.
* g++.dg/cpp2a/class-deduction-alias21.C: New test.
Gerald Pfeifer [Sun, 12 May 2024 13:31:33 +0000 (15:31 +0200)]
doc: Describe limitations re Ada, D, and Go on FreeBSD
gcc:
PR target/69374
PR target/112959
* doc/install.texi (Specific) <*-*-freebsd*>: The Ada and D
run-time libraries are broken on i386 which also can affect
64-bit builds. Go is broken.
Gerald Pfeifer [Sun, 12 May 2024 13:30:18 +0000 (15:30 +0200)]
doc: FreeBSD no longer has a GNU toolchain in base
gcc:
PR target/69374
PR target/112959
* doc/install.texi (Specific) <*-*-freebsd*>: No longer refer
to GCC or binutils in base. Recommend bootstrap using binutils.
Jakub Jelinek [Fri, 10 May 2024 07:21:38 +0000 (09:21 +0200)]
c++, mingw: Fix up types of dtor hooks to __cxa_{,thread_}atexit/__cxa_throw on mingw ia32 [PR114968]
__cxa_atexit/__cxa_thread_atexit/__cxa_throw functions accept function
pointers to usually directly destructors rather than wrappers around
them.
Now, mingw ia32 uses implicitly __attribute__((thiscall)) calling
conventions for METHOD_TYPE (where the this pointer is passed in %ecx
register, the rest on the stack), so these functions use:
in config/os/mingw32/os_defines.h:
#if defined (__i386__)
#define _GLIBCXX_CDTOR_CALLABI __thiscall
#endif
in libsupc++/cxxabi.h
__cxa_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void*) _GLIBCXX_NOTHROW;
__cxa_thread_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void *) _GLIBCXX_NOTHROW;
__cxa_throw(void*, std::type_info*, void (_GLIBCXX_CDTOR_CALLABI *) (void *))
__attribute__((__noreturn__));
Now, mingw for some weird reason uses
#define TARGET_CXX_USE_ATEXIT_FOR_CXA_ATEXIT hook_bool_void_true
so it never actually uses __cxa_atexit, but does use __cxa_thread_atexit
and __cxa_throw. Recent changes for modules result in more detailed
__cxa_*atexit/__cxa_throw prototypes precreated by the compiler, and if
that happens and one also includes <cxxabi.h>, the compiler complains about
mismatches in the prototypes.
One thing is the missing thiscall attribute on the FUNCTION_TYPE, the
other problem is that all of atexit/__cxa_atexit/__cxa_thread_atexit
get function pointer types created by a single function,
get_atexit_fn_ptr_type (), which creates it depending on if atexit
or __cxa_atexit will be used as either void(*)(void) or void(*)(void *),
but when using atexit and __cxa_thread_atexit it uses the wrong function
type for __cxa_thread_atexit.
The following patch adds a target hook to add the thiscall attribute to the
function pointers, and splits the get_atexit_fn_ptr_type () function into
get_atexit_fn_ptr_type () and get_cxa_atexit_fn_ptr_type (), the former always
creates shared void(*)(void) type, the latter creates either
void(*)(void*) (on most targets) or void(__attribute__((thiscall))*)(void*)
(on mingw ia32). So that we don't waiste another GTY global tree for it,
because cleanup_type used for the same purpose for __cxa_throw should be
the same, the code changes it to use that type too.
In register_dtor_fn then based on the decision whether to use atexit,
__cxa_atexit or __cxa_thread_atexit it picks the right function pointer
type, and also if it decides to emit a __tcf_* wrapper for the cleanup,
uses that type for that wrapper so that it agrees on calling convention.
2024-05-10 Jakub Jelinek <jakub@redhat.com>
PR target/114968
gcc/
* target.def (use_atexit_for_cxa_atexit): Remove spurious space
from comment.
(adjust_cdtor_callabi_fntype): New cxx target hook.
* targhooks.h (default_cxx_adjust_cdtor_callabi_fntype): Declare.
* targhooks.cc (default_cxx_adjust_cdtor_callabi_fntype): New
function.
* doc/tm.texi.in (TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE): Add.
* doc/tm.texi: Regenerate.
* config/i386/i386.cc (ix86_cxx_adjust_cdtor_callabi_fntype): New
function.
(TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE): Redefine.
gcc/cp/
* cp-tree.h (atexit_fn_ptr_type_node, cleanup_type): Adjust macro
comments.
(get_cxa_atexit_fn_ptr_type): Declare.
* decl.cc (get_atexit_fn_ptr_type): Adjust function comment, only
build type for atexit argument.
(get_cxa_atexit_fn_ptr_type): New function.
(get_atexit_node): Call get_cxa_atexit_fn_ptr_type rather than
get_atexit_fn_ptr_type when using __cxa_atexit.
(get_thread_atexit_node): Call get_cxa_atexit_fn_ptr_type
rather than get_atexit_fn_ptr_type.
(start_cleanup_fn): Add ob_parm argument, call
get_cxa_atexit_fn_ptr_type or get_atexit_fn_ptr_type depending
on it and create PARM_DECL also based on that argument.
(register_dtor_fn): Adjust start_cleanup_fn caller, use
get_cxa_atexit_fn_ptr_type rather than get_atexit_fn_ptr_type
for use_dtor casts.
* except.cc (build_throw): Use get_cxa_atexit_fn_ptr_type ().
Xi Ruoyao [Wed, 8 May 2024 03:25:57 +0000 (11:25 +0800)]
driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]
In GCC 14 we started to emit URLs for "command-line option <option> is
valid for <language> but not <another language>" and "-Werror= argument
'-Werror=<option>' is not valid for <language>" warnings. So we should
have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
-fdiagnostics-urls= wouldn't be able to control URLs in these warnings.
No test cases are added because with TERM=xterm-256colors PR114980
already triggers some test failures.
gcc/ChangeLog:
PR driver/114980
* opts-common.cc (prune_options): Move -fdiagnostics-urls=
early like -fdiagnostics-color=.
Harald Anlauf [Mon, 29 Apr 2024 17:52:52 +0000 (19:52 +0200)]
Fortran: fix issues with class(*) assignment [PR114827]
gcc/fortran/ChangeLog:
PR fortran/114827
* trans-array.cc (gfc_alloc_allocatable_for_assignment): Take into
account _len of unlimited polymorphic entities when calculating
the effective element size for allocation size and array span.
Set _len of lhs to _len of rhs.
* trans-expr.cc (trans_class_assignment): Take into account _len
of unlimited polymorphic entities for allocation size.
gcc/testsuite/ChangeLog:
PR fortran/114827
* gfortran.dg/asan/unlimited_polymorphic_34.f90: New test.
Jakub Jelinek [Thu, 9 May 2024 09:18:21 +0000 (11:18 +0200)]
testsuite: Fix up vector-subaccess-1.C test for ia32 [PR89224]
The test FAILs on i686-linux due to
.../gcc/testsuite/g++.dg/torture/vector-subaccess-1.C:16:6: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi]
excess warnings.
This fixes it by adding -Wno-psabi, like commonly done in other tests.
2024-05-09 Jakub Jelinek <jakub@redhat.com>
PR c++/89224
* g++.dg/torture/vector-subaccess-1.C: Add -Wno-psabi as additional
options.
AVR: target/114981 - Support __builtin_powi[l] / __powidf2.
This supports __powidf2 by means of a double wrapper for already
existing f7_powi (renamed to __f7_powi by f7-renames.h).
It tweaks the implementation so that it does not perform trivial
multiplications with 1.0 any more, but instead uses a move.
It also fixes the last statement of f7_powi, which was wrong.
Notice that f7_powi was unused until now.
PR target/114981
libgcc/config/avr/libf7/
* libf7-common.mk (F7_ASM_PARTS): Add D_powi
* libf7-asm.sx (F7MOD_D_powi_, __powidf2): New module and function.
* libf7.c (f7_powi): Fix last (wrong) statement.
Tweak trivial multiplications with 1.0.
gcc/testsuite/
* gcc.target/avr/pr114981-powil.c: New test.
Objective-C, NeXT, v2: Correct a regression in code-gen.
There have been several changes in the ABI of Objective-C which
depend on the OS version targetted. In this case Protocols and
LabelProtocols should be made weak/hidden/extern from macOS 10.7
however there was a mistake in the code causing this to occur
from macOS 10.6. Fixed thus.
gcc/objc/ChangeLog:
* objc-next-runtime-abi-02.cc (WEAK_PROTOCOLS_AFTER): New.
(next_runtime_abi_02_protocol_decl): Use WEAK_PROTOCOLS_AFTER
to determine this ABI change.
(build_v2_protocol_list_address_table): Likewise.
Jakub Jelinek [Wed, 8 May 2024 08:17:32 +0000 (10:17 +0200)]
reassoc: Fix up optimize_range_tests_to_bit_test [PR114965]
The optimize_range_tests_to_bit_test optimization normally emits a range
test first:
if (entry_test_needed)
{
tem = build_range_check (loc, optype, unshare_expr (exp),
false, lowi, high);
if (tem == NULL_TREE || is_gimple_val (tem))
continue;
}
so during the bit test we already know that exp is in the [lowi, high]
range, but skips it if we have range info which tells us this isn't
necessary.
Also, normally it emits shifts by exp - lowi counter, but has an
optimization to use just exp counter if the mask isn't a more expensive
constant in that case and lowi is > 0 and high is smaller than prec.
The following testcase is miscompiled because the two abnormal cases
are triggered. The range of exp is [43, 43][48, 48][95, 95], so we on
64-bit arch decide we don't need the entry test, because 95 - 43 < 64.
And we also decide to use just exp as counter, because the range test
tests just for exp == 43 || exp == 48, so high is smaller than 64 too.
Because 95 is in the exp range, we can't do that, we'd either need to
do a range test first, i.e.
if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1))
or need to subtract lowi from the shift counter, i.e.
if ((1UL << (exp - 43)) & mask2)
but can't do both unless r.upper_bound () is < prec.
The following patch ensures that.
2024-05-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114965
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to
optimize away exp - lowi subtraction from shift count unless entry
test is emitted or unless r.upper_bound () is smaller than prec.
Andrew Pinski [Tue, 20 Feb 2024 21:38:28 +0000 (13:38 -0800)]
c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]
After r7-987-gf17a223de829cb, the access for the elements of a vector type would lose the qualifiers.
So if we had `constvector[0]`, the type of the element of the array would not have const on it.
This was due to a missing build_qualified_type for the inner type of the vector when building the array type.
We need to add back the call to build_qualified_type and now the access has the correct qualifiers. So the
overloads and even if it is a lvalue or rvalue is correctly done.
Note we correctly now reject the testcase gcc.dg/pr83415.c which was incorrectly accepted after r7-987-gf17a223de829cb.
Built and tested for aarch64-linux-gnu.
PR c++/89224
gcc/c-family/ChangeLog:
* c-common.cc (convert_vector_to_array_for_subscript): Call build_qualified_type
for the inner type.
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_array_reference): Compare main variants
for the vector/array types instead of the types directly.
gcc/testsuite/ChangeLog:
* g++.dg/torture/vector-subaccess-1.C: New test.
* gcc.dg/pr83415.c: Change warning to error.
c++/modules: Stream unmergeable temporaries by value again [PR114856]
In r14-9266-g2823b4d96d9ec4 I gave all temporary vars a DECL_CONTEXT,
including those at namespace or global scope, so that they could be
properly merged across importers. However, not all of these temporary
vars are actually supposed to be mergeable.
For instance, in the attached testcase we have an unnamed temporary var
used in the NSDMI of a class member, which cannot properly merged -- but
it also doesn't need to be, as it'll be thrown away when the class type
itself is merged anyway.
This patch reverts the change made above and instead makes a weaker
adjustment that only causes temporary vars with linkage have a
DECL_CONTEXT to merge from. This way these unnamed, "unmergeable"
temporaries are properly streamed by value again.
PR c++/114856
gcc/cp/ChangeLog:
* call.cc (make_temporary_var_for_ref_to_temp): Set context for
temporaries with linkage.
* init.cc (create_temporary_var): Revert to only set context
when in a function decl.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr114856.h: New test.
* g++.dg/modules/pr114856_a.H: New test.
* g++.dg/modules/pr114856_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
(cherry picked from commit e60032b382364897a58e67994baac896bcd03327)
Jakub Jelinek [Tue, 7 May 2024 19:30:21 +0000 (21:30 +0200)]
expansion: Use __trunchfbf2 calls rather than __extendhfbf2 [PR114907]
The HF and BF modes have the same size/precision and neither is
a subset nor superset of the other.
So, using either __extendhfbf2 or __trunchfbf2 is weird.
The expansion apparently emits __extendhfbf2, but on the libgcc side
we apparently have __trunchfbf2 implemented.
I think it is easier to switch to using what is available rather than
adding new entrypoints to libgcc, even alias, because this is backportable.
2024-05-07 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114907
* expr.cc (convert_mode_scalar): Use trunc_optab rather than
sext_optab for HF->BF conversions.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise.
Jakub Jelinek [Tue, 7 May 2024 19:29:14 +0000 (21:29 +0200)]
tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]
In r9-5742 we've started allowing to inline always_inline functions into
functions which have disabled e.g. address sanitization even when the
always_inline function is implicitly from command line options sanitized.
This mostly works fine because most of the asan instrumentation is done only
late after ipa, but as the following testcase the .ASAN_MARK ifn calls
gimplifier adds can result in ICEs.
Fixed by dropping those during inlining, similarly to how we drop
.TSAN_FUNC_EXIT calls.
2024-05-07 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/114956
* tree-inline.cc: Include asan.h.
(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
sanitization disabled.
Gaius Mulley [Tue, 7 May 2024 18:51:08 +0000 (19:51 +0100)]
[PR modula2/113768][PR modula2/114133] bugfix constants must be cast prior to vararg call
This bug fix corrects the test codes below by converting the constant
literals to the type required by C. In the testcases below the values, 1
etc were converted into the INTEGER type before being passed to a C
vararg function. By default in modula2 constant literal ordinals are
represented as the ZTYPE (the largest GCC integer type node).
gcc/testsuite/ChangeLog:
PR modula2/113768
PR modula2/114133
* gm2/extensions/run/pass/callingc10.mod: Convert constant literal
numbers into INTEGER.
* gm2/extensions/run/pass/callingc11.mod: Ditto.
* gm2/extensions/run/pass/vararg2.mod: Ditto.
* gm2/iso/run/pass/packed.mod: Emit a printf as a runtime
diagnostic.
Jakub Jelinek [Thu, 2 May 2024 09:56:16 +0000 (11:56 +0200)]
libgomp: Add gfx90c, 1036 and 1103 declare variant tests
Recently -march=gfx{90c,1036,1103} support has been added, but corresponding
changes weren't done in the testsuite.
The following patch adds that.
Tested on x86_64-linux (with fiji and gfx1103 devices; had to use
OMP_DEFAULT_DEVICE=1 there, fiji doesn't really work due to LLVM dropping
support, but we still list those as offloading devices).
2024-05-02 Jakub Jelinek <jakub@redhat.com>
* testsuite/libgomp.c/declare-variant-4.h (gfx90c, gfx1036, gfx1103):
New functions.
(f): Add #pragma omp declare variant directives for those.
* testsuite/libgomp.c/declare-variant-4-gfx90c.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx1036.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx1103.c: New test.
Jakub Jelinek [Tue, 30 Apr 2024 09:22:32 +0000 (11:22 +0200)]
gimple-ssa-sprintf: Use [0, 1] range for %lc with (wint_t) 0 argument [PR114876]
Seems when Martin S. implemented this, he coded there strict reading
of the standard, which said that %lc with (wint_t) 0 argument is handled
as wchar_t[2] temp = { arg, 0 }; %ls with temp arg and so shouldn't print
any values. But, most of the libc implementations actually handled that
case like %c with '\0' argument, adding a single NUL character, the only
known exception is musl.
Recently, C23 changed this in response to GB-141 and POSIX in
https://austingroupbugs.net/view.php?id=1647
so that it should have the same behavior as %c with '\0'.
Because there is implementation divergence, the following patch uses
a range rather than hardcoding it to all 1s (i.e. the %c behavior),
though the likely case is still 1 (forward looking plus most of
implementations).
The res.knownrange = true; assignment removed is redundant due to
the same assignment done unconditionally before the if statement,
rest is formatting fixes.
I don't think the min >= 0 && min < 128 case is right either, I'd think
it should be min >= 0 && max < 128, otherwise it is just some possible
inputs are (maybe) ASCII and there can be others, but this code is a total
mess anyway, with the min, max, likely (somewhere in [min, max]?) and then
unlikely possibly larger than max, dunno, perhaps for at least some chars
in the ASCII range the likely case could be for the ascii case; so perhaps
just the one_2_one_ascii shouldn't set max to 1 and mayfail should be true
for max >= 128. Anyway, didn't feel I should touch that right now.
2024-04-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114876
* gimple-ssa-sprintf.cc (format_character): For min == 0 && max == 0,
set max, likely and unlikely members to 1 rather than 0. Remove
useless res.knownrange = true;. Formatting fixes.
* gcc.dg/pr114876.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Adjust expected
diagnostics.
Patrick Palka [Tue, 30 Apr 2024 01:27:59 +0000 (21:27 -0400)]
c++/modules: imported spec befriending class tmpl [PR114889]
When adding to CLASSTYPE_BEFRIENDING_CLASSES as part of installing an
imported class definition, we need to look through TEMPLATE_DECL like
make_friend_class does.
Otherwise in the below testcase we won't add _Hashtable<int, int> to
CLASSTYPE_BEFRIENDING_CLASSES of _Map_base, which leads to a bogus
access check failure for _M_hash_code.
PR c++/114889
gcc/cp/ChangeLog:
* module.cc (trees_in::read_class_def): Look through
TEMPLATE_DECL when adding to CLASSTYPE_BEFRIENDING_CLASSES.
gcc/testsuite/ChangeLog:
* g++.dg/modules/friend-8_a.H: New test.
* g++.dg/modules/friend-8_b.C: New test.
AVR: ipa/92606 - Don't optimize PROGMEM data against non-PROGMEM.
ipa/92606: Inter-procedural analysis optimizes data across
address-spaces and PROGMEM. As of v14, the PROGMEM part is
still not fixed (and there is still no target hook as proposed
in PR92932). Just disable respective bogus optimization.
PR ipa/92606
gcc/
* config/avr/avr.cc (avr_option_override): Set
flag_ipa_icf_variables = 0.
gcc/testsuite/
* gcc.target/avr/torture/pr92606.c: New test.