gcc.gnu.org Git - gcc.git/log

contrib: Improve dg-extract-results.sh's Python detection [PR109668]

'python' on some systems (e.g. SLES 15) might be Python 2. Prefer python3,
then python, then python2 (as the script still tries to work there).

PR other/109668
* dg-extract-results.sh: Check for python3 before python. Check for
python2 last.

testsuite: Fix up pr113617 test for darwin [PR113617]

The test attempts to link a shared library, and apparently Darwin doesn't
allow by default for shared libraries to contain undefined symbols.

The following patch just adds dummy definitions for the symbols, so that
the library no longer has any undefined symbols at least in my linux
testing.
Furthermore, for target { !shared } targets (like darwin until the it is
fixed in target-supports.exp), because we then link a program rather than
shared library, the patch also adds a dummy main definition so that it
can link.

2024-03-08 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/113617
PR target/114233
* g++.dg/other/pr113617.C: Define -DSHARED when linking with -shared.
* g++.dg/other/pr113617-aux.cc: Add definitions for used methods and
templates not defined elsewhere.

tree-optimization/114269 - 434.zeusmp regression after SCEV analysis fix

The following addresses a performance regression caused by the recent
SCEV analysis fix with regard to folding multiplications and undefined
behavior on overflow. We do not handle (T) { a, +, b } * c but can
treat sign-conversions from unsigned by performing the multiplication
in the unsigned type. That's what we already do for additions (but
that misses one case that turns out important).

This fixes the 434.zeusmp regression for me.

PR tree-optimization/114269
PR tree-optimization/114074
* tree-chrec.cc (chrec_fold_plus_1): Handle sign-conversions
in the third CASE_CONVERT case as well.
(chrec_fold_multiply): Handle sign-conversions from unsigned
by performing the operation in the unsigned type.

modula2: Rebuild bootstrap tools with faster dynamic arrays

This patch configures the larger dynamic arrays to use a larger
growth factor and larger initial size. It also rebuilds mc and pge
using the improved default array sizes in Indexing.mod.

gcc/m2/ChangeLog:

* gm2-compiler/M2Quads.mod (Init): Use InitIndexTuned with
default size 65K.
* gm2-compiler/SymbolConversion.mod (Init): Ditto.
* gm2-compiler/SymbolTable.mod (BEGIN): Ditto.
* mc-boot/GM2Dependent.cc: Rebuild.
* mc-boot/GM2Dependent.h: Rebuild.
* mc-boot/GM2RTS.cc: Rebuild.
* pge-boot/GIndexing.cc: Rebuild.
* pge-boot/GIndexing.h: Rebuild.
* pge-boot/GM2Dependent.cc: Rebuild.
* pge-boot/GM2Dependent.h: Rebuild.
* pge-boot/GM2RTS.cc: Rebuild.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

AVR: Add an insn combine pattern for offset computation.

Computing  uint16_t += 2 * uint8_t  can occur when an offset
into a 16-bit array is computed.  Without this pattern is costs
six instructions: A move (1), a zero-extend (1), a shift (2) and
an addition (2).  With this pattern it costs 4.

gcc/
* config/avr/avr.md (*addhi3_zero_extend.ashift1): New pattern.
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS]: Compute its cost.

bb-reorder: Fix assertion

When touching bb-reorder yesterday, I've noticed the checking assert
doesn't actually check what it meant to.
Because asm_noperands returns >= 0 for inline asm patterns (in that case
number of input+output+label operands, so asm goto has at least one)
and -1 if it isn't inline asm.

The following patch fixes the assertion to actually check that it is
asm goto.

2024-03-08 Jakub Jelinek <jakub@redhat.com>

* bb-reorder.cc (fix_up_fall_thru_edges): Fix up checking assert,
asm_noperands < 0 means it is not asm goto too.

i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

The following patch hides the noreturn no_callee_saved_registers (except bp)
optimization with a not enabled by default option.
The reason is that most noreturn functions should be called just once in a
program (unless they are recursive or invoke longjmp or similar, for exceptions
we already punt), so it isn't that essential to save a few instructions in their
prologue, but more importantly because it interferes with debugging.
And unlike most other optimizations, doesn't actually make it harder to debug
the given function, which can be solved by recompiling the given function if
it is too hard to debug, but makes it harder to debug the callers of that
noreturn function. Those can be from a different translation unit, different
binary or even different package, so if e.g. glibc abort needs to use all
of the callee saved registers (%rbx, %rbp, %r12, %r13, %r14, %r15), debugging
any programs which abort will be harder because any DWARF expressions which
use those registers will be optimized out, not just in the immediate caller,
but in other callers as well until some frame restores a particular register
from some stack slot.

2024-03-08 Jakub Jelinek <jakub@redhat.com>

PR target/38534
* config/i386/i386.opt (mnoreturn-no-callee-saved-registers): New
option.
* config/i386/i386-options.cc (ix86_set_func_type): Don't use
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP unless
ix86_noreturn_no_callee_saved_registers is enabled.
* doc/invoke.texi (-mnoreturn-no-callee-saved-registers): Document.

* gcc.target/i386/pr38534-1.c: Add -mnoreturn-no-callee-saved-registers
to dg-options.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/pr38534-5.c: Likewise.
* gcc.target/i386/pr38534-6.c: Likewise.
* gcc.target/i386/pr114097-1.c: Likewise.
* gcc.target/i386/stack-check-17.c: Likewise.

c-family, c++: Fix up handling of types which may have padding in __atomic_{compare_}exchange

On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote:
> Ah, although __atomic_compare_exchange only takes pointers, the
> compiler replaces that with a call to __atomic_compare_exchange_n
> which takes the newval by value, which presumably uses an 80-bit FP
> register and so the padding bits become indeterminate again.

The problem is that __atomic_{,compare_}exchange lowering if it has
a supported atomic 1/2/4/8/16 size emits code like:
  _3 = *p2;
  _4 = VIEW_CONVERT_EXPR<I_type> (_3);
so if long double or some small struct etc. has some carefully filled
padding bits, those bits can be lost on the assignment.  The library call
for __atomic_{,compare_}exchange would actually work because it woiuld
load the value from memory using integral type or memcpy.
E.g. on
void
foo (long double *a, long double *b, long double *c)
{
  __atomic_compare_exchange (a, b, c, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
}
we end up with -O0 with:
        fldt    (%rax)
        fstpt   -48(%rbp)
        movq    -48(%rbp), %rax
        movq    -40(%rbp), %rdx
i.e. load *c from memory into 387 register, store it back to uninitialized
stack slot (the padding bits are now random in there) and then load a
__uint128_t (pair of GPR regs).  The problem is that we first load it using
whatever type the pointer points to and then VIEW_CONVERT_EXPR that value:
  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
The following patch fixes that by creating a MEM_REF instead, with the
I_type type, but with the original pointer type on the second argument for
aliasing purposes, so we actually preserve the padding bits that way.
With this patch instead of the above assembly we emit
        movq    8(%rax), %rdx
        movq    (%rax), %rax
I had to add support for MEM_REF in pt.cc, though with the assumption
that it has been already originally created with non-dependent
types/operands (which is the case here for the __atomic*exchange lowering).

2024-03-08  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
* c-common.cc (resolve_overloaded_atomic_exchange): Instead of setting
p1 to VIEW_CONVERT_EXPR<I_type> (*p1), set it to MEM_REF with p1 and
(typeof (p1)) 0 operands and I_type type.
(resolve_overloaded_atomic_compare_exchange): Similarly for p2.
gcc/cp/
* pt.cc (tsubst_expr): Handle MEM_REF.
gcc/testsuite/
* g++.dg/ext/atomic-5.C: New test.

dwarf2out: Emit DW_AT_export_symbols on anon unions/structs [PR113918]

DWARF5 added DW_AT_export_symbols both for use on inline namespaces (where
we emit it), but also on anonymous unions/structs (and we didn't emit that
attribute there).
The following patch fixes it.

2024-03-08 Jakub Jelinek <jakub@redhat.com>

PR debug/113918
gcc/
* dwarf2out.cc (gen_field_die): Emit DW_AT_export_symbols
on anonymous unions or structs for -gdwarf-5 or -gno-strict-dwarf.
gcc/c/
* c-tree.h (c_type_dwarf_attribute): Declare.
* c-objc-common.h (LANG_HOOKS_TYPE_DWARF_ATTRIBUTE): Redefine.
* c-objc-common.cc: Include dwarf2.h.
(c_type_dwarf_attribute): New function.
gcc/cp/
* cp-objcp-common.cc (cp_type_dwarf_attribute): Return 1
for DW_AT_export_symbols on anonymous structs or unions.
gcc/testsuite/
* c-c++-common/dwarf2/pr113918.c: New test.

c++: Fix up parameter pack diagnostics on xobj vs. varargs functions [PR113802]

The simple presence of ellipsis as next token after the parameter
declaration doesn't imply it is a parameter pack, it sometimes is, e.g.
if its type is a pack, but sometimes is not and in that case it acts
the same as if the next tokens were , ... instead of just ...
The xobj param cannot be a function parameter pack though treats both
the declarator->parameter_pack_p and token->type == CPP_ELLIPSIS as
sufficient conditions for the error.  The conditions for CPP_ELLIPSIS
are done a little bit later in the same function and complex enough that
IMHO shouldn't be repeated, on the other side for the
declarator->parameter_pack_p case we clear that flag for xobj params
for error recovery reasons.

This patch just moves the diagnostics later (after the CPP_ELLIPSIS handling)
and changes the error recovery behavior by pretending the this specifier
didn't appear if an error is reported.

2024-03-08  Jakub Jelinek  <jakub@redhat.com>

PR c++/113802
* parser.cc (cp_parser_parameter_declaration): Move the xobj_param_p
pack diagnostics after ellipsis handling and if an error is reported,
pretend this specifier didn't appear.  Formatting fix.

* g++.dg/cpp23/explicit-obj-diagnostics3.C (S0, S1, S2, S3, S4): Don't
expect any diagnostics on f and fd member function templates, add
similar templates with ...Selves instead of Selves as k and kd and
expect diagnostics for those.  Expect extra diagnostics in error
recovery for g and gd member function templates.

MAINTAINERS: Fix order in Write After Aproval

ChangeLog:

* MAINTAINERS: Fix order of names in Write After Aproval

Signed-off-by: Filip Kastl <fkastl@suse.cz>

testsuite/108355 - make gcc.dg/tree-ssa/ssa-fre-104.c properly XFAIL

The testcase only XFAILs on targets where int has an alignment
of sizeof(int). Align the respective array this way to make it
XFAIL consistenlty.

PR testsuite/108355
* gcc.dg/tree-ssa/ssa-fre-104.c: Align e.

modula2: Add constant aggregate tests

This patch adds four constant aggregate tests and assignment of
arrays by a constant in two different scopes.

gcc/testsuite/ChangeLog:

* gm2/iso/pass/arrayconst.mod: New test.
* gm2/iso/pass/arrayconst2.mod: New test.
* gm2/iso/pass/arrayconst3.mod: New test.
* gm2/iso/pass/arrayconst4.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

RISC-V: Fix ICE in riscv vector costs

The following code can result in ICE:
-march=rv64gcv --param riscv-autovec-lmul=dynamic -O3

char *jpeg_difference7_input_buf;
void jpeg_difference7(int *diff_buf) {
  unsigned width;
  int samp, Rb;
  while (--width) {
    Rb = samp = *jpeg_difference7_input_buf;
    *diff_buf++ = -(int)(samp + (long)Rb >> 1);
  }
}

One biggest_mode update missed in one branch and trigger assertion fail.
gcc_assert (biggest_size >= mode_size);

Tested On RV64 and no regression.

PR target/114264

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc: Fix ICE

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr114264.c: New test.

Signed-off-by: demin.han <demin.han@starfivetech.com>

fwprop: Avoid volatile rtx to be propagated

The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f)
which introduces an exception for propagation on single set insn.  The
propagation which might not be profitable (checked by profitable_p) is still
allowed to be propagated to single set insn.  It has a potential problem
that a volatile operand might be propagated to a singel set insn.  If the
define insn is not eliminated after propagation, the volatile operand will
be executed for multiple times.  This patch fixes the problem by skipping
volatile set source rtx in propagation.

gcc/
* fwprop.cc (forward_propagate_into): Return false for volatile set
source rtx.

gcc/testsuite/
* gcc.target/powerpc/fwprop-1.c: New.

Daily bump.

libstdc++: Use std::from_chars to speed up parsing subsecond durations

With std::from_chars we can parse subsecond durations much faster than
with std::num_get, as shown in the microbenchmarks below. We were using
std::num_get and std::numpunct in order to parse a number with the
locale's decimal point character. But we copy the chars from the input
stream into a new buffer anyway, so we can replace the locale's decimal
point with '.' in that buffer, and then we can use std::from_chars on
it.

Benchmark                Time             CPU   Iterations
----------------------------------------------------------
from_chars_millisec    158 ns          158 ns      4524046
num_get_millisec       192 ns          192 ns      3644626
from_chars_microsec    164 ns          163 ns      4330627
num_get_microsec       205 ns          205 ns      3413452
from_chars_nanosec     173 ns          173 ns      4072653
num_get_nanosec        227 ns          227 ns      3105161

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (_Parser::operator()): Use
std::from_chars to parse fractional seconds.

libstdc++: Fix parsing of fractional seconds [PR114244]

When converting a chrono::duration<long double> to a result type with an
integer representation we should use chrono::round<_Duration> so that we
don't truncate towards zero. Rounding ensures that e.g. 0.001999s
becomes 2ms not 1ms.

We can also remove some redundant uses of chrono::duration_cast to
convert from seconds to _Duration, because the _Parser class template
requires _Duration type to be able to represent seconds without loss of
precision.

This also fixes a bug where no fractional part would be parsed for
chrono::duration<long double> because its period is ratio<1>. We should
also consider treat_as_floating_point<rep> when deciding whether to skip
reading a fractional part.

libstdc++-v3/ChangeLog:

PR libstdc++/114244
* include/bits/chrono_io.h (_Parser::operator()): Remove
redundant uses of duration_cast. Use chrono::round to convert
long double value to durations with integer representations.
Check represenation type when deciding whether to skip parsing
fractional seconds.
* testsuite/20_util/duration/114244.cc: New test.
* testsuite/20_util/duration/io.cc: Check that a floating-point
duration with ratio<1> precision can be parsed.

c++: Redetermine whether to write vtables on stream-in [PR114229]

We currently always stream DECL_INTERFACE_KNOWN, which is needed since
many kinds of declarations already have their interface determined at
parse time. But for vtables and type-info declarations we need to
re-evaluate on stream-in as whether they need to be emitted or not
changes in each TU, so this patch clears DECL_INTERFACE_KNOWN on these
kinds of declarations so that they can go through 'import_export_decl'
again.

Note that the precise details of the virt-2 tests will need to change
when we implement the resolution of [1], for now I just updated the test
to not fail with the new (current) semantics.

[1]: https://github.com/itanium-cxx-abi/cxx-abi/pull/171

PR c++/114229

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Redetermine
DECL_INTERFACE_KNOWN on stream-in for vtables and tinfo.
* decl2.cc (import_export_decl): Add fixme for ABI changes with
module vtables and tinfo.

gcc/testsuite/ChangeLog:

* g++.dg/modules/virt-2_b.C: Update test to acknowledge that we
now emit vtables here too.
* g++.dg/modules/virt-3_a.C: New test.
* g++.dg/modules/virt-3_b.C: New test.
* g++.dg/modules/virt-3_c.C: New test.
* g++.dg/modules/virt-3_d.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: member alias tmpl partial inst [PR103994]

Alias templates are weird in that their specializations can appear in
both decl_specializations and type_specializations.  They're always in
the decl table, and additionally appear in the type table only at parse
time via finish_template_type.  There seems to be no good reason for
them to appear in both tables, and the code paths end up stepping over
each other in particular for a partial instantiation such as
A<B>::key_arg<T> in the below modules testcase: the type code path
(lookup_template_class) wants to set TI_TEMPLATE to the most general
template whereas the decl code path (tsubst_template_decl called during
instantiation of A<B>) already set TI_TEMPLATE to the partially
instantiated TEMPLATE_DECL.  This TI_TEMPLATE change ends up confusing
modules which decides to stream the logically equivalent TYPE_DECL and
TEMPLATE_DECL for this partial instantiation separately.

This patch fixes this by making lookup_template_class dispatch to
instantiate_alias_template early for alias template specializations.
In turn we now add such specializations only to the decl table.  This
admits some nice simplification in the modules code which otherwise has
to cope with such specializations appearing in both tables.

PR c++/103994

gcc/cp/ChangeLog:

* cp-tree.h (add_mergeable_specialization): Remove second
parameter.
* module.cc (depset::disc_bits::DB_ALIAS_TMPL_INST_BIT): Remove.
(depset::disc_bits::DB_ALIAS_SPEC_BIT): Remove.
(depset::is_alias_tmpl_inst): Remove.
(depset::is_alias): Remove.
(merge_kind::MK_tmpl_alias_mask): Remove.
(merge_kind::MK_alias_spec): Remove.
(merge_kind_name): Remove entries for alias specializations.
(trees_out::core_vals) <case TEMPLATE_DECL>: Adjust after
removing is_alias_tmpl_inst.
(trees_in::decl_value): Adjust add_mergeable_specialization
calls.
(trees_out::get_merge_kind) <case depset::EK_SPECIALIZATION>:
Use MK_decl_spec for alias template specializations.
(trees_out::key_mergeable): Simplify after MK_tmpl_alias_mask
removal.
(depset::hash::make_dependency): Adjust after removing
DB_ALIAS_TMPL_INST_BIT.
(specialization_add): Don't allow alias templates when !decl_p.
(depset::hash::add_specializations): Remove now-dead code
accomodating alias template specializations in the type table.
* pt.cc (lookup_template_class): Dispatch early to
instantiate_alias_template for alias templates.  Simplify
accordingly.
(add_mergeable_specialization): Remove alias_p parameter and
simplify accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99425-1_b.H: s/alias/decl in dump scan.
* g++.dg/modules/tpl-alias-1_a.H: Likewise.
* g++.dg/modules/tpl-alias-2_a.H: New test.
* g++.dg/modules/tpl-alias-2_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

AArch64: memcpy/memset expansions should not emit LDP/STP [PR113618]

The new RTL introduced for LDP/STP results in regressions due to use of UNSPEC.
Given the new LDP fusion pass is good at finding LDP opportunities, change the
memcpy, memmove and memset expansions to emit single vector loads/stores.
This fixes the regression and enables more RTL optimization on the standard
memory accesses. Handling of unaligned tail of memcpy/memmove is improved
with -mgeneral-regs-only. SPEC2017 performance improves slightly. Codesize
is a bit worse due to missed LDP opportunities as discussed in the PR.

gcc/ChangeLog:
PR target/113618
* config/aarch64/aarch64.cc (aarch64_copy_one_block): Remove.
(aarch64_expand_cpymem): Emit single load/store only.
(aarch64_set_one_block): Emit single stores only.

gcc/testsuite/ChangeLog:
PR target/113618
* gcc.target/aarch64/pr113618.c: New test.

c++/modules: inline namespace abi_tag streaming [PR110730]

The unreduced testcase from PR110730 crashes at runtime ultimately
because we don't stream the abi_tag attribute on inline namespaces and
so the filesystem::current_path() call resolves to the non-C++11 ABI
version even though the C++11 ABI is active, leading to a crash when
destroying the path temporary (which contains an std::string member).
Similar story for the PR105512 testcase.

While we do stream the DECL_ATTRIBUTES of all decls that go through
the generic tree streaming routines, it seems namespaces are streamed
separately from other decls and we don't use the generic routines for
them. So this patch makes us stream the abi_tag manually for (inline)
namespaces.

PR c++/110730
PR c++/105512

gcc/cp/ChangeLog:

* module.cc (module_state::write_namespaces): Stream the
abi_tag attribute of an inline namespace.
(module_state::read_namespaces): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/hello-2_a.C: New test.
* g++.dg/modules/hello-2_b.C: New test.
* g++.dg/modules/namespace-6_a.H: New test.
* g++.dg/modules/namespace-6_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

libstdc++: Do not define lock-free atomic aliases if not fully lock-free [PR114103]

The whole point of these typedefs is to guarantee lock-freedom, so if
the target has no such types, we shouldn't defined the typedefs at all.

libstdc++-v3/ChangeLog:

PR libstdc++/114103
* include/bits/version.def (atomic_lock_free_type_aliases): Add
extra_cond to check for at least one always-lock-free type.
* include/bits/version.h: Regenerate.
* include/std/atomic (atomic_signed_lock_free)
(atomic_unsigned_lock_free): Only use always-lock-free types.
* src/c++20/tzdb.cc (time_zone::_Impl::RulesCounter): Don't use
atomic counter if lock-free aliases aren't available.
* testsuite/29_atomics/atomic/lock_free_aliases.cc: XFAIL for
targets without lock-free word-size compare_exchange.

libstdc++: Update expiry times for leap seconds lists

The list in tzdb.cc isn't the only hardcoded list of leap seconds in the
library, there's the one defined inline in <chrono> (to avoid loading
the tzdb for the common case) and another in a testcase. This updates
them to note that there are no new leap seconds in 2024 either, until at
least 2024-12-28.

libstdc++-v3/ChangeLog:

* include/std/chrono (__get_leap_second_info): Update expiry
time for hardcoded list of leap seconds.
* testsuite/std/time/tzdb/leap_seconds.cc: Update comment.

libstdc++: Replace unnecessary uses of built-ins in testsuite

I don't see why we should rely on __builtin_memset etc. in tests. We can
just include <cstring> and use the public API.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/deque/allocator/default_init.cc: Use
std::memset instead of __builtin_memset.
* testsuite/23_containers/forward_list/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/list/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/map/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/set/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/unordered_map/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/unordered_set/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/vector/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/vector/bool/allocator/default_init.cc:
Likewise.
* testsuite/29_atomics/atomic/compare_exchange_padding.cc:
Likewise.
* testsuite/util/atomic/wait_notify_util.h: Likewise.

libstdc++: Better diagnostics for std::format errors

This adds two new static_assert messages to the internals of
std::make_format_args to give better diagnostics for invalid format
args. Rather than just getting an error saying that basic_format_arg
cannot be constructed, we get more specific errors for the cases where
std::formatter isn't specialized for the type at all, and where it's
specialized but only meets the BasicFormatter requirements and so can
only format non-const arguments.

Also add a test for the existing static_assert when constructing a
format_string for non-formattable args.

libstdc++-v3/ChangeLog:

* include/std/format (_Arg_store::_S_make_elt): Add two
static_assert checks to give more user-friendly error messages.
* testsuite/lib/prune.exp (libstdc++-dg-prune): Prune another
form of "in requirements with" note.
* testsuite/std/format/arguments/args_neg.cc: Check for
user-friendly diagnostics for non-formattable types.
* testsuite/std/format/string_neg.cc: Likewise.

testsuite, darwin: improve check for -shared support

The undefined symbols are allowed for C checks, but when
this is run as C++, the mangled foo() symbol is still
seen as undefined, and the testsuite thinks darwin does not
support -shared.

gcc/testsuite/ChangeLog:

PR target/114233
* lib/target-supports.exp: Fix test for C++.

vect: Do not peel epilogue for partial vectors.

r14-7036-gcbf569486b2dec added an epilogue vectorization guard for early
break but PR114196 shows that we also run into the problem without early
break. Therefore merge the condition into the topmost vectorization
guard.

gcc/ChangeLog:

PR middle-end/114196

* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Merge
vectorization guards.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr114196.c: New test.
* gcc.target/riscv/rvv/autovec/pr114196.c: New test.

PR modula2/109969 Linking large project causes an ICE

This patch contains a re-write of M2LexBuf.mod which removes the linked
list of token buckets and simplifies the implementation using a dynamic
array. It contains more checking (for empty source files for example).
The patch also contains a fix for an ICE in gcc/m2/gm2-gcc/builtins.cc

gcc/m2/ChangeLog:

PR modula2/109969
* gm2-compiler/M2LexBuf.def (TokenToLineNo): Rename parameter.
(TokenToColumnNo): Rename parameter.
(TokenToLocation): Rename parameter.
(FindFileNameFromToken): Rename parameter.
(DumpTokens): Rewrite comment.
* gm2-compiler/M2LexBuf.mod: Rewrite.
* gm2-compiler/P0SyntaxCheck.bnf (CheckInsertCandidate):
DumpTokens before and after inserting recovery token.
* gm2-gcc/m2builtins.cc (do_target_support_exists): Add
bf_c99_compl case.
* gm2-libs/Indexing.def (InitIndexTuned): New procedure
function.
(IsEmpty): New procedure function.
* gm2-libs/Indexing.mod (InitIndexTuned): New procedure
function.
(IsEmpty): New procedure function.
(Index): New field GrowFactor.
(PutIndice): Use GrowFactor to extend dynamic array.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: ICE with variable template and [[deprecated]] [PR110031]

lookup_and_finish_template_variable already has and uses the complain
parameter but it is not passing it down to mark_used so we got the
default tf_warning_or_error, which causes various problems when
lookup_and_finish_template_variable gets called with complain=tf_none.

PR c++/110031

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Pass complain to
mark_used.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/inline-var11.C: New test.

doc: Fix docs for -dD regarding predefined macros

The manual has always claimed that -dD differs from -dM by not
outputting predefined macros, but that's untrue. It has been untrue
since GCC 3.0 (probably with the change to use libcpp as the default
preprocessor implementation).

gcc/ChangeLog:

* doc/cppopts.texi: Remove incorrect claim about -dD not
outputting predefined macros.

rs6000: Don't ICE when compiling the __builtin_vsx_splat_2di [PR113950]

When we expand the __builtin_vsx_splat_2di built-in, we were allowing immediate
value for second operand which causes an unrecognizable insn ICE. Even though
the immediate value was forced into a register, it wasn't correctly assigned
to the second operand. So corrected the assignment of op1 to operands[1].

2024-03-07 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
PR target/113950
* config/rs6000/vsx.md (vsx_splat_<mode>): Correct assignment to operand1
and simplify else if with else.

gcc/testsuite/
PR target/113950
* gcc.target/powerpc/pr113950.c: New testcase.

Fix bogus error on allocator for array type with Dynamic_Predicate

This is a regression present on all active branches: the compiler gives
a bogus error on an allocator for an unconstrained array type declared
with a Dynamic_Predicate because Apply_Predicate_Check is invoked directly
on a subtype reference, which it cannot handle.

This moves the check to the resulting access value (after dereference) like
in Expand_Allocator_Expression.

gcc/ada/
PR ada/113979
* exp_ch4.adb (Expand_N_Allocator): In the subtype indication case,
call Apply_Predicate_Check on the resulting access value if needed.

gcc/testsuite/
* gnat.dg/predicate15.adb: New test.

Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

When building gcc's C++ sources against recent libc++, the poisoning of
the ctype macros due to including safe-ctype.h before including C++
standard headers such as <list>, <map>, etc, causes many compilation
errors, similar to:

  In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
  In file included from /home/dim/src/gcc/master/gcc/system.h:233:
  In file included from /usr/include/c++/v1/vector:321:
  In file included from
  /usr/include/c++/v1/__format/formatter_bool.h:20:
  In file included from
  /usr/include/c++/v1/__format/formatter_integral.h:32:
  In file included from /usr/include/c++/v1/locale:202:
  /usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
  only applies to structs, variables, functions, and namespaces
    546 |     _LIBCPP_INLINE_VISIBILITY
        |     ^
  /usr/include/c++/v1/__config:813:37: note: expanded from macro
  '_LIBCPP_INLINE_VISIBILITY'
    813 | #  define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
        |                                     ^
  /usr/include/c++/v1/__config:792:26: note: expanded from macro
  '_LIBCPP_HIDE_FROM_ABI'
    792 |
    __attribute__((__abi_tag__(_LIBCPP_TOSTRING(
  _LIBCPP_VERSIONED_IDENTIFIER))))
        |                          ^
  In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
  In file included from /home/dim/src/gcc/master/gcc/system.h:233:
  In file included from /usr/include/c++/v1/vector:321:
  In file included from
  /usr/include/c++/v1/__format/formatter_bool.h:20:
  In file included from
  /usr/include/c++/v1/__format/formatter_integral.h:32:
  In file included from /usr/include/c++/v1/locale:202:
  /usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
  declaration list
    547 |     char_type toupper(char_type __c) const
        |                                     ^
  /usr/include/c++/v1/__locale:553:48: error: too many arguments
  provided to function-like macro invocation
    553 |     const char_type* toupper(char_type* __low, const
    char_type* __high) const
        |                                                ^
  /home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
  macro 'toupper' defined here
    146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
        |         ^

This is because libc++ uses different transitive includes than
libstdc++, and some of those transitive includes pull in various ctype
declarations (typically via <locale>).

There was already a special case for including <string> before
safe-ctype.h, so move the rest of the C++ standard header includes to
the same location, to fix the problem.

gcc/ChangeLog:

* system.h: Include safe-ctype.h after C++ standard headers.

Signed-off-by: Dimitry Andric <dimitry@andric.com>

analyzer: Fix up some -Wformat* warnings

I'm seeing warnings like
../../gcc/analyzer/access-diagram.cc: In member function ‘void ana::bit_size_expr::print(pretty_printer*) const’:
../../gcc/analyzer/access-diagram.cc:399:26: warning: unknown conversion type character ‘E’ in format [-Wformat=]
  399 |         pp_printf (pp, _("%qE bytes"), bytes_expr);
      |                          ^~~~~~~~~~~
when building stage2/stage3 gcc.  While such warnings would be
understandable when building stage1 because one could e.g. have some
older host compiler which doesn't understand some of the format specifiers,
the above seems to be because we have in pretty-print.h
#ifdef GCC_DIAG_STYLE
#define GCC_PPDIAG_STYLE GCC_DIAG_STYLE
#else
#define GCC_PPDIAG_STYLE __gcc_diag__
#endif
and use GCC_PPDIAG_STYLE e.g. for pp_printf, and while
diagnostic-core.h has
#ifndef GCC_DIAG_STYLE
#define GCC_DIAG_STYLE __gcc_tdiag__
#endif
(and similarly various FE headers include their own GCC_DIAG_STYLE)
when including pretty-print.h before diagnostic-core.h we end up
with __gcc_diag__ style rather than __gcc_tdiag__ style, which I think
is the right thing for the analyzer, because analyzer seems to use
default_tree_printer everywhere:
grep pp_format_decoder.*=.default_tree_printer analyzer/* | wc -l
57

The following patch fixes that by making sure diagnostic-core.h is included
before pretty-print.h.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

* access-diagram.cc: Include diagnostic-core.h before including
diagnostic.h or diagnostic-path.h.
* sm-malloc.cc: Likewise.
* diagnostic-manager.cc: Likewise.
* call-summary.cc: Likewise.
* record-layout.cc: Likewise.

contrib: Update test_mklog to correspond to mklog

contrib/ChangeLog:

* test_mklog.py: "Moved to..." -> "Move to..."

Signed-off-by: Filip Kastl <fkastl@suse.cz>

c++: Fix ICE diagnosing incomplete type of overloaded function set [PR98356]

In the linked PR the result of 'get_first_fn' is a USING_DECL against
the template parameter, to be filled in on instantiation. But we don't
actually need to get the first set of the member functions: it's enough
to know that we have a (possibly overloaded) member function at all.

PR c++/98356

gcc/cp/ChangeLog:

* typeck2.cc (cxx_incomplete_type_diagnostic): Don't assume
'member' will be a FUNCTION_DECL (or something like it).

gcc/testsuite/ChangeLog:

* g++.dg/pr98356.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Stream DECL_CONTEXT for template template parms [PR98881]

When streaming in a nested template-template parameter as in the
attached testcase, we end up reaching the containing template-template
parameter in 'tpl_parms_fini'. We should not set the DECL_CONTEXT to
this (nested) template-template parameter, as it should already be the
struct that the outer template-template parameter is declared on.

The precise logic for what DECL_CONTEXT should be for a template
template parameter in various situations seems rather obscure. Rather
than trying to determine the assumptions that need to hold, it seems
simpler to just always re-stream the DECL_CONTEXT as needed for now.

PR c++/98881

gcc/cp/ChangeLog:

* module.cc (trees_out::tpl_parms_fini): Stream out DECL_CONTEXT
for template template parameters.
(trees_in::tpl_parms_fini): Read it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-tpl-parm-3.h: New test.
* g++.dg/modules/tpl-tpl-parm-3_a.H: New test.
* g++.dg/modules/tpl-tpl-parm-3_b.C: New test.
* g++.dg/modules/tpl-tpl-parm-3_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]

The following testcase ICEs, because fix_crossing_unconditional_branches
thinks that asm goto is an unconditional jump and removes it, replacing it
with unconditional jump to one of the labels.
This doesn't happen on x86 because the function in question isn't invoked
there at all:
  /* If the architecture does not have unconditional branches that
     can span all of memory, convert crossing unconditional branches
     into indirect jumps.  Since adding an indirect jump also adds
     a new register usage, update the register usage information as
     well.  */
  if (!HAS_LONG_UNCOND_BRANCH)
    fix_crossing_unconditional_branches ();
I think for the asm goto case, for the non-fallthru edge if any we should
handle it like any other fallthru (and fix_crossing_unconditional_branches
doesn't really deal with those, it only looks at explicit branches at the
end of bbs and we are in cfglayout mode at that point) and for the labels
we just pass the labels as immediates to the assembly and it is up to the
user to figure out how to store them/branch to them or whatever they want to
do.
So, the following patch fixes this by not treating asm goto as a simple
unconditional jump.

I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
somewhere else, where outofcfglayout or whatever should actually create
those indirect jumps on the crossing edges instead of adding normal
unconditional jumps, I see e.g. in
__attribute__((cold)) int bar (char *);
__attribute__((hot)) int baz (char *);
void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto l1; l1: baz (""); }
void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: return; l1: bar (""); goto l2; }
with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
just b .L? jumps which I believe are +-32MB, so if .text is larger than
32MB, it could fail to link, but this patch doesn't address that.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/110079
* bb-reorder.cc (fix_crossing_unconditional_branches): Don't adjust
asm goto.

* gcc.dg/pr110079.c: New test.

expand: Fix UB in choose_mult_variant [PR105533]

As documented in the function comment, choose_mult_variant attempts to
compute costs of 3 different cases, val, -val and val - 1.
The -val case is actually only done if val fits into host int, so there
should be no overflow, but the val - 1 case is done unconditionally.
val is shwi (but inside of synth_mult already uhwi), so when val is
HOST_WIDE_INT_MIN, val - 1 invokes UB.  The following patch fixes that
by using val - HOST_WIDE_INT_1U, but I'm not really convinced it would
DTRT for > 64-bit modes, so I've guarded it as well.  Though, arch
would need to have really strange costs that something that could be
expressed as x << 63 would be better expressed as (x * 0x7fffffffffffffff) + 1
In the long term, I think we should just rewrite
choose_mult_variant/synth_mult etc. to work on wide_int.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/105533
* expmed.cc (choose_mult_variant): Only try the val - 1 variant
if val is not HOST_WIDE_INT_MIN or if mode has exactly
HOST_BITS_PER_WIDE_INT precision.  Avoid triggering UB while computing
val - 1.

* gcc.dg/pr105533.c: New test.

sccvn: Avoid UB in ao_ref_init_from_vn_reference [PR105533]

When compiling libgcc or on e.g.
int a[64];
int p;

void
foo (void)
{
  int s = 1;
  while (p)
    {
      s -= 11;
      a[s] != 0;
    }
}
sccvn invokes UB in the compiler as detected by ubsan:
../../gcc/poly-int.h:1089:5: runtime error: left shift of negative value -40
The problem is that we still use C++11..C++17 as the implementation language
and in those C++ versions shifting negative values left is UB (well defined
since C++20) and above in
           offset += op->off << LOG2_BITS_PER_UNIT;
op->off is poly_int64 with -40 value (in libgcc with -8).
I understand the offset_int << LOG2_BITS_PER_UNIT shifts but it is then well
defined during underlying implementation which is done on the uhwi limbs,
but for poly_int64 we use
                offset += pop->off * BITS_PER_UNIT;
a few lines earlier and I think that is both more readable in what it
actually does and triggers UB only if there would be signed multiply
overflow.  In the end, the compiler will treat them the same at least at the
RTL level (at least, if not and they aren't the same cost, it should).

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/105533
* tree-ssa-sccvn.cc (ao_ref_init_from_vn_reference) <case ARRAY_REF>:
Multiple op->off by BITS_PER_UNIT instead of shifting it left by
LOG2_BITS_PER_UNIT.

LoongArch: testsuite:Fix problems with incorrect results in vector test cases.

In simd_correctness_check.h, the role of the macro ASSERTEQ_64 is to check the
result of the passed vector values for the 64-bit data of each array element.
It turns out that it uses the abs() function to check only the lower 32 bits
of the data at a time, so it replaces abs() with the llabs() function.

However, the following two problems may occur after modification:

1.FAIL in lasx-xvfrint_s.c and lsx-vfrint_s.c
The reason for the error is because vector test cases that use __m{128,256} to
define vector types are composed of 32-bit primitive types, they should use
ASSERTEQ_32 instead of ASSERTEQ_64 to check for correctness.

2.FAIL in lasx-xvshuf_b.c and lsx-vshuf.c
The cause of the error is that the expected result of the function setting in
the test case is incorrect.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c: Replace
ASSERTEQ_64 with the macro ASSERTEQ_32.
* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Modify the expected
test results of some functions according to the function of the vector
instruction.
* gcc.target/loongarch/vector/lsx/lsx-vfrint_s.c: Same
modification as lasx-xvfrint_s.c.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Same
modification as lasx-xvshuf_b.c.
* gcc.target/loongarch/vector/simd_correctness_check.h: Use the llabs()
function instead of abs() to check the correctness of the results.

LoongArch: Use /lib instead of /lib64 as the library search path for MUSL.

gcc/ChangeLog:

* config.gcc: Add a case for loongarch*-*-linux-musl*.
* config/loongarch/linux.h: Disable the multilib-compatible
treatment for *musl* targets.
* config/loongarch/musl.h: New file.

match.pd: Optimize a * !a to 0 [PR114009]

The following patch attempts to fix an optimization regression through
adding a simple simplification.  We already have the
/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
(if (!canonicalize_math_p ())
(for cmp (tcc_comparison)
  (simplify
   (mult:c (convert (cmp@0 @1 @2)) @3)
   (if (INTEGRAL_TYPE_P (type)
        && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
     (cond @0 @3 { build_zero_cst (type); })))
optimization which otherwise triggers during the a * !a multiplication,
but that is done only late and we aren't able through range assumptions
optimize it yet anyway.

The patch adds a specific simplification for it.
If a is zero, then a * !a will be 0 * 1 (or for signed 1-bit 0 * -1)
and so 0.
If a is non-zero, then a * !a will be a * 0 and so again 0.
THe pattern is valid for scalar integers, complex integers and vector types,
but I think will actually trigger only for the scalar integers.  For
vector types I've added other two with VEC_COND_EXPR in it, for complex
there are different GENERIC trees to match and it is something that likely
would be never matched in GIMPLE, so I didn't handle that.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/114009
* genmatch.cc (decision_tree::gen): Emit ARG_UNUSED for captures
argument even for GENERIC, not just for GIMPLE.
* match.pd (a * !a -> 0): New simplifications.

* gcc.dg/tree-ssa/pr114009.c: New test.

RISC-V: Refactor expand_vec_cmp [NFC]

There are two expand_vec_cmp functions.
They have same structure and similar code.
We can use default arguments instead of overloading.

Tested on RV32 and RV64.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_cmp): Change proto
* config/riscv/riscv-v.cc (expand_vec_cmp): Use default arguments
(expand_vec_cmp_float): Adapt arguments

Signed-off-by: demin.han <demin.han@starfivetech.com>

Fortran: Fix issue with using snprintf function.

The previous patch used snprintf to set the message
string. The message string is not a formatted string
and the snprintf will interpret '%' related characters
as format specifiers when there are no associated
output variables. A segfault ensues.

This change replaces snprintf with a fortran string copy
function and null terminates the message string.

PR libfortran/105456

libgfortran/ChangeLog:

* io/list_read.c (list_formatted_read_scalar): Use fstrcpy
from libgfortran/runtime/string.c to replace snprintf.
(nml_read_obj): Likewise.
* io/transfer.c (unformatted_read): Likewise.
(unformatted_write): Likewise.
(formatted_transfer_scalar_read): Likewise.
(formatted_transfer_scalar_write): Likewise.
* io/write.c (list_formatted_write_scalar): Likewise.
(nml_write_obj): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105456.f90: Revise using '%' characters
in users error message.

Daily bump.

i386: Fix and improve insn constraint for V2QI arithmetic/shift insns

optimize_function_for_size_p predicate is not stable during optab selection,
because it also depends on node->count/node->frequency of the current function,
which are updated during IPA, so they may change between early opts and
late opts.  Use optimize_size instead - optimize_size implies
optimize_function_for_size_p (cfun), so if a named pattern uses
"&& optimize_size" and the insn it splits into uses
optimize_function_for_size_p (cfun), it shouldn't fail.

PR target/114232

gcc/ChangeLog:

* config/i386/mmx.md (negv2qi2): Enable for optimize_size instead
of optimize_function_for_size_p.  Explictily enable for TARGET_SSE2.
(negv2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(<plusminus:insn>v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.  Explictily enable for TARGET_SSE2.
(<plusminus:insn>v2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(<any_shift:insn>v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.

RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads [PR114200].

Three-operand instructions like vmacc are modeled with an implicit
output reload when the output does not match one of the operands. For
this we use vmv.v.v which is subject to length masking.

In a situation where the current vl is less than the full vlenb
and the fma's result value is used as input for a vector reduction
(which is never length masked) we effectively only reduce vl
elements. The masked-out elements are relevant for the
reduction, though, leading to a wrong result.

This patch replaces the vmv reloads by full-register reloads.

gcc/ChangeLog:

PR target/114200
PR target/114202

* config/riscv/vector.md: Use vmv[1248]r.v instead of vmv.v.v.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr114200.c: New test.
* gcc.target/riscv/rvv/autovec/pr114202.c: New test.

RISC-V: Adjust vec unit-stride load/store costs.

Scalar loads provide offset addressing while unit-stride vector
instructions cannot.  The offset must be loaded into a general-purpose
register before it can be used.  In order to account for this, this
patch adds an address arithmetic heuristic that keeps track of data
reference operands.  If we haven't seen the operand before we add the
cost of a scalar statement.

This helps to get rid of an lbm regression when vectorizing (roughly
0.5% fewer dynamic instructions).  gcc5 improves by 0.2% and deepsjeng
by 0.25%.  wrf and nab degrade by 0.1%.  This is because before we now
adjust the cost of SLP as well as loop-vectorized instructions whereas
we would only adjust loop-vectorized instructions before.
Considering higher scalar_to_vec costs (3 vs 1) for all vectorization
types causes some snippets not to get vectorized anymore.  Given these
costs the decision looks correct but appears worse when just counting
dynamic instructions.

In total SPECint 2017 has 4 bln dynamic instructions less and SPECfp 0.7
bln.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Move...
(costs::adjust_stmt_cost): ... to here and add vec_load/vec_store
offset handling.
(costs::add_stmt_cost): Also adjust cost for statements without
stmt_info.
* config/riscv/riscv-vector-costs.h: Define zero constant.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-2.c: New test.

ARM: Fix conditional execution [PR113915]

By default most patterns can be conditionalized on Arm targets.  However
Thumb-2 predication requires the "predicable" attribute be explicitly
set to "yes".  Most patterns are shared between Arm and Thumb(-2) and are
marked with "predicable".  Given this sharing, it does not make sense to
use a different default for Arm.  So only consider conditional execution
of instructions that have the predicable attribute set to yes.  This ensures
that patterns not explicitly marked as such are never conditionally executed.

gcc/ChangeLog:
PR target/113915
* config/arm/arm.md (NOCOND): Improve comment.
(arm_rev*) Add predicable.
* config/arm/arm.cc (arm_final_prescan_insn): Add check for
PREDICABLE_YES.

gcc/testsuite/ChangeLog:
PR target/113915
* gcc.target/arm/builtin-bswap-1.c: Fix test to allow conditional
execution both for Arm and Thumb-2.

Revert "Set num_threads to 50 on 32-bit hppa in two libgomp loop tests"

This reverts commit b14209715e659f6d3ca0f9eef9a4851e7bd6e373.

[PR target/113001] Fix incorrect operand swapping in conditional move

This bug totally fell off my radar.  Sorry about that.

We have some special casing the conditional move expander to simplify a
conditional move when comparing a register against zero and that same register
is one of the arms.

Specifically a (eq (reg) (const_int 0)) where reg is also the true arm or (ne
(reg) (const_int 0)) where reg is the false arm need not use the fully
generalized conditional move, thus saving an instruction for those cases.

In the NE case we swapped the operands, but didn't swap the condition, which
led to the ICE due to an unrecognized pattern.  THe backend actually has
distinct patterns for those two cases.  So swapping the operands is neither
needed nor advisable.

Regression tested on rv64gc and verified the new tests pass.

Pushing to the trunk.

PR target/113001
PR target/112871
gcc/
* config/riscv/riscv.cc (expand_conditional_move): Do not swap
operands when the comparison operand is the same as the false
arm for a NE test.

gcc/testsuite
* gcc.target/riscv/zicond-ice-3.c: New test.
* gcc.target/riscv/zicond-ice-4.c: New test.

Fortran: error recovery while simplifying expressions [PR103707,PR106987]

When an exception is encountered during simplification of arithmetic
expressions, the result may depend on whether range-checking is active
(-frange-check) or not.  However, the code path in the front-end should
stay the same for "soft" errors for which the exception is triggered by the
check, while "hard" errors should always terminate the simplification, so
that error recovery is independent of the flag.  Separation of arithmetic
error codes into "hard" and "soft" errors shall be done consistently via
is_hard_arith_error().

PR fortran/103707
PR fortran/106987

gcc/fortran/ChangeLog:

* arith.cc (is_hard_arith_error): New helper function to determine
whether an arithmetic error is "hard" or not.
(check_result): Use it.
(gfc_arith_divide): Set "Division by zero" only for regular
numerators of real and complex divisions.
(reduce_unary): Use is_hard_arith_error to determine whether a hard
or (recoverable) soft error was encountered.  Terminate immediately
on hard error, otherwise remember code of first soft error.
(reduce_binary_ac): Likewise.
(reduce_binary_ca): Likewise.
(reduce_binary_aa): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr99350.f90:
* gfortran.dg/arithmetic_overflow_3.f90: New test.

c++: ICE with noexcept and local specialization [PR114114]

Here we ICE because we call register_local_specialization while
local_specializations is null, so

  local_specializations->put ();

crashes on null this.  It's null since maybe_instantiate_noexcept calls
push_to_top_level which creates a new scope.  Normally, I would have
guessed that we need a new local_specialization_stack.  But here we're
dealing with an operand of a noexcept, which is an unevaluated operand,
and those aren't registered in the hash map.  maybe_instantiate_noexcept
wasn't signalling that it's substituting an unevaluated operand though.

PR c++/114114

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Save/restore
cp_unevaluated_operand, c_inhibit_evaluation_warnings, and
cp_noexcept_operand around the tsubst_expr call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept84.C: New test.

i386: Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move

Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move and
use generic code instead.

No functional changes.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_move) [TARGET_MACHO]:
Eliminate common code and use generic code instead.

amdgcn: additional gfx1030/gfx1100 support: adjust test cases

The "SDWA" changes in commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support" caused a few regressions:

    PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler zero_extendv64qiv64si2

    PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler zero_extendv64hiv64si2

    PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler zero_extendv64qiv64si2

    PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler zero_extendv64hiv64si2

Those test cases need corresponding adjustment.

gcc/testsuite/
* gcc.target/gcn/sram-ecc-3.c: Adjust.
* gcc.target/gcn/sram-ecc-4.c: Likewise.
* gcc.target/gcn/sram-ecc-7.c: Likewise.
* gcc.target/gcn/sram-ecc-8.c: Likewise.

AVR: Adjust rtx cost of plus + zero_extend.

gcc/
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS+ZERO_EXTEND]: Adjust
rtx cost.

tree-optimization/114239 - rework reduction epilogue driving

The following reworks vectorizable_live_operation to pass the
live stmt to vect_create_epilog_for_reduction also for early breaks
and a peeled main exit.  This is to be able to figure the scalar
definition to replace.  This reverts the PR114192 fix as it is
subsumed by this cleanup.

PR tree-optimization/114239
* tree-vect-loop.cc (vect_get_vect_def): Remove.
(vect_create_epilog_for_reduction): The passed in stmt_info
should now be the live stmt that produces the scalar reduction
result.  Revert PR114192 fix.  Base reduction info off
info_for_reduction.  Remove special handling of
early-break/peeled, restore original vector def gathering.
Make sure to pick the correct exit PHIs.
(vectorizable_live_operation): Pass in the proper stmt_info
for early break exits.

* gcc.dg/vect/vect-early-break_122-pr114239.c: New testcase.

LoongArch: testsuite: Rewrite {x,}vfcmp-{d,f}.c to avoid named registers

Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names. A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.

aarch64: Define out-of-class static constants

While reworking the aarch64 feature descriptions, I forgot
to add out-of-class definitions of some static constants.
This could lead to a build failure with some compilers.

This was seen with some WIP to increase the number of extensions
beyond 64. It's latent on trunk though, and a regression from
before the rework.

gcc/
* config/aarch64/aarch64-feature-deps.h (feature_deps::info): Add
out-of-class definitions of static constants.

c++: Fix template deduction for conversion operators with xobj parameters [PR113629]

Unification for conversion operators (DEDUCE_CONV) doesn't perform
transformations like handling forwarding references. This is correct in
general, but not for xobj parameters, which should be handled "normally"
for the purposes of deduction: [temp.deduct.conv] only applies to the
return type of the conversion function.

PR c++/113629

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Only use DEDUCE_CONV for the
return type of a conversion function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-conv-op.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

tree-optimization/114249 - ICE with BB reduction vectorization

When we scrap the last def of an odd lane numbered BB reduction
we can end up recording a pattern def which will later wreck
code generation. The following puts this logic where it better
belongs, avoiding this issue.

PR tree-optimization/114249
* tree-vect-slp.cc (vect_build_slp_instance): Move making
a BB reduction lane number even ...
(vect_slp_check_for_roots): ... here to avoid leaking
pattern defs.

* gcc.dg/vect/bb-slp-pr114249.c: New testcase.

tree-optimization/114246 - invalid call argument from DSE

The following makes sure to strip type conversions added by
build_fold_addr_expr before placing the result in a call argument.

PR tree-optimization/114246
* tree-ssa-dse.cc (increment_start_addr): Strip useless
type conversions from the adjusted address.

* gcc.dg/torture/pr114246.c: New testcase.

i386: Fix up the vzeroupper REG_DEAD/REG_UNUSED note workaround [PR114190]

When writing the rest_of_handle_insert_vzeroupper workaround to manually
remove all the REG_DEAD/REG_UNUSED notes from the IL, I've missed that
there is a df_analyze () call right after it and that the problems added
earlier in the pass, like df_note_add_problem () done during mode switching,
doesn't affect just the next df_analyze () call right after it, but all
other df_analyze () calls until the end of the current pass where
df_finish_pass removes the optional problems.

So, as can be seen on the following patch, the workaround doesn't actually
work there, because while rest_of_handle_insert_vzeroupper carefully removes
all REG_DEAD/REG_UNUSED notes, the df_analyze () call at the end of the
function immediately adds them in again (so, I must say I have no idea
why the workaround worked on the earlier testcases).

Now, I could move the df_analyze () call just before the REG_DEAD/REG_UNUSED
note removal loop, but I think the following patch is better, because
the df_analyze () call doesn't have to recompute the problem when we don't
care about it and will actively strip all traces of it away.

2024-03-06 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/114190
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Call df_remove_problem for df_note before calling df_analyze.

* gcc.target/i386/avx-pr114190.c: New test.

Fortran: Add user defined error messages for UDTIO.

The defines IOMSG_LEN and MSGLEN were redundant so these are combined
into IOMSG_LEN as defined in io.h.

The remainder of the patch adds checks for when a user defined
derived type IO procedure sets the IOSTAT or IOMSG variables
independent of the librrary defined I/O messages.

PR libfortran/105456

libgfortran/ChangeLog:

* io/io.h (IOMSG_LEN): Moved to here.
* io/list_read.c (MSGLEN): Removed MSGLEN.
(convert_integer): Changed MSGLEN to IOMSG_LEN.
(parse_repeat): Likewise.
(read_logical): Likewise.
(read_integer): Likewise.
(read_character): Likewise.
(parse_real): Likewise.
(read_complex): Likewise.
(read_real): Likewise.
(check_type): Likewise.
(list_formatted_read_scalar): Adjust to IOMSG_LEN.
(nml_read_obj): Add user defined error message.
* io/transfer.c (unformatted_read): Add user defined error
message.
(unformatted_write): Add user defined error message.
(formatted_transfer_scalar_read): Add user defined error message.
(formatted_transfer_scalar_write): Add user defined error message.
* io/write.c (list_formatted_write_scalar): Add user defined error message.
(nml_write_obj): Add user defined error message.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105456-nmlr.f90: New test.
* gfortran.dg/pr105456-nmlw.f90: New test.
* gfortran.dg/pr105456-ruf.f90: New test.
* gfortran.dg/pr105456-wf.f90: New test.
* gfortran.dg/pr105456-wuf.f90: New test.

c++/modules: befriending template from current class scope

Here the TEMPLATE_DECL representing the template friend declaration
naming B has class scope since the template B has class scope, but
get_merge_kind assumes all DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P
TEMPLATE_DECL have namespace scope and wrongly returns MK_named instead
of MK_local_friend for the friend.

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind) <case depset::EK_DECL>:
Accomodate class-scope DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P
TEMPLATE_DECL. Consolidate IDENTIFIER_ANON_P cases.

gcc/testsuite/ChangeLog:

* g++.dg/modules/friend-7.h: New test.
* g++.dg/modules/friend-7_a.H: New test.
* g++.dg/modules/friend-7_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

ctf: fix incorrect CTF for multi-dimensional array types

PR debug/114186

DWARF DIEs of type DW_TAG_subrange_type are linked together to represent
the information about the subsequent dimensions. The CTF processing was
so far working through them in the opposite (incorrect) order.

While fixing the issue, refactor the code a bit for readability.

co-authored-By: Indu Bhagat <indu.bhagat@oracle.com>

gcc/
PR debug/114186
* dwarf2ctf.cc (gen_ctf_array_type): Invoke the ctf_add_array ()
in the correct order of the dimensions.
(gen_ctf_subrange_type): Refactor out handling of
DW_TAG_subrange_type DIE to here.

gcc/testsuite/
PR debug/114186
* gcc.dg/debug/ctf/ctf-array-6.c: Add test.

asan: Handle poly-int sizes in ASAN_MARK [PR97696]

This patch makes the expansion of IFN_ASAN_MARK let through
poly-int-sized objects. The expansion itself was already generic
enough, but the tests for the fast path were too strict.

gcc/
PR sanitizer/97696
* asan.cc (asan_expand_mark_ifn): Allow the length to be a poly_int.

gcc/testsuite/
PR sanitizer/97696
* gcc.target/aarch64/sve/pr97696.c: New test.

aarch64: Remove SME2.1 forms of LUTI2/4

I was over-eager when adding support for strided SME2 instructions
and accidentally included forms of LUTI2 and LUTI4 that are only
available with SME2.1, not SME2. This patch removes them for now.
We're planning to add proper support for SME2.1 in the GCC 15
timeframe.

Sorry for the blunder :(

gcc/
* config/aarch64/aarch64.md (stride_type): Remove luti_consecutive
and luti_strided.
* config/aarch64/aarch64-sme.md
(@aarch64_sme_lut<LUTI_BITS><mode>): Remove stride_type attribute.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided2): Delete.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
* config/aarch64/aarch64-early-ra.cc (is_stride_candidate)
(early_ra::maybe_convert_to_strided_access): Remove support for
strided LUTI2 and LUTI4.

gcc/testsuite/
* gcc.target/aarch64/sme/strided_1.c (test5): Remove.

arm: check for low register before applying peephole [PR113510]

For thumb1, when using a peephole to fuse

mov reg, #const
add reg, reg, SP

into

add reg, SP, #const

we must first check that reg is a low register, otherwise we will ICE
when trying to recognize the resulting insn.

gcc/ChangeLog:

PR target/113510
* config/arm/thumb1.md (peephole2 to fuse mov imm/add SP): Use
low_register_operand.

Fix testcase pr112337.c to check the options [PR112337]

gcc.target/arm/pr112337.c was failing to validate that adding MVE options
was compatible with the test environment, so add the missing checks.

gcc/testsuite/ChangeLog:

PR target/112337
* gcc.target/arm/pr112337.c: Check for, then use the right MVE
options.

AVR: Add two RTL peepholes.

Register alloc may expand a 3-operand arithmetic X = Y o CST as
   X = CST
   X o= Y
where it may be better to instead:
   X = Y
   X o= CST
because 1) the first insn may use MOVW for "X = Y", and 2) the
operation may be more efficient when performed with a constant,
for example when ADIW or SBIW can be used, or some bytes of
the constant are 0x00 or 0xff.

gcc/
* config/avr/avr.md: Add two RTL peepholes for PLUS, IOR and AND
in HI, PSI, SI that swap operation order from "X = CST, X o= Y"
to "X = Y, X o= CST".

Regenerate c.opt.urls

Fixes: 08edf85f747b ("c++/modules: relax diagnostic about GMF contents")
gcc/c-family/ChangeLog:

* c.opt.urls: Regenerate.

LoongArch: Allow s9 as a register alias

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/regname-fp-s9.c: New test.

AVR: Improve output of insn "*insv.any_shift.<mode>_split".

The instructions printed by insn "*insv.any_shift.<mode>_split" were
sub-optimal.  The code to print the improved output is lengthy and
performed by new function avr_out_insv.  As it turns out, the function
can also handle shift-offsets of zero, which is "*andhi3", "*andpsi3"
and "*andsi3".  Thus, these tree insns get a new 3-operand alternative
where the 3rd operand is an exact power of 2.

gcc/
* config/avr/avr-protos.h (avr_out_insv): New proto.
* config/avr/avr.cc (avr_out_insv): New function.
(avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle case.
(avr_cbranch_cost) [ZERO_EXTRACT]: Adjust rtx costs.
* config/avr/avr.md (define_attr "adjust_len") Add insv.
(andhi3, *andhi3, andpsi3, *andpsi3, andsi3, *andsi3):
Add constraint alternative where the 3rd operand is a power
of 2, and the source register may differ from the destination.
(*insv.any_shift.<mode>_split): Call avr_out_insv to output
instructions.  Set attr "length" to "insv".
* config/avr/constraints.md (Cb2, Cb3, Cb4): New constraints.

gcc/testsuite/
* gcc.target/avr/torture/insv-anyshift-hi.c: New test.
* gcc.target/avr/torture/insv-anyshift-si.c: New test.

tree-optimization/114231 - use patterns for BB SLP discovery root stmts

The following makes sure to use recognized patterns when vectorizing
roots during BB SLP discovery. We need to apply those late since
during root discovery we've not yet done pattern recognition.
All parts of the vectorizer assume patterns get used, for the testcase
we mix this up when doing live lane computation.

PR tree-optimization/114231
* tree-vect-slp.cc (vect_analyze_slp): Lookup patterns when
processing a BB SLP root.

* gcc.dg/vect/pr114231.c: New testcase.

lower-subreg: Fix ROTATE handling [PR114211]

On the following testcase, we have
(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
        (rotate:TI (reg/v:TI 106 [ h ])
            (const_int 64 [0x40]))) "pr114211.c":8:5 1042 {rotl64ti2_doubleword}
     (nil))
before subreg1 and the pass decides to use
(reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
register pair instead of (reg/v:TI 106 [ h ]).
resolve_operand_for_swap_move_operator implements it by pretending it is
an assignment from
(concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
to
(concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
The problem is that if the rotate argument is the same as destination or
if there is even an overlap between the first half of the destination with
second half of the source we emit incorrect code, because the store to
(reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
move.  The following patch detects that case and uses a temporary pseudo
to hold the original (reg:DI 128 [ h+8 ]) value across the first store.

2024-03-05  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/114211
* lower-subreg.cc (resolve_simple_move): For double-word
rotates by BITS_PER_WORD if there is overlap between source
and destination use a temporary.

* gcc.dg/pr114211.c: New test.

bitint: Handle BIT_FIELD_REF lowering [PR114157]

The following patch adds support for BIT_FIELD_REF lowering with
large/huge _BitInt lhs. BIT_FIELD_REF requires mode argument first
operand, so the operand shouldn't be any huge _BitInt.
If we only access limbs from inside of BIT_FIELD_REF using constant
indexes, we can just create a new BIT_FIELD_REF to extract the limb,
but if we need to use variable index in a loop, I'm afraid we need
to spill it into memory, which is what the following patch does.
If there is some bitwise type for the extraction, it extracts just
what we need and not more than that, otherwise it spills the whole
first argument of BIT_FIELD_REF and uses MEM_REF with an offset
with VIEW_CONVERT_EXPR around it.

2024-03-05 Jakub Jelinek <jakub@redhat.com>

PR middle-end/114157
* gimple-lower-bitint.cc: Include stor-layout.h.
(mergeable_op): Return true for BIT_FIELD_REF.
(struct bitint_large_huge): Declare handle_bit_field_ref method.
(bitint_large_huge::handle_bit_field_ref): New method.
(bitint_large_huge::handle_stmt): Use it for BIT_FIELD_REF.

* gcc.dg/bitint-98.c: New test.
* gcc.target/i386/avx2-pr114157.c: New test.
* gcc.target/i386/avx512f-pr114157.c: New test.

i386: For noreturn functions save at least the bp register if it is used [PR114116]

As mentioned in the PR, on x86_64 currently a lot of ICEs end up
with crashes in the unwinder like:
during RTL pass: expand
pr114044-2.c: In function ‘foo’:
pr114044-2.c:5:3: internal compiler error: in expand_fn_using_insn, at internal-fn.cc:208
    5 |   __builtin_clzg (a);
      |   ^~~~~~~~~~~~~~~~~~
0x7d9246 expand_fn_using_insn
        ../../gcc/internal-fn.cc:208

pr114044-2.c:5:3: internal compiler error: Segmentation fault
0x1554262 crash_signal
        ../../gcc/toplev.cc:319
0x2b20320 x86_64_fallback_frame_state
        ./md-unwind-support.h:63
0x2b20320 uw_frame_state_for
        ../../../libgcc/unwind-dw2.c:1013
0x2b2165d _Unwind_Backtrace
        ../../../libgcc/unwind.inc:303
0x2acbd69 backtrace_full
        ../../libbacktrace/backtrace.c:127
0x2a32fa6 diagnostic_context::action_after_output(diagnostic_t)
        ../../gcc/diagnostic.cc:781
0x2a331bb diagnostic_action_after_output(diagnostic_context*, diagnostic_t)
        ../../gcc/diagnostic.h:1002
0x2a331bb diagnostic_context::report_diagnostic(diagnostic_info*)
        ../../gcc/diagnostic.cc:1633
0x2a33543 diagnostic_impl
        ../../gcc/diagnostic.cc:1767
0x2a33c26 internal_error(char const*, ...)
        ../../gcc/diagnostic.cc:2225
0xe232c8 fancy_abort(char const*, int, char const*)
        ../../gcc/diagnostic.cc:2336
0x7d9246 expand_fn_using_insn
        ../../gcc/internal-fn.cc:208
Segmentation fault (core dumped)

The problem are the PR38534 r14-8470 changes which avoid saving call-saved
registers in noreturn functions.  If such functions ever touch the
bp register but because of the r14-8470 changes don't save it in the
prologue, the caller or any other function in the backtrace uses a frame
pointer and the noreturn function or anything it calls directly or
indirectly calls backtrace, then the unwinder crashes, because bp register
contains some unrelated value, but in the frames which do use frame pointer
CFA is based on the bp register.

In theory this could happen with any other call-saved register, e.g. code
written by hand in assembly with .cfi_* directives could use any other
call-saved register as register into which store the CFA or something
related to that, but in reality at least compiler generated code and usual
assembly probably just making sure bp doesn't contain garbage could be
enough for backtrace purposes.  In the debugger of course it will not be
enough, the values of the arguments etc. can be lost (if DW_CFA_undefined
is emitted) or garbage.

So, I think for noreturn function we should at least save the bp register
if we use it.  If user asks for it using no_callee_saved_registers
attribute, let's honor what is asked for (but then it is up to the user
to make sure e.g. backtrace isn't called from the function or anything it
calls).  As discussed in the PR, whether to save bp or not shouldn't be
based on whether compiling with -g or -g0, because we don't want code
generation changes without/with debugging, it would also break
-fcompare-debug, and users can call backtrace(3), that doesn't use debug
info, just unwind info, even backtrace_symbols{,_fd}(3) don't use debug info
but just looks at dynamic symbol table.

The patch also adds check for no_caller_saved_registers
attribute in the implicit addition of not saving callee saved register
in noreturn functions, because on I think
__attribute__((no_caller_saved_registers, noreturn)) will otherwise
error that no_caller_saved_registers and no_callee_saved_registers
attributes are incompatible (but user didn't specify anything like that).

2024-03-05  Jakub Jelinek  <jakub@redhat.com>

PR target/114116
* config/i386/i386.h (enum call_saved_registers_type): Add
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP enumerator.
* config/i386/i386-options.cc (ix86_set_func_type): Remove
has_no_callee_saved_registers variable, add no_callee_saved_registers
instead, initialize it depending on whether it is
no_callee_saved_registers function or not.  Don't set it if
no_caller_saved_registers attribute is present.  Adjust users.
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Handle
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP like
TYPE_NO_CALLEE_SAVED_REGISTERS.
(ix86_save_reg): Handle TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP.

* gcc.target/i386/pr38534-1.c: Allow push/pop of bp.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr114097-1.c: Likewise.
* gcc.target/i386/stack-check-17.c: Expect no pop on ! ia32.

RISC-V: Cleanup unused code in riscv_v_adjust_bytesize [NFC]

Cleanup mode_size related code which is not used anymore. Below tests are
passed for this patch.

* The RVV fully regresssion test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_v_adjust_bytesize): Cleanup unused
mode_size related code.

Signed-off-by: Pan Li <pan2.li@intel.com>

c++/modules: relax diagnostic about GMF contents

Issuing a hard error when the GMF doesn't consist only of preprocessing
directives happens to be inconvenient for automated testcase reduction
via cvise. This patch relaxes this diagnostic into a pedwarn that can
be disabled with -Wno-global-module.

gcc/c-family/ChangeLog:

* c.opt (Wglobal-module): New warning.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_translation_unit): Relax GMF contents
error into a pedwarn.

gcc/ChangeLog:

* doc/invoke.texi (-Wno-global-module): Document.

gcc/testsuite/ChangeLog:

* g++.dg/modules/friend-6_a.C: Pass -Wno-global-module instead
of -Wno-pedantic. Remove now unnecessary preprocessing
directives from GMF.

Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

c++: Support exporting using-decls in same namespace as target

Currently a using-declaration bringing a name into its own namespace is
a no-op, except for functions. This prevents people from being able to
redeclare a name brought in from the GMF as exported, however, which
this patch fixes.

Apart from marking declarations as exported they are also now marked as
effectively being in the module purview (due to the using-decl) so that
they are properly processed, as 'add_binding_entity' assumes that
declarations not in the module purview cannot possibly be exported.

gcc/cp/ChangeLog:

* name-lookup.cc (walk_module_binding): Remove completed FIXME.
(do_nonmember_using_decl): Mark redeclared entities as exported
when needed. Check for re-exporting internal linkage types.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-12.C: New test.
* g++.dg/modules/using-13.h: New test.
* g++.dg/modules/using-13_a.C: New test.
* g++.dg/modules/using-13_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

PR modula2/114227 InstallTerminationProcedure does not work with -fiso

This patch moves the initial/termination user procedure functionality in
pim and iso versions of M2RTS into M2Dependent. This ensures that
finalization/initialization procedures will always be invoked for both -fiso
and -fpim. Prior to this patch M2Dependent called M2RTS for
termination procedure cleanup and always invoked the pim M2RTS.

gcc/m2/ChangeLog:

PR modula2/114227
* gm2-libs-iso/M2RTS.mod (ProcedureChain): Remove.
(ProcedureList): Remove.
(ExecuteReverse): Remove.
(ExecuteTerminationProcedures): Rewrite.
(ExecuteInitialProcedures): Rewrite.
(AppendProc): Remove.
(InstallTerminationProcedure): Rewrite.
(InstallInitialProcedure): Rewrite.
(InitProcList): Remove.
* gm2-libs/M2Dependent.def (InstallTerminationProcedure):
New procedure.
(ExecuteTerminationProcedures): New procedure.
(InstallInitialProcedure): New procedure.
(ExecuteInitialProcedures): New procedure.
* gm2-libs/M2Dependent.mod (ProcedureChain): New type.
(ProcedureList): New type.
(ExecuteReverse): New procedure.
(ExecuteTerminationProcedures): New procedure.
(ExecuteInitialProcedures): New procedure.
(AppendProc): New procedure.
(InstallTerminationProcedure): New procedure.
(InstallInitialProcedure): New procedure.
(InitProcList): New procedure.
* gm2-libs/M2RTS.mod (ProcedureChain): Remove.
(ProcedureList): Remove.
(ExecuteReverse): Remove.
(ExecuteTerminationProcedures): Rewrite.
(ExecuteInitialProcedures): Rewrite.
(AppendProc): Remove.
(InstallTerminationProcedure): Rewrite.
(InstallInitialProcedure): Rewrite.
(InitProcList): Remove.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

libstdc++: Add missing std::tuple constructor [PR114147]

I caused a regression with commit r10-908 by adding a constraint to the
non-explicit allocator-extended default constructor, but seemingly
forgot to add an explicit overload with the corresponding constraint.

libstdc++-v3/ChangeLog:

PR libstdc++/114147
* include/std/tuple (tuple::tuple(allocator_arg_t, const Alloc&)):
Add missing overload of allocator-extended default constructor.
(tuple<T1,T2>::tuple(allocator_arg_t, const Alloc&)): Likewise.
* testsuite/20_util/tuple/cons/114147.cc: New test.

bpf: add inline memset expansion

Similar to memmove and memcpy, the BPF backend cannot fall back on a
library call to implement __builtin_memset, and should always expand
calls to it inline if possible.

This patch implements simple inline expansion of memset in the BPF
backend in a verifier-friendly way. Similar to memcpy and memmove, the
size must be an integer constant, as is also required by clang.

gcc/
* config/bpf/bpf-protos.h (bpf_expand_setmem): New prototype.
* config/bpf/bpf.cc (bpf_expand_setmem): New.
* config/bpf/bpf.md (setmemdi): New define_expand.

gcc/testsuite/
* gcc.target/bpf/memset-1.c: New test.

Update gcc sv.po

* sv.po: Update.

combine: Fix recent WORD_REGISTER_OPERATIONS check [PR113010]

On Mon, Mar 04, 2024 at 05:18:39PM +0100, Rainer Orth wrote:
> unfortunately, the patch broke Solaris/SPARC bootstrap
> (sparc-sun-solaris2.11):
>
> .../gcc/combine.cc: In function 'rtx_code simplify_comparison(rtx_code, rtx_def**, rtx_def**)':
> .../gcc/combine.cc:12101:25: error: '*(unsigned int*)((char*)&inner_mode + offsetof(scalar_int_mode, scalar_int_mode::m_mode))' may be used uninitialized [-Werror=maybe-uninitialized]
> 12101 |   scalar_int_mode mode, inner_mode, tmode;
>       |                         ^~~~~~~~~~

I don't see how it could ever work properly, inner_mode in that spot is
just uninitialized.

I think we shouldn't worry about paradoxical subregs of non-scalar_int_mode
REGs/MEMs and for the scalar_int_mode ones should initialize inner_mode
before we use it.
Another option would be to use
maybe_lt (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))), BITS_PER_WORD)
and
load_extend_op (GET_MODE (SUBREG_REG (op0))) == ZERO_EXTEND,
or set machine_mode smode = GET_MODE (SUBREG_REG (op0)); and use it in
those two spots.

2024-03-04  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/113010
* combine.cc (simplify_comparison): Guard the
WORD_REGISTER_OPERATIONS check on scalar_int_mode of SUBREG_REG
and initialize inner_mode.

arm: Fix a wrong attribute use and remove unused unspecs and iterators

This patch fixes the erroneous use of a mode attribute without a mode iterator
in the pattern and removes unused unspecs and iterators.

gcc/ChangeLog:

* config/arm/iterators.md (supf): Remove VMLALDAVXQ_U, VMLALDAVXQ_P_U,
VMLALDAVAXQ_U cases.
(VMLALDAVXQ): Remove iterator.
(VMLALDAVXQ_P): Likewise.
(VMLALDAVAXQ): Likewise.
* config/arm/mve.md (mve_vstrwq_p_fv4sf): Replace use of <MVE_VPRED>
mode iterator attribute with V4BI mode.
* config/arm/unspecs.md (VMLALDAVXQ_U, VMLALDAVXQ_P_U,
VMLALDAVAXQ_U): Remove unused unspecs.

arm: Annotate instructions with mve_safe_imp_xlane_pred

This patch annotates some MVE across lane instructions with a new attribute.
We use this attribute to let the compiler know that these instructions can be
safely implicitly predicated when tail predicating if their operands are
guaranteed to have zeroed tail predicated lanes. These instructions were
selected because having the value 0 in those lanes or 'tail-predicating' those
lanes have the same effect.

gcc/ChangeLog:

* config/arm/arm.md (mve_safe_imp_xlane_pred): New attribute.
* config/arm/iterators.md (mve_vmaxmin_safe_imp): New iterator
attribute.
* config/arm/mve.md (vaddvq_s, vaddvq_u, vaddlvq_s, vaddlvq_u,
vaddvaq_s, vaddvaq_u, vmaxavq_s, vmaxvq_u, vmladavq_s, vmladavq_u,
vmladavxq_s, vmlsdavq_s, vmlsdavxq_s, vaddlvaq_s, vaddlvaq_u,
vmlaldavq_u, vmlaldavq_s, vmlaldavq_u, vmlaldavxq_s, vmlsldavq_s,
vmlsldavxq_s, vrmlaldavhq_u, vrmlaldavhq_s, vrmlaldavhxq_s,
vrmlsldavhq_s, vrmlsldavhxq_s, vrmlaldavhaq_s, vrmlaldavhaq_u,
vrmlaldavhaxq_s, vrmlsldavhaq_s, vrmlsldavhaxq_s, vabavq_s, vabavq_u,
vmladavaq_u, vmladavaq_s, vmladavaxq_s, vmlsdavaq_s, vmlsdavaxq_s,
vmlaldavaq_s, vmlaldavaq_u, vmlaldavaxq_s, vmlsldavaq_s,
vmlsldavaxq_s): Added mve_safe_imp_xlane_pred.

arm: Add define_attr to to create a mapping between MVE predicated and unpredicated insns

This patch adds an attribute to the mve md patterns to be able to identify
predicable MVE instructions and what their predicated and unpredicated variants
are. This attribute is used to encode the icode of the unpredicated variant of
an instruction in its predicated variant.

This will make it possible for us to transform VPT-predicated insns in
the insn chain into their unpredicated equivalents when transforming the loop
into a MVE Tail-Predicated Low Overhead Loop. For example:
`mve_vldrbq_z_<supf><mode> -> mve_vldrbq_<supf><mode>`.

gcc/ChangeLog:

* config/arm/arm.md (mve_unpredicated_insn): New attribute.
* config/arm/arm.h (MVE_VPT_PREDICATED_INSN_P): New define.
(MVE_VPT_UNPREDICATED_INSN_P): Likewise.
(MVE_VPT_PREDICABLE_INSN_P): Likewise.
* config/arm/vec-common.md (mve_vshlq_<supf><mode>): Add attribute.
* config/arm/mve.md (arm_vcx1q<a>_p_v16qi): Add attribute.
(arm_vcx1q<a>v16qi): Likewise.
(arm_vcx1qav16qi): Likewise.
(arm_vcx1qv16qi): Likewise.
(arm_vcx2q<a>_p_v16qi): Likewise.
(arm_vcx2q<a>v16qi): Likewise.
(arm_vcx2qav16qi): Likewise.
(arm_vcx2qv16qi): Likewise.
(arm_vcx3q<a>_p_v16qi): Likewise.
(arm_vcx3q<a>v16qi): Likewise.
(arm_vcx3qav16qi): Likewise.
(arm_vcx3qv16qi): Likewise.
(@mve_<mve_insn>q_<supf><mode>): Likewise.
(@mve_<mve_insn>q_int_<supf><mode>): Likewise.
(@mve_<mve_insn>q_<supf>v4si): Likewise.
(@mve_<mve_insn>q_n_<supf><mode>): Likewise.
(@mve_<mve_insn>q_r_<supf><mode>): Likewise.
(@mve_<mve_insn>q_f<mode>): Likewise.
(@mve_<mve_insn>q_m_<supf><mode>): Likewise.
(@mve_<mve_insn>q_m_n_<supf><mode>): Likewise.
(@mve_<mve_insn>q_m_r_<supf><mode>): Likewise.
(@mve_<mve_insn>q_m_f<mode>): Likewise.
(@mve_<mve_insn>q_int_m_<supf><mode>): Likewise.
(@mve_<mve_insn>q_p_<supf>v4si): Likewise.
(@mve_<mve_insn>q_p_<supf><mode>): Likewise.
(@mve_<mve_insn>q<mve_rot>_<supf><mode>): Likewise.
(@mve_<mve_insn>q<mve_rot>_f<mode>): Likewise.
(@mve_<mve_insn>q<mve_rot>_m_<supf><mode>): Likewise.
(@mve_<mve_insn>q<mve_rot>_m_f<mode>): Likewise.
(mve_v<absneg_str>q_f<mode>): Likewise.
(mve_<mve_addsubmul>q<mode>): Likewise.
(mve_<mve_addsubmul>q_f<mode>): Likewise.
(mve_vadciq_<supf>v4si): Likewise.
(mve_vadciq_m_<supf>v4si): Likewise.
(mve_vadcq_<supf>v4si): Likewise.
(mve_vadcq_m_<supf>v4si): Likewise.
(mve_vandq_<supf><mode>): Likewise.
(mve_vandq_f<mode>): Likewise.
(mve_vandq_m_<supf><mode>): Likewise.
(mve_vandq_m_f<mode>): Likewise.
(mve_vandq_s<mode>): Likewise.
(mve_vandq_u<mode>): Likewise.
(mve_vbicq_<supf><mode>): Likewise.
(mve_vbicq_f<mode>): Likewise.
(mve_vbicq_m_<supf><mode>): Likewise.
(mve_vbicq_m_f<mode>): Likewise.
(mve_vbicq_m_n_<supf><mode>): Likewise.
(mve_vbicq_n_<supf><mode>): Likewise.
(mve_vbicq_s<mode>): Likewise.
(mve_vbicq_u<mode>): Likewise.
(@mve_vclzq_s<mode>): Likewise.
(mve_vclzq_u<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op>q_<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op>q_n_<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op>q_f<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op>q_n_f<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op1>q_m_f<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op1>q_m_n_<supf><mode>): Likewise.
(@mve_vcmp_<mve_cmp_op1>q_m_<supf><mode>): Likewise.
(@mve_vcmp_<mve_cmp_op1>q_m_n_f<mode>): Likewise.
(mve_vctp<MVE_vctp>q<MVE_vpred>): Likewise.
(mve_vctp<MVE_vctp>q_m<MVE_vpred>): Likewise.
(mve_vcvtaq_<supf><mode>): Likewise.
(mve_vcvtaq_m_<supf><mode>): Likewise.
(mve_vcvtbq_f16_f32v8hf): Likewise.
(mve_vcvtbq_f32_f16v4sf): Likewise.
(mve_vcvtbq_m_f16_f32v8hf): Likewise.
(mve_vcvtbq_m_f32_f16v4sf): Likewise.
(mve_vcvtmq_<supf><mode>): Likewise.
(mve_vcvtmq_m_<supf><mode>): Likewise.
(mve_vcvtnq_<supf><mode>): Likewise.
(mve_vcvtnq_m_<supf><mode>): Likewise.
(mve_vcvtpq_<supf><mode>): Likewise.
(mve_vcvtpq_m_<supf><mode>): Likewise.
(mve_vcvtq_from_f_<supf><mode>): Likewise.
(mve_vcvtq_m_from_f_<supf><mode>): Likewise.
(mve_vcvtq_m_n_from_f_<supf><mode>): Likewise.
(mve_vcvtq_m_n_to_f_<supf><mode>): Likewise.
(mve_vcvtq_m_to_f_<supf><mode>): Likewise.
(mve_vcvtq_n_from_f_<supf><mode>): Likewise.
(mve_vcvtq_n_to_f_<supf><mode>): Likewise.
(mve_vcvtq_to_f_<supf><mode>): Likewise.
(mve_vcvttq_f16_f32v8hf): Likewise.
(mve_vcvttq_f32_f16v4sf): Likewise.
(mve_vcvttq_m_f16_f32v8hf): Likewise.
(mve_vcvttq_m_f32_f16v4sf): Likewise.
(mve_vdwdupq_m_wb_u<mode>_insn): Likewise.
(mve_vdwdupq_wb_u<mode>_insn): Likewise.
(mve_veorq_s><mode>): Likewise.
(mve_veorq_u><mode>): Likewise.
(mve_veorq_f<mode>): Likewise.
(mve_vidupq_m_wb_u<mode>_insn): Likewise.
(mve_vidupq_u<mode>_insn): Likewise.
(mve_viwdupq_m_wb_u<mode>_insn): Likewise.
(mve_viwdupq_wb_u<mode>_insn): Likewise.
(mve_vldrbq_<supf><mode>): Likewise.
(mve_vldrbq_gather_offset_<supf><mode>): Likewise.
(mve_vldrbq_gather_offset_z_<supf><mode>): Likewise.
(mve_vldrbq_z_<supf><mode>): Likewise.
(mve_vldrdq_gather_base_<supf>v2di): Likewise.
(mve_vldrdq_gather_base_wb_<supf>v2di_insn): Likewise.
(mve_vldrdq_gather_base_wb_z_<supf>v2di_insn): Likewise.
(mve_vldrdq_gather_base_z_<supf>v2di): Likewise.
(mve_vldrdq_gather_offset_<supf>v2di): Likewise.
(mve_vldrdq_gather_offset_z_<supf>v2di): Likewise.
(mve_vldrdq_gather_shifted_offset_<supf>v2di): Likewise.
(mve_vldrdq_gather_shifted_offset_z_<supf>v2di): Likewise.
(mve_vldrhq_<supf><mode>): Likewise.
(mve_vldrhq_fv8hf): Likewise.
(mve_vldrhq_gather_offset_<supf><mode>): Likewise.
(mve_vldrhq_gather_offset_fv8hf): Likewise.
(mve_vldrhq_gather_offset_z_<supf><mode>): Likewise.
(mve_vldrhq_gather_offset_z_fv8hf): Likewise.
(mve_vldrhq_gather_shifted_offset_<supf><mode>): Likewise.
(mve_vldrhq_gather_shifted_offset_fv8hf): Likewise.
(mve_vldrhq_gather_shifted_offset_z_<supf><mode>): Likewise.
(mve_vldrhq_gather_shifted_offset_z_fv8hf): Likewise.
(mve_vldrhq_z_<supf><mode>): Likewise.
(mve_vldrhq_z_fv8hf): Likewise.
(mve_vldrwq_<supf>v4si): Likewise.
(mve_vldrwq_fv4sf): Likewise.
(mve_vldrwq_gather_base_<supf>v4si): Likewise.
(mve_vldrwq_gather_base_fv4sf): Likewise.
(mve_vldrwq_gather_base_wb_<supf>v4si_insn): Likewise.
(mve_vldrwq_gather_base_wb_fv4sf_insn): Likewise.
(mve_vldrwq_gather_base_wb_z_<supf>v4si_insn): Likewise.
(mve_vldrwq_gather_base_wb_z_fv4sf_insn): Likewise.
(mve_vldrwq_gather_base_z_<supf>v4si): Likewise.
(mve_vldrwq_gather_base_z_fv4sf): Likewise.
(mve_vldrwq_gather_offset_<supf>v4si): Likewise.
(mve_vldrwq_gather_offset_fv4sf): Likewise.
(mve_vldrwq_gather_offset_z_<supf>v4si): Likewise.
(mve_vldrwq_gather_offset_z_fv4sf): Likewise.
(mve_vldrwq_gather_shifted_offset_<supf>v4si): Likewise.
(mve_vldrwq_gather_shifted_offset_fv4sf): Likewise.
(mve_vldrwq_gather_shifted_offset_z_<supf>v4si): Likewise.
(mve_vldrwq_gather_shifted_offset_z_fv4sf): Likewise.
(mve_vldrwq_z_<supf>v4si): Likewise.
(mve_vldrwq_z_fv4sf): Likewise.
(mve_vmvnq_s<mode>): Likewise.
(mve_vmvnq_u<mode>): Likewise.
(mve_vornq_<supf><mode>): Likewise.
(mve_vornq_f<mode>): Likewise.
(mve_vornq_m_<supf><mode>): Likewise.
(mve_vornq_m_f<mode>): Likewise.
(mve_vornq_s<mode>): Likewise.
(mve_vornq_u<mode>): Likewise.
(mve_vorrq_<supf><mode>): Likewise.
(mve_vorrq_f<mode>): Likewise.
(mve_vorrq_m_<supf><mode>): Likewise.
(mve_vorrq_m_f<mode>): Likewise.
(mve_vorrq_m_n_<supf><mode>): Likewise.
(mve_vorrq_n_<supf><mode>): Likewise.
(mve_vorrq_s<mode>): Likewise.
(mve_vorrq_s<mode>): Likewise.
(mve_vsbciq_<supf>v4si): Likewise.
(mve_vsbciq_m_<supf>v4si): Likewise.
(mve_vsbcq_<supf>v4si): Likewise.
(mve_vsbcq_m_<supf>v4si): Likewise.
(mve_vshlcq_<supf><mode>): Likewise.
(mve_vshlcq_m_<supf><mode>): Likewise.
(mve_vshrq_m_n_<supf><mode>): Likewise.
(mve_vshrq_n_<supf><mode>): Likewise.
(mve_vstrbq_<supf><mode>): Likewise.
(mve_vstrbq_p_<supf><mode>): Likewise.
(mve_vstrbq_scatter_offset_<supf><mode>_insn): Likewise.
(mve_vstrbq_scatter_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrdq_scatter_base_<supf>v2di): Likewise.
(mve_vstrdq_scatter_base_p_<supf>v2di): Likewise.
(mve_vstrdq_scatter_base_wb_<supf>v2di): Likewise.
(mve_vstrdq_scatter_base_wb_p_<supf>v2di): Likewise.
(mve_vstrdq_scatter_offset_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_offset_p_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn): Likewise.
(mve_vstrhq_<supf><mode>): Likewise.
(mve_vstrhq_fv8hf): Likewise.
(mve_vstrhq_p_<supf><mode>): Likewise.
(mve_vstrhq_p_fv8hf): Likewise.
(mve_vstrhq_scatter_offset_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_offset_p_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Likewise.
(mve_vstrwq_<supf>v4si): Likewise.
(mve_vstrwq_fv4sf): Likewise.
(mve_vstrwq_p_<supf>v4si): Likewise.
(mve_vstrwq_p_fv4sf): Likewise.
(mve_vstrwq_scatter_base_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_fv4sf): Likewise.
(mve_vstrwq_scatter_base_p_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_p_fv4sf): Likewise.
(mve_vstrwq_scatter_base_wb_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_wb_fv4sf): Likewise.
(mve_vstrwq_scatter_base_wb_p_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Likewise.
(mve_vstrwq_scatter_offset_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_offset_p_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_offset_p_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Likewise.

doc: update [[gnu::no_dangling]]

...to offer a more realistic example.

gcc/ChangeLog:

* doc/extend.texi: Update [[gnu::no_dangling]].

vect: Fix integer overflow calculating mask

The masks and bitvectors were broken when nunits==32 on hosts where int is
32-bit.

gcc/ChangeLog:

* dojump.cc (do_compare_and_jump): Use full-width integers for shifts.
* expr.cc (store_constructor): Likewise.
(do_store_flag): Likewise.

Regenerate opt.urls

There were several commits that didn't regenerate the opt.urls files.

Fixes: 438ef143679e ("rs6000: Neuter option -mpower{8,9}-vector")
Fixes: 50c549ef3db6 ("gccrs: enable -Winfinite-recursion warnings by default")
Fixes: 25bb8a40abd9 ("Move docs for -Wuse-after-free and -Wuseless-cast")
Fixes: 48448055fb70 ("AVR: Support .rodata in Flash for AVR64* and AVR128*")
Fixes: 42503cc257fb ("AVR: Document option -mskip-bug")
Fixes: 7de5bb642c12 ("i386: [APX] Document inline asm behavior and new switch")
Fixes: 49a14ee488b8 ("Add -mevex512 into invoke.texi")
Fixes: 4666cbde5e6d ("Sort warning options in c-family/c.opt.")
Fixes: cda383616183 ("AVR: target/114100 - Better indirect accesses for reduced Tiny")
gcc/c-family/ChangeLog:

* c.opt.urls: Regenerate.

gcc/ChangeLog:

* common.opt.urls: Regenerate.
* config/avr/avr.opt.urls: Likewise.
* config/i386/i386.opt.urls: Likewise.
* config/pru/pru.opt.urls: Likewise.
* config/riscv/riscv.opt.urls: Likewise.
* config/rs6000/rs6000.opt.urls: Likewise.

gcc/rust/ChangeLog:

* lang.opt.urls: Regenerate.

Fix 201001011-1.c on H8

Excerpt from gcc.sum:
[...]
PASS: gcc.c-torture/execute/20101011-1.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/execute/20101011-1.c   -O0  execution test
PASS: gcc.c-torture/execute/20101011-1.c   -O1  (test for excess errors)
FAIL: gcc.c-torture/execute/20101011-1.c   -O1  execution test
[ ... ]

This is because H8 MCUs do not throw a "divide by zero" exception.

gcc/testsuite
* gcc.c-torture/execute/20101011-1.c: Do not test on H8 series.

tree-optimization/114197 - unexpected if-conversion for vectorization

The following avoids lowering a volatile bitfiled access and in case
the if-converted and original loops end up in different outer loops
because of simplifcations enabled scrap the result since that is not
how the vectorizer expects the loops to be laid out.

PR tree-optimization/114197
* tree-if-conv.cc (bitfields_to_lower_p): Do not lower if
there are volatile bitfield accesses.
(pass_if_conversion::execute): Throw away result if the
if-converted and original loops are not nested as expected.

* gcc.dg/torture/pr114197.c: New testcase.

tree-optimization/114164 - unsupported SIMD clone call, unsupported VEC_COND

The following avoids creating unsupported VEC_COND_EXPRs as part of
SIMD clone call mask argument setup during vectorization which results
in inefficient decomposing of the operation during vector lowering.

PR tree-optimization/114164
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Fail if
the code generated for mask argument setup is not supported.

libgomp: Use void (*) (void *) rather than void (*)() for host_fn type [PR114216]

For the type of the target callbacks we use elsehwere void (*) (void *) and
IMHO should use that for the reverse offload fallback as well (where the actual
callback is emitted using the same code as for host fallback or device kernel
entry routines), even when it is also ok to use void (*) () before C23 and
we aren't building libgomp with C23 yet. On some arches perhaps void (*) ()
could result in worse code generation because calls in that case like casts
to unprototyped functions need to sometimes pass argument in two different spots
etc. so that it deals with both passing it through ... and as a named argument.

2024-03-04 Jakub Jelinek <jakub@redhat.com>

PR libgomp/114216
* target.c (gomp_target_rev): Change host_fn type and corresponding
cast from void (*)() to void (*) (void *).