gcc.gnu.org Git - gcc.git/log

c++: print conversion error at candidate location

In testcases like this one, the printing of candidates in a diagnostic has
been longer than necessary because it jumps back and forth between the call
site and the candidate site. So here, we first say at the call site that no
match was found; then we note the candidate site, and then explain why it's
not suitable back at the call site, which means printing the call site line
with caret again. With this patch, the conversion diagnostic is at the same
location as the candidate, so we don't need to print any input line.

gcc/cp/ChangeLog:

* call.cc (print_conversion_rejection): Use iloc_sentinel.

gcc/testsuite/ChangeLog:

* g++.dg/template/copy1.C: Adjust error lines.

RISC-V: Add required tls to read thread pointer test

The read-thread-pointer test may require the gcc configured
with --enable-tls. If no, there x4 (aka tp) register will not
be presented in the assembly code.

This patch requires the tls for the dg checking. It will perform
the test checking if --enable-tls and mark the test as unsupported
if --disable-tls.

Configured with --enable-tls:
=== gcc Summary ===
of expected passes 16

Configured with --disable-tls:
=== gcc Summary ===
of unsupported tests 8

gcc/testsuite/ChangeLog:

* gcc.target/riscv/read-thread-pointer.c: Add required tls.

Signed-off-by: Pan Li <pan2.li@intel.com>

PHIOPT: Allow MIN/MAX to have up to 2 MIN/MAX expressions for early phiopt

In the early PHIOPT mode, the original minmax_replacement, would
replace a PHI node with up to 2 min/max expressions in some cases,
this allows for that too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (phiopt_early_allow): Allow for
up to 2 min/max expressions in the sequence/match code.

MIN/MAX should be treated similar as comparisons for trapping

While looking into moving optimizations from minmax_replacement
in phiopt to match.pd, I Noticed that min/max were considered
trapping even if -ffinite-math-only was being used. This changes
those expressions to be similar as comparisons so that they are
not considered trapping if -ffinite-math-only is on.

OK? Bootstrapped and tested with no regressions on x86_64-linux-gnu.

gcc/ChangeLog:

* rtlanal.cc (may_trap_p_1): Treat SMIN/SMAX similar as
COMPARISON.
* tree-eh.cc (operation_could_trap_helper_p): Treate
MIN_EXPR/MAX_EXPR similar as other comparisons.

PHIOPT: Move store_elim_worker into pass_cselim::execute

This simple patch moves the body of store_elim_worker
direclty into pass_cselim::execute.

Also removes unneeded prototypes too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (cond_store_replacement): Remove
prototype.
(cond_if_else_store_replacement): Likewise.
(get_non_trapping): Likewise.
(store_elim_worker): Move into ...
(pass_cselim::execute): This.

PHIOPT: Rename tree_ssa_phiopt_worker to pass_phiopt::execute

Now that store elimination and phiopt does not
share outer code, we can move tree_ssa_phiopt_worker
directly into pass_phiopt::execute and remove
many declarations (prototypes) from the file.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (two_value_replacement): Remove
prototype.
(match_simplify_replacement): Likewise.
(factor_out_conditional_conversion): Likewise.
(value_replacement): Likewise.
(minmax_replacement): Likewise.
(spaceship_replacement): Likewise.
(cond_removal_in_builtin_zero_pattern): Likewise.
(hoist_adjacent_loads): Likewise.
(tree_ssa_phiopt_worker): Move into ...
(pass_phiopt::execute): this.

PHIOPT: Split out store elimination from phiopt

Since the last cleanups, it made easier to see
that we should split out the store elimination
worker from tree_ssa_phiopt_worker function.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove
do_store_elim argument and split that part out to ...
(store_elim_worker): This new function.
(pass_cselim::execute): Call store_elim_worker.
(pass_phiopt::execute): Update call to tree_ssa_phiopt_worker.

MAINTAINERS: Change my email address.

I'm at Ventana now. Change my email address accordingly. Also, add
myself to the DCO list.

ChangeLog:

* MAINTAINERS: Change my email address.

Unloop loops that no longer loops in tree-ssa-loop-ch

I noticed this after adding sanity check that the upper bound on number
of iterations never drop to -1. It seems to be relatively common case
(happening few hundred times in testsuite and also during bootstrap)
that loop-ch duplicates enough so the loop itself no longer loops.

This is later detected in loop unrolling but since we test the number
of iterations anyway it seems better to do that earlier.

* cfgloopmanip.h (unloop_loops): Export.
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Unloop loops
that no longer loop.
* tree-ssa-loop-ivcanon.cc (unloop_loops): Export; do not free
vectors of loops to unloop.
(canonicalize_induction_variables): Free vectors here.
(tree_unroll_loops_completely): Free vectors here.

tree-optimization/109170 - bogus use-after-free with __builtin_expect

The following generalizes the range-op for __builtin_expect
by using the fnspec machinery.

PR tree-optimization/109170
* gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call):
Handle __builtin_expect and similar via cfn_pass_through_arg1
and inspecting the calls fnspec.
* builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
and BUILT_IN_EXPECT_WITH_PROBABILITY.

Use CONFIG_SHELL-/bin/sh in genmultilib

There are still shells on some systems that lack the ability to start
scripts when not using the shell name explicitly. Adjust genmultilib
to use ${CONFIG_SHELL-/bin/sh} the same way configure does.

for gcc/ChangeLog

* genmultilib: Use CONFIG_SHELL to run sub-scripts.

Normalize addresses in IPA before calling range_op_handler [PR109639]

The old legacy code would allow building ranges containing symbolics,
even though the entire ranger ecosystem does not handle them.  These
were normalized into non-zero ranges by helper functions in VRP
(range_fold_*_expr) before calling the ranger.  The only users of
these functions should have been legacy VRP, which is no more.
However, a handful of users crept into IPA, even though these
functions shouldn't never been called outside of VRP or vr-values.

The issue here is that IPA is building a range of [&foo, &foo] and
expecting range_fold_binary to normalize it to non-zero.  Fixed by
adding a helper function before calling the range_op handler.

I think these covers the problematic ranges.  If not, I'll come up
with something more generalized that does not involve polluting
irange::set with the normalization code.  After all, this only
involves a handful of IPA places.

I've also added an assert in irange::set() making it easier to detect
any possible fallout without having to drill deep into the setter.

gcc/ChangeLog:

PR tree-optimization/109639
* ipa-cp.cc (ipa_value_range_from_jfunc): Normalize range.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
* ipa-prop.h (ipa_range_set_and_normalize): New.
* value-range.cc (irange::set): Assert min and max are INTEGER_CST.

wrong GIMPLE from (bit_field_ref CTOR ..) simplification

When we simplify a BIT_FIELD_REF of a CTOR like { _1, _2, _3, _4 }
and attempt to produce (view converted) { _1, _2 } for a selected
subset we fail to realize this cannot be done from match.pd since
we have no way to write the resulting CTOR "operation" and the
built CTOR { _1, _2 } isn't a GIMPLE value.

This kind of simplifications have to be done in forwprop (or would
need a match.pd syntax extension) where we can split out the CTOR
to a separate stmt.

The following disables this particular simplification when we are
simplifying GIMPLE. With enhanced IL checking this otherwise
causes ICEs in the testsuite from vectorized code.

* match.pd (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2): Do not
create a CTOR operand in the result when simplifying GIMPLE.

Properly gimplify handled component chains on registers

When for example complex lowering wants to extract the imaginary
part of a complex variable for lowering a complex move we can
end up with it generating __imag <VIEW_CONVERT_EXPR <_22> > which
is valid GENERIC. It then feeds that to the gimplifier via
force_gimple_operand but that fails to split up this chain
of handled components, generating invalid GIMPLE catched by
verification when PR109644 is fixed.

The following rectifies this by noting in gimplify_compound_lval
when the base object which we gimplify first ends up being a
register.

* gimplify.cc (gimplify_compound_lval): When the base
gimplified to a register make sure to split up chains
of operations.

ipa/109607 - properly gimplify conversions introduced by IPA param manipulation

The following addresses IPA param manipulation (through IPA SRA)
replacing

  BIT_FIELD_REF <*this_8(D), 8, 56>

with

  BIT_FIELD_REF <VIEW_CONVERT_EXPR<const struct profile_count>(ISRA.814), 8, 56>

which is supposed to be invalid GIMPLE (ISRA.814 is a register).
There's currently insufficient checking in place to catch this in the
IL verifier but I am working on that as part of fixing PR109594.

The solution for the particular testcase I am running into this is
to split the conversion to a separate stmt.  Generally the modification
phase is set up for this but the extra_stmts sequence isn't passed
around everywhere.  The following passes it to modify_expression
from modify_assignment when rewriting the RHS.

PR ipa/109607
* ipa-param-manipulation.h
(ipa_param_body_adjustments::modify_expression): Add extra_stmts
argument.
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::modify_expression): Likewise.
When we need a conversion and the replacement is a register
split the conversion out.
(ipa_param_body_adjustments::modify_assignment): Pass
extra_stmts to RHS modify_expression.

* g++.dg/torture/pr109607.C: New testcase.

libstdc++: Fix typos in doxygen comments

libstdc++-v3/ChangeLog:

* include/bits/mofunc_impl.h: Fix typo in doxygen comment.
* include/std/format: Likewise.

libstdc++: Remove obsolete options from Doxygen config

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (FORMULA_TRANSPARENT, DOT_FONTNAME)
(DOT_FONTSIZE, DOT_TRANSPARENT): Remove obsolete options.

libstdc++: Reduce Doxygen output for PDF

Including the header source code in the doxygen-generated PDF file makes
it too large, and causes pdflatex to run out of memory. If we only set
SOURCE_BROWSER=YES for the HTML docs then we won't include the sources
in the PDF file.

There are several macros defined for std::valarray that are only used to
generate repetitive code and then #undef'd. Those aren't useful in the
doxygen docs, especially the ones that reuse the same name in different
files. Omitting them avoids warnings about duplicate labels in the
refman.tex file.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (SOURCE_BROWSER): Only set to YES for
HTML docs.
* include/bits/gslice_array.h (_DEFINE_VALARRAY_OPERATOR): Omit
from doxygen docs.
* include/bits/indirect_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/mask_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/slice_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/std/valarray (_DEFINE_VALARRAY_UNARY_OPERATOR)
(_DEFINE_VALARRAY_AUGMENTED_ASSIGNMENT)
(_DEFINE_VALARRAY_EXPR_AUGMENTED_ASSIGNMENT)
(_DEFINE_BINARY_OPERATOR): Likewise.

libstdc++: Improve doxygen docs for <memory_resource>

libstdc++-v3/ChangeLog:

* include/bits/memory_resource.h: Improve doxygen comments.
* include/std/memory_resource: Likewise.

libstdc++: Add @headerfile and @since to doxygen comments [PR40380]

libstdc++-v3/ChangeLog:

PR libstdc++/40380
* include/bits/basic_string.h: Improve doxygen comments.
* include/bits/cow_string.h: Likewise.
* include/bits/forward_list.h: Likewise.
* include/bits/fs_dir.h: Likewise.
* include/bits/fs_path.h: Likewise.
* include/bits/quoted_string.h: Likewise.
* include/bits/stl_bvector.h: Likewise.
* include/bits/stl_map.h: Likewise.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_vector.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/std/filesystem: Likewise.
* include/std/iomanip: Likewise.

libstdc++: Make std::random_device throw std::system_error [PR105081]

This changes std::random_device constructors to throw std::system_error
(with EINVAL as the error code) when the constructor argument is
invalid. We can also throw std::system_error when read(2) fails so that
the exception includes the additional information provided by errno.

As noted in the PR, this is consistent with libc++, and doesn't break
any existing code which catches std::runtime_error, because those
handlers will still catch std::system_error.

libstdc++-v3/ChangeLog:

PR libstdc++/105081
* src/c++11/random.cc (__throw_syserr): New function.
(random_device::_M_init, random_device::_M_init_pretr1): Use new
function for bad tokens.
(random_device::_M_getval): Use new function for read errors.
* testsuite/util/testsuite_random.h (random_device_available):
Change catch handler to use std::system_error.

c: Fix up error-recovery on non-empty VLA initializers [PR109409]

On the following testcase we ICE, because after we emit the
variable-sized object may not be initialized except with an empty initializer
error we don't really reset the initializer to error_mark_node and then at
-Wformat checking time we ICE on seeing STRING_CST initializer for a VLA.

The following patch just arranges for error_mark_node to be returned after
the error diagnostics.

2023-04-27 Jakub Jelinek <jakub@redhat.com>

PR c/109409
* c-parser.cc (c_parser_initializer): Move diagnostics about
initialization of variable sized object with non-empty initializer
after c_parser_expr_no_commas call and ret.set_error (); after it.

* gcc.dg/pr109409.c: New test.

c: Fix up error-recovery on functions initialized as variables [PR109412]

The change to allow empty initializers in C broke error-recovery on the
following testcase.  We are emitting function %qD is initialized like a
variable error early; if the initializer is non-empty, we just emit
another error that the initializer is invalid.  Previously if it was empty,
we'd emit another error that scalar is being initialized by empty
initializer (not really correct), but now we instead just try to
build_zero_cst for the FUNCTION_TYPE and ICE on it.

The following patch just emits the same diagnostics for the empty
initializers as we emit for the non-empty ones.

2023-04-27  Jakub Jelinek  <jakub@redhat.com>

PR c/107682
PR c/109412
* c-typeck.cc (pop_init_level): If constructor_type is FUNCTION_TYPE,
reject empty initializer as invalid.

* gcc.dg/pr109412.c: New test.

doc: Add explanation of zero-length array example

gcc/ChangeLog:

* doc/extend.texi (Zero Length): Describe example.

tree-optimization/109594 - wrong register promotion

We fail to verify the constraints under which we allow handled
components to wrap registers.  The gcc.dg/pr70022.c testcase shows
that we happily end up with

  _2 = VIEW_CONVERT_EXPR<int[4]>(v_1(D))

as produced by SSA rewrite and update_address_taken.  But the intent
was that we wrap registers with at most a single level of handled
components and specifically only allow __real, __imag, BIT_FIELD_REF
and VIEW_CONVERT_EXPR on them, but not ARRAY_REF or COMPONENT_REF.

Together with the improved gimple_load predicate taking advantage
of the above and ASAN this eventually ICEd.

The following fixes update_address_taken as to this constraint.

PR tree-optimization/109594
* tree-ssa.cc (non_rewritable_mem_ref_base): Constrain
what we rewrite to a register based on the above.

testsuite: adjust NOP expectations for RISC-V

RISC-V will emit ".option nopic" when -fno-pie is in effect, which
matches the generic pattern. Just like done for Alpha, special-case
RISC-V.

gcc/testsuite/

* c-c++-common/patchable_function_entry-decl.c: Special-case
RISC-V.
* c-c++-common/patchable_function_entry-default.c: Likewise.
* c-c++-common/patchable_function_entry-definition.c: Likewise.

c++: restore instantiate_decl assert

For PR61445 I removed this assert, but PR108242 demonstrated why it's still
useful; to avoid regressing the former testcase I check pattern_defined
in the assert.

This reverts r212524.

PR c++/61445

gcc/cp/ChangeLog:

* pt.cc (instantiate_decl): Assert !defer_ok for local
class members.

Daily bump.

libgcc CRIS: Define TARGET_HAS_NO_HW_DIVIDE

With this, execution time for e.g. __moddi3 go from 59 to 40 cycles in
the "fast" case or from 290 to 200 cycles in the "slow" case (when the
!TARGET_HAS_NO_HW_DIVIDE variant calls division and modulus functions
for 32-bit SImode), as exposed by gcc.c-torture/execute/arith-rand-ll.c
compiled for -march=v10.

Unfortunately, it just puts a performance improvement "dent" of 0.07%
in a arith-rand-ll.c-based performance test - where all loops are also
reduced to 1/10.

The size of every affected libgcc function is reduced to less than
half and they are all now leaf functions.

* config/cris/t-cris (HOST_LIBGCC2_CFLAGS): Add
-DTARGET_HAS_NO_HW_DIVIDE.

RISC-V: Fix sync.md and riscv.cc whitespace errors

This patch fixes whitespace errors introduced with
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html

2023-04-26 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/riscv.cc: Fix whitespace.
* config/riscv/sync.md: Fix whitespace.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

c++: remove nsdmi_inst hashtable

It occurred to me that we have a perfectly good DECL_INITIAL field to put
the instantiated DMI into, we don't need a separate hash table.

gcc/cp/ChangeLog:

* init.cc (nsdmi_inst): Remove.
(maybe_instantiate_nsdmi_init): Use DECL_INITIAL instead.

c++: local class in nested generic lambda [PR109241]

The earlier fix for PR109241 avoided the crash by handling a type with no
TREE_BINFO. But we want to move toward doing the partial substitution of
classes in generic lambdas, so let's take a step in that direction.

PR c++/109241

gcc/cp/ChangeLog:

* pt.cc (instantiate_class_template): Do partially instantiate.
(tsubst_expr): Do call complete_type for partial instantiations.

c++: unique friend shenanigans [PR69836]

Normally we re-instantiate a function declaration when we start to
instantiate the body in case of multiple declarations.  In this wacky
testcase, this causes a problem because the type of the w_counter parameter
depends on its declaration not being in scope yet, so the name lookup only
finds the previous declaration.  This isn't a problem for member functions,
since they aren't subject to argument-dependent lookup.  So let's just skip
the regeneration for hidden friends.

PR c++/69836

gcc/cp/ChangeLog:

* pt.cc (regenerate_decl_from_template): Skip unique friends.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend76.C: New test.

c++: micro-optimize most_specialized_partial_spec

This introduces an early exit test to most_specialized_partial_spec for
templates which have no partial specializations, saving some unnecessary
work during class template instantiation in the common case.  In passing,
modernize the code a bit.

gcc/cp/ChangeLog:

* pt.cc (most_specialized_partial_spec): Exit early when
DECL_TEMPLATE_SPECIALIZATIONS is empty.  Move local variable
declarations closer to their first use.  Remove redundant
flag_concepts test.  Remove redundant forward declaration.

Create a lazy ssa_cache.

Sparsely used ssa caches can benefit from using a bitmap to
determine if a name already has an entry. Utilize it in the path query
and remove its private bitmap for tracking the same info.
Also use it in the "assume" query class.

PR tree-optimization/108697
* gimple-range-cache.cc (ssa_global_cache::clear_range): Do
not clear the vector on an out of range query.
(ssa_cache::dump): Use dump_range_query instead of get_range.
(ssa_cache::dump_range_query): New.
(ssa_lazy_cache::dump_range_query): New.
(ssa_lazy_cache::set_range): New.
* gimple-range-cache.h (ssa_cache::dump_range_query): New.
(class ssa_lazy_cache): New.
(ssa_lazy_cache::ssa_lazy_cache): New.
(ssa_lazy_cache::~ssa_lazy_cache): New.
(ssa_lazy_cache::get_range): New.
(ssa_lazy_cache::clear_range): New.
(ssa_lazy_cache::clear): New.
(ssa_lazy_cache::dump): New.
* gimple-range-path.cc (path_range_query::path_range_query): Do
not allocate a ssa_cache object nor has_cache bitmap.
(path_range_query::~path_range_query): Do not free objects.
(path_range_query::clear_cache): Remove.
(path_range_query::get_cache): Adjust.
(path_range_query::set_cache): Remove.
(path_range_query::dump): Don't call through a pointer.
(path_range_query::internal_range_of_expr): Set cache directly.
(path_range_query::reset_path): Clear cache directly.
(path_range_query::ssa_range_in_phi): Fold with globals only.
(path_range_query::compute_ranges_in_phis): Simply set range.
(path_range_query::compute_ranges_in_block): Call cache directly.
* gimple-range-path.h (class path_range_query): Replace bitmap
and cache pointer with lazy cache object.
* gimple-range.h (class assume_query): Use ssa_lazy_cache.

Rename ssa_global_cache to ssa_cache and add has_range

This renames the ssa_global_cache to be ssa_cache.  The original use was
to function as a global cache, but its uses have expanded.  Remove all mention
of "global" from the class and methods.  Also add a has_range method.

* gimple-range-cache.cc (ssa_cache::ssa_cache): Rename.
(ssa_cache::~ssa_cache): Rename.
(ssa_cache::has_range): New.
(ssa_cache::get_range): Rename.
(ssa_cache::set_range): Rename.
(ssa_cache::clear_range): Rename.
(ssa_cache::clear): Rename.
(ssa_cache::dump): Rename and use get_range.
(ranger_cache::get_global_range): Use get_range and set_range.
(ranger_cache::range_of_def): Use get_range.
* gimple-range-cache.h (class ssa_cache): Rename class and methods.
(class ranger_cache): Use ssa_cache.
* gimple-range-path.cc (path_range_query::path_range_query): Use
ssa_cache.
(path_range_query::get_cache): Use get_range.
(path_range_query::set_cache): Use set_range.
* gimple-range-path.h (class path_range_query): Use ssa_cache.
* gimple-range.cc (assume_query::assume_range_p): Use get_range.
(assume_query::range_of_expr): Use get_range.
(assume_query::assume_query): Use set_range.
(assume_query::calculate_op): Use get_range and set_range.
* gimple-range.h (class assume_query): Use ssa_cache.

Add sbr_lazy_vector and adjust (e)vrp sparse cache

Add a sparse vector class for cache and use if by default.
Rename the evrp_* params to vrp_*, and add a param for small CFGS which use
just the original basic vector.

* gimple-range-cache.cc (sbr_vector::sbr_vector): Add parameter
and local to optionally zero memory.
(br_vector::grow): Only zero memory if flag is set.
(class sbr_lazy_vector): New.
(sbr_lazy_vector::sbr_lazy_vector): New.
(sbr_lazy_vector::set_bb_range): New.
(sbr_lazy_vector::get_bb_range): New.
(sbr_lazy_vector::bb_range_p): New.
(block_range_cache::set_bb_range): Check flags and Use sbr_lazy_vector.
* gimple-range-gori.cc (gori_map::calculate_gori): Use
param_vrp_switch_limit.
(gori_compute::gori_compute): Use param_vrp_switch_limit.
* params.opt (vrp_sparse_threshold): Rename from evrp_sparse_threshold.
(vrp_switch_limit): Rename from evrp_switch_limit.
(vrp_vector_threshold): New.

Quicker relation check.

If either of the SSA names in a comparison do not have any equivalences
or relations, we can short-circuit the check slightly.

* value-relation.cc (dom_oracle::query_relation): Check early for lack
of any relation.
* value-relation.h (equiv_oracle::has_equiv_p): New.

Don't save ssa-name pointer in dependency cache.

If the direct dependence fields point directly to an ssa-name,
its possible that an optimization frees an ssa-name, and the value
pointed to may now be in the free list. Simply maintain the ssa
version number instead.

PR tree-optimization/109417
* gimple-range-gori.cc (range_def_chain::register_dependency):
Save the ssa version number, not the pointer.
(gori_compute::may_recompute_p): No need to check if a dependency
is in the free list.
* gimple-range-gori.h (class range_def_chain): Change ssa1 and ssa2
fields to be unsigned int instead of trees.
(ange_def_chain::depend1): Adjust.
(ange_def_chain::depend2): Adjust.
* gimple-range.h: Include "ssa.h" to inline ssa_name().

aix: Default AIX 7.2 to POWER7 server and AIX 7.3 to POWER8 server.

AIX 7.2 minimum ISA is POWER7 and AIX 7.3 minimum ISA is POWER8.
This patch changes the aix72.h configuration to POWER7 with VSX enabled
by default (with the AIX VSX ABI limitations), matching LLVM on AIX,
and changes the aix73.h configuration to POWER8.

gcc/ChangeLog:
* config/rs6000/aix72.h (TARGET_DEFAULT): Use ISA_2_6_MASKS_SERVER.
* config/rs6000/aix73.h (TARGET_DEFAULT): Use ISA_2_7_MASKS_SERVER.
(PROCESSOR_DEFAULT): Use PROCESSOR_POWER8.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>

RISCV: Inline subword atomic ops

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:
PR target/104338
* config/riscv/riscv-protos.h: Add helper function stubs.
* config/riscv/riscv.cc: Add helper functions for subword masking.
* config/riscv/riscv.opt: Add command-line flag.
* config/riscv/sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* doc/invoke.texi: Add blurb regarding command-line flag.

libgcc/ChangeLog:
PR target/104338
* config/riscv/atomic.c: Add reference to duplicate logic.

gcc/testsuite/ChangeLog:
PR target/104338
* gcc.target/riscv/inline-atomics-1.c: New test.
* gcc.target/riscv/inline-atomics-2.c: New test.
* gcc.target/riscv/inline-atomics-3.c: New test.
* gcc.target/riscv/inline-atomics-4.c: New test.
* gcc.target/riscv/inline-atomics-5.c: New test.
* gcc.target/riscv/inline-atomics-6.c: New test.
* gcc.target/riscv/inline-atomics-7.c: New test.
* gcc.target/riscv/inline-atomics-8.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

MAINTAINERS: Add myself to write after approval

2023-04-26 Patrick O'Neill <patrick@rivosinc.com>

* MAINTAINERS: Add myself.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

aarch64: Reimplement RSHRN2 intrinsic patterns with standard RTL codes

Similar to the previous patch, we can reimplement the rshrn2 patterns using standard RTL codes
for shift, truncate and plus with the appropriate constants.
This allows us to get rid of UNSPEC_RSHRN entirely.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_rshrn2<mode>_insn_le):
Reimplement using standard RTL codes instead of unspec.
(aarch64_rshrn2<mode>_insn_be): Likewise.
(aarch64_rshrn2<mode>): Adjust for the above.
* config/aarch64/aarch64.md (UNSPEC_RSHRN): Delete.

aarch64: Reimplement RSHRN intrinsic patterns with standard RTL codes

This patch reimplements the backend patterns for the rshrn intrinsics using standard RTL codes rather than UNSPECS.
We already represent shrn as truncate of a shift. rshrn can be represented as truncate (src + (1 << (shft - 1)) >> shft),
similar to how LLVM treats it.

I have a follow-up patch to do the same for the rshrn2 pattern, which will allow us to remove the UNSPEC_RSHRN entirely.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_rshrn<mode>_insn_le): Reimplement
with standard RTL codes instead of an UNSPEC.
(aarch64_rshrn<mode>_insn_be): Likewise.
(aarch64_rshrn<mode>): Adjust for the above.
* config/aarch64/predicates.md (aarch64_simd_rshrn_imm_vec): Define.

libsanitizer: change LOCAL_PATCHES revision

libsanitizer/ChangeLog:

* LOCAL_PATCHES: Change revision.

libsanitizer: Apply local patches

libsanitizer: merge from upstream (3185e47b5a8444e9fd).

RISC-V: Legitimise the const0_rtx for RVV load/store address

This patch try to legitimise the const0_rtx (aka zero register)
as the base register for the RVV load/store instructions.

For example:
vint32m1_t test_vle32_v_i32m1_shortcut (size_t vl)
{
  return __riscv_vle32_v_i32m1 ((int32_t *)0, vl);
}

Before this patch:
li      a5,0
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(a5)  <- can propagate the const 0 to a5 here
vs1r.v  v24,0(a0)

After this patch:
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(zero)
vs1r.v  v24,0(a0)

As above, this patch allow you to propagate the const 0 (aka zero
register) to the base register of the RVV Unit-Stride load in the
combine pass. This may benefit the underlying RVV auto-vectorization.

However, the indexed load failed to perform the optimization and it
will be take care of in another PATCH.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_classify_address): Allow
const0_rtx for the RVV load/store.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

Remove legacy range support.

This patch removes all the code paths guarded by legacy_mode_p(), thus
allowing us to re-use the int_range<1> idiom for a range of one
sub-range. This allows us to represent these simple ranges in a more
efficient manner.

gcc/ChangeLog:

* range-op.cc (range_op_cast_tests): Remove legacy support.
* value-range-storage.h (vrange_allocator::alloc_irange): Same.
* value-range.cc (irange::operator=): Same.
(get_legacy_range): Same.
(irange::copy_legacy_to_multi_range): Delete.
(irange::copy_to_legacy): Delete.
(irange::irange_set_anti_range): Delete.
(irange::set): Remove legacy support.
(irange::verify_range): Same.
(irange::legacy_lower_bound): Delete.
(irange::legacy_upper_bound): Delete.
(irange::legacy_equal_p): Delete.
(irange::operator==): Remove legacy support.
(irange::singleton_p): Same.
(irange::value_inside_range): Same.
(irange::contains_p): Same.
(intersect_ranges): Delete.
(irange::legacy_intersect): Delete.
(union_ranges): Delete.
(irange::legacy_union): Delete.
(irange::legacy_verbose_union_): Delete.
(irange::legacy_verbose_intersect): Delete.
(irange::irange_union): Remove legacy support.
(irange::irange_intersect): Same.
(irange::intersect): Same.
(irange::invert): Same.
(ranges_from_anti_range): Delete.
(gt_pch_nx): Adjust for legacy removal.
(gt_ggc_mx): Same.
(range_tests_legacy): Delete.
(range_tests_misc): Adjust for legacy removal.
(range_tests): Same.
* value-range.h (class irange): Same.
(irange::legacy_mode_p): Delete.
(ranges_from_anti_range): Delete.
(irange::nonzero_p): Adjust for legacy removal.
(irange::lower_bound): Same.
(irange::upper_bound): Same.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
* vr-values.cc (simplify_using_ranges::legacy_fold_cond_overflow): Same.

Remove range_has_numeric_bounds_p.

gcc/ChangeLog:

* value-range.cc (irange::copy_legacy_to_multi_range): Rewrite use
of range_has_numeric_bounds_p with irange API.
(range_has_numeric_bounds_p): Delete.
* value-range.h (range_has_numeric_bounds_p): Delete.

Remove range_int_cst_p.

gcc/ChangeLog:

* tree-data-ref.cc (compute_distributive_range): Replace uses of
range_int_cst_p with irange API.
* tree-ssa-strlen.cc (get_range_strlen_dynamic): Same.
* tree-vrp.h (range_int_cst_p): Delete.
* vr-values.cc (check_for_binary_op_overflow): Replace usees of
range_int_cst_p with irange API.
(vr_set_zero_nonzero_bits): Same.
(range_fits_type_p): Same.
(simplify_using_ranges::simplify_casted_cond): Same.
* tree-vrp.cc (range_int_cst_p): Remove.

Convert compare_nonzero_chars to wide_ints.

gcc/ChangeLog:

* tree-ssa-strlen.cc (compare_nonzero_chars): Convert to wide_ints.

Remove some uses of deprecated irange API.

gcc/ChangeLog:

* builtins.cc (expand_builtin_strnlen): Rewrite deprecated irange
API uses to new API.
* gimple-predicate-analysis.cc (find_var_cmp_const): Same.
* internal-fn.cc (get_min_precision): Same.
* match.pd: Same.
* tree-affine.cc (expr_to_aff_combination): Same.
* tree-data-ref.cc (dr_step_indicator): Same.
* tree-dfa.cc (get_ref_base_and_extent): Same.
* tree-scalar-evolution.cc (iv_can_overflow_p): Same.
* tree-ssa-phiopt.cc (two_value_replacement): Same.
* tree-ssa-pre.cc (insert_into_preds_of_block): Same.
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Same.
* tree-ssa-strlen.cc (compare_nonzero_chars): Same.
* tree-switch-conversion.cc (bit_test_cluster::emit): Same.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Same.
* tree.cc (get_range_pos_neg): Same.

Replace ad-hoc value_range dumpers with irange::dump.

This causes a regression in gcc.c-torture/unsorted/dump-noaddr.c.

The test is asserting that two dumps are identical, but they are not
because irange dumps the type which varies between runs:

< VR [irange] void (*<T3dc>) (int) [1, +INF]
> VR [irange] void (*<T3da>) (int) [1, +INF]

I have changed the pretty printer for irange types to pass TDF_NOUID,
thus avoiding this problem.

gcc/ChangeLog:

* ipa-prop.cc (ipa_print_node_jump_functions_for_edge): Use
vrange::dump instead of ad-hoc dumper.
* tree-ssa-strlen.cc (dump_strlen_info): Same.
* value-range-pretty-print.cc (visit): Pass TDF_NOUID to
dump_generic_node.

Fix swapping of ranges.

The legacy range code has logic to swap out of order endpoints in the
irange constructor.  The new irange code expects the caller to fix any
inconsistencies, thus speeding up the common case.  However, this means
that when we remove legacy, any stragglers must be fixed.  This patch
fixes the 3 culprits found during the conversion.

gcc/ChangeLog:

* range-op.cc (operator_cast::op1_range): Use
create_possibly_reversed_range.
(operator_bitwise_and::simple_op1_range_solver): Same.
* value-range.cc (swap_out_of_order_endpoints): Delete.
(irange::set): Remove call to swap_out_of_order_endpoints.

Convert users of legacy API to get_legacy_range() function.

This patch converts the users of the legacy API to a function called
get_legacy_range() which will return the pieces of the soon to be
removed API (min, max, and kind).  This is a temporary measure while
these users are converted.

In upcoming patches I will convert most users, but most of the
middle-end warning uses will remain.  Naive attempts to remove them
showed that a lot of these uses are quite dependant on the anti-range
idiom, and converting them to the new API broke the tests, even when
the conversion was conceptually correct.  Perhaps someone who
understands these passes could take a stab at it.  In the meantime,
the legacy uses can be trivially found by grepping for
get_legacy_range.

gcc/ChangeLog:

* builtins.cc (determine_block_size): Convert use of legacy API to
get_legacy_range.
* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Same.
(array_bounds_checker::check_array_ref): Same.
* gimple-ssa-warn-restrict.cc
(builtin_memref::extend_offset_range): Same.
* ipa-cp.cc (ipcp_store_vr_results): Same.
* ipa-fnsummary.cc (set_switch_stmt_execution_predicate): Same.
* ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Same.
(ipa_write_jump_function): Same.
* pointer-query.cc (get_size_range): Same.
* tree-data-ref.cc (split_constant_offset): Same.
* tree-ssa-strlen.cc (get_range): Same.
(maybe_diag_stxncpy_trunc): Same.
(strlen_pass::get_len_or_size): Same.
(strlen_pass::count_nonzero_bytes_addr): Same.
* tree-vect-patterns.cc (vect_get_range_info): Same.
* value-range.cc (irange::maybe_anti_range): Remove.
(get_legacy_range): New.
(irange::copy_to_legacy): Use get_legacy_range.
(ranges_from_anti_range): Same.
* value-range.h (class irange): Remove maybe_anti_range.
(get_legacy_range): New.
* vr-values.cc (check_for_binary_op_overflow): Convert use of
legacy API to get_legacy_range.
(compare_ranges): Same.
(compare_range_with_value): Same.
(bounds_of_var_in_loop): Same.
(find_case_label_ranges): Same.
(simplify_using_ranges::simplify_switch_using_ranges): Same.

Remove irange::constant_p.

gcc/ChangeLog:

* value-range-pretty-print.cc (vrange_printer::visit): Remove
constant_p use.
* value-range.cc (irange::constant_p): Remove.
(irange::get_nonzero_bits_from_range): Remove constant_p use.
* value-range.h (class irange): Remove constant_p.
(irange::num_pairs): Remove constant_p use.

Remove symbolics from irange.

gcc/ChangeLog:

* value-range.cc (irange::copy_legacy_to_multi_range): Remove
symbolics support.
(irange::set): Same.
(irange::legacy_lower_bound): Same.
(irange::legacy_upper_bound): Same.
(irange::contains_p): Same.
(range_tests_legacy): Same.
(irange::normalize_addresses): Remove.
(irange::normalize_symbolics): Remove.
(irange::symbolic_p): Remove.
* value-range.h (class irange): Remove symbolic_p,
normalize_symbolics, and normalize_addresses.
* vr-values.cc (simplify_using_ranges::two_valued_val_range_p):
Remove symbolics support.

Remove irange::may_contain_p.

The deprecated irange::may_contain_p method differed from contains_p
in that it could handle symbolics, which no longer exist in VRP.

gcc/ChangeLog:

* value-range.cc (irange::may_contain_p): Remove.
* value-range.h (range_includes_zero_p): Rewrite may_contain_p
usage with contains_p.
* vr-values.cc (compare_range_with_value): Same.

Remove range_fold_{unary,binary}_expr.

gcc/ChangeLog:

* tree-vrp.cc (supported_types_p): Remove.
(defined_ranges_p): Remove.
(range_fold_binary_expr): Remove.
(range_fold_unary_expr): Remove.
* tree-vrp.h (range_fold_unary_expr): Remove.
(range_fold_binary_expr): Remove.

Remove deprecated range_fold_{unary,binary}_expr uses from ipa-*.

gcc/ChangeLog:

* ipa-cp.cc (ipa_vr_operation_and_type_effects): Convert to ranger API.
(ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
* vr-values.cc (bounds_of_var_in_loop): Same.

Remove range_query::get_value_range.

gcc/ChangeLog:

* gimple-array-bounds.cc (array_bounds_checker::get_value_range):
Add irange argument.
(check_out_of_bounds_and_warn): Remove check for vr.
(array_bounds_checker::check_array_ref): Remove pointer qualifier
for vr and adjust accordingly.
* gimple-array-bounds.h (get_value_range): Add irange argument.
* value-query.cc (class equiv_allocator): Delete.
(range_query::get_value_range): Delete.
(range_query::range_query): Remove allocator access.
(range_query::~range_query): Same.
* value-query.h (get_value_range): Delete.
* vr-values.cc
(simplify_using_ranges::op_with_boolean_value_range_p): Remove
call to get_value_range.
(check_for_binary_op_overflow): Same.
(simplify_using_ranges::legacy_fold_cond_overflow): Same.
(simplify_using_ranges::simplify_abs_using_ranges): Same.
(simplify_using_ranges::simplify_cond_using_ranges_1): Same.
(simplify_using_ranges::simplify_casted_cond): Same.
(simplify_using_ranges::simplify_switch_using_ranges): Same.
(simplify_using_ranges::two_valued_val_range_p): Same.

Refactor vrp_evaluate_conditional* and rename it.

gcc/ChangeLog:

* vr-values.cc
(simplify_using_ranges::vrp_evaluate_conditional_warnv_with_ops):
Rename to...
(simplify_using_ranges::legacy_fold_cond_overflow): ...this.
(simplify_using_ranges::vrp_visit_cond_stmt): Rename to...
(simplify_using_ranges::legacy_fold_cond): ...this.
(simplify_using_ranges::fold_cond): Rename
vrp_evaluate_conditional_warnv_with_ops to
legacy_fold_cond_overflow.
* vr-values.h (class vr_values): Replace vrp_visit_cond_stmt and
vrp_evaluate_conditional_warnv_with_ops with legacy_fold_cond and
legacy_fold_cond_overflow respectively.

Remove compare_names* from legacy cond folding.

In a test run I have asserted that the legacy conditional folding only
gets overflows, so this removal is safe.

gcc/ChangeLog:

* vr-values.cc (get_vr_for_comparison): Remove.
(compare_name_with_value): Same.
(vrp_evaluate_conditional_warnv_with_ops): Remove calls to
compare_name_with_value.
* vr-values.h: Remove compare_name_with_value.
Remove get_vr_for_comparison.

[xstormy16] Add support for byte and word swapping instructions.

This patch adds support for xstormy16's swpb (swap bytes) and swpw (swap
words) instructions.  The most obvious application of these to implement
the __builtin_bswap16 and __builtin_bswap32 intrinsics.

Currently, __builtin_bswap16 is implemented as:
foo:    mov r7,r2
        shl r7,#8
        shr r2,#8
        or r2,r7
        ret

but with this patch becomes:
foo: swpb r2
ret

Likewise, __builtin_bswap32 now becomes:
foo: swpb r2 | swpb r3 | swpw r2,r3
        ret

Finally, the swpw instruction on its own can be used to exchange
two word mode registers without a temporary, so a new pattern and
peephole2 have been added to catch this.  As described in the
PR rtl-optimization/106518, register allocation can (in theory)
be more efficient on targets that provide a swap/exchange instruction.
The slightly unusual swap<mode> naming matches that used in i386.md.

2024-04-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.md (bswaphi2): New define_insn.
(bswapsi2): New define_insn.
(swaphi): New define_insn to exchange two registers (swpw).
(define_peephole2): Recognize exchange of registers as swaphi.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/bswap16.c: New test case.
* gcc.target/xstormy16/bswap32.c: Likewise.
* gcc.target/xstormy16/swpb.c: Likewise.
* gcc.target/xstormy16/swpw-1.c: Likewise.
* gcc.target/xstormy16/swpw-2.c: Likewise.

MAINTAINERS: fix alphabetic sorting

ChangeLog:

* MAINTAINERS: fix sorting

Update gennews for GCC 13.

2023-04-26 Jakub Jelinek <jakub@redhat.com>

* gennews (files): Add files for GCC 13.

More last_stmt removal

This adjusts more users of last_stmt where it is clear that debug
stmt skipping is unnecessary.  In most cases this also allowed
significant code simplification.

gcc/c/
* gimple-parser.cc (c_parser_parse_gimple_body): Avoid
last_stmt.

gcc/
* gimple-range-path.cc (path_range_query::compute_outgoing_relations):
Avoid last_stmt.
* ipa-pure-const.cc (pass_nothrow::execute): Likewise.
* predict.cc (apply_return_prediction): Likewise.
* sese.cc (set_ifsese_condition): Likewise.  Simplify.
* tree-cfg.cc (assert_unreachable_fallthru_edge_p): Avoid last_stmt.
(make_edges_bb): Likewise.
(make_cond_expr_edges): Likewise.
(end_recording_case_labels): Likewise.
(make_gimple_asm_edges): Likewise.
(cleanup_dead_labels): Likewise.
(group_case_labels): Likewise.
(gimple_can_merge_blocks_p): Likewise.
(gimple_merge_blocks): Likewise.
(find_taken_edge): Likewise.  Also handle empty fallthru blocks.
(gimple_duplicate_sese_tail): Avoid last_stmt.
(find_loop_dist_alias): Likewise.
(gimple_block_ends_with_condjump_p): Likewise.
(gimple_purge_dead_eh_edges): Likewise.
(gimple_purge_dead_abnormal_call_edges): Likewise.
(pass_warn_function_return::execute): Likewise.
(execute_fixup_cfg): Likewise.
* tree-eh.cc (redirect_eh_edge_1): Likewise.
(pass_lower_resx::execute): Likewise.
(pass_lower_eh_dispatch::execute): Likewise.
(cleanup_empty_eh): Likewise.
* tree-if-conv.cc (if_convertible_bb_p): Likewise.
(predicate_bbs): Likewise.
(ifcvt_split_critical_edges): Likewise.
* tree-loop-distribution.cc (create_edge_for_control_dependence):
Likewise.
(loop_distribution::transform_reduction_loop): Likewise.
* tree-parloops.cc (transform_to_exit_first_loop_alt): Likewise.
(try_transform_to_exit_first_loop_alt): Likewise.
(transform_to_exit_first_loop): Likewise.
(create_parallel_loop): Likewise.
* tree-scalar-evolution.cc (get_loop_exit_condition): Likewise.
* tree-ssa-dce.cc (mark_last_stmt_necessary): Likewise.
(eliminate_unnecessary_stmts): Likewise.
* tree-ssa-dom.cc
(dom_opt_dom_walker::set_global_ranges_from_unreachable_edges):
Likewise.
* tree-ssa-ifcombine.cc (ifcombine_ifandif): Likewise.
(pass_tree_ifcombine::execute): Likewise.
* tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Likewise.
(should_duplicate_loop_header_p): Likewise.
* tree-ssa-loop-ivcanon.cc (create_canonical_iv): Likewise.
(tree_estimate_loop_size): Likewise.
(try_unroll_loop_completely): Likewise.
* tree-ssa-loop-ivopts.cc (tree_ssa_iv_optimize_loop): Likewise.
* tree-ssa-loop-manip.cc (ip_normal_pos): Likewise.
(canonicalize_loop_ivs): Likewise.
* tree-ssa-loop-niter.cc (determine_value_range): Likewise.
(bound_difference): Likewise.
(number_of_iterations_popcount): Likewise.
(number_of_iterations_cltz): Likewise.
(number_of_iterations_cltz_complement): Likewise.
(simplify_using_initial_conditions): Likewise.
(number_of_iterations_exit_assumptions): Likewise.
(loop_niter_by_eval): Likewise.
(estimate_numbers_of_iterations): Likewise.

RISC-V: Fine tune vmadc/vmsbc RA constraint

gcc/ChangeLog:

* config/riscv/vector.md: Refine vmadc/vmsbc RA constraint.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/narrow_constraint-13.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-14.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-15.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-16.c: New test.

rs6000: Guard power9-vector for vsx_scalar_cmp_exp_qp_* [PR108758]

__builtin_vsx_scalar_cmp_exp_qp_{eq,gt,lt,unordered} used
to be guarded with condition TARGET_P9_VECTOR before new
bif framework was introduced (r12-5752-gd08236359eb229),
since r12-5752 they are placed under stanza ieee128-hw,
that is to check condition TARGET_FLOAT128_HW, it caused
test case float128-cmp2-runnable.c to fail at -m32 as the
condition TARGET_FLOAT128_HW isn't satisified with -m32.

By checking the commit history, I didn't see any notes on
why this condition change on them was made, so this patch
is to move these bifs from stanza ieee128-hw to stanza
power9-vector as before.

PR target/108758

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def
(__builtin_vsx_scalar_cmp_exp_qp_eq, __builtin_vsx_scalar_cmp_exp_qp_gt
__builtin_vsx_scalar_cmp_exp_qp_lt,
__builtin_vsx_scalar_cmp_exp_qp_unordered): Move from stanza ieee128-hw
to power9-vector.

rs6000: Fix predicate for const vector in sldoi_to_mov [PR109069]

As PR109069 shows, commit r12-6537-g080a06fcb076b3 which
introduces define_insn_and_split sldoi_to_mov adopts
easy_vector_constant for const vector of interest, but it's
wrong since predicate easy_vector_constant doesn't guarantee
each byte in the const vector is the same. One counter
example is the const vector in pr109069-1.c. This patch is
to introduce new predicate const_vector_each_byte_same to
ensure all bytes in the given const vector are the same by
considering both int and float, meanwhile for the constants
which don't meet easy_vector_constant we need to gen a move
instead of just a set, and uses VECTOR_MEM_ALTIVEC_OR_VSX_P
rather than VECTOR_UNIT_ALTIVEC_OR_VSX_P for V2DImode support
under VSX since vector long long type of vec_sld is guarded
under stanza vsx.

PR target/109069

gcc/ChangeLog:

* config/rs6000/altivec.md (sldoi_to_mov<mode>): Replace predicate
easy_vector_constant with const_vector_each_byte_same, add
handlings in preparation for !easy_vector_constant, and update
VECTOR_UNIT_ALTIVEC_OR_VSX_P with VECTOR_MEM_ALTIVEC_OR_VSX_P.
* config/rs6000/predicates.md (const_vector_each_byte_same): New
predicate.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109069-1.c: New test.
* gcc.target/powerpc/pr109069-2-run.c: New test.
* gcc.target/powerpc/pr109069-2.c: New test.
* gcc.target/powerpc/pr109069-2.h: New test.

RISC-V: Optimize comparison patterns for register allocation

Current RA constraint for RVV comparison instructions totall does not allow
registers between dest and source operand have any overlaps.

For example:
vmseq.vv vd, vs2, vs1
If LMUL = 8, vs2 = v8, vs1 = v16:

In current GCC RA constraint, GCC does not allow vd to be any regno in v8 ~ v23.
However, it is too conservative and not true according to RVV ISA.

Since the dest EEW of comparison is always EEW = 1, so it always follows the overlap
rules of Dest EEW < Source EEW. So in this case, we should allow GCC RA have the chance
to allocate v8 or v16 for vd, so that we can have better vector registers usage in RA.

gcc/ChangeLog:

* config/riscv/vector.md (*pred_cmp<mode>_merge_tie_mask): New pattern.
(*pred_ltge<mode>_merge_tie_mask): Ditto.
(*pred_cmp<mode>_scalar_merge_tie_mask): Ditto.
(*pred_eqne<mode>_scalar_merge_tie_mask): Ditto.
(*pred_cmp<mode>_extended_scalar_merge_tie_mask): Ditto.
(*pred_eqne<mode>_extended_scalar_merge_tie_mask): Ditto.
(*pred_cmp<mode>_narrow_merge_tie_mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vv_constraint-4.c: Adapt testcase.
* gcc.target/riscv/rvv/base/narrow_constraint-17.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-18.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-19.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-20.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-21.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-22.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-23.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-24.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-25.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-26.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-27.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-28.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-29.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-30.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-31.c: New test.

RISC-V: Fix redundant vmv1r.v instruction in vmsge.vx codegen

Current expansion of vmsge will make RA produce redundant vmv1r.v.

testcase:
void f1 (void * in, void *out, int32_t x)
{
    vbool32_t mask = *(vbool32_t*)in;
    asm volatile ("":::"memory");
    vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
    vint32m1_t v2 = __riscv_vle32_v_i32m1_m (mask, in, 4);
    vbool32_t m3 = __riscv_vmsge_vx_i32m1_b32 (v, x, 4);
    vbool32_t m4 = __riscv_vmsge_vx_i32m1_b32_mu (mask, m3, v, x, 4);
    m4 = __riscv_vmsge_vv_i32m1_b32_m (m4, v2, v2, 4);
    __riscv_vsm_v_b32 (out, m4, 4);
}

Before this patch:
f1:
vsetvli a5,zero,e8,mf4,ta,ma
vlm.v   v0,0(a0)
vsetivli zero,4,e32,m1,ta,mu
vle32.v v3,0(a0)
vle32.v v2,0(a0),v0.t
vmslt.vx v1,v3,a2
vmnot.m v1,v1
vmslt.vx v1,v3,a2,v0.t
vmxor.mm v1,v1,v0
vmv1r.v v0,v1
vmsge.vv v2,v2,v2,v0.t
vsm.v   v2,0(a1)
ret

After this patch:
f1:
vsetvli a5,zero,e8,mf4,ta,ma
vlm.v   v0,0(a0)
vsetivli zero,4,e32,m1,ta,mu
vle32.v v3,0(a0)
vle32.v v2,0(a0),v0.t
vmslt.vx v1,v3,a2
vmnot.m v1,v1
vmslt.vx v1,v3,a2,v0.t
vmxor.mm v0,v1,v0
vmsge.vv v2,v2,v2,v0.t
vsm.v   v2,0(a1)
ret

gcc/ChangeLog:

* config/riscv/vector.md: Fix redundant vmv1r.v.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vx_constraint-150.c: Adapt assembly
check.

RISC-V: Fine tune gather load RA constraint

For DEST EEW < SOURCE EEW, we can partial overlap register
according to RVV ISA.

gcc/ChangeLog:

* config/riscv/vector.md: Fix RA constraint.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/narrow_constraint-12.c: New test.

RISC-V: Bugfix for RVV vbool*_t vn_reference_equal

In most architecture the precision_size of vbool*_t types are caculated
like as the multiple of the type size.  For example:
precision_size = type_size * 8 (aka, bit count per bytes).

Unfortunately, some architecture like RISC-V will adjust the
precision_size
for the vbool*_t in order to align the ISA. For example as below.
type_size      = [1, 1, 1, 1,  2,  4,  8]
precision_size = [1, 2, 4, 8, 16, 32, 64]

Then the precision_size of RISC-V vbool*_t will not be the multiple of
the
type_size. This PATCH try to enrich this case when comparing the
vn_reference.

Given we have the below code:
void test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict
out) {
    vbool8_t v1 = *(vbool8_t*)in;
    vbool16_t v2 = *(vbool16_t*)in;

    *(vbool8_t*)(out + 100) = v1;
    *(vbool16_t*)(out + 200) = v2;
}

Before this PATCH:
csrr    t0,vlenb
slli    t1,t0,1
csrr    a3,vlenb
sub     sp,sp,t1
slli    a4,a3,1
add     a4,a4,sp
addi    a2,a1,100
vsetvli a5,zero,e8,m1,ta,ma
sub     a3,a4,a3
vlm.v   v24,0(a0)
vsm.v   v24,0(a2)
vsm.v   v24,0(a3)
addi    a1,a1,200
csrr    t0,vlenb
vsetvli a4,zero,e8,mf2,ta,ma
slli    t1,t0,1
vlm.v   v24,0(a3)
vsm.v   v24,0(a1)
add     sp,sp,t1
jr      ra

After this PATCH:
addi    a3,a1,100
vsetvli a4,zero,e8,m1,ta,ma
addi    a1,a1,200
vlm.v   v24,0(a0)
vsm.v   v24,0(a3)
vsetvli a5,zero,e8,mf2,ta,ma
vlm.v   v24,0(a0)
vsm.v   v24,0(a1)
ret

PR target/109272

gcc/ChangeLog:

* tree-ssa-sccvn.cc (vn_reference_eq): add type vector subparts
check for vn_reference equal.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr108185-4.c: Update test check
condition.
* gcc.target/riscv/rvv/base/pr108185-5.c: Likewise.
* gcc.target/riscv/rvv/base/pr108185-6.c: Likewise.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add auto-vectorization compile option for RVV

This patch is adding 2 compile option for RVV auto-vectorization.
1. -param=riscv-autovec-preference=
   This option is to specify the auto-vectorization approach for RVV.
   Currently, we only support scalable and fixed-vlmax.

    - scalable means VLA auto-vectorization. The vector-length to compiler is
      unknown and runtime invariant. Such approach can allow us compile the code
      run on any vector-length RVV CPU.

    - fixed-vlmax means the compile known the RVV CPU vector-length, compile option
      in fixed-length VLS auto-vectorization. Meaning if we specify vector-length=512.
      The execution file can only run on vector-length = 512 RVV CPU.

    - TODO: we may need to support min-length VLS auto-vectorization, means the execution
      file can run on larger length RVV CPU.
2. -param=riscv-autovec-lmul=
   Specify LMUL choosing for RVV auto-vectorization.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_preference_enum): Add enum for
auto-vectorization preference.
(enum riscv_autovec_lmul_enum): Add enum for choosing LMUL of RVV
auto-vectorization.
* config/riscv/riscv.opt: Add compile option for RVV auto-vectorization.

avoid splitting small constants in bcrli_nottwobits patterns

I have noticed that in the case when we try to clear two bits through a
small constant,
and ZBS is enabled then GCC split it into two "andi" instructions.
For example for the following C code:
  int foo(int a) {
    return a & ~ 0x101;
  }

GCC generates the following:
  foo:
     andi a0,a0,-2
     andi a0,a0,-257
     ret

but should be this one:
  foo:
     andi a0,a0,-258
     ret

This patch solves the mentioned issue.

gcc/ChangeLog
* config/riscv/bitmanip.md: Updated predicates of bclri<mode>_nottwobits
and bclridisi_nottwobits patterns.
* config/riscv/predicates.md: (not_uimm_extra_bit_or_nottwobits): Adjust
predicate to avoid splitting arith constants.
(const_nottwobits_not_arith_operand): New predicate.

gcc/testsuite
* gcc.target/riscv/zbs-bclri-nottwobits.c: New test.

PR modula2/108121 Re-implement overflow detection for constant literals

This patch fixes the overflow detection for constant literals.
The ZTYPE is changed to int128 (or int64) if int128 is unavailable and
constant literals are built from widest_int. The widest_int is converted
into the tree type and checked for overflow.
m2expr_interpret_integer and append_m2_digit are removed.

gcc/m2/ChangeLog:

PR modula2/108121
* gm2-compiler/M2ALU.mod (Less): Reformatted.
* gm2-compiler/SymbolTable.mod (DetermineSizeOfConstant): Remove
from import.
(ConstantStringExceedsZType): Import.
(GetConstLitType): Re-implement using ConstantStringExceedsZType.
* gm2-gcc/m2decl.cc (m2decl_DetermineSizeOfConstant): Remove.
(m2decl_ConstantStringExceedsZType): New function.
(m2decl_BuildConstLiteralNumber): Re-implement.
* gm2-gcc/m2decl.def (DetermineSizeOfConstant): Remove.
(ConstantStringExceedsZType): New function.
* gm2-gcc/m2decl.h (m2decl_DetermineSizeOfConstant): Remove.
(m2decl_ConstantStringExceedsZType): New function.
* gm2-gcc/m2expr.cc (append_digit): Remove.
(m2expr_interpret_integer): Remove.
(append_m2_digit): Remove.
(m2expr_StrToWideInt): New function.
(m2expr_interpret_m2_integer): Remove.
* gm2-gcc/m2expr.def (CheckConstStrZtypeRange): New function.
* gm2-gcc/m2expr.h (m2expr_StrToWideInt): New function.
* gm2-gcc/m2type.cc (build_m2_word64_type_node): New function.
(build_m2_ztype_node): New function.
(m2type_InitBaseTypes): Call build_m2_ztype_node.
* gm2-lang.cc (gm2_type_for_size): Re-write using early returns.

gcc/testsuite/ChangeLog:

PR modula2/108121
* gm2/pim/fail/largeconst.mod: Increased constant value test
to fail now that cc1gm2 uses widest_int to represent a ZTYPE.
* gm2/pim/fail/largeconst2.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Daily bump.

recog.cc: Correct comments referring to parameter match_len

* recog.cc (peep2_attempt, peep2_update_life): Correct
head-comment description of parameter match_len.

Regenerate gcc.pot

* gcc.pot: Regenerate.

c++: value dependence of by-ref lambda capture [PR108975]

We are still ICEing on the generic lambda version of the testcase from
this PR, even after r13-6743-g6f90de97634d6f, due to the by-ref capture
of the constant local variable 'dim' being considered value-dependent
when regenerating the lambda (at which point processing_template_decl is
set since the lambda is generic), which prevents us from constant folding
its uses.  Later during prune_lambda_captures we end up not thoroughly
walking the body of the lambda and overlook the (non-folded) uses of
'dim' within the array bound and using-decls.

We could fix this by making prune_lambda_captures walk the body of the
lambda more thoroughly so that it finds these uses of 'dim', but ideally
we should be able to constant fold all uses of 'dim' ahead of time and
prune the implicit capture after all.

To that end this patch makes value_dependent_expression_p return false
for such by-ref captures of constant local variables, allowing their
uses to get constant folded ahead of time.  It seems we just need to
disable the predicate's conservative early exit for reference variables
(added by r5-5022-g51d72abe5ea04e) when DECL_HAS_VALUE_EXPR_P.  This
effectively makes us treat by-value and by-ref captures more consistently
when it comes to value dependence.

PR c++/108975

gcc/cp/ChangeLog:

* pt.cc (value_dependent_expression_p) <case VAR_DECL>:
Suppress conservative early exit for reference variables
when DECL_HAS_VALUE_EXPR_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-const11a.C: New test.

riscv: relax splitter restrictions for creating pseudos

[partial addressing of PR/109279]

RISCV splitters have restrictions to not create pesudos due to a combine
limitatation. And despite this being a split-during-combine limitation,
all split passes take the hit due to way define*_split are used in gcc.

With the original combine issue being fixed 61bee6aed2 ("combine: Don't
record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]")
the RV splitters can now be relaxed.

This improves the codegen in general. e.g.

long long f(void) { return 0x0101010101010101ull; }

Before

li a0,0x01010000
addi a0,0x0101
slli a0,a0,16
addi a0,a0,0x0101
slli a0,a0,16
addi a0,a0,0x0101
ret

With patch

li a5,0x01010000
addi a5,a5,0x0101
mv a0,a5
slli a5,a5,32
add a0,a5,a0
ret

This reduces the qemu icounts, even if slightly, across SPEC2017.

500.perlbench_r 0 1235310737733 1231742384460 0.29%
1 744489708820 743515759958
2 714072106766 712875768625 0.17%
502.gcc_r 0 197365353269 197178223030
1 235614445254 235465240341
2 226769189971 226604663947
3 188315686133 188123584015
4 289372107644 289187945424
503.bwaves_r 0 326291538768 326291539697
1 515809487294 515809488863
2 401647004144 401647005463
3 488750661035 488750662484
505.mcf_r 0 681926695281 681925418147
507.cactuBSSN_r 0 3832240965352 3832226068734
508.namd_r 0 1919838790866 1919832527292
510.parest_r 0 3515999635520 3515878553435
511.povray_r 0 3073889223775 3074758622749
519.lbm_r 0 1194077464296 1194077464041
520.omnetpp_r 0 1014144252460 1011530791131 0.26%
521.wrf_r 0 3966715533120 3966265425092
523.xalancbmk_r 0 1064914296949 1064506711802
525.x264_r 0 509290028335 509258131632
1 2001424246635 2001677767181
2 1914660798226 1914869407575
526.blender_r 0 1726083839515 1725974286174
527.cam4_r 0 2336526136415 2333656336419
531.deepsjeng_r 0 1689007489539 1686541299243 0.15%
538.imagick_r 0 3247960667520 3247942048723
541.leela_r 0 2072315300365 2070248271250
544.nab_r 0 1527909091282 1527906483039
548.exchange2_r 0 2086120304280 2086314757502
549.fotonik3d_r 0 2261694058444 2261670330720
554.roms_r 0 2640547903140 2640512733483
557.xz_r 0 388736881767 386880875636 0.48%
1 959356981818 959993132842
2 547643353034 546374038310 0.23%
997.specrand_fr 0 512881578 512599641
999.specrand_ir 0 512881578 512599641

This is testsuite clean, no regression w/ patch.

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
rv64imafdc/  lp64d/ medlow |    2 /     2 |    1 /     1 |    6 /     1 |
   rv64imac/   lp64/ medlow |    3 /     3 |    1 /     1 |   43 /     8 |
rv32imafdc/ ilp32d/ medlow |    1 /     1 |    3 /     2 |    6 /     1 |
   rv32imac/  ilp32/ medlow |    1 /     1 |    3 /     2 |   43 /     8 |

This came up as part of IRC chat on PR/109279 and was suggested by
Andrew Pinski.

gcc/ChangeLog:

* config/riscv/riscv.md: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.
* config/riscv/riscv.cc: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.
riscv_force_temporary() drop in_splitter arg.
* config/riscv/riscv-protos.h: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

Avoid creating useless debug temporaries

insert_debug_temp_for_var_def has some strange code whereby it creates
debug temporaries for SINGLE_RHS (RHS for gimple_assign_single_p) but
not for other RHS in the same situation.

gcc/
* tree-ssa.cc (insert_debug_temp_for_var_def): Do not create
superfluous debug temporaries for single GIMPLE assignments.

tree-optimization/109609 - correctly interpret arg size in fnspec

By majority vote and a hint from the API name which is
arg_max_access_size_given_by_arg_p this interprets a memory access
size specified as given as other argument such as for strncpy
in the testcase which has "1cO313" as specifying the _maximum_
size read/written rather than the exact size. There are two
uses interpreting it that way already and one differing. The
following adjusts the differing and clarifies the documentation.

PR tree-optimization/109609
* attr-fnspec.h (arg_max_access_size_given_by_arg_p):
Clarify semantics.
* tree-ssa-alias.cc (check_fnspec): Correctly interpret
the size given by arg_max_access_size_given_by_arg_p as
maximum, not exact, size.

* gcc.dg/torture/pr109609.c: New testcase.

'omp scan' struct block seq update for OpenMP 5.x

While OpenMP 5.0 required a single structured block before and after the
'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence,
denoting 2 or more executable statements in OpenMP 5.1 (whoops!) and zero or
more in OpenMP 5.2. This commit updates C/C++ to accept zero statements (but
till requires the '{' ... '}' for the final-loop-body) and updates Fortran
to accept zero or more than one statements.

If there is no preceeding or succeeding executable statement, a warning is
shown.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_scan_loop_body): Handle
zero exec statements before/after 'omp scan'.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_scan_loop_body): Handle
zero exec statements before/after 'omp scan'.

gcc/fortran/ChangeLog:

* openmp.cc (gfc_resolve_omp_do_blocks): Handle zero
or more than one exec statements before/after 'omp scan'.
* trans-openmp.cc (gfc_trans_omp_do): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/scan-1.c: New test.
* testsuite/libgomp.c/scan-23.c: New test.
* testsuite/libgomp.fortran/scan-2.f90: New test.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/attrs-7.C: Update dg-error/dg-warning.
* gfortran.dg/gomp/loop-2.f90: Likewise.
* gfortran.dg/gomp/reduction5.f90: Likewise.
* gfortran.dg/gomp/reduction6.f90: Likewise.
* gfortran.dg/gomp/scan-1.f90: Likewise.
* gfortran.dg/gomp/taskloop-2.f90: Likewise.
* c-c++-common/gomp/scan-6.c: New test.
* gfortran.dg/gomp/scan-8.f90: New test.

testsuite: Fix up ext-floating2.C on powerpc64-linux

Another testcase that is failing on powerpc64-linux.  The test expects
a diagnostics when float64 && float128 or in another spot when
float32 && float128.  Now, float128 effective target is satisfied on
powerpc64-linux, despite __CPP_FLOAT128_T__ not being defined, because
one needs to add some extra options for it.  I think 32-bit arm has
similar case for float16.

2023-04-25  Jakub Jelinek  <jakub@redhat.com>

* g++.dg/cpp23/ext-floating2.C: Add dg-add-options for
float16, float32, float64 and float128.

aarch64: PR target/PR99195 Annotate more simple integer binary patterns with vcz subst rules

This patch adds more straightforward annotations to some more integer binary ops to
eliminate redundant fmovs around 64-bit SIMD results.

Bootstrapped and tested on aarch64-none-linux.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (orn<mode>3): Rename to...
(orn<mode>3<vczle><vczbe>): ... This.
(bic<mode>3): Rename to...
(bic<mode>3<vczle><vczbe>): ... This.
(<su><maxmin><mode>3): Rename to...
(<su><maxmin><mode>3<vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for orn, bic, max and min.

aarch64: Implement V2DI,V4SI division optabs for TARGET_SVE

Similar to the mulv2di case, we can use SVE instruction to implement the V4SI and V2DI optabs
for signed and unsigned integer division.
This allows us to generate much cleaner code for the testcase than the current:
food:
        fmov    x1, d1
        fmov    x0, d0
        umov    x2, v0.d[1]
        sdiv    x0, x0, x1
        umov    x1, v1.d[1]
        sdiv    x1, x2, x1
        fmov    d0, x0
        ins     v0.d[1], x1
        ret
which now becomes:
food:
        ptrue   p0.b, all
        sdiv    z0.d, p0/m, z0.d, z1.d
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (<su_optab>div<mode>3): New define_expand.
* config/aarch64/iterators.md (VQDIV): New mode iterator.
(vnx2di): New mode attribute.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve-neon-modes_3.c: New test.

testsuite: Fix up ext-floating15.C tests on powerpc64-linux [PR109278]

I've noticed this test FAILs on powerpc64-linux, with
FAIL: g++.dg/cpp23/ext-floating15.C -std=gnu++98 (test for excess errors)
Excess errors:
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:1: error: variable or field 'bar' declared void
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:6: error: expected primary-expression before '_Float128'
and similarly other std versions.
powerpc64-linux is float128 target, but needs to add some options for it.

Fixed by adding them.

2023-04-25 Jakub Jelinek <jakub@redhat.com>

PR c++/109278
* g++.dg/cpp23/ext-floating15.C: Add dg-add-options float128.

rtl-optimization/109585 - alias analysis typo

When r10-514-gc6b84edb6110dd2b4fb improved access path analysis
it introduced a typo that triggers when there's an access to a
trailing array in the first access path leading to false
disambiguation.

PR rtl-optimization/109585
* tree-ssa-alias.cc (aliasing_component_refs_p): Fix typo.

* gcc.dg/torture/pr109585.c: New testcase.

powerpc: Fix up *branch_anddi3_dot for -m32 -mpowerpc64 [PR109566]

The following testcase reduced from newlib ICEs on powerpc-linux,
with -O2 -m32 -mpowerpc64 since r12-6433 PR102239 optimization was
added and on the original testcase since some ranger improvements in
GCC 13 made it no longer latent on newlib.
The problem is that the *branch_anddi3_dot define_insn_and_split
relies on the *rotldi3_mask_dot define_insn_and_split being recognized
during splitting.  The rs6000_is_valid_rotate_dot_mask function checks whether
the mask is a CONST_INT which is a valid mask, but *rotl<mode>3_mask_dot in
addition to checking that it is a valid mask also has
  (<MODE>mode == Pmode || UINTVAL (operands[3]) <= 0x7fffffff)
test in the condition.  For TARGET_64BIT that doesn't add any further
requirements, but for !TARGET_64BIT && TARGET_POWERPC64 if the AND
second operand is larger than INT_MAX it will not be recognized.

The rs6000_is_valid_rotate_dot_mask function is used solely in one spot,
condition of *branch_anddi3_dot, so the following patch adjusts it
to check for that as well.

2023-04-25  Jakub Jelinek  <jakub@redhat.com>

PR target/109566
* config/rs6000/rs6000.cc (rs6000_is_valid_rotate_dot_mask): For
!TARGET_64BIT, don't return true if UINTVAL (mask) << (63 - nb)
is larger than signed int maximum.

* gcc.target/powerpc/pr109566.c: New test.

gcov: add info about "calls" to JSON output format

gcc/ChangeLog:

* doc/gcov.texi: Document the new "calls" field and document
the API bump. Mention also "block_ids" for lines.
* gcov.cc (output_intermediate_json_line): Output info about
calls and extend branches as well.
(generate_results): Bump version to 2.
(output_line_details): Use block ID instead of a non-sensual
index.

gcc/testsuite/ChangeLog:

* g++.dg/gcov/gcov-17.C: Add call to a noreturn function.
* g++.dg/gcov/test-gcov-17.py: Cover new format.
* lib/gcov.exp: Add options for gcov that emit the extra info.

[Committed] Correct zeroextendqihi2 insn length regression on xstormy16.

My recent tweak to the zeroextendqihi2 pattern on xstormy16 incorrectly
handled the case where the operand was a MEM.  MEM operands use a longer
encoding than REG operands, and the incorrect instruction length resulted
in assembler errors (as reported by Jeff Law).  This patch restores the
original length resolving this regression.  Sorry for the inconvenience.
Committed as obvious, after testing that a cross-compiler to xstormy16-elf
builds from x86_64-pc-linux-gnu, and that gcc.c-torture/execute/memset-2.c
no longer causes "operand out of range" issues in gas.  Committed as
obvious.

2023-04-25  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.md (zero_extendqihi2): Restore/fix
length attribute for the first (memory operand) alternative.

aarch64: Leveraging the use of STP instruction for vec_duplicate

The backend pattern for storing a pair of identical values in 32 and
64-bit modes with the machine instruction STP was missing, and
multiple instructions were needed to reproduce this behavior as a
result of failed RTL pattern match in combine pass.

For the test case:

typedef long long v2di __attribute__((vector_size (16)));
typedef int v2si __attribute__((vector_size (8)));

void
foo (v2di *x, long long a)
{
  v2di tmp = {a, a};
  *x = tmp;
}

void
foo2 (v2si *x, int a)
{
  v2si tmp = {a, a};
  *x = tmp;
}

at -O2 on aarch64 gives:

foo:
    stp x1, x1, [x0]
    ret
foo2:
    stp w1, w1, [x0]
    ret

instead of:

foo:
        dup     v0.2d, x1
        str     q0, [x0]
        ret
foo2:
        dup     v0.2s, w1
        str     d0, [x0]
        ret

Bootstrapped and regtested on aarch64-none-linux-gnu.

gcc/
* config/aarch64/aarch64-simd.md(aarch64_simd_stp<mode>): New.
* config/aarch64/constraints.md: Make "Umn" relaxed memory
constraint.
* config/aarch64/iterators.md(ldpstp_vel_sz): New.

gcc/testsuite/
* gcc.target/aarch64/stp_vec_dup_32_64-1.c: New.

Remove default constructor to nan_state.

I think it's best to specify the default behavior of nan_state, since
it's not obvious that nan_state() defaults to TRUE. Also, this avoids
the ugly nan_state(false, false) idiom.

gcc/ChangeLog:

* value-range.cc (frange::set): Adjust constructor.
* value-range.h (nan_state::nan_state): Replace default
constructor with one taking an argument.

MAINTAINERS: add myself to write after approval

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.

Remove obsolete configure code in gnattools

It was recently pointed out that we generate symbolic links to ghost files
when building the GNAT tools, as the mlib-tgt-specific-*.adb files are gone.

gnattools/
* configure.ac (TOOLS_TARGET_PAIRS): Remove obsolete settings.
(EXTRA_GNATTOOLS): Likewise.
* configure: Regenerate.

Pass correct type to irange::contains_p() in ipa-cp.cc.

There is a call to contains_p() in ipa-cp.cc which passes incompatible
types. This currently works because deep in the call chain, the legacy
code uses tree_int_cst_lt which performs the operation with
widest_int. With the upcoming removal of legacy, contains_p() will be
stricter.

gcc/ChangeLog:

* ipa-cp.cc (ipa_range_contains_p): New.
(decide_whether_version_node): Use it.

[PATCH v2] testsuite: Add testcase for sparc ICE [PR105573]

r11-10018-g33914983cf3734c2f8079963ba49fcc117499ef3 fixed PR105312 and added
a test case for target/arm but the duplicate PR105573 has a test case for
target/sparc that was uncommitted until now.

2023-04-21 Sam James <sam@gentoo.org>

PR tree-optimization/105312
PR target/105573
gcc/testsuite/
* gcc.target/sparc/pr105573.c: New test.