gcc.gnu.org Git - gcc.git/log

c++: various code cleanups

* Harden some tree accessor macros and fix a couple of bad
  PLACEHOLDER_TYPE_CONSTRAINTS accesses uncovered by this.
* Use strip_innermost_template_args in outer_template_args.
* Add !processing_template_decl early exit tests to some dependence
  predicates.

gcc/cp/ChangeLog:

* cp-tree.h (PLACEHOLDER_TYPE_CONSTRAINTS_INFO): Harden via
TEMPLATE_TYPE_PARM_CHECK.
(TPARMS_PRIMARY_TEMPLATE): Harden via TREE_VEC_CHECK.
(TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL): Harden via
TEMPLATE_TEMPLATE_PARM_CHECK.
* cxx-pretty-print.cc (cxx_pretty_printer::simple_type_specifier):
Guard PLACEHOLDER_TYPE_CONSTRAINTS access.
* error.cc (dump_type) <case TEMPLATE_TYPE_PARM>: Use separate
variable to store CLASS_PLACEHOLDER_TEMPLATE result.
* pt.cc (outer_template_args): Use strip_innermost_template_args.
(any_type_dependent_arguments_p): Exit early if
!processing_template_decl.  Use range-based for.
(any_dependent_template_arguments_p): Likewise.

c++: parenthesized -> resolving to static data member [PR98283]

Here we're neglecting to propagate parenthesized-ness when the
member access (this->m) resolves to a static data member (and
thus finish_class_member_access_expr yields a VAR_DECL instead
of a COMPONENT_REF).

PR c++/98283

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build) <case COMPONENT_REF>: Propagate
REF_PARENTHESIZED_P more generally via force_paren_expr.
* semantics.cc (force_paren_expr): Document default argument.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/paren6.C: New test.

c++: bound ttp in lambda function type [PR109651]

After r14-11-g2245459c85a3f4 we now coerce the template arguments of a
bound ttp again after level-lowering it.  Notably a level-lowered ttp
doesn't have DECL_CONTEXT set, so during this coercion we fall back to
using current_template_parms to obtain the relevant set of in-scope
parameters.

But it turns out current_template_parms isn't properly set when
substituting the function type of a generic lambda, and so if the type
contains bound ttps that need to be lowered we'll crash during their
attempted coercion.  Specifically in the first testcase below,
current_template_parms during the lambda type substitution (with T=int)
is "1 U" instead of the expected "2 TT, 1 U", and we crash when level
lowering TT<int>.

Ultimately the problem is that tsubst_lambda_expr does things in the
wrong order: we ought to substitute (and install) the in-scope template
parameters _before_ substituting anything that may use those template
parameters (such as the function type of a generic lambda).  This patch
corrects this substitution order.

PR c++/109651

gcc/cp/ChangeLog:

* pt.cc (coerce_template_args_for_ttp): Mention we can hit the
current_template_parms fallback when level-lowering a bound ttp.
(tsubst_template_decl): Add lambda_tparms parameter.  Prefer to
use lambda_tparms instead of substituting DECL_TEMPLATE_PARMS.
(tsubst_decl) <case TEMPLATE_DECL>: Pass NULL_TREE as lambda_tparms
to tsubst_template_decl.
(tsubst_lambda_expr): For a generic lambda, substitute
DECL_TEMPLATE_PARMS and set current_template_parms to it
before substituting the function type.  Pass the substituted
DECL_TEMPLATE_PARMS as lambda_tparms to tsubst_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic-ttp1.C: New test.
* g++.dg/cpp2a/lambda-generic-ttp2.C: New test.

Fix aarch64/109762: push_options/push_options does not work sometimes

aarch64_isa_flags (and aarch64_asm_isa_flags) are both aarch64_feature_flags (uint64_t)
but since r12-8000-g14814e20161d, they are saved/restored as unsigned long. This
does not make a difference for LP64 targets but on ILP32 and LLP64IL32 targets,
it means it does not get restored correctly.
This patch changes over to use aarch64_feature_flags instead of unsigned long.

Committed as obvious after a bootstrap/test.

gcc/ChangeLog:

PR target/109762
* config/aarch64/aarch64-builtins.cc (aarch64_simd_switcher::aarch64_simd_switcher):
Change argument type to aarch64_feature_flags.
* config/aarch64/aarch64-protos.h (aarch64_simd_switcher): Change
constructor argument type to aarch64_feature_flags.
Change m_old_asm_isa_flags to be aarch64_feature_flags.

c++: non-dep init folding and access checking [PR109480]

enforce_access currently checks processing_template_decl to decide
whether to defer the given access check until instantiation time.
But using this flag is unreliable because it gets cleared during e.g.
non-dependent initializer folding, and so can lead to premature access
check failures as in the below testcase. It seems better to check
current_template_parms instead.

PR c++/109480

gcc/cp/ChangeLog:

* semantics.cc (enforce_access): Check current_template_parms
instead of processing_template_decl when deciding whether to
defer the access check.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent25a.C: New test.

c++: potentiality of templated memfn call [PR109480]

Here we're incorrectly deeming the templated call a.g() inside b's
initializer as potentially constant, despite g being non-constexpr,
which leads to us needlessly instantiating the initializer ahead of time
and which subsequently triggers a bug in access checking deferral (to be
fixed by the follow-up patch).

This patch fixes this by calling get_fns earlier during CALL_EXPR
potentiality checking so that when we extract a FUNCTION_DECL out of a
templated member function call (whose overall callee is typically a
COMPONENT_REF) we do the usual constexpr-eligibility checking for it.

In passing, I noticed the nearby special handling of the object argument
of a non-static member function call is effectively the same as the
generic argument handling a few lines below.  So this patch just gets
rid of this special handling; otherwise we'd have to adapt it to handle
templated versions of such calls.

PR c++/109480

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) <case CALL_EXPR>:
Reorganize to call get_fns sooner.  Remove special handling of
the object argument of a non-static member function call.  Remove
dead store to 'fun'.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept59.C: Make e() constexpr so that the
expected "without object" diagnostic isn't replaced by a
"call to non-constexpr function" diagnostic.
* g++.dg/template/non-dependent25.C: New test.

rs6000: Load high and low part of 64bit constant independently

Compare with previous version, this patch updates the comments only.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608293.html

For a complicate 64bit constant, below is one instruction-sequence to
build:
lis 9,0x800a
ori 9,9,0xabcd
sldi 9,9,32
oris 9,9,0xc167
ori 9,9,0xfa16

while we can also use below sequence to build:
lis 9,0xc167
lis 10,0x800a
ori 9,9,0xfa16
ori 10,10,0xabcd
rldimi 9,10,32,0
This sequence is using 2 registers to build high and low part firstly,
and then merge them.

In parallel aspect, this sequence would be faster. (Ofcause, using 1 more
register with potential register pressure).

The instruction sequence with two registers for parallel version can be
generated only if can_create_pseudo_p. Otherwise, the one register
sequence is generated.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Generate
more parallel code if can_create_pseudo_p.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/parall_5insn_const.c: New test.

Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.

Following up on posts/reviews by Segher and Uros, there's some question
over why the middle-end's lower subreg pass emits a clobber (of a
multi-word register) into the instruction stream before emitting the
sequence of moves of the word-sized parts.  This clobber interferes
with (LRA) register allocation, preventing the multi-word pseudo to
remain in the same hard registers.  This patch eliminates this
(presumably superfluous) clobber and thereby improves register allocation.

A concrete example of the observed improvement is PR target/43644.
For the test case:
__int128 foo(__int128 x, __int128 y) { return x+y; }

on x86_64-pc-linux-gnu, gcc -O2 currently generates:

foo: movq    %rsi, %rax
        movq    %rdi, %r8
        movq    %rax, %rdi
        movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %r8, %rax
        adcq    %rdi, %rdx
        ret

with this patch, we now generate the much improved:

foo: movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
        ret

2023-05-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/43644
* lower-subreg.cc (resolve_simple_move): Don't emit a clobber
immediately before moving a multi-word register by parts.

gcc/testsuite/ChangeLog
PR target/43644
* gcc.target/i386/pr43644.c: New test case.

Daily bump.

Delete duplicated riscv definition.

gcc/
* config/riscv/riscv-v.cc (riscv_vector_preferred_simd_mode): Delete.

RISC-V: autovec: Verify that GET_MODE_NUNITS is a multiple of 2.

While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

gcc/
* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.

RISC-V:autovec: Add target vectorization hooks

gcc/
* config/riscv/riscv.cc
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(riscv_support_vector_misalignment): Implement
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): Ditto.

Remove duplicated definition in risc-v vector support.

gcc/

* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Remove
duplicate definition.

RISC-V:autovec: Add auto-vectorization support functions

* config/riscv/riscv-v.cc (autovec_use_vlmax_p): New function.
(riscv_vector_preferred_simd_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.

RISC-V: autovec: Export policy functions to global scope

gcc/
* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.

RISC-V: autovec: Add new predicates and function prototypes

gcc/
* config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): New.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.

LoongArch: Enable shrink wrapping

This commit implements the target macros for shrink wrapping of function
prologues/epilogues shrink wrapping on LoongArch.

Bootstrapped and regtested on loongarch64-linux-gnu. I don't have an
access to SPEC CPU so I hope the reviewer can perform a benchmark to see
if there is real benefit.

gcc/ChangeLog:

* config/loongarch/loongarch.h (struct machine_function): Add
reg_is_wrapped_separately array for register wrapping
information.
* config/loongarch/loongarch.cc
(loongarch_get_separate_components): New function.
(loongarch_components_for_bb): Likewise.
(loongarch_disqualify_components): Likewise.
(loongarch_process_components): Likewise.
(loongarch_emit_prologue_components): Likewise.
(loongarch_emit_epilogue_components): Likewise.
(loongarch_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
(TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
(TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
(loongarch_for_each_saved_reg): Skip registers that are wrapped
separately.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/shrink-wrap.c: New test.

build: Use -nostdinc generating macro_list [PR109522]

This prevents a spurious message building a cross-compiler when target
libc is not installed yet:

cc1: error: no include path in which to search for stdc-predef.h

As stdc-predef.h was added to define __STDC_* macros by libc, it's
unlikely the header will ever contain some bad definitions w/o "__"
prefix so it should be safe.

gcc/ChangeLog:

PR other/109522
* Makefile.in (s-macro_list): Pass -nostdinc to
$(GCC_FOR_TARGET).

RISC-V: Enable basic RVV auto-vectorization support.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (preferred_simd_mode): New function.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto.
(preferred_simd_mode): Ditto.
* config/riscv/riscv.cc (riscv_get_arg_info): Handle RVV type in function arg.
(riscv_convert_vector_bits): Adjust for RVV auto-vectorization.
(riscv_preferred_simd_mode): New function.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New target hook support.
* config/riscv/vector.md: Add autovec.md.
* config/riscv/autovec.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add testcases for RVV auto-vectorization.
* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/scalable-1.c: New test.
* gcc.target/riscv/rvv/autovec/template-1.h: New test.
* gcc.target/riscv/rvv/autovec/v-1.c: New test.
* gcc.target/riscv/rvv/autovec/v-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.

libffi: fix handling of homogeneous float128 structs (#689)

If there is a homogeneous struct with float128 members, they should be
copied to vector register save area. The current code incorrectly copies
only the value of the first member, not increasing the pointer with each
iteration. Fix this.

Merged from upstream libffi commit: 464b4b66e3cf3b5489e730c1466ee1bf825560e0

2023-05-03 Dan Horák <dan@danny.cz>

libffi/
PR libffi/109447
* src/powerpc/ffi_linux64.c (ffi_prep_args64): Update arg.f128 pointer.

Fortran: Namelist read with invalid input accepted.

PR fortran/109662

libgfortran/ChangeLog:

* io/list_read.c: Add a check for a comma after a namelist
name in read input. Issue a runtime error message.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr109662.f90: New test.

gimple-range-op: Improve handling of sin/cos ranges

Similarly to the earlier sqrt patch, this patch attempts to improve
sin/cos ranges.  As the functions are periodic, for the reverse range
there is not much we can do (but I've discovered I forgot to take
into account the boundary ulps for the discovery of impossible result
ranges).  For fold_range, we can do something only if the range is
narrow enough (narrower than 2*pi).  The patch computes the value of
the functions (taking ulps into account) and also computes the derivative
to find out if the function is growing or declining on the boundaries and
from that it figures out if the result range should be
[min (fn (lb), fn (ub)), max (fn (lb), fn (ub))] or if it needs to be
extended to 1 (actually using +Inf) and/or -1 (actually using -Inf) because
there must be a local minimum and/or maximum in the range.

2023-05-06  Jakub Jelinek  <jakub@redhat.com>

* real.h (dconst_pi): Define.
(dconst_e_ptr): Formatting fix.
(dconst_pi_ptr): Declare.
* real.cc (dconst_pi_ptr): New function.
* gimple-range-op.cc (cfn_sincos::fold_range): Intersect the generic
boundaries range with range computed from sin/cos of the particular
bounds if the argument range is shorter than 2*pi.
(cfn_sincos::op1_range): Take bulps into account when determining
which result ranges are always invalid or behave like known NAN.

* gcc.dg/tree-ssa/range-sincos-2.c: New test.

Remove type from vrange_storage::equal_p.

The equal_p method in vrange_storage is only used to compare ranges
that are the same type. No sense passing the type if it can be
determined from the range being compared.

gcc/ChangeLog:

* gimple-range-cache.cc (sbr_sparse_bitmap::set_bb_range): Do not
pass type to vrange_storage::equal_p.
* value-range-storage.cc (vrange_storage::equal_p): Remove type.
(irange_storage::equal_p): Same.
(frange_storage::equal_p): Same.
* value-range-storage.h (class frange_storage): Same.

RISC-V: Fix incorrect demand info merge in local vsetvli optimization [PR109748]

This patch is fixing my recent optimization patch:
https://github.com/gcc-mirror/gcc/commit/d51f2456ee51bd59a79b4725ca0e488c25260bbf

In that patch, the new_info = parse_insn (i) is not correct.
Since consider the following case:

vsetvli a5,a4, e8,m1
..
vsetvli zero,a5, e32, m4
vle8.v
vmacc.vv
...

Since we have backward demand fusion in Phase 1, so the real demand of "vle8.v" is e32, m4.
However, if we use parse_insn (vle8.v) = e8, m1 which is not correct.

So this patch we change new_info = new_info.parse_insn (i)
into:

vector_insn_info new_info = m_vector_manager->vector_insn_infos[i->uid ()];

So that, we can correctly optimize codes into:

vsetvli a5,a4, e32, m4
..
.. (vsetvli zero,a5, e32, m4 is removed)
vle8.v
vmacc.vv

Since m_vector_manager->vector_insn_infos is the member variable of pass_vsetvl class.
We remove static void function "local_eliminate_vsetvl_insn", and make it as the member function
of pass_vsetvl class.

PR target/109748

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Remove it.
(pass_vsetvl::local_eliminate_vsetvl_insn): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109748.c: New test.

Canonicalize vec_merge when mask is constant.

Use swap_communattive_operands_p for canonicalization. When both value
has same operand precedence value, then first bit in the mask should
select first operand.

The canonicalization should help backends for pattern match. .i.e. x86
backend has lots of vec_merge patterns, combine will create any form
of vec_merge(mask, or inverted mask), then backend need to add 2
patterns to match exact 1 instruction. The canonicalization can
simplify 2 patterns to 1.

gcc/ChangeLog:

* combine.cc (maybe_swap_commutative_operands): Canonicalize
vec_merge when mask is constant.
* doc/md.texi: Document vec_merge canonicalization.

gimple-range-op: Improve handling of sqrt ranges

The previous patch just added basic intrinsic ranges for sqrt
([-0.0, +Inf] +-NAN being the general result range of the function
and [-0.0, +Inf] the general operand range if result isn't NAN etc.),
the following patch intersects those ranges with particular range
computed from argument or result's exact range with the expected
error in ulps taken into account and adds a function (frange_arithmetic
variant) which can be used by other functions as well as helper.

2023-05-06 Jakub Jelinek <jakub@redhat.com>

* value-range.h (frange_arithmetic): Declare.
* range-op-float.cc (frange_arithmetic): No longer static.
* gimple-range-op.cc (frange_mpfr_arg1): New function.
(cfn_sqrt::fold_range): Intersect the generic boundaries range
with range computed from sqrt of the particular bounds.
(cfn_sqrt::op1_range): Intersect the generic boundaries range
with range computed from squared particular bounds.

* gcc.dg/tree-ssa/range-sqrt-2.c: New test.

build: Replace seq for portability with GNU Make variant

Some hosts like AIX don't have seq command, this patch replaces it
with something that uses just GNU make features we've been using
for this already before for the parallel make check.

2023-05-06 Jakub Jelinek <jakub@redhat.com>

* Makefile.in (check_p_numbers): Rename to one_to_9999, move
earlier with helper variables also renamed.
(MATCH_SPLUT_SEQ): Use $(wordlist 1,$(NUM_MATCH_SPLITS),$(one_to_9999))
instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
(check_p_subdirs): Use $(one_to_9999) instead of $(check_p_numbers).

Daily bump.

CRIS: peephole2 an add into two addq or subq

Unfortunately, doesn't cause a performance improvement for coremark,
but happens a few times in newlib, just enough to affect coremark
0.01% by size (or 4 bytes, and three cycles (__fwalk_sglue and
__vfiprintf_r each two bytes).

gcc:
* config/cris/cris.md (splitop): Add PLUS.
* config/cris/cris.cc (cris_split_constant): Also handle
PLUS when a split into two insns may be useful.

gcc/testsuite:
* gcc.target/cris/peep2-addsplit1.c: New test.

CRIS: peephole2 a move of constant followed by and of same register

While moves of constants into registers are separately
optimizable, a combination of a move with a subsequent "and"
is slightly preferable even if the move can be generated
with the same number (and timing) of insns, as moves of
"just" registers are eliminated now and then in different
passes, loosely speaking.  This movandsplit1 pattern feeds
into the opsplit1/AND peephole2, with matching occurrences
observed in the floating point functions in libgcc.  Also, a
test-case to fit.  Coremark improvements are unimpressive:
less than 0.0003% speed, 0.1% size.

But that was pre-LRA; after the switch to LRA this peephole2
doesn't match anymore (for any of coremark, local tests,
libgcc and newlib libc) and the test-case passes with and
without the patch.  Still, there's no apparent reason why
LRA prefers "move R1,R2" "and I,R2" to "move I,R1" "and
R1,R2", or why that wouldn't "randomly" change (also seen
with other operations than "and").  Thus committed.

gcc:
* config/cris/cris.md (movandsplit1): New define_peephole2.

gcc/testsuite:
* gcc.target/cris/peep2-movandsplit1.c: New test.

CRIS: peephole2 a lsrq into a lslq+lsrq pair

Observed after opsplit1 with AND in libgcc floating-point
functions, like the first spottings of opsplit1/AND
opportunities. Two patterns are nominally needed, as the
peephole2 optimizer continues from the *first replacement*
insn, not from a minimum context for general matching; one
that includes it as the last match.

But, the "free-standing" opportunity (three shifts) didn't
match by itself in a gcc build of libraries plus running the
test-suite, and thus deemed uninteresting and left out.
(As expected; if it had matched, that'd have indicated a
previously missed optimization or other problem elsewhere.)
Only the one that includes the previous define_peephole2
that may generate the sequence (i.e. opsplit1/AND), matches
easily.

Coremark results aren't impressive though: 0.003%
improvement in speed and slightly less than 0.1% in size.

A testcase is added to match and another one to cover a case
of movulsr checking that it's used; it's preferable to
lsrandsplit when both would match.

gcc:
* config/cris/cris.md (lsrandsplit1): New define_peephole2.

gcc/testsuite:
* gcc.target/cris/peep2-lsrandsplit1.c,
gcc.target/cris/peep2-movulsr2.c: New tests.

doc: Document order of define_peephole2 scanning

I was a bit surprised when my newly-added define_peephole2 didn't
match, but it was because it was expected to partially match the
generated output of a previous define_peephole2, which matched and
modified the last insn of a sequence to be matched. I had assumed
that the algorithm backed-up the size of the match-buffer, thereby
exposing newly created opportunities *with sufficient context* to all
define_peephole2's. While things can change in that direction, let's
start with documenting the current state.

* doc/md.texi (define_peephole2): Document order of scanning.

Fortran: overloading of intrinsic binary operators [PR109641]

Fortran allows overloading of intrinsic operators also for operands of
numeric intrinsic types. The intrinsic operator versions are used
according to the rules of F2018 table 10.2 and imply type conversion as
long as the operand ranks are conformable. Otherwise no type conversion
shall be performed to allow the resolution of a matching user-defined
operator.

gcc/fortran/ChangeLog:

PR fortran/109641
* arith.cc (eval_intrinsic): Check conformability of ranks of operands
for intrinsic binary operators before performing type conversions.
* gfortran.h (gfc_op_rank_conformable): Add prototype.
* resolve.cc (resolve_operator): Check conformability of ranks of
operands for intrinsic binary operators before performing type
conversions.
(gfc_op_rank_conformable): New helper function to compare ranks of
operands of binary operator.

gcc/testsuite/ChangeLog:

PR fortran/109641
* gfortran.dg/overload_5.f90: New test.

RISC-V: Legitimise the const0_rtx for RVV indexed load/store

This patch try to legitimise the const0_rtx (aka zero register)
as the base register for the RVV indexed load/store instructions
by allowing the const as the operand of the indexed RTL pattern.
Then the underlying combine pass will try to perform the const
propagation.

For example:
vint32m1_t
test_vluxei32_v_i32m1_shortcut (vuint32m1_t bindex, size_t vl)
{
  return __riscv_vluxei32_v_i32m1 ((int32_t *)0, bindex, vl);
}

Before this patch:
li         a5,0                 <- can be eliminated.
vl1re32.v  v1,0(a1)
vsetvli    zero,a2,e32,m1,ta,ma
vluxei32.v v1,(a5),v1           <- can propagate the const 0 to a5 here.
vs1r.v     v1,0(a0)
ret

After this patch:
test_vluxei32_v_i32m1_shortcut:
vl1re32.v       v1,0(a1)
vsetvli zero,a2,e32,m1,ta,ma
vluxei32.v      v1,(0),v1
vs1r.v  v1,0(a0)
ret

As above, this patch allow you to propagaate the const 0 (aka zero
register) to the base register of the RVV indexed load in the combine
pass. This may benefit the underlying RVV auto-vectorization.

gcc/ChangeLog:

* config/riscv/vector.md: Allow const as the operand of RVV
indexed load/store.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c:
Adjust indexed load/store check condition.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
simplification by adding one macro in riscv for simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1                  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
consumed by simplify_rtx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
Adjust test check condition.

arm: [MVE intrinsics] rework vshrq vrshrq

Implement vshrq and vrshrq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vrshrq, vshrq): New.
* config/arm/arm-mve-builtins-base.def (vrshrq, vshrq): New.
* config/arm/arm-mve-builtins-base.h (vrshrq, vshrq): New.
* config/arm/arm_mve.h (vshrq): Remove.
(vrshrq): Remove.
(vrshrq_m): Remove.
(vshrq_m): Remove.
(vrshrq_x): Remove.
(vshrq_x): Remove.
(vshrq_n_s8): Remove.
(vshrq_n_s16): Remove.
(vshrq_n_s32): Remove.
(vshrq_n_u8): Remove.
(vshrq_n_u16): Remove.
(vshrq_n_u32): Remove.
(vrshrq_n_u8): Remove.
(vrshrq_n_s8): Remove.
(vrshrq_n_u16): Remove.
(vrshrq_n_s16): Remove.
(vrshrq_n_u32): Remove.
(vrshrq_n_s32): Remove.
(vrshrq_m_n_s8): Remove.
(vrshrq_m_n_s32): Remove.
(vrshrq_m_n_s16): Remove.
(vrshrq_m_n_u8): Remove.
(vrshrq_m_n_u32): Remove.
(vrshrq_m_n_u16): Remove.
(vshrq_m_n_s8): Remove.
(vshrq_m_n_s32): Remove.
(vshrq_m_n_s16): Remove.
(vshrq_m_n_u8): Remove.
(vshrq_m_n_u32): Remove.
(vshrq_m_n_u16): Remove.
(vrshrq_x_n_s8): Remove.
(vrshrq_x_n_s16): Remove.
(vrshrq_x_n_s32): Remove.
(vrshrq_x_n_u8): Remove.
(vrshrq_x_n_u16): Remove.
(vrshrq_x_n_u32): Remove.
(vshrq_x_n_s8): Remove.
(vshrq_x_n_s16): Remove.
(vshrq_x_n_s32): Remove.
(vshrq_x_n_u8): Remove.
(vshrq_x_n_u16): Remove.
(vshrq_x_n_u32): Remove.
(__arm_vshrq_n_s8): Remove.
(__arm_vshrq_n_s16): Remove.
(__arm_vshrq_n_s32): Remove.
(__arm_vshrq_n_u8): Remove.
(__arm_vshrq_n_u16): Remove.
(__arm_vshrq_n_u32): Remove.
(__arm_vrshrq_n_u8): Remove.
(__arm_vrshrq_n_s8): Remove.
(__arm_vrshrq_n_u16): Remove.
(__arm_vrshrq_n_s16): Remove.
(__arm_vrshrq_n_u32): Remove.
(__arm_vrshrq_n_s32): Remove.
(__arm_vrshrq_m_n_s8): Remove.
(__arm_vrshrq_m_n_s32): Remove.
(__arm_vrshrq_m_n_s16): Remove.
(__arm_vrshrq_m_n_u8): Remove.
(__arm_vrshrq_m_n_u32): Remove.
(__arm_vrshrq_m_n_u16): Remove.
(__arm_vshrq_m_n_s8): Remove.
(__arm_vshrq_m_n_s32): Remove.
(__arm_vshrq_m_n_s16): Remove.
(__arm_vshrq_m_n_u8): Remove.
(__arm_vshrq_m_n_u32): Remove.
(__arm_vshrq_m_n_u16): Remove.
(__arm_vrshrq_x_n_s8): Remove.
(__arm_vrshrq_x_n_s16): Remove.
(__arm_vrshrq_x_n_s32): Remove.
(__arm_vrshrq_x_n_u8): Remove.
(__arm_vrshrq_x_n_u16): Remove.
(__arm_vrshrq_x_n_u32): Remove.
(__arm_vshrq_x_n_s8): Remove.
(__arm_vshrq_x_n_s16): Remove.
(__arm_vshrq_x_n_s32): Remove.
(__arm_vshrq_x_n_u8): Remove.
(__arm_vshrq_x_n_u16): Remove.
(__arm_vshrq_x_n_u32): Remove.
(__arm_vshrq): Remove.
(__arm_vrshrq): Remove.
(__arm_vrshrq_m): Remove.
(__arm_vshrq_m): Remove.
(__arm_vrshrq_x): Remove.
(__arm_vshrq_x): Remove.

arm: [MVE intrinsics] factorize vsrhrq vrshrq

Factorize vsrhrq vrshrq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_VSHRQ_M_N, MVE_VSHRQ_N): New.
(mve_insn): Add vrshr, vshr.
* config/arm/mve.md (mve_vshrq_n_<supf><mode>)
(mve_vrshrq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vrshrq_m_n_<supf><mode>, mve_vshrq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] add binary_rshift shape

This patch adds the binary_rshift shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_rshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_rshift): New.

arm: [MVE intrinsics] rework vqrshrunbq vqrshruntq vqshrunbq vqshruntq

Implement vqrshrunbq, vqrshruntq, vqshrunbq, vqshruntq using the new
MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_N_NO_U_F): New.
(vqshrunbq, vqshruntq, vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins-base.def (vqshrunbq, vqshruntq)
(vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins-base.h (vqshrunbq, vqshruntq)
(vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vqshrunbq,
vqshruntq, vqrshrunbq, vqrshruntq.
* config/arm/arm_mve.h (vqrshrunbq): Remove.
(vqrshruntq): Remove.
(vqrshrunbq_m): Remove.
(vqrshruntq_m): Remove.
(vqrshrunbq_n_s16): Remove.
(vqrshrunbq_n_s32): Remove.
(vqrshruntq_n_s16): Remove.
(vqrshruntq_n_s32): Remove.
(vqrshrunbq_m_n_s32): Remove.
(vqrshrunbq_m_n_s16): Remove.
(vqrshruntq_m_n_s32): Remove.
(vqrshruntq_m_n_s16): Remove.
(__arm_vqrshrunbq_n_s16): Remove.
(__arm_vqrshrunbq_n_s32): Remove.
(__arm_vqrshruntq_n_s16): Remove.
(__arm_vqrshruntq_n_s32): Remove.
(__arm_vqrshrunbq_m_n_s32): Remove.
(__arm_vqrshrunbq_m_n_s16): Remove.
(__arm_vqrshruntq_m_n_s32): Remove.
(__arm_vqrshruntq_m_n_s16): Remove.
(__arm_vqrshrunbq): Remove.
(__arm_vqrshruntq): Remove.
(__arm_vqrshrunbq_m): Remove.
(__arm_vqrshruntq_m): Remove.
(vqshrunbq): Remove.
(vqshruntq): Remove.
(vqshrunbq_m): Remove.
(vqshruntq_m): Remove.
(vqshrunbq_n_s16): Remove.
(vqshruntq_n_s16): Remove.
(vqshrunbq_n_s32): Remove.
(vqshruntq_n_s32): Remove.
(vqshrunbq_m_n_s32): Remove.
(vqshrunbq_m_n_s16): Remove.
(vqshruntq_m_n_s32): Remove.
(vqshruntq_m_n_s16): Remove.
(__arm_vqshrunbq_n_s16): Remove.
(__arm_vqshruntq_n_s16): Remove.
(__arm_vqshrunbq_n_s32): Remove.
(__arm_vqshruntq_n_s32): Remove.
(__arm_vqshrunbq_m_n_s32): Remove.
(__arm_vqshrunbq_m_n_s16): Remove.
(__arm_vqshruntq_m_n_s32): Remove.
(__arm_vqshruntq_m_n_s16): Remove.
(__arm_vqshrunbq): Remove.
(__arm_vqshruntq): Remove.
(__arm_vqshrunbq_m): Remove.
(__arm_vqshruntq_m): Remove.

arm: [MVE intrinsics] factorize vqrshrunb vqrshrunt vqshrunb vqshrunt

Factorize vqrshrunb, vqrshrunt, vqshrunb, vqshrunt so that they use
existing patterns.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_SHRN_N): Add VQRSHRUNBQ,
VQRSHRUNTQ, VQSHRUNBQ, VQSHRUNTQ.
(MVE_SHRN_M_N): Likewise.
(mve_insn): Add vqrshrunb, vqrshrunt, vqshrunb, vqshrunt.
(isu): Add VQRSHRUNBQ, VQRSHRUNTQ, VQSHRUNBQ, VQSHRUNTQ.
(supf): Likewise.
* config/arm/mve.md (mve_vqrshrunbq_n_s<mode>): Remove.
(mve_vqrshruntq_n_s<mode>): Remove.
(mve_vqshrunbq_n_s<mode>): Remove.
(mve_vqshruntq_n_s<mode>): Remove.
(mve_vqrshrunbq_m_n_s<mode>): Remove.
(mve_vqrshruntq_m_n_s<mode>): Remove.
(mve_vqshrunbq_m_n_s<mode>): Remove.
(mve_vqshruntq_m_n_s<mode>): Remove.

arm: [MVE intrinsics] add binary_rshift_narrow_unsigned shape

This patch adds the binary_rshift_narrow_unsigned shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc
(binary_rshift_narrow_unsigned): New.
* config/arm/arm-mve-builtins-shapes.h
(binary_rshift_narrow_unsigned): New.

arm: [MVE intrinsics] rework vshrnbq vshrntq vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq

Implement vshrnbq, vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq,
vqrshrnbq, vqrshrntq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_N_NO_F): New.
(vshrnbq, vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq)
(vqrshrnbq, vqrshrntq): New.
* config/arm/arm-mve-builtins-base.def (vshrnbq, vshrntq)
(vrshrnbq, vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq):
New.
* config/arm/arm-mve-builtins-base.h (vshrnbq, vshrntq, vrshrnbq)
(vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vshrnbq,
vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq,
vqrshrntq.
* config/arm/arm_mve.h (vshrnbq): Remove.
(vshrntq): Remove.
(vshrnbq_m): Remove.
(vshrntq_m): Remove.
(vshrnbq_n_s16): Remove.
(vshrntq_n_s16): Remove.
(vshrnbq_n_u16): Remove.
(vshrntq_n_u16): Remove.
(vshrnbq_n_s32): Remove.
(vshrntq_n_s32): Remove.
(vshrnbq_n_u32): Remove.
(vshrntq_n_u32): Remove.
(vshrnbq_m_n_s32): Remove.
(vshrnbq_m_n_s16): Remove.
(vshrnbq_m_n_u32): Remove.
(vshrnbq_m_n_u16): Remove.
(vshrntq_m_n_s32): Remove.
(vshrntq_m_n_s16): Remove.
(vshrntq_m_n_u32): Remove.
(vshrntq_m_n_u16): Remove.
(__arm_vshrnbq_n_s16): Remove.
(__arm_vshrntq_n_s16): Remove.
(__arm_vshrnbq_n_u16): Remove.
(__arm_vshrntq_n_u16): Remove.
(__arm_vshrnbq_n_s32): Remove.
(__arm_vshrntq_n_s32): Remove.
(__arm_vshrnbq_n_u32): Remove.
(__arm_vshrntq_n_u32): Remove.
(__arm_vshrnbq_m_n_s32): Remove.
(__arm_vshrnbq_m_n_s16): Remove.
(__arm_vshrnbq_m_n_u32): Remove.
(__arm_vshrnbq_m_n_u16): Remove.
(__arm_vshrntq_m_n_s32): Remove.
(__arm_vshrntq_m_n_s16): Remove.
(__arm_vshrntq_m_n_u32): Remove.
(__arm_vshrntq_m_n_u16): Remove.
(__arm_vshrnbq): Remove.
(__arm_vshrntq): Remove.
(__arm_vshrnbq_m): Remove.
(__arm_vshrntq_m): Remove.
(vrshrnbq): Remove.
(vrshrntq): Remove.
(vrshrnbq_m): Remove.
(vrshrntq_m): Remove.
(vrshrnbq_n_s16): Remove.
(vrshrntq_n_s16): Remove.
(vrshrnbq_n_u16): Remove.
(vrshrntq_n_u16): Remove.
(vrshrnbq_n_s32): Remove.
(vrshrntq_n_s32): Remove.
(vrshrnbq_n_u32): Remove.
(vrshrntq_n_u32): Remove.
(vrshrnbq_m_n_s32): Remove.
(vrshrnbq_m_n_s16): Remove.
(vrshrnbq_m_n_u32): Remove.
(vrshrnbq_m_n_u16): Remove.
(vrshrntq_m_n_s32): Remove.
(vrshrntq_m_n_s16): Remove.
(vrshrntq_m_n_u32): Remove.
(vrshrntq_m_n_u16): Remove.
(__arm_vrshrnbq_n_s16): Remove.
(__arm_vrshrntq_n_s16): Remove.
(__arm_vrshrnbq_n_u16): Remove.
(__arm_vrshrntq_n_u16): Remove.
(__arm_vrshrnbq_n_s32): Remove.
(__arm_vrshrntq_n_s32): Remove.
(__arm_vrshrnbq_n_u32): Remove.
(__arm_vrshrntq_n_u32): Remove.
(__arm_vrshrnbq_m_n_s32): Remove.
(__arm_vrshrnbq_m_n_s16): Remove.
(__arm_vrshrnbq_m_n_u32): Remove.
(__arm_vrshrnbq_m_n_u16): Remove.
(__arm_vrshrntq_m_n_s32): Remove.
(__arm_vrshrntq_m_n_s16): Remove.
(__arm_vrshrntq_m_n_u32): Remove.
(__arm_vrshrntq_m_n_u16): Remove.
(__arm_vrshrnbq): Remove.
(__arm_vrshrntq): Remove.
(__arm_vrshrnbq_m): Remove.
(__arm_vrshrntq_m): Remove.
(vqshrnbq): Remove.
(vqshrntq): Remove.
(vqshrnbq_m): Remove.
(vqshrntq_m): Remove.
(vqshrnbq_n_s16): Remove.
(vqshrntq_n_s16): Remove.
(vqshrnbq_n_u16): Remove.
(vqshrntq_n_u16): Remove.
(vqshrnbq_n_s32): Remove.
(vqshrntq_n_s32): Remove.
(vqshrnbq_n_u32): Remove.
(vqshrntq_n_u32): Remove.
(vqshrnbq_m_n_s32): Remove.
(vqshrnbq_m_n_s16): Remove.
(vqshrnbq_m_n_u32): Remove.
(vqshrnbq_m_n_u16): Remove.
(vqshrntq_m_n_s32): Remove.
(vqshrntq_m_n_s16): Remove.
(vqshrntq_m_n_u32): Remove.
(vqshrntq_m_n_u16): Remove.
(__arm_vqshrnbq_n_s16): Remove.
(__arm_vqshrntq_n_s16): Remove.
(__arm_vqshrnbq_n_u16): Remove.
(__arm_vqshrntq_n_u16): Remove.
(__arm_vqshrnbq_n_s32): Remove.
(__arm_vqshrntq_n_s32): Remove.
(__arm_vqshrnbq_n_u32): Remove.
(__arm_vqshrntq_n_u32): Remove.
(__arm_vqshrnbq_m_n_s32): Remove.
(__arm_vqshrnbq_m_n_s16): Remove.
(__arm_vqshrnbq_m_n_u32): Remove.
(__arm_vqshrnbq_m_n_u16): Remove.
(__arm_vqshrntq_m_n_s32): Remove.
(__arm_vqshrntq_m_n_s16): Remove.
(__arm_vqshrntq_m_n_u32): Remove.
(__arm_vqshrntq_m_n_u16): Remove.
(__arm_vqshrnbq): Remove.
(__arm_vqshrntq): Remove.
(__arm_vqshrnbq_m): Remove.
(__arm_vqshrntq_m): Remove.
(vqrshrnbq): Remove.
(vqrshrntq): Remove.
(vqrshrnbq_m): Remove.
(vqrshrntq_m): Remove.
(vqrshrnbq_n_s16): Remove.
(vqrshrnbq_n_u16): Remove.
(vqrshrnbq_n_s32): Remove.
(vqrshrnbq_n_u32): Remove.
(vqrshrntq_n_s16): Remove.
(vqrshrntq_n_u16): Remove.
(vqrshrntq_n_s32): Remove.
(vqrshrntq_n_u32): Remove.
(vqrshrnbq_m_n_s32): Remove.
(vqrshrnbq_m_n_s16): Remove.
(vqrshrnbq_m_n_u32): Remove.
(vqrshrnbq_m_n_u16): Remove.
(vqrshrntq_m_n_s32): Remove.
(vqrshrntq_m_n_s16): Remove.
(vqrshrntq_m_n_u32): Remove.
(vqrshrntq_m_n_u16): Remove.
(__arm_vqrshrnbq_n_s16): Remove.
(__arm_vqrshrnbq_n_u16): Remove.
(__arm_vqrshrnbq_n_s32): Remove.
(__arm_vqrshrnbq_n_u32): Remove.
(__arm_vqrshrntq_n_s16): Remove.
(__arm_vqrshrntq_n_u16): Remove.
(__arm_vqrshrntq_n_s32): Remove.
(__arm_vqrshrntq_n_u32): Remove.
(__arm_vqrshrnbq_m_n_s32): Remove.
(__arm_vqrshrnbq_m_n_s16): Remove.
(__arm_vqrshrnbq_m_n_u32): Remove.
(__arm_vqrshrnbq_m_n_u16): Remove.
(__arm_vqrshrntq_m_n_s32): Remove.
(__arm_vqrshrntq_m_n_s16): Remove.
(__arm_vqrshrntq_m_n_u32): Remove.
(__arm_vqrshrntq_m_n_u16): Remove.
(__arm_vqrshrnbq): Remove.
(__arm_vqrshrntq): Remove.
(__arm_vqrshrnbq_m): Remove.
(__arm_vqrshrntq_m): Remove.

arm: [MVE intrinsics] factorize vshrntq vshrnbq vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq

Factorize vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq, vshrntq, vshrnbq,
vrshrnbq and vrshrntq so that they use the same pattern.

Introduce <isu> iterator for *shrn* so that we can use the same
pattern despite the different "s", "u" and "i" suffixes.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_SHRN_N, MVE_SHRN_M_N): New.
(mve_insn): Add vqrshrnb, vqrshrnt, vqshrnb, vqshrnt, vrshrnb,
vrshrnt, vshrnb, vshrnt.
(isu): New.
* config/arm/mve.md (mve_vqrshrnbq_n_<supf><mode>)
(mve_vqrshrntq_n_<supf><mode>, mve_vqshrnbq_n_<supf><mode>)
(mve_vqshrntq_n_<supf><mode>, mve_vrshrnbq_n_<supf><mode>)
(mve_vrshrntq_n_<supf><mode>, mve_vshrnbq_n_<supf><mode>)
(mve_vshrntq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vqrshrnbq_m_n_<supf><mode>, mve_vqrshrntq_m_n_<supf><mode>)
(mve_vqshrnbq_m_n_<supf><mode>, mve_vqshrntq_m_n_<supf><mode>)
(mve_vrshrnbq_m_n_<supf><mode>, mve_vrshrntq_m_n_<supf><mode>)
(mve_vshrnbq_m_n_<supf><mode>, mve_vshrntq_m_n_<supf><mode>):
Merge into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] add binary_rshift_narrow shape

This patch adds the binary_rshift_narrow shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_rshift_narrow):
New.
* config/arm/arm-mve-builtins-shapes.h (binary_rshift_narrow): New.

arm: [MVE intrinsics] rework vmaxq vminq

Implement vmaxq and vminq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_NO_F): New.
(vmaxq, vminq): New.
* config/arm/arm-mve-builtins-base.def (vmaxq, vminq): New.
* config/arm/arm-mve-builtins-base.h (vmaxq, vminq): New.
* config/arm/arm_mve.h (vminq): Remove.
(vmaxq): Remove.
(vmaxq_m): Remove.
(vminq_m): Remove.
(vminq_x): Remove.
(vmaxq_x): Remove.
(vminq_u8): Remove.
(vmaxq_u8): Remove.
(vminq_s8): Remove.
(vmaxq_s8): Remove.
(vminq_u16): Remove.
(vmaxq_u16): Remove.
(vminq_s16): Remove.
(vmaxq_s16): Remove.
(vminq_u32): Remove.
(vmaxq_u32): Remove.
(vminq_s32): Remove.
(vmaxq_s32): Remove.
(vmaxq_m_s8): Remove.
(vmaxq_m_s32): Remove.
(vmaxq_m_s16): Remove.
(vmaxq_m_u8): Remove.
(vmaxq_m_u32): Remove.
(vmaxq_m_u16): Remove.
(vminq_m_s8): Remove.
(vminq_m_s32): Remove.
(vminq_m_s16): Remove.
(vminq_m_u8): Remove.
(vminq_m_u32): Remove.
(vminq_m_u16): Remove.
(vminq_x_s8): Remove.
(vminq_x_s16): Remove.
(vminq_x_s32): Remove.
(vminq_x_u8): Remove.
(vminq_x_u16): Remove.
(vminq_x_u32): Remove.
(vmaxq_x_s8): Remove.
(vmaxq_x_s16): Remove.
(vmaxq_x_s32): Remove.
(vmaxq_x_u8): Remove.
(vmaxq_x_u16): Remove.
(vmaxq_x_u32): Remove.
(__arm_vminq_u8): Remove.
(__arm_vmaxq_u8): Remove.
(__arm_vminq_s8): Remove.
(__arm_vmaxq_s8): Remove.
(__arm_vminq_u16): Remove.
(__arm_vmaxq_u16): Remove.
(__arm_vminq_s16): Remove.
(__arm_vmaxq_s16): Remove.
(__arm_vminq_u32): Remove.
(__arm_vmaxq_u32): Remove.
(__arm_vminq_s32): Remove.
(__arm_vmaxq_s32): Remove.
(__arm_vmaxq_m_s8): Remove.
(__arm_vmaxq_m_s32): Remove.
(__arm_vmaxq_m_s16): Remove.
(__arm_vmaxq_m_u8): Remove.
(__arm_vmaxq_m_u32): Remove.
(__arm_vmaxq_m_u16): Remove.
(__arm_vminq_m_s8): Remove.
(__arm_vminq_m_s32): Remove.
(__arm_vminq_m_s16): Remove.
(__arm_vminq_m_u8): Remove.
(__arm_vminq_m_u32): Remove.
(__arm_vminq_m_u16): Remove.
(__arm_vminq_x_s8): Remove.
(__arm_vminq_x_s16): Remove.
(__arm_vminq_x_s32): Remove.
(__arm_vminq_x_u8): Remove.
(__arm_vminq_x_u16): Remove.
(__arm_vminq_x_u32): Remove.
(__arm_vmaxq_x_s8): Remove.
(__arm_vmaxq_x_s16): Remove.
(__arm_vmaxq_x_s32): Remove.
(__arm_vmaxq_x_u8): Remove.
(__arm_vmaxq_x_u16): Remove.
(__arm_vmaxq_x_u32): Remove.
(__arm_vminq): Remove.
(__arm_vmaxq): Remove.
(__arm_vmaxq_m): Remove.
(__arm_vminq_m): Remove.
(__arm_vminq_x): Remove.
(__arm_vmaxq_x): Remove.

arm: [MVE intrinsics] factorize vmaxq vminq

Factorize vmaxq and vminq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MAX_MIN_SU): New.
(max_min_su_str): New.
(max_min_supf): New.
* config/arm/mve.md (mve_vmaxq_s<mode>, mve_vmaxq_u<mode>)
(mve_vminq_s<mode>, mve_vminq_u<mode>): Merge into ...
(mve_<max_min_su_str>q_<max_min_supf><mode>): ... this.

arm: [MVE intrinsics] rework vqshlq vshlq

Implement vqshlq, vshlq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_M_N_R): New.
(vqshlq, vshlq): New.
* config/arm/arm-mve-builtins-base.def (vqshlq, vshlq): New.
* config/arm/arm-mve-builtins-base.h (vqshlq, vshlq): New.
* config/arm/arm_mve.h (vshlq): Remove.
(vshlq_r): Remove.
(vshlq_n): Remove.
(vshlq_m_r): Remove.
(vshlq_m): Remove.
(vshlq_m_n): Remove.
(vshlq_x): Remove.
(vshlq_x_n): Remove.
(vshlq_s8): Remove.
(vshlq_s16): Remove.
(vshlq_s32): Remove.
(vshlq_u8): Remove.
(vshlq_u16): Remove.
(vshlq_u32): Remove.
(vshlq_r_u8): Remove.
(vshlq_n_u8): Remove.
(vshlq_r_s8): Remove.
(vshlq_n_s8): Remove.
(vshlq_r_u16): Remove.
(vshlq_n_u16): Remove.
(vshlq_r_s16): Remove.
(vshlq_n_s16): Remove.
(vshlq_r_u32): Remove.
(vshlq_n_u32): Remove.
(vshlq_r_s32): Remove.
(vshlq_n_s32): Remove.
(vshlq_m_r_u8): Remove.
(vshlq_m_r_s8): Remove.
(vshlq_m_r_u16): Remove.
(vshlq_m_r_s16): Remove.
(vshlq_m_r_u32): Remove.
(vshlq_m_r_s32): Remove.
(vshlq_m_u8): Remove.
(vshlq_m_s8): Remove.
(vshlq_m_u16): Remove.
(vshlq_m_s16): Remove.
(vshlq_m_u32): Remove.
(vshlq_m_s32): Remove.
(vshlq_m_n_s8): Remove.
(vshlq_m_n_s32): Remove.
(vshlq_m_n_s16): Remove.
(vshlq_m_n_u8): Remove.
(vshlq_m_n_u32): Remove.
(vshlq_m_n_u16): Remove.
(vshlq_x_s8): Remove.
(vshlq_x_s16): Remove.
(vshlq_x_s32): Remove.
(vshlq_x_u8): Remove.
(vshlq_x_u16): Remove.
(vshlq_x_u32): Remove.
(vshlq_x_n_s8): Remove.
(vshlq_x_n_s16): Remove.
(vshlq_x_n_s32): Remove.
(vshlq_x_n_u8): Remove.
(vshlq_x_n_u16): Remove.
(vshlq_x_n_u32): Remove.
(__arm_vshlq_s8): Remove.
(__arm_vshlq_s16): Remove.
(__arm_vshlq_s32): Remove.
(__arm_vshlq_u8): Remove.
(__arm_vshlq_u16): Remove.
(__arm_vshlq_u32): Remove.
(__arm_vshlq_r_u8): Remove.
(__arm_vshlq_n_u8): Remove.
(__arm_vshlq_r_s8): Remove.
(__arm_vshlq_n_s8): Remove.
(__arm_vshlq_r_u16): Remove.
(__arm_vshlq_n_u16): Remove.
(__arm_vshlq_r_s16): Remove.
(__arm_vshlq_n_s16): Remove.
(__arm_vshlq_r_u32): Remove.
(__arm_vshlq_n_u32): Remove.
(__arm_vshlq_r_s32): Remove.
(__arm_vshlq_n_s32): Remove.
(__arm_vshlq_m_r_u8): Remove.
(__arm_vshlq_m_r_s8): Remove.
(__arm_vshlq_m_r_u16): Remove.
(__arm_vshlq_m_r_s16): Remove.
(__arm_vshlq_m_r_u32): Remove.
(__arm_vshlq_m_r_s32): Remove.
(__arm_vshlq_m_u8): Remove.
(__arm_vshlq_m_s8): Remove.
(__arm_vshlq_m_u16): Remove.
(__arm_vshlq_m_s16): Remove.
(__arm_vshlq_m_u32): Remove.
(__arm_vshlq_m_s32): Remove.
(__arm_vshlq_m_n_s8): Remove.
(__arm_vshlq_m_n_s32): Remove.
(__arm_vshlq_m_n_s16): Remove.
(__arm_vshlq_m_n_u8): Remove.
(__arm_vshlq_m_n_u32): Remove.
(__arm_vshlq_m_n_u16): Remove.
(__arm_vshlq_x_s8): Remove.
(__arm_vshlq_x_s16): Remove.
(__arm_vshlq_x_s32): Remove.
(__arm_vshlq_x_u8): Remove.
(__arm_vshlq_x_u16): Remove.
(__arm_vshlq_x_u32): Remove.
(__arm_vshlq_x_n_s8): Remove.
(__arm_vshlq_x_n_s16): Remove.
(__arm_vshlq_x_n_s32): Remove.
(__arm_vshlq_x_n_u8): Remove.
(__arm_vshlq_x_n_u16): Remove.
(__arm_vshlq_x_n_u32): Remove.
(__arm_vshlq): Remove.
(__arm_vshlq_r): Remove.
(__arm_vshlq_n): Remove.
(__arm_vshlq_m_r): Remove.
(__arm_vshlq_m): Remove.
(__arm_vshlq_m_n): Remove.
(__arm_vshlq_x): Remove.
(__arm_vshlq_x_n): Remove.
(vqshlq): Remove.
(vqshlq_r): Remove.
(vqshlq_n): Remove.
(vqshlq_m_r): Remove.
(vqshlq_m_n): Remove.
(vqshlq_m): Remove.
(vqshlq_u8): Remove.
(vqshlq_r_u8): Remove.
(vqshlq_n_u8): Remove.
(vqshlq_s8): Remove.
(vqshlq_r_s8): Remove.
(vqshlq_n_s8): Remove.
(vqshlq_u16): Remove.
(vqshlq_r_u16): Remove.
(vqshlq_n_u16): Remove.
(vqshlq_s16): Remove.
(vqshlq_r_s16): Remove.
(vqshlq_n_s16): Remove.
(vqshlq_u32): Remove.
(vqshlq_r_u32): Remove.
(vqshlq_n_u32): Remove.
(vqshlq_s32): Remove.
(vqshlq_r_s32): Remove.
(vqshlq_n_s32): Remove.
(vqshlq_m_r_u8): Remove.
(vqshlq_m_r_s8): Remove.
(vqshlq_m_r_u16): Remove.
(vqshlq_m_r_s16): Remove.
(vqshlq_m_r_u32): Remove.
(vqshlq_m_r_s32): Remove.
(vqshlq_m_n_s8): Remove.
(vqshlq_m_n_s32): Remove.
(vqshlq_m_n_s16): Remove.
(vqshlq_m_n_u8): Remove.
(vqshlq_m_n_u32): Remove.
(vqshlq_m_n_u16): Remove.
(vqshlq_m_s8): Remove.
(vqshlq_m_s32): Remove.
(vqshlq_m_s16): Remove.
(vqshlq_m_u8): Remove.
(vqshlq_m_u32): Remove.
(vqshlq_m_u16): Remove.
(__arm_vqshlq_u8): Remove.
(__arm_vqshlq_r_u8): Remove.
(__arm_vqshlq_n_u8): Remove.
(__arm_vqshlq_s8): Remove.
(__arm_vqshlq_r_s8): Remove.
(__arm_vqshlq_n_s8): Remove.
(__arm_vqshlq_u16): Remove.
(__arm_vqshlq_r_u16): Remove.
(__arm_vqshlq_n_u16): Remove.
(__arm_vqshlq_s16): Remove.
(__arm_vqshlq_r_s16): Remove.
(__arm_vqshlq_n_s16): Remove.
(__arm_vqshlq_u32): Remove.
(__arm_vqshlq_r_u32): Remove.
(__arm_vqshlq_n_u32): Remove.
(__arm_vqshlq_s32): Remove.
(__arm_vqshlq_r_s32): Remove.
(__arm_vqshlq_n_s32): Remove.
(__arm_vqshlq_m_r_u8): Remove.
(__arm_vqshlq_m_r_s8): Remove.
(__arm_vqshlq_m_r_u16): Remove.
(__arm_vqshlq_m_r_s16): Remove.
(__arm_vqshlq_m_r_u32): Remove.
(__arm_vqshlq_m_r_s32): Remove.
(__arm_vqshlq_m_n_s8): Remove.
(__arm_vqshlq_m_n_s32): Remove.
(__arm_vqshlq_m_n_s16): Remove.
(__arm_vqshlq_m_n_u8): Remove.
(__arm_vqshlq_m_n_u32): Remove.
(__arm_vqshlq_m_n_u16): Remove.
(__arm_vqshlq_m_s8): Remove.
(__arm_vqshlq_m_s32): Remove.
(__arm_vqshlq_m_s16): Remove.
(__arm_vqshlq_m_u8): Remove.
(__arm_vqshlq_m_u32): Remove.
(__arm_vqshlq_m_u16): Remove.
(__arm_vqshlq): Remove.
(__arm_vqshlq_r): Remove.
(__arm_vqshlq_n): Remove.
(__arm_vqshlq_m_r): Remove.
(__arm_vqshlq_m_n): Remove.
(__arm_vqshlq_m): Remove.

arm: [MVE intrinsics] add unspec_mve_function_exact_insn_vshl

Introduce a function that will be used to build vshl intrinsics. They
are special because they have to handle MODE_r.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn_vshl): New.

arm: [MVE intrinsics] add binary_lshift_r shape

This patch adds the binary_lshift_r shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_lshift_r): New.
* config/arm/arm-mve-builtins-shapes.h (binary_lshift_r): New.

arm: [MVE intrinsics] add support for MODE_r

A few intrinsics have an additional mode (MODE_r), which does not
always support the same set of predicates as MODE_none and MODE_n.
For vqshlq they are the same, but for vshlq they are not.

Indeed we have:
vqshlq
vqshlq_m
vqshlq_n
vqshlq_m_n
vqshlq_r
vqshlq_m_r

vshlq
vshlq_m
vshlq_x
vshlq_n
vshlq_m_n
vshlq_x_n
vshlq_r
vshlq_m_r

This patch adds support for it.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins.cc (has_inactive_argument)
(finish_opt_n_resolution): Handle MODE_r.
* config/arm/arm-mve-builtins.def (r): New mode.

arm: [MVE intrinsics] add binary_lshift shape

This patch adds the binary_lshift shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_lshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_lshift): New.

arm: [MVE intrinsics] rework vabdq

Implement vabdq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_N): New.
(vabdq): New.
* config/arm/arm-mve-builtins-base.def (vabdq): New.
* config/arm/arm-mve-builtins-base.h (vabdq): New.
* config/arm/arm_mve.h (vabdq): Remove.
(vabdq_m): Remove.
(vabdq_x): Remove.
(vabdq_u8): Remove.
(vabdq_s8): Remove.
(vabdq_u16): Remove.
(vabdq_s16): Remove.
(vabdq_u32): Remove.
(vabdq_s32): Remove.
(vabdq_f16): Remove.
(vabdq_f32): Remove.
(vabdq_m_s8): Remove.
(vabdq_m_s32): Remove.
(vabdq_m_s16): Remove.
(vabdq_m_u8): Remove.
(vabdq_m_u32): Remove.
(vabdq_m_u16): Remove.
(vabdq_m_f32): Remove.
(vabdq_m_f16): Remove.
(vabdq_x_s8): Remove.
(vabdq_x_s16): Remove.
(vabdq_x_s32): Remove.
(vabdq_x_u8): Remove.
(vabdq_x_u16): Remove.
(vabdq_x_u32): Remove.
(vabdq_x_f16): Remove.
(vabdq_x_f32): Remove.
(__arm_vabdq_u8): Remove.
(__arm_vabdq_s8): Remove.
(__arm_vabdq_u16): Remove.
(__arm_vabdq_s16): Remove.
(__arm_vabdq_u32): Remove.
(__arm_vabdq_s32): Remove.
(__arm_vabdq_m_s8): Remove.
(__arm_vabdq_m_s32): Remove.
(__arm_vabdq_m_s16): Remove.
(__arm_vabdq_m_u8): Remove.
(__arm_vabdq_m_u32): Remove.
(__arm_vabdq_m_u16): Remove.
(__arm_vabdq_x_s8): Remove.
(__arm_vabdq_x_s16): Remove.
(__arm_vabdq_x_s32): Remove.
(__arm_vabdq_x_u8): Remove.
(__arm_vabdq_x_u16): Remove.
(__arm_vabdq_x_u32): Remove.
(__arm_vabdq_f16): Remove.
(__arm_vabdq_f32): Remove.
(__arm_vabdq_m_f32): Remove.
(__arm_vabdq_m_f16): Remove.
(__arm_vabdq_x_f16): Remove.
(__arm_vabdq_x_f32): Remove.
(__arm_vabdq): Remove.
(__arm_vabdq_m): Remove.
(__arm_vabdq_x): Remove.

arm: [MVE intrinsics] factorize vabdq

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_FP_M_BINARY): Add vabdq.
(MVE_FP_VABDQ_ONLY): New.
(mve_insn): Add vabd.
* config/arm/mve.md (mve_vabdq_f<mode>): Move into ...
(@mve_<mve_insn>q_f<mode>): ... this.
(mve_vabdq_m_f<mode>): Remove.

arm: [MVE intrinsics] rework vqrdmulhq

Implement vqrdmulhq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vqrdmulhq): New.
* config/arm/arm-mve-builtins-base.def (vqrdmulhq): New.
* config/arm/arm-mve-builtins-base.h (vqrdmulhq): New.
* config/arm/arm_mve.h (vqrdmulhq): Remove.
(vqrdmulhq_m): Remove.
(vqrdmulhq_s8): Remove.
(vqrdmulhq_n_s8): Remove.
(vqrdmulhq_s16): Remove.
(vqrdmulhq_n_s16): Remove.
(vqrdmulhq_s32): Remove.
(vqrdmulhq_n_s32): Remove.
(vqrdmulhq_m_n_s8): Remove.
(vqrdmulhq_m_n_s32): Remove.
(vqrdmulhq_m_n_s16): Remove.
(vqrdmulhq_m_s8): Remove.
(vqrdmulhq_m_s32): Remove.
(vqrdmulhq_m_s16): Remove.
(__arm_vqrdmulhq_s8): Remove.
(__arm_vqrdmulhq_n_s8): Remove.
(__arm_vqrdmulhq_s16): Remove.
(__arm_vqrdmulhq_n_s16): Remove.
(__arm_vqrdmulhq_s32): Remove.
(__arm_vqrdmulhq_n_s32): Remove.
(__arm_vqrdmulhq_m_n_s8): Remove.
(__arm_vqrdmulhq_m_n_s32): Remove.
(__arm_vqrdmulhq_m_n_s16): Remove.
(__arm_vqrdmulhq_m_s8): Remove.
(__arm_vqrdmulhq_m_s32): Remove.
(__arm_vqrdmulhq_m_s16): Remove.
(__arm_vqrdmulhq): Remove.
(__arm_vqrdmulhq_m): Remove.

arm: [MVE intrinsics] factorize vqshlq vshlq

Factorize vqshlq and vshlq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_SHIFT_M_R, MVE_SHIFT_M_N)
(MVE_SHIFT_N, MVE_SHIFT_R): New.
(mve_insn): Add vqshl, vshl.
* config/arm/mve.md (mve_vqshlq_n_<supf><mode>)
(mve_vshlq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vqshlq_r_<supf><mode>, mve_vshlq_r_<supf><mode>): Merge into
...
(@mve_<mve_insn>q_r_<supf><mode>): ... this.
(mve_vqshlq_m_r_<supf><mode>, mve_vshlq_m_r_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_r_<supf><mode>): ... this.
(mve_vqshlq_m_n_<supf><mode>, mve_vshlq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
* config/arm/vec-common.md (mve_vshlq_<supf><mode>): Transform
into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.

arm: [MVE intrinsics] rework vrshlq vqrshlq

Implement vrshlq, vqrshlq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins-base.def (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins-base.h (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins.cc (has_inactive_argument): Handle
vqrshlq, vrshlq.
* config/arm/arm_mve.h (vrshlq): Remove.
(vrshlq_m_n): Remove.
(vrshlq_m): Remove.
(vrshlq_x): Remove.
(vrshlq_u8): Remove.
(vrshlq_n_u8): Remove.
(vrshlq_s8): Remove.
(vrshlq_n_s8): Remove.
(vrshlq_u16): Remove.
(vrshlq_n_u16): Remove.
(vrshlq_s16): Remove.
(vrshlq_n_s16): Remove.
(vrshlq_u32): Remove.
(vrshlq_n_u32): Remove.
(vrshlq_s32): Remove.
(vrshlq_n_s32): Remove.
(vrshlq_m_n_u8): Remove.
(vrshlq_m_n_s8): Remove.
(vrshlq_m_n_u16): Remove.
(vrshlq_m_n_s16): Remove.
(vrshlq_m_n_u32): Remove.
(vrshlq_m_n_s32): Remove.
(vrshlq_m_s8): Remove.
(vrshlq_m_s32): Remove.
(vrshlq_m_s16): Remove.
(vrshlq_m_u8): Remove.
(vrshlq_m_u32): Remove.
(vrshlq_m_u16): Remove.
(vrshlq_x_s8): Remove.
(vrshlq_x_s16): Remove.
(vrshlq_x_s32): Remove.
(vrshlq_x_u8): Remove.
(vrshlq_x_u16): Remove.
(vrshlq_x_u32): Remove.
(__arm_vrshlq_u8): Remove.
(__arm_vrshlq_n_u8): Remove.
(__arm_vrshlq_s8): Remove.
(__arm_vrshlq_n_s8): Remove.
(__arm_vrshlq_u16): Remove.
(__arm_vrshlq_n_u16): Remove.
(__arm_vrshlq_s16): Remove.
(__arm_vrshlq_n_s16): Remove.
(__arm_vrshlq_u32): Remove.
(__arm_vrshlq_n_u32): Remove.
(__arm_vrshlq_s32): Remove.
(__arm_vrshlq_n_s32): Remove.
(__arm_vrshlq_m_n_u8): Remove.
(__arm_vrshlq_m_n_s8): Remove.
(__arm_vrshlq_m_n_u16): Remove.
(__arm_vrshlq_m_n_s16): Remove.
(__arm_vrshlq_m_n_u32): Remove.
(__arm_vrshlq_m_n_s32): Remove.
(__arm_vrshlq_m_s8): Remove.
(__arm_vrshlq_m_s32): Remove.
(__arm_vrshlq_m_s16): Remove.
(__arm_vrshlq_m_u8): Remove.
(__arm_vrshlq_m_u32): Remove.
(__arm_vrshlq_m_u16): Remove.
(__arm_vrshlq_x_s8): Remove.
(__arm_vrshlq_x_s16): Remove.
(__arm_vrshlq_x_s32): Remove.
(__arm_vrshlq_x_u8): Remove.
(__arm_vrshlq_x_u16): Remove.
(__arm_vrshlq_x_u32): Remove.
(__arm_vrshlq): Remove.
(__arm_vrshlq_m_n): Remove.
(__arm_vrshlq_m): Remove.
(__arm_vrshlq_x): Remove.
(vqrshlq): Remove.
(vqrshlq_m_n): Remove.
(vqrshlq_m): Remove.
(vqrshlq_u8): Remove.
(vqrshlq_n_u8): Remove.
(vqrshlq_s8): Remove.
(vqrshlq_n_s8): Remove.
(vqrshlq_u16): Remove.
(vqrshlq_n_u16): Remove.
(vqrshlq_s16): Remove.
(vqrshlq_n_s16): Remove.
(vqrshlq_u32): Remove.
(vqrshlq_n_u32): Remove.
(vqrshlq_s32): Remove.
(vqrshlq_n_s32): Remove.
(vqrshlq_m_n_u8): Remove.
(vqrshlq_m_n_s8): Remove.
(vqrshlq_m_n_u16): Remove.
(vqrshlq_m_n_s16): Remove.
(vqrshlq_m_n_u32): Remove.
(vqrshlq_m_n_s32): Remove.
(vqrshlq_m_s8): Remove.
(vqrshlq_m_s32): Remove.
(vqrshlq_m_s16): Remove.
(vqrshlq_m_u8): Remove.
(vqrshlq_m_u32): Remove.
(vqrshlq_m_u16): Remove.
(__arm_vqrshlq_u8): Remove.
(__arm_vqrshlq_n_u8): Remove.
(__arm_vqrshlq_s8): Remove.
(__arm_vqrshlq_n_s8): Remove.
(__arm_vqrshlq_u16): Remove.
(__arm_vqrshlq_n_u16): Remove.
(__arm_vqrshlq_s16): Remove.
(__arm_vqrshlq_n_s16): Remove.
(__arm_vqrshlq_u32): Remove.
(__arm_vqrshlq_n_u32): Remove.
(__arm_vqrshlq_s32): Remove.
(__arm_vqrshlq_n_s32): Remove.
(__arm_vqrshlq_m_n_u8): Remove.
(__arm_vqrshlq_m_n_s8): Remove.
(__arm_vqrshlq_m_n_u16): Remove.
(__arm_vqrshlq_m_n_s16): Remove.
(__arm_vqrshlq_m_n_u32): Remove.
(__arm_vqrshlq_m_n_s32): Remove.
(__arm_vqrshlq_m_s8): Remove.
(__arm_vqrshlq_m_s32): Remove.
(__arm_vqrshlq_m_s16): Remove.
(__arm_vqrshlq_m_u8): Remove.
(__arm_vqrshlq_m_u32): Remove.
(__arm_vqrshlq_m_u16): Remove.
(__arm_vqrshlq): Remove.
(__arm_vqrshlq_m_n): Remove.
(__arm_vqrshlq_m): Remove.

arm: [MVE intrinsics] factorize vqrshlq vrshlq

Factorize vqrshlq, vrshlq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_RSHIFT_M_N, MVE_RSHIFT_N): New.
(mve_insn): Add vqrshl, vrshl.
* config/arm/mve.md (mve_vqrshlq_n_<supf><mode>)
(mve_vrshlq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vqrshlq_m_n_<supf><mode>, mve_vrshlq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] add binary_round_lshift shape

This patch adds the binary_round_lshift shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_round_lshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_round_lshift): New.

RISC-V: Fix PR109615

This patch is to fix following case:
void f (int8_t * restrict in, int8_t * restrict out, int n, int m, int cond)
{
  size_t vl = 101;
  if (cond)
    vl = m * 2;
  else
    vl = m * 2 * vl;

  for (size_t i = 0; i < n; i++)
    {
      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
      __riscv_vse8_v_i8mf8 (out + i, v, vl);

      vbool64_t mask = __riscv_vlm_v_b64 (in + i + 100, vl);

      vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tumu (mask, v, in + i + 100, vl);
      __riscv_vse8_v_i8mf8 (out + i + 100, v2, vl);
    }

  for (size_t i = 0; i < n; i++)
    {
      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300, vl);
      __riscv_vse8_v_i8mf8 (out + i + 300, v, vl);
    }
}

The value of "vl" is coming from different blocks so it will be wrapped as a PHI node of each
block.

In the first loop, the "vl" source is a PHI node from bb 4.
In the second loop, the "vl" source is a PHI node from bb 5.
since bb 5 is dominated by bb 4, the PHI input of "vl" in the second loop is the PHI node of "vl"
in bb 4.
So when 2 "vl" PHI node are both degenerate PHI node (the phi->num_inputs () == 1) and their only
input are same, it's safe for us to consider they are compatible.

This patch is only optimize degenerate PHI since it's safe and simple optimization.

non-dengerate PHI are considered as incompatible unless the PHI are the same in RTL_SSA.
TODO: non-generate PHI is complicated, we can support it when it is necessary in the future.

Before this patch:

...
.L2:
addi    a4,a1,100
add     t1,a0,a2
mv      t0,a0
beq     a2,zero,.L1
vsetvli zero,a3,e8,mf8,tu,mu
.L4:
addi    a6,t0,100
addi    a7,a4,-100
vle8.v  v1,0(t0)
addi    t0,t0,1
vse8.v  v1,0(a7)
vlm.v   v0,0(a6)
vle8.v  v1,0(a6),v0.t
vse8.v  v1,0(a4)
addi    a4,a4,1
bne     t0,t1,.L4
addi    a0,a0,300
addi    a1,a1,300
add     a2,a0,a2
vsetvli zero,a3,e8,mf8,ta,ma
.L5:
vle8.v  v2,0(a0)
addi    a0,a0,1
vse8.v  v2,0(a1)
addi    a1,a1,1
bne     a2,a0,.L5
.L1:
ret

After this patch:

...
.L2:
addi    a4,a1,100
add     t1,a0,a2
mv      t0,a0
beq     a2,zero,.L1
vsetvli zero,a3,e8,mf8,tu,mu
.L4:
addi    a6,t0,100
addi    a7,a4,-100
vle8.v  v1,0(t0)
addi    t0,t0,1
vse8.v  v1,0(a7)
vlm.v   v0,0(a6)
vle8.v  v1,0(a6),v0.t
vse8.v  v1,0(a4)
addi    a4,a4,1
bne     t0,t1,.L4
addi    a0,a0,300
addi    a1,a1,300
add     a2,a0,a2
.L5:
vle8.v  v2,0(a0)
addi    a0,a0,1
vse8.v  v2,0(a1)
addi    a1,a1,1
bne     a2,a0,.L5
.L1:
ret

PR target/109615

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (avl_info::multiple_source_equal_p): Add
denegrate PHI optmization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-74.c: Adapt testcase.
* gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/pr109615.c: New test.

i386: Rename index_register_operand predicate to register_no_SP_operand

Rename index_register_operand predicate to what it really does.

No functional change.

gcc/ChangeLog:

* config/i386/predicates.md (register_no_SP_operand):
Rename from index_register_operand.
(call_register_operand): Update for rename.
* config/i386/i386.md (*lea<mode>_general_[1234]): Update for rename.

match.pd: Use splits in makefile and make configurable.

This updates the build system to split up match.pd files into chunks of 10.
This also introduces a new flag --with-matchpd-partitions which can be used to
change the number of partitions.

For the analysis of why 10 please look at the previous patch in the series.

gcc/ChangeLog:

PR bootstrap/84402
* Makefile.in (NUM_MATCH_SPLITS, MATCH_SPLITS_SEQ,
GIMPLE_MATCH_PD_SEQ_SRC, GIMPLE_MATCH_PD_SEQ_O,
GENERIC_MATCH_PD_SEQ_SRC, GENERIC_MATCH_PD_SEQ_O): New.
(OBJS, MOSTLYCLEANFILES, .PRECIOUS): Use them.
(s-match): Split into s-generic-match and s-gimple-match.
* configure.ac (with-matchpd-partitions,
DEFAULT_MATCHPD_PARTITIONS): New.
* configure: Regenerate.

match.pd: automatically partition *-match.cc files.

Following on from Richi's RFC[1] this is another attempt to split up match.pd
into multiple gimple-match and generic-match files.  This version is fully
automated and requires no human intervention.

First things first, some perf numbers.  The following shows the effect of the
patch on my desktop doing parallel compilation of gimple-match:

+--------+------------------+--------+------------------+
| splits | rel. improvement | splits | rel. improvement |
+--------+------------------+--------+------------------+
|      1 | 0.00%            |     33 | 91.03%           |
|      2 | 71.77%           |     34 | 84.02%           |
|      3 | 100.71%          |     35 | 83.42%           |
|      4 | 143.08%          |     36 | 78.80%           |
|      5 | 176.18%          |     37 | 74.06%           |
|      6 | 174.40%          |     38 | 55.76%           |
|      7 | 176.62%          |     39 | 66.90%           |
|      8 | 168.35%          |     40 | 18.25%           |
|      9 | 189.80%          |     41 | 16.55%           |
|     10 | 171.77%          |     42 | 47.02%           |
|     11 | 152.82%          |     43 | 15.29%           |
|     12 | 112.20%          |     44 | 21.63%           |
|     13 | 158.57%          |     45 | 41.53%           |
|     14 | 158.57%          |     46 | 21.98%           |
|     15 | 152.07%          |     47 | -42.74%          |
|     16 | 151.70%          |     48 | -32.62%          |
|     17 | 131.52%          |     49 | 11.81%           |
|     18 | 133.11%          |     50 | 34.07%           |
|     19 | 137.33%          |     51 | 2.71%            |
|     20 | 103.83%          |     52 | -22.23%          |
|     21 | 132.47%          |     53 | 32.30%           |
|     22 | 116.52%          |     54 | 21.45%           |
|     23 | 112.73%          |     55 | 40.02%           |
|     24 | 111.94%          |     56 | 42.83%           |
|     25 | 112.73%          |     57 | -9.98%           |
|     26 | 104.07%          |     58 | 18.01%           |
|     27 | 113.27%          |     59 | -4.91%           |
|     28 | 96.77%           |     60 | 22.94%           |
|     29 | 93.42%           |     61 | -3.73%           |
|     30 | 87.67%           |     62 | -27.43%          |
|     31 | 89.54%           |     63 | -1.05%           |
|     32 | 84.42%           |     64 | -5.44%           |
+--------+------------------+--------+------------------+

As can be seen there seems to be a point of diminishing returns in doing splits.
This comes from the fact that these match files consume a sizeable amount of
headers.  At a certain point the parsing overhead of the headers dominate and
you start losing in gains.

As such from this I've made the default 10 splits per file to allow for some
room for growth in the future without needing changes to the split amount.
Since 5-10 show roughly the same gains it means we can afford to double the
file sizes before we need to up the split amount.  This can be controlled
by the configure parameter --with-matchpd-partitions=.

At 10 splits the sizes of the files are:

1.2M gimple-match-1.cc
490K gimple-match-2.cc
459K gimple-match-3.cc
462K gimple-match-4.cc
466K gimple-match-5.cc
690K gimple-match-6.cc
517K gimple-match-7.cc
693K gimple-match-8.cc
1011K gimple-match-9.cc
490K gimple-match-10.cc
210K gimple-match-auto.h

The reason gimple-match-1.cc is so large is because it got allocated a very
large function: gimple_simplify_NE_EXPR.

Because of these sporadically large functions the allocation to a split happens
based on the amount of data already written to a split instead of just a simple
round robin allocation (though the patch supports that too.).   This means that
once gimple_simplify_NE_EXPR is allocated to gimple-match-1.cc nothing uses it
again until the rest of the files catch up.

To support this split a new header file *-match-auto.h is generated to allow
the individual files to compile separately.

Lastly for the auto generated files I use pragmas to silence the unused
predicate warnings instead of the previous Makefile way because I couldn't find
a way to set them without knowing the number of split files beforehand.

Finally with this change, bootstrap time has dropped 8 minutes on AArch64.

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-04/msg01125.html

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (emit_func, SIZED_BASED_CHUNKS, get_out_file): New.
(decision_tree::gen): Accept list of files instead of single and update
to write function definition to header and main file.
(write_predicate): Likewise.
(write_header): Emit pragmas and new includes.
(main): Create file buffers and cleanup.
(showUsage, write_header_includes): New.

genmatch: split shared code to gimple-match-exports.cc

In preparation for automatically splitting match.pd files I split off the
non-static helper functions that are shared between the match.pd functions off
to another file.

This file can be compiled in parallel and also allows us to later avoid
duplicate symbols errors.

gcc/ChangeLog:

PR bootstrap/84402
* Makefile.in (OBJS): Add gimple-match-exports.o.
* genmatch.cc (decision_tree::gen): Export gimple_gimplify helpers.
* gimple-match-head.cc (gimple_simplify, gimple_resimplify1,
gimple_resimplify2, gimple_resimplify3, gimple_resimplify4,
gimple_resimplify5, constant_for_folding, convert_conditional_op,
maybe_resimplify_conditional_op, gimple_match_op::resimplify,
maybe_build_generic_op, build_call_internal, maybe_push_res_to_seq,
do_valueize, try_conditional_simplification, gimple_extract,
gimple_extract_op, canonicalize_code, commutative_binary_op_p,
commutative_ternary_op_p, first_commutative_argument,
associative_binary_op_p, directly_supported_p,
get_conditional_internal_fn): Moved to gimple-match-exports.cc
* gimple-match-exports.cc: New file.

match.pd: CSE the dump output check.

This is a small improvement in QoL codegen for match.pd to save time not
re-evaluating the condition for printing debug information in every function.

There is a small but consistent runtime and compile time win here. The runtime
win comes from not having to do the condition over again, and on Arm plaforms
we now use the new test-and-branch support for booleans to only have a single
instruction here.

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (decision_tree::gen, write_predicate): Generate new
debug_dump var.
(dt_simplify::gen_1): Use it.

match.pd: Remove commented out line pragmas unless -vv is used.

genmatch currently outputs commented out line directives that have no effect
but the compiler still has to parse only to discard.

They are however handy when debugging genmatch output. As such this moves them
behind the -vv flag.

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (output_line_directive): Only emit commented directive
when -vv.

match.pd: don't emit label if not needed

This is a small QoL codegen improvement for match.pd to not emit labels when
they are not needed. The codegen is nice and there is a small (but consistent)
improvement in compile time.

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (dt_simplify::gen_1): Only emit labels if used.

GCN: Silence unused-variable warning

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_vectorize_builtin_vectorized_function): Remove
unused in_mode/in_n variables.

tree-optimization/109735 - conversion for vectorized pointer-diff

There's handling in vectorizable_operation for POINTER_DIFF_EXPR
requiring conversion of the result of the unsigned operation to
a signed type.  But that's conditional on the "default" kind of
vectorization.  In this PR it's shown the emulated vector path
needs it and I think the masked operation case will, too (though
we might eventually never mask an integral MINUS_EXPR).  So the
following makes that handling unconditional.

PR tree-optimization/109735
* tree-vect-stmts.cc (vectorizable_operation): Perform
conversion for POINTER_DIFF_EXPR unconditionally.

i386: Introduce mulv2si3 instruction

For SSE2 targets the expander unpacks input elements into the correct
position in the V4SI vector and emits PMULUDQ instruction. The output
elements are then shuffled back to their positions in the V2SI vector.

For SSE4 targets PMULLD instruction is emitted directly.

gcc/ChangeLog:

* config/i386/mmx.md (mulv2si3): New expander.
(*mulv2si3): New insn pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sse2-mmx-mult-vec.c: New test.

[libstdc++] [testsuite] xfail double-prec from_chars for ldbl

When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail. Mark them as such on aarch64-*-vxworks.

for libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/4.cc: Skip long double test06
on aarch64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
aarch64-vxworks.

nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
variables by NULL if a translation does not contain any executable code. It
works with CUDA 11.1. The code of this commit is about reverse offload;
having NULL values disables the side of reverse offload during image load.

Solution is the same as found by Thomas for a related issue: Adding a dummy
procedure. Cf. the PR of this issue and Thomas' patch
"nvptx: Support global constructors/destructors via 'collect2'"
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html

As that approach also works here:

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
gcc/
PR libgomp/108098

* config/nvptx/mkoffload.cc (process): Emit dummy procedure
alongside reverse-offload function table to prevent NULL values
of the function addresses.

builtins: Fix comment typo mpft_t -> mpfr_t

I've noticed 4 typos in comments, fixed thusly.

2023-05-05 Jakub Jelinek <jakub@redhat.com>

* builtins.cc (do_mpfr_ckconv, do_mpc_ckconv): Fix comment typo,
mpft_t -> mpfr_t.
* fold-const-call.cc (do_mpfr_ckconv, do_mpc_ckconv): Likewise.

PHIOPT: Fix diamond case of match_simplify_replacement

So it turns out I messed checking which edge was true/false for the diamond
form. The edges, e0 and e1 here are edges from the merge block but the
true/false edges are from the conditional block and with diamond/threeway,
there is a bb inbetween on both edges.
Most of the time, the check that was in match_simplify_replacement would
happen to be correct for diamond form as most of the time the first edge in
the conditional is the edge for the true side of the conditional.
This is why I didn't see the issue during bootstrap/testing.

I added a fragile gimple testcase which exposed the issue. Since there is
no way to specify the order of the edges in the gimple fe, we have to
have forwprop to swap the false/true edges (not order of them, just swapping
true/false flags) and hope not to do cleanupcfg inbetween forwprop and the
first phiopt pass. This is the fragile part really, it is not that we will
produce wrong code, just we won't hit what was the failing case.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/109732

gcc/ChangeLog:

* tree-ssa-phiopt.cc (match_simplify_replacement): Fix the selection
of the argtrue/argfalse.

gcc/testsuite/ChangeLog:

* gcc.dg/pr109732.c: New test.
* gcc.dg/pr109732-1.c: New test.

MATCH: Add ABSU<a> == 0 to a == 0 simplification

There is already an `ABS<a> == 0` to `a == 0` pattern,
this just extends that to ABSU too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/109722

gcc/ChangeLog:

* match.pd: Extend the `ABS<a> == 0` pattern
to cover `ABSU<a> == 0` too.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/abs-1.c: New test.

Revert "c++: restore instantiate_decl assert"

In the testcase the assert fails because we use one member function from
another while we're in the middle of instantiating them all, which is
perfectly fine. It seems complicated to detect this situation, so let's
remove the assert again.

PR c++/109658

This reverts commit 95d4c0d2e6318aef88ba0bc607dfc1ec6b7a612f.

gcc/testsuite/ChangeLog:

* g++.dg/template/local10.C: New test.

Daily bump.

i386: Tighten ashift to lea splitter operand predicates [PR109733]

The predicates of ashift to lea post-reload splitter were too broad
so the splitter tried to convert the mask shift instruction. Tighten
operand predicates to match only general registers.

gcc/ChangeLog:

PR target/109733
* config/i386/predicates.md (index_reg_operand): New predicate.
* config/i386/i386.md (ashift to lea spliter): Use
general_reg_operand and index_reg_operand predicates.

PR modula2/109729 cannot use a CHAR type as a FOR loop iterator

This patch introduces a new quadruple ArithAddOp which is used in
the construction of FOR loop to ensure that when constant folding
is applied it does not concatenate two constant char operands into
a string constant. Overloading only occurs with constant operands.

gcc/m2/ChangeLog:

PR modula2/109729
* gm2-compiler/M2GenGCC.mod (CodeStatement): Detect
ArithAddOp and call CodeAddChecked.
(ResolveConstantExpressions): Detect ArithAddOp and call
FoldArithAdd.
(FoldArithAdd): New procedure.
(FoldAdd): Refactor to use FoldArithAdd.
* gm2-compiler/M2Quads.def (QuadOperator): Add ArithAddOp.
* gm2-compiler/M2Quads.mod: Remove commented imports.
(QuadFrame): Changed comments to use GNU coding standards.
(ArithPlusTok): New global variable.
(BuildForToByDo): Use ArithPlusTok instead of PlusTok.
(MakeOp): Detect ArithPlusTok and return ArithAddOp.
(WriteQuad): Add ArithAddOp clause.
(WriteOperator): Add ArithAddOp clause.
(Init): Initialize ArithPlusTok.

gcc/testsuite/ChangeLog:

PR modula2/109729
* gm2/pim/run/pass/ForChar.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[2/2] aarch64: Reimplement (R){ADD,SUB}HN2 patterns with standard RTL codes

Similar to the previous patch, this one converts the high-half versions of the patterns.
With this patch we can remove the UNSPEC_* codes involved entirely.

Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur><addsub>hn2<mode>_insn_le):
Rename and reimplement with RTL codes to...
(aarch64_<optab>hn2<mode>_insn_le): .. This.
(aarch64_r<optab>hn2<mode>_insn_le): New pattern.
(aarch64_<sur><addsub>hn2<mode>_insn_be): Rename and reimplement with RTL
codes to...
(aarch64_<optab>hn2<mode>_insn_be): ... This.
(aarch64_r<optab>hn2<mode>_insn_be): New pattern.
(aarch64_<sur><addsub>hn2<mode>): Rename and adjust expander to...
(aarch64_<optab>hn2<mode>): ... This.
(aarch64_r<optab>hn2<mode>): New expander.
* config/aarch64/iterators.md (UNSPEC_ADDHN, UNSPEC_RADDHN,
UNSPEC_SUBHN, UNSPEC_RSUBHN): Delete unspecs.
(ADDSUBHN): Delete.
(sur): Remove handling of the above.
(addsub): Likewise.

[1/2] aarch64: Reimplement (R){ADD,SUB}HN intrinsics with RTL codes

We can implement the halving-narrowing add/sub patterns with standard RTL codes as well rather than relying on unspecs.
This patch handles the low-part ones and the second patch does the high-part ones and removes the unspecs themselves.
The operation ADDHN on V4SI, for example, is represented as (truncate:V4HI ((src1:V4SI + src2:V4SI) >> 16))
and RADDHN as (truncate:V4HI ((src1:V4SI + src2:V4SI + (1 << 15)) >> 16)).
Taking this opportunity I specified the patterns returning the narrow mode and annotated them with the
<vczle><vczbe> define_subst rules to get the vec_concat-zero meta-patterns too. This allows us to simplify
the expanders somewhat too. Tests are added to check that the combinations work.

Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur><addsub>hn<mode>_insn_le):
Delete.
(aarch64_<optab>hn<mode>_insn<vczle><vczbe>): New define_insn.
(aarch64_<sur><addsub>hn<mode>_insn_be): Delete.
(aarch64_r<optab>hn<mode>_insn<vczle><vczbe>): New define_insn.
(aarch64_<sur><addsub>hn<mode>): Delete.
(aarch64_<optab>hn<mode>): New define_expand.
(aarch64_r<optab>hn<mode>): Likewise.
* config/aarch64/predicates.md (aarch64_simd_raddsubhn_imm_vec):
New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/pr99195_4.c: New test.

OpenACC: Further attach/detach clause fixes for Fortran [PR109622]

This patch moves several tests introduced by the following patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616939.html
  commit r14-325-gcacf65d74463600815773255e8b82b4043432bd7

into the proper location for OpenACC testing (thanks to Thomas for
spotting my mistake!), and also fixes a few additional problems --
missing diagnostics for non-pointer attaches, and a case where a pointer
was incorrectly dereferenced. Tests are also adjusted for vector-length
warnings on nvidia accelerators.

2023-04-29  Julian Brown  <julian@codesourcery.com>

PR fortran/109622

gcc/fortran/
* openmp.cc (resolve_omp_clauses): Add diagnostic for
non-pointer/non-allocatable attach/detach.
* trans-openmp.cc (gfc_trans_omp_clauses): Remove dereference for
pointer-to-scalar derived type component attach/detach.  Fix
attach/detach handling for descriptors.

gcc/testsuite/
* gfortran.dg/goacc/pr109622-5.f90: New test.
* gfortran.dg/goacc/pr109622-6.f90: New test.

libgomp/
* testsuite/libgomp.fortran/pr109622.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622.f90: ...to here. Ignore
vector length warning.
* testsuite/libgomp.fortran/pr109622-2.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622-2.f90: ...to here.  Add
missing copyin/copyout variable. Ignore vector length warnings.
* testsuite/libgomp.fortran/pr109622-3.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622-3.f90: ...to here.  Ignore
vector length warnings.
* testsuite/libgomp.oacc-fortran/pr109622-4.f90: New test.

libstdc++: Document new library version in manual

libstdc++-v3/ChangeLog:

* doc/xml/manual/abi.xml (abi.versioning.history): Document
libstdc++.so.6.0.32 and GLIBCXX_3.4.32 version.
* doc/html/manual/abi.html: Regenerate.

libstdc++: Mention recent libgcc_s symbol versions in manual

GCC_11.0 is an aarch64-specific outlier.

libstdc++-v3/ChangeLog:

* doc/xml/manual/abi.xml (abi.versioning.history): Add
GCC_7.0.0, GCC_9.0.0, GCC_11.0, GCC_12.0.0, GCC_13.0.0 for
libgcc_s.

PHIOPT: Improve replace_phi_edge_with_variable for diamond shapped bb

While looking at differences between what minmax_replacement
and match_simplify_replacement does. I noticed that they sometimes
chose different edges to remove. I decided we should be able to do
better and be able to remove both empty basic blocks in the
case of match_simplify_replacement as that moves the statements.

This also updates the testcases as now match_simplify_replacement
will remove the unused MIN/MAX_EXPR and they were checking for
those.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Handle
diamond form bb with forwarder only empty blocks better.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-15.c: Update test.
* gcc.dg/tree-ssa/minmax-16.c: Update test.
* gcc.dg/tree-ssa/minmax-3.c: Update test.
* gcc.dg/tree-ssa/minmax-4.c: Update test.
* gcc.dg/tree-ssa/minmax-5.c: Update test.
* gcc.dg/tree-ssa/minmax-8.c: Update test.

Move copy_phi_arg_into_existing_phi to common location and use it

While improving replace_phi_edge_with_variable for the diamond formed bb
case, I need a way to copy phi entries from one edge to another as I am
removing a forwarding bb inbetween. I was pointed out that jump threading
code had copy_phi_arg_into_existing_phi which I can use.
I also noticed that both gimple_duplicate_sese_tail and
remove_forwarder_block have similar code so it makes sense to use that function
in those two locations too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-threadupdate.cc (copy_phi_arg_into_existing_phi): Move to ...
* tree-cfg.cc (copy_phi_arg_into_existing_phi): Here and remove static.
(gimple_duplicate_sese_tail): Use copy_phi_arg_into_existing_phi instead
of an inline version of it.
* tree-cfgcleanup.cc (remove_forwarder_block): Likewise.
* tree-cfg.h (copy_phi_arg_into_existing_phi): New declaration.

PHIOPT: Improve replace_phi_edge_with_variable's dce_ssa_names slightly

When I added the dce_ssa_names argument, I didn't realize bitmap was a
pointer so I used the default argument value as auto_bitmap(). But
instead we could just use nullptr and check if it was a nullptr
before calling simple_dce_from_worklist.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Change
the default argument value for dce_ssa_names to nullptr.
Check to make sure dce_ssa_names is a non-nullptr before
calling simple_dce_from_worklist.

i386: Improve index_register_operand predicate

Use the same approach as in register_no_elim_operand predicate, but also
reject stack_pointer_rtx operands.

gcc/ChangeLog:

* config/i386/predicates.md (index_register_operand): Reject
arg_pointer_rtx, frame_pointer_rtx, stack_pointer_rtx and
VIRTUAL_REGISTER_P operands. Allow subregs of memory before reload.
(call_register_no_elim_operand): Rewrite as ...
(call_register_operand): ... this.
(call_insn_operand): Use call_register_operand predicate.

tree-optimization/109721 - emulated vectors

When fixing PR109672 I noticed we let SImode AND through when
target_support_p even though it isn't word_mode and I didn't want to
change that but had to catch the case where SImode PLUS is supported
but emulated vectors rely on it being word_mode. The following
makes sure to preserve the word_mode check when !target_support_p
to avoid excessive lowering later even for bit operations.

PR tree-optimization/109721
* tree-vect-stmts.cc (vectorizable_operation): Make sure
to test word_mode for all !target_support_p operations.

aarch64: PR target/99195 annotate simple ternary ops for vec-concat with zero

We're now moving onto various simple ternary instructions, including some lane forms.
These include intrinsics that map down to mla, mls, fma, aba, bsl instructions.
Tests are added for lane 0 and lane 1 as for some of these instructions the lane 0 variants
use separate simpler patterns that need a separate annotation.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_<su>aba<mode>): Rename to...
(aarch64_<su>aba<mode><vczle><vczbe>): ... This.
(aarch64_mla<mode>): Rename to...
(aarch64_mla<mode><vczle><vczbe>): ... This.
(*aarch64_mla_elt<mode>): Rename to...
(*aarch64_mla_elt<mode><vczle><vczbe>): ... This.
(*aarch64_mla_elt_<vswap_width_name><mode>): Rename to...
(*aarch64_mla_elt_<vswap_width_name><mode><vczle><vczbe>): ... This.
(aarch64_mla_n<mode>): Rename to...
(aarch64_mla_n<mode><vczle><vczbe>): ... This.
(aarch64_mls<mode>): Rename to...
(aarch64_mls<mode><vczle><vczbe>): ... This.
(*aarch64_mls_elt<mode>): Rename to...
(*aarch64_mls_elt<mode><vczle><vczbe>): ... This.
(*aarch64_mls_elt_<vswap_width_name><mode>): Rename to...
(*aarch64_mls_elt_<vswap_width_name><mode><vczle><vczbe>): ... This.
(aarch64_mls_n<mode>): Rename to...
(aarch64_mls_n<mode><vczle><vczbe>): ... This.
(fma<mode>4): Rename to...
(fma<mode>4<vczle><vczbe>): ... This.
(*aarch64_fma4_elt<mode>): Rename to...
(*aarch64_fma4_elt<mode><vczle><vczbe>): ... This.
(*aarch64_fma4_elt_<vswap_width_name><mode>): Rename to...
(*aarch64_fma4_elt_<vswap_width_name><mode><vczle><vczbe>): ... This.
(*aarch64_fma4_elt_from_dup<mode>): Rename to...
(*aarch64_fma4_elt_from_dup<mode><vczle><vczbe>): ... This.
(fnma<mode>4): Rename to...
(fnma<mode>4<vczle><vczbe>): ... This.
(*aarch64_fnma4_elt<mode>): Rename to...
(*aarch64_fnma4_elt<mode><vczle><vczbe>): ... This.
(*aarch64_fnma4_elt_<vswap_width_name><mode>): Rename to...
(*aarch64_fnma4_elt_<vswap_width_name><mode><vczle><vczbe>): ... This.
(*aarch64_fnma4_elt_from_dup<mode>): Rename to...
(*aarch64_fnma4_elt_from_dup<mode><vczle><vczbe>): ... This.
(aarch64_simd_bsl<mode>_internal): Rename to...
(aarch64_simd_bsl<mode>_internal<vczle><vczbe>): ... This.
(*aarch64_simd_bsl<mode>_alt): Rename to...
(*aarch64_simd_bsl<mode>_alt<vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_3.c: New test.

aarch64: PR target/99195 annotate more simple binary ops for vec-concat with zero

More pattern annotations and tests to eliminate redundant vec-concat with zero instructions.
These are for the abd family of instructions and the pairwise floating-point max/min and fadd
operations too.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_<su>abd<mode>): Rename to...
(aarch64_<su>abd<mode><vczle><vczbe>): ... This.
(fabd<mode>3): Rename to...
(fabd<mode>3<vczle><vczbe>): ... This.
(aarch64_<optab>p<mode>): Rename to...
(aarch64_<optab>p<mode><vczle><vczbe>): ... This.
(aarch64_faddp<mode>): Rename to...
(aarch64_faddp<mode><vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for more binary ops.
* gcc.target/aarch64/simd/pr99195_2.c: Add testing for more binary ops.

gcov: add GCOV format version to gcov -v

gcc/ChangeLog:

* gcov.cc (GCOV_JSON_FORMAT_VERSION): New definition.
(print_version): Use it.
(generate_results): Likewise.

tree-optimization/109724 - new testcase

The following adds a testcase for PR109724 which was caused by
backporting r13-2375-gbe1b42de9c151d and fixed by r11-199-g2b42509f8b7bdf.

PR tree-optimization/109724
* g++.dg/torture/pr109724.C: New testcase.

Rename last_stmt to last_nondebug_stmt

The following renames last_stmt to last_nondebug_stmt which is
what it really does.

* tree-cfg.h (last_stmt): Rename to ...
(last_nondebug_stmt): ... this.
* tree-cfg.cc (last_stmt): Rename to ...
(last_nondebug_stmt): ... this.
(assign_discriminators): Adjust.
(group_case_labels_stmt): Likewise.
(gimple_can_duplicate_bb_p): Likewise.
(execute_fixup_cfg): Likewise.
* auto-profile.cc (afdo_propagate_circuit): Likewise.
* gimple-range.cc (gimple_ranger::range_on_exit): Likewise.
* omp-expand.cc (workshare_safe_to_combine_p): Likewise.
(determine_parallel_type): Likewise.
(adjust_context_and_scope): Likewise.
(expand_task_call): Likewise.
(remove_exit_barrier): Likewise.
(expand_omp_taskreg): Likewise.
(expand_omp_for_init_counts): Likewise.
(expand_omp_for_init_vars): Likewise.
(expand_omp_for_static_chunk): Likewise.
(expand_omp_simd): Likewise.
(expand_oacc_for): Likewise.
(expand_omp_for): Likewise.
(expand_omp_sections): Likewise.
(expand_omp_atomic_fetch_op): Likewise.
(expand_omp_atomic_cas): Likewise.
(expand_omp_atomic): Likewise.
(expand_omp_target): Likewise.
(expand_omp): Likewise.
(omp_make_gimple_edges): Likewise.
* trans-mem.cc (tm_region_init): Likewise.
* tree-inline.cc (redirect_all_calls): Likewise.
* tree-parloops.cc (gen_parallel_loop): Likewise.
* tree-ssa-loop-ch.cc (do_while_loop_p): Likewise.
* tree-ssa-loop-ivcanon.cc (canonicalize_loop_induction_variables):
Likewise.
* tree-ssa-loop-ivopts.cc (stmt_after_ip_normal_pos): Likewise.
(may_eliminate_iv): Likewise.
* tree-ssa-loop-manip.cc (standard_iv_increment_position): Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations):
Likewise.
(estimate_numbers_of_iterations): Likewise.
* tree-ssa-loop-split.cc (compute_added_num_insns): Likewise.
* tree-ssa-loop-unswitch.cc (get_predicates_for_bb): Likewise.
(set_predicates_for_bb): Likewise.
(init_loop_unswitch_info): Likewise.
(hoist_guard): Likewise.
* tree-ssa-phiopt.cc (match_simplify_replacement): Likewise.
(minmax_replacement): Likewise.
* tree-ssa-reassoc.cc (update_range_test): Likewise.
(optimize_range_tests_to_bit_test): Likewise.
(optimize_range_tests_var_bound): Likewise.
(optimize_range_tests): Likewise.
(no_side_effect_bb): Likewise.
(suitable_cond_bb): Likewise.
(maybe_optimize_range_tests): Likewise.
(reassociate_bb): Likewise.
* tree-vrp.cc (rvrp_folder::pre_fold_bb): Likewise.

i386: Fix up handling of debug insns in STV [PR109676]

The following testcase ICEs because STV replaces there
(debug_insn 114 47 51 8 (var_location:TI D#3 (reg:TI 91 [ p ])) -1
     (nil))
with
(debug_insn 114 47 51 8 (var_location:TI D#3 (reg:V1TI 91 [ p ])) -1
     (nil))
which is invalid because of the mode mismatch.
STV has fix_debug_reg_uses function which is supposed to fix this up
and adjust such debug insns into
(debug_insn 114 47 51 8 (var_location:TI D#3 (subreg:TI (reg:V1TI 91 [ p ]) 0)) -1
     (nil))
but it doesn't trigger here.
The IL before stv1 has:
(debug_insn 114 47 51 8 (var_location:TI D#3 (reg:TI 91 [ p ])) -1
     (nil))
...
(insn 63 62 64 8 (set (mem/c:TI (reg/f:DI 89 [ .result_ptr ]) [0 <retval>.mStorage+0 S16 A32])
        (reg:TI 91 [ p ])) "pr109676.C":4:48 87 {*movti_internal}
     (expr_list:REG_DEAD (reg:TI 91 [ p ])
        (nil)))
in bb 8 and
(insn 97 96 98 9 (set (reg:TI 91 [ p ])
        (mem/c:TI (plus:DI (reg/f:DI 19 frame)
                (const_int -32 [0xffffffffffffffe0])) [0 p+0 S16 A128])) "pr109676.C":26:12 87 {*movti_internal}
     (nil))
(insn 98 97 99 9 (set (mem/c:TI (plus:DI (reg/f:DI 19 frame)
                (const_int -64 [0xffffffffffffffc0])) [0 tmp+0 S16 A128])
        (reg:TI 91 [ p ])) "pr109676.C":26:12 87 {*movti_internal}
     (nil))
in bb9.
PUT_MODE on a REG is done in two spots in timode_scalar_chain::convert_insn,
one is:
  switch (GET_CODE (dst))
    {
    case REG:
      if (GET_MODE (dst) == TImode)
        {
          PUT_MODE (dst, V1TImode);
          fix_debug_reg_uses (dst);
        }
      if (GET_MODE (dst) == V1TImode)
when seeing the REG in SET_DEST and another one the hunk the patch adjusts.
Because bb 8 comes first in the order the pass walks the bbs, we first
notice the TImode pseudo on insn 63 where it is SET_SRC, use PUT_MODE there
unconditionally, so for a shared REG it changes all other uses in the IL,
and then don't call fix_debug_reg_uses because DF_REG_DEF_CHAIN (REGNO (src))
is non-NULL - the REG is set in insn 97 but we haven't processed it yet.
Later on we process insn 97, but because the REG in SET_DEST already has
V1TImode, we don't do anything, even when the src handling code earlier
relied on it being done.

The following patch fixes this by using similar code for both dst and src,
in particular calling fix_debug_reg_uses once when we actually change REG
mode from TImode to V1TImode, and not later on.

2023-05-04  Jakub Jelinek  <jakub@redhat.com>

PR debug/109676
* config/i386/i386-features.cc (timode_scalar_chain::convert_insn):
If src is REG, change its mode to V1TImode and call fix_debug_reg_uses
for it only if it still has TImode.  Don't decide whether to call
fix_debug_reg_uses based on whether SRC is ever set or not.

* g++.target/i386/pr109676.C: New test.

CRIS: peephole2 an "and" with a contiguous "one-sided" sequences of 1s

This kind of transformation seems pretty generic and might be a
candidate for adding to the middle-end, perhaps as part of combine.

I noticed these happened more often for LRA, which is the reason I
went on this track of low-hanging-fruit-microoptimizations that are
such an itch when noticing them, inspecting generated code for libgcc.
Unfortunately, this one improves coremark only by a few cycles at the
beginning or end (<0.0005%) for cris-elf -march=v10. The size of the
coremark code is down by 0.4% (0.22% pre-lra).

Using an iterator from the start because other binary operations will
be added and their define_peephole2's would look exactly the same for
the .md part.

Some existing and-peephole2-related tests suffered, because many of
them were using patterns with only contiguous 1:s in them: adjusted.
Also, spotted and fixed, by adding a space, some
scan-assembler-strings that were prone to spurious identifier or file
name matches.

gcc:
* config/cris/cris.cc (cris_split_constant): New function.
* config/cris/cris.md (splitop): New iterator.
(opsplit1): New define_peephole2.
* config/cris/cris-protos.h (cris_split_constant): Declare.
(cris_splittable_constant_p): New macro.

gcc/testsuite:
* gcc.target/cris/peep2-andsplit1.c: New test.
* gcc.target/cris/peep2-andu1.c, gcc.target/cris/peep2-andu2.c,
gcc.target/cris/peep2-xsrand.c, gcc.target/cris/peep2-xsrand2.c:
Adjust values to avoid interference with "opsplit1" with AND. Add
whitespace to match-strings that may be confused with identifiers
or file names.

CRIS-LRA: Define TARGET_SPILL_CLASS

This has no effect on arith-rand-ll (which suffers badly from LRA) and
marginal effects (0.01% improvement) on coremark, but the size of
coremark shrinks by 0.2%. An earlier version was tested with a tree
around 2023-03 which showed (marginally) that ALL_REGS is preferable
to GENERAL_REGS.

* config/cris/cris.cc (TARGET_SPILL_CLASS): Define
to ALL_REGS.

PR modula2/109675 implementation of writeAddress is non portable

The implementation of gcc/m2/gm2-libs/DynamicStrings.mod:writeAddress
is non portable as it casts a void * into an unsigned long int.  This
procedure has been re-implemented to use snprintf.  As it is a library
the support tools 'mc' and 'pge' have been rebuilt.  There have been
linking changes in the library and the underlying boolean type is now
bool since the last rebuild hence the size of the patch.

gcc/m2/ChangeLog:

PR modula2/109675
* Make-lang.in (MC-LIB-DEFS): Remove M2LINK.def.
(BUILD-PGE-O): Remove GM2LINK.o.
* Make-maintainer.in (PPG-DEFS): New define.
(PPG-LIB-DEFS): Remove M2LINK.def.
(BUILD-BOOT-PPG-H): Add PPGDEF .h files.
(m2/ppg$(exeext)): Remove M2LINK.o
(PGE-DEPS): New define.
(m2/pg$(exeext)): Remove M2LINK.o.
(m2/gm2-pge-boot/$(SRC_PREFIX)%.o): Add -Im2/gm2-pge-boot.
(m2/pge$(exeext)): Remove M2LINK.o.
(pge-maintainer): Re-implement.
(pge-libs-push): Re-implement.
(m2/m2obj3/cc1gm2$(exeext)): Remove M2LINK.o.
* gm2-libs/DynamicStrings.mod (writeAddress): Re-implement
using snprintf.
* gm2-libs/M2Dependent.mod: Remove commented out imports.
* mc-boot/GDynamicStrings.cc: Rebuild.
* mc-boot/GFIO.cc: Rebuild.
* mc-boot/GFormatStrings.cc: Rebuild.
* mc-boot/GM2Dependent.cc: Rebuild.
* mc-boot/GM2Dependent.h: Rebuild.
* mc-boot/GM2RTS.cc: Rebuild.
* mc-boot/GM2RTS.h: Rebuild.
* mc-boot/GRTExceptions.cc: Rebuild.
* mc-boot/GRTint.cc: Rebuild.
* mc-boot/GSFIO.cc: Rebuild.
* mc-boot/GStringConvert.cc: Rebuild.
* mc-boot/Gdecl.cc: Rebuild.
* pge-boot/GASCII.cc: Rebuild.
* pge-boot/GASCII.h: Rebuild.
* pge-boot/GArgs.cc: Rebuild.
* pge-boot/GArgs.h: Rebuild.
* pge-boot/GAssertion.cc: Rebuild.
* pge-boot/GAssertion.h: Rebuild.
* pge-boot/GBreak.h: Rebuild.
* pge-boot/GCmdArgs.h: Rebuild.
* pge-boot/GDebug.cc: Rebuild.
* pge-boot/GDebug.h: Rebuild.
* pge-boot/GDynamicStrings.cc: Rebuild.
* pge-boot/GDynamicStrings.h: Rebuild.
* pge-boot/GEnvironment.h: Rebuild.
* pge-boot/GFIO.cc: Rebuild.
* pge-boot/GFIO.h: Rebuild.
* pge-boot/GFormatStrings.h:: Rebuild.
* pge-boot/GFpuIO.h:: Rebuild.
* pge-boot/GIO.cc: Rebuild.
* pge-boot/GIO.h: Rebuild.
* pge-boot/GIndexing.cc: Rebuild.
* pge-boot/GIndexing.h: Rebuild.
* pge-boot/GLists.cc: Rebuild.
* pge-boot/GLists.h: Rebuild.
* pge-boot/GM2Dependent.cc: Rebuild.
* pge-boot/GM2Dependent.h: Rebuild.
* pge-boot/GM2EXCEPTION.cc: Rebuild.
* pge-boot/GM2EXCEPTION.h: Rebuild.
* pge-boot/GM2RTS.cc: Rebuild.
* pge-boot/GM2RTS.h: Rebuild.
* pge-boot/GNameKey.cc: Rebuild.
* pge-boot/GNameKey.h: Rebuild.
* pge-boot/GNumberIO.cc: Rebuild.
* pge-boot/GNumberIO.h: Rebuild.
* pge-boot/GOutput.cc: Rebuild.
* pge-boot/GOutput.h: Rebuild.
* pge-boot/GPushBackInput.cc: Rebuild.
* pge-boot/GPushBackInput.h: Rebuild.
* pge-boot/GRTExceptions.cc: Rebuild.
* pge-boot/GRTExceptions.h: Rebuild.
* pge-boot/GSArgs.h: Rebuild.
* pge-boot/GSEnvironment.h: Rebuild.
* pge-boot/GSFIO.cc: Rebuild.
* pge-boot/GSFIO.h: Rebuild.
* pge-boot/GSYSTEM.h: Rebuild.
* pge-boot/GScan.h: Rebuild.
* pge-boot/GStdIO.cc: Rebuild.
* pge-boot/GStdIO.h: Rebuild.
* pge-boot/GStorage.cc: Rebuild.
* pge-boot/GStorage.h: Rebuild.
* pge-boot/GStrCase.cc: Rebuild.
* pge-boot/GStrCase.h: Rebuild.
* pge-boot/GStrIO.cc: Rebuild.
* pge-boot/GStrIO.h: Rebuild.
* pge-boot/GStrLib.cc: Rebuild.
* pge-boot/GStrLib.h: Rebuild.
* pge-boot/GStringConvert.h: Rebuild.
* pge-boot/GSymbolKey.cc: Rebuild.
* pge-boot/GSymbolKey.h: Rebuild.
* pge-boot/GSysExceptions.h: Rebuild.
* pge-boot/GSysStorage.cc: Rebuild.
* pge-boot/GSysStorage.h: Rebuild.
* pge-boot/GTimeString.h: Rebuild.
* pge-boot/GUnixArgs.h: Rebuild.
* pge-boot/Gbnflex.cc: Rebuild.
* pge-boot/Gbnflex.h: Rebuild.
* pge-boot/Gdtoa.h: Rebuild.
* pge-boot/Gerrno.h: Rebuild.
* pge-boot/Gldtoa.h: Rebuild.
* pge-boot/Glibc.h: Rebuild.
* pge-boot/Glibm.h: Rebuild.
* pge-boot/Gpge.cc: Rebuild.
* pge-boot/Gtermios.h: Rebuild.
* pge-boot/Gwrapc.h: Rebuild.
* mc-boot/GM2LINK.h: Removed.
* pge-boot/GM2LINK.cc: Removed.
* pge-boot/GM2LINK.h: Removed.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Daily bump.

CRIS-LRA: Fix uses of reload_in_progress

This shows no difference neither in arith-rand-ll nor coremark
numbers. Comparing libgcc and newlib libc before/after, the only
difference can be seen in a few functions where it's mostly neutral
(newlib's _svfprintf_r et al) and one function (__gdtoa), which
improves ever so slightly (four bytes less; one load less, but one
instruction reading from memory instead of a register).

* config/cris/cris.cc (cris_side_effect_mode_ok): Use
lra_in_progress, not reload_in_progress.
* config/cris/cris.md ("movdi", "*addi_reload"): Ditto.
* config/cris/constraints.md ("Q"): Ditto.

libstdc++: Fix up abi.exp FAILs on powerpc64le-linux

This is an ABI problem on powerpc64le-linux, introduced in 13.1.
When libstdc++ is configured against old glibc, the
_ZSt10from_charsPKcS0_RDF128_St12chars_format@@GLIBCXX_3.4.31
_ZSt8to_charsPcS_DF128_@@GLIBCXX_3.4.31
_ZSt8to_charsPcS_DF128_St12chars_format@@GLIBCXX_3.4.31
_ZSt8to_charsPcS_DF128_St12chars_formati@@GLIBCXX_3.4.31
symbols are exported from the library, while when it is configured against
new enough glibc, those symbols aren't exported and we export instead
_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format@@GLIBCXX_IEEE128_3.4.29
_ZSt8to_charsPcS_u9__ieee128@@GLIBCXX_IEEE128_3.4.29
_ZSt8to_charsPcS_u9__ieee128St12chars_format@@GLIBCXX_IEEE128_3.4.29
_ZSt8to_charsPcS_u9__ieee128St12chars_formati@@GLIBCXX_IEEE128_3.4.29
together with various other @@GLIBCXX_IEEE128_3.4.{29,30,31} and
@@CXXABI_IEEE128_1.3.13 symbols. The idea was that those *IEEE128* symbol
versions (similarly to *LDBL* symbol versions) are optional (but if it
appears, all symbols from it up to the version of the library appears),
but the base appears always.
My _Float128 from_chars/to_chars changes unfortunately broke this.
I believe nothing really uses those symbols if libstdc++ has been
configured against old glibc, so if 13.1 wasn't already released, it might
be best to make sure they aren't exported on powerpc64le-linux.
But as they were exported, I think the best resolution for this ABI
difference is to add those 4 symbols as aliases to the
GLIBCXX_IEEE128_3.4.29 *u9__ieee128* symbols, which the following patch
does.

2023-05-03 Jakub Jelinek <jakub@redhat.com>

* src/c++17/floating_from_chars.cc
(_ZSt10from_charsPKcS0_RDF128_St12chars_format): New alias to
_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format.
* src/c++17/floating_to_chars.cc (_ZSt8to_charsPcS_DF128_): New alias to
_ZSt8to_charsPcS_u9__ieee128.
(_ZSt8to_charsPcS_DF128_St12chars_format): New alias to
_ZSt8to_charsPcS_u9__ieee128St12chars_format.
(_ZSt8to_charsPcS_DF128_St12chars_formati): New alias to
_ZSt8to_charsPcS_u9__ieee128St12chars_formati.
* config/abi/post/powerpc64le-linux-gnu/baseline_symbols.txt: Updated.