gcc.gnu.org Git - gcc.git/log

genmatch: fixup get_out_file

get_out_file did not follow the coding conventions (mixing three-space
and two-space indentation, missing linebreak before function name).

Take that as an excuse to reimplement it in a more terse manner and
rename as 'choose_output', which is hopefully more descriptive.

gcc/ChangeLog:

* genmatch.cc (get_out_file): Make static and rename to ...
(choose_output): ... this. Reimplement. Update all uses ...
(decision_tree::gen): ... here and ...
(main): ... here.

genmatch: clean up showUsage

Display usage more consistently and get rid of camelCase.

gcc/ChangeLog:

* genmatch.cc (showUsage): Reimplement as ...
(usage): ...this. Adjust all uses.
(main): Print usage when no arguments. Add missing 'return 1'.

genmatch: clean up emit_func

Eliminate boolean parameters of emit_func. The first ('open') just
prints 'extern' to generated header, which is unnecessary. Introduce a
separate function to use when finishing a declaration in place of the
second ('close').

Rename emit_func to 'fp_decl' (matching 'fprintf' in length) to unbreak
indentation in several places.

Reshuffle emitted line breaks in a few places to make generated
declarations less ugly.

gcc/ChangeLog:

* genmatch.cc (header_file): Make static.
(emit_func): Rename to...
(fp_decl): ... this. Adjust all uses.
(fp_decl_done): New function. Use it...
(decision_tree::gen): ... here and...
(write_predicate): ... here.
(main): Adjust.

aarch64: Avoid hard-coding specific register allocations

Some tests hard-coded specific allocations for temporary registers,
whereas the RA should be free to pick anything that doesn't force
unnecessary moves or spills.

gcc/testsuite/
* gcc.target/aarch64/asimd-mul-to-shl-sub.c: Allow any register
allocation for temporary results, rather than requiring specific
registers.
* gcc.target/aarch64/auto-init-padding-1.c: Likewise.
* gcc.target/aarch64/auto-init-padding-2.c: Likewise.
* gcc.target/aarch64/auto-init-padding-3.c: Likewise.
* gcc.target/aarch64/auto-init-padding-4.c: Likewise.
* gcc.target/aarch64/auto-init-padding-9.c: Likewise.
* gcc.target/aarch64/memset-corner-cases.c: Likewise.
* gcc.target/aarch64/memset-q-reg.c: Likewise.
* gcc.target/aarch64/simd/vaddlv_1.c: Likewise.
* gcc.target/aarch64/sve-neon-modes_1.c: Likewise.
* gcc.target/aarch64/sve-neon-modes_3.c: Likewise.
* gcc.target/aarch64/sve/load_scalar_offset_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
* gcc.target/aarch64/sve/pr89007-1.c: Likewise.
* gcc.target/aarch64/sve/pr89007-2.c: Likewise.
* gcc.target/aarch64/sve/store_scalar_offset_1.c: Likewise.
* gcc.target/aarch64/vadd_reduc-1.c: Likewise.
* gcc.target/aarch64/vadd_reduc-2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Allow the temporary
predicate register to be any of p4-p7, rather than requiring p4
specifically.
* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.

aarch64: Relax FP/vector register matches

There were many tests that used [0-9] to match an FP or vector register,
but that should allow any of 0-31 instead.

asm-x-constraint-1.c required s0-s7, but that's the range for "y"
rather than "x". "x" allows s0-s15.

sve/pcs/return_9.c required z2-z7 (the initial set of available
call-clobbered registers), but z24-z31 are OK too.

gcc/testsuite/
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: Allow any
FP/vector register, not just register 0-9.
* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
* gcc.target/aarch64/ldp_stp_8.c: Likewise.
* gcc.target/aarch64/ldp_stp_17.c: Likewise.
* gcc.target/aarch64/ldp_stp_21.c: Likewise.
* gcc.target/aarch64/simd/vpaddd_f64.c: Likewise.
* gcc.target/aarch64/simd/vpaddd_s64.c: Likewise.
* gcc.target/aarch64/simd/vpaddd_u64.c: Likewise.
* gcc.target/aarch64/sve/adr_1.c: Likewise.
* gcc.target/aarch64/sve/adr_2.c: Likewise.
* gcc.target/aarch64/sve/adr_3.c: Likewise.
* gcc.target/aarch64/sve/adr_4.c: Likewise.
* gcc.target/aarch64/sve/adr_5.c: Likewise.
* gcc.target/aarch64/sve/extract_1.c: Likewise.
* gcc.target/aarch64/sve/extract_2.c: Likewise.
* gcc.target/aarch64/sve/extract_3.c: Likewise.
* gcc.target/aarch64/sve/extract_4.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
* gcc.target/aarch64/sve/spill_3.c: Likewise.
* gcc.target/aarch64/vfp-1.c: Likewise.
* gcc.target/aarch64/asm-x-constraint-1.c: Allow s0-s15, not just
s0-s7.
* gcc.target/aarch64/sve/pcs/return_9.c: Allow z24-z31 as well as
z2-z7.

aarch64: Relax predicate register matches

Most governing predicate operands require p0-p7, but some
instructions also allow p8-p15. Non-gp uses of predicates
often also allow all of p0-p15.

This patch fixes up cases where we required p0-p7 unnecessarily.
In some cases we match the definition (typically a comparison,
PFALSE or PTRUE), sometimes we match the use (like a logic
instruction, MOV or SEL), and sometimes we match both.

gcc/testsuite/
* g++.target/aarch64/sve/vcond_1.C: Allow any predicate
register for the temporary results, not just p0-p7.
* gcc.target/aarch64/sve/acle/asm/dupq_b8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_b16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_b32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_b64.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_5.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_6.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_7.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_9.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_10.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilelt_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilelt_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilelt_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/vcond_2.c: Likewise.
* gcc.target/aarch64/sve/vcond_3.c: Likewise.
* gcc.target/aarch64/sve/vcond_7.c: Likewise.
* gcc.target/aarch64/sve/vcond_18.c: Likewise.
* gcc.target/aarch64/sve/vcond_19.c: Likewise.
* gcc.target/aarch64/sve/vcond_20.c: Likewise.

aarch64: Relax ordering requirements in SVE dup tests

Some of the svdup tests expand to a SEL between two constant vectors.
This patch allows the constants to be formed in either order.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/dup_s16.c: When using SEL to select
between two constant vectors, allow the constant moves to appear in
either order.
* gcc.target/aarch64/sve/acle/asm/dup_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u64.c: Likewise.

aarch64: Allow moves after tied-register intrinsics

Some ACLE intrinsics map to instructions that tie the output
operand to an input operand. If all the operands are allocated
to different registers, and if MOVPRFX can't be used, we will need
a move either before the instruction or after it. Many tests only
matched the "before" case; this patch makes them accept the "after"
case too.

gcc/testsuite/
* gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: Allow
moves to occur after the intrinsic instruction, rather than requiring
them to happen before.
* gcc.target/aarch64/advsimd-intrinsics/bfdot-1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vdot-3-1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adda_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/brka_b.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/brkb_b.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/brkn_b.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clasta_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clasta_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clasta_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clastb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clastb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clastb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pfirst_b.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pnext_b16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pnext_b32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pnext_b64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pnext_b8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_u8.c: Likewise.

aarch64: Fix move-after-intrinsic function-body tests

Some of the SVE ACLE asm tests tried to be agnostic about the
instruction order, but only one of the alternatives was exercised
in practice. This patch fixes latent typos in the other versions.

gcc/testsuite/
* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Fix expected register
allocation in the case where a move occurs after the intrinsic
instruction.
* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.

ira: Don't create copies for earlyclobbered pairs

This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a
("the previous patch").  To start by quoting that:

If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn.  Allocating the
same register to the input and the output then saves the cost of a move.

If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency.  This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.

This heuristic seems to work well in the vast majority of cases.
However, the problem fixed in the previous patch was that we
could create a copy for an operand pair even if, for all relevant
alternatives, the output and input register classes did not have
any registers in common.  It is then impossible for the output
operand to reuse the dying input register.

This left unfixed a further case where copies don't make sense:
there is no point trying to reuse the dying input register if,
for all relevant alternatives, the output is earlyclobbered and
the input doesn't match the output.  (Matched earlyclobbers are fine.)

Handling that case fixes several existing XFAILs and helps with
a follow-on aarch64 patch.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  A SPEC2017 run
on aarch64 showed no differences outside the noise.  Also, I tried
compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
per cpu directory, using the options -Os -fno-schedule-insns{,2}.
The results below summarise the tests that showed a difference in LOC:

Target               Tests   Good    Bad   Delta    Best   Worst  Median
======               =====   ====    ===   =====    ====   =====  ======
amdgcn-amdhsa           14      7      7       3     -18      10      -1
arm-linux-gnueabihf     16     15      1     -22      -4       2      -1
csky-elf                 6      6      0     -21      -6      -2      -4
hppa64-hp-hpux11.23      5      5      0      -7      -2      -1      -1
ia64-linux-gnu          16     16      0     -70     -15      -1      -3
m32r-elf                53      1     52      64      -2       8       1
mcore-elf                2      2      0      -8      -6      -2      -6
microblaze-elf         285    283      2    -909     -68       4      -1
mmix                     7      7      0   -2101   -2091      -1      -1
msp430-elf               1      1      0      -4      -4      -4      -4
pru-elf                  8      6      2     -12      -6       2      -2
rx-elf                  22     18      4     -40      -5       6      -2
sparc-linux-gnu         15     14      1     -40      -8       1      -2
sparc-wrs-vxworks       15     14      1     -40      -8       1      -2
visium-elf               2      1      1       0      -2       2      -2
xstormy16-elf            1      1      0      -2      -2      -2      -2

with other targets showing no sensitivity to the patch.  The only
target that seems to be negatively affected is m32r-elf; otherwise
the patch seems like an extremely minor but still clear improvement.

gcc/
* ira-conflicts.cc (can_use_same_reg_p): Skip over non-matching
earlyclobbers.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c: Remove XFAILs.
* gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.

c++: non-template friend of template [PR106740]

This was fixed by r13-1018, but the testcase seems needed.

PR c++/106740

gcc/testsuite/ChangeLog:

* g++.dg/template/friend78.C: New test.

Daily bump.

[x86_64] Introduce insvti_highpart define_insn_and_split.

This is a repost/respin of a patch that was conditionally approved:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html

This patch adds a convenient post-reload splitter for setting/updating
the highpart of a TImode variable, using i386's previously added
split_double_concat infrastructure.

For the new test case below:

__int128 foo(__int128 x, unsigned long long y)
{
  __int128 t = (__int128)y << 64;
  __int128 r = (x & ~0ull) | t;
  return r;
}

mainline GCC with -O2 currently generates:

foo:    movq    %rdi, %rcx
        xorl    %eax, %eax
        xorl    %edi, %edi
        orq     %rcx, %rax
        orq     %rdi, %rdx
        ret

with this patch, GCC instead now generates the much better:

foo: movq    %rdi, %rcx
        movq    %rcx, %rax
        ret

It turns out that the -m32 equivalent of this testcase, already
avoids using explict orl/xor instructions, as it gets optimized
(in combine) by a completely different path.  Given that this idiom
isn't seen in 32-bit code (so this pattern doesn't match with -m32),
and also that the shorter 32-bit AND bitmask is represented as a
CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
is implemented for just TARGET_64BIT rather than contort a "generic"
implementation using DWI mode iterators.

2023-05-08  Roger Sayle  <roger@nextmovesoftware.com>
    Uros Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
* config/i386/i386.md (any_or_plus): Move definition earlier.
(*insvti_highpart_1): New define_insn_and_split to overwrite
(insv) the highpart of a TImode register/memory.

gcc/testsuite/ChangeLog
* gcc.target/i386/insvti_highpart-1.c: New test case.

Fix cfg maintenance after inlining in AutoFDO

Todo from early_inliner needs to be propagated so that
cleanup_tree_cfg () is called if necessary.

This bug was causing an assert in get_loop_body during
ipa-sra in autoprofiledbootstrap build since loops weren't
fixed up and one of the loops had num_nodes set to 0.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* auto-profile.cc (auto_profile): Check todo from early_inline
to see if cleanup_tree_vfg needs to be called.
(early_inline): Return todo from early_inliner.

Fix pr81192.c for int16 targets

I had missed when converting this
testcase to Gimple that there was a define
for int/unsigned type specifically to get
an INT32 type. This means when using a
literal integer constant you need to use the
`_Literal (type)` to form the types correctly on the
constants.

This fixes the issue and has been both tested on
xstormy16-elf and x86_64-linux-gnu.

Committed as obvious.

gcc/testsuite/ChangeLog:

PR testsuite/109776
* gcc.dg/pr81192.c: Fix integer constants for int16 targets.

RISC-V: Factor out vector manager code in vsetvli insertion pass. [NFC]

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info):
New.
(pass_vsetvl::get_block_info): New.
(pass_vsetvl::update_vector_info): New.
(pass_vsetvl::simple_vsetvl): Use get_vector_info.
(pass_vsetvl::compute_local_backward_infos): Ditto.
(pass_vsetvl::transfer_before): Ditto.
(pass_vsetvl::transfer_after): Ditto.
(pass_vsetvl::emit_local_forward_vsetvls): Ditto.
(pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_insns): Ditto.
(pass_vsetvl::compute_local_backward_infos): Use
update_vector_info.

RISC-V: Improve portability of testcases

stdint.h will require having corresponding multi-lib existing, so using
stdint-gcc.h instead, also added a riscv_vector.h wrapper to
gcc.target/riscv/rvv/autovec/.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: Change
stdint.h to stdint-gcc.h.
* gcc.target/riscv/rvv/autovec/template-1.h: Ditto.
* gcc.target/riscv/rvv/autovec/riscv_vector.h: New.

Fix minor length computation on stormy16

Today's build of xstormy16-elf failed due to a branch to an out of range
target. Manual inspection of the assembly code for the affected function
(divdi3) showed that the zero-extension patterns were claiming a length
of 2, but clearly assembled into 4 bytes.

This patch adds an explicit length to the zero extension pattern and
appears to resolve the issue in my test builds.

gcc/

* config/stormy16/stormy16.md (zero_extendhisi2): Fix length.

libgomp C++ testsuite: Use 'lang_include_flags' instead of 'libstdcxx_includes'

With nvptx offloading configured, and supported, and CUDA available:

    $ make check-target-libgomp RUNTESTFLAGS="--all c.exp=context-1.c c++.exp=context-1.c"
    [...]
    Running [...]/libgomp.oacc-c/c.exp ...
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
    UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
    Running [...]/libgomp.oacc-c++/c++.exp ...
    PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
    PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
    PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
    PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
    UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
    [...]

..., but for 'c++.exp=context-1.c' alone, we currently get all-UNSUPPORTED:

    $ make check-target-libgomp RUNTESTFLAGS_="--all c++.exp=context-1.c"
    [...]
    Running [...]/libgomp.oacc-c++/c++.exp ...
    UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
    UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
    UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
    [...]

That is, if 'c.exp' executes first, it does successfully evaluate
'dg-require-effective-target openacc_cublas' -- and does cache this result (so
it isn't reevaluated for 'c++.exp').  However, for 'c++.exp' alone (that is,
without the 'c.exp' result cached), we run into:

    spawn -ignore SIGHUP [xgcc] [...] -x c++ openacc_cublas2311907.c [...]
    In file included from /usr/include/cuda_fp16.h:3673,
                     from /usr/include/cublas_api.h:75,
                     from /usr/include/cublas_v2.h:65,
                     from openacc_cublas2311907.c:3:
    /usr/include/cuda_fp16.hpp:67:10: fatal error: utility: No such file or directory

We're missing include paths to C++/libstdc++ build-tree headers.

Fix this by using the mechanism introduced for Fortran in
r212268 (commit f707da16f714f7fe5a42391748212c84dfec639b) re
"libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}".

libgomp/
* testsuite/libgomp.c++/c++.exp: Use 'lang_include_flags' instead
of 'libstdcxx_includes'.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.

Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'

Otherwise, for example for 'RUNTESTFLAGS' of '--target_board=unix\{-m64,-m32\}'
vs. '--target_board=unix\{-m32,-m64\}', both variants exercise testing with
always the first flag variant's 'LTO_OPTIONS'/'LTO_TORTURE_OPTIONS', which
results in unequal test results between the two 'RUNTESTFLAGS' variants if one
of the flag variants has 'check_linker_plugin_available' but the other doesn't.

Fix-up for r180245 (commit c1a7cdbbcca90ad5260bfc543f8c10f3514e76c1)
"Update testsuite to run with slim LTO".

gcc/testsuite/
* g++.dg/guality/guality.exp: Move 'torture-init' earlier.
* gcc.dg/guality/guality.exp: Likewise.
* gfortran.dg/guality/guality.exp: Likewise.
* lib/c-torture.exp (LTO_TORTURE_OPTIONS): Don't set.
* lib/gcc-dg.exp (LTO_TORTURE_OPTIONS): Don't set.
* lib/lto.exp (lto_init, lto_finish): Let each 'lto_init'
determine the default 'LTO_OPTIONS'.
* lib/torture-options.exp (torture-init, torture-finish): Let each
'torture-init' determine the 'LTO_TORTURE_OPTIONS'.

libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation

... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it.

Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609
"libgomp/nvptx: Prepare for reverse-offload callback handling",
and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8
"libgomp: Handle OpenMP's reverse offloads".

libgomp/
* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
* libgomp.h (gomp_target_rev): Adjust.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
(rev_off_host_to_dev_cpy): Remove.
(GOMP_OFFLOAD_run): Adjust.

libgm2: Remove 'autogen.sh'

... given that plain 'autoreconf' achieves the same.

libgm2/
* autogen.sh: Remove.

libgm2: Adjust 'autogen.sh' to 'ACLOCAL_AMFLAGS', and simplify

Specifying explicit '-I ..' before '-I ../config' is what (most) other GCC
components do. Specifying '-I .' is not necessary.

With the order of '-I's aligned, 'autogen.sh' and plain 'autoreconf' then
produce identical results.

libgm2/
* autogen.sh: For 'aclocal', 'autoreconf', remove '-I .',
add '-I ..'.
* Makefile.am (ACLOCAL_AMFLAGS): Remove '-I .'.
* libm2cor/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
* libm2iso/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
* libm2log/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
* libm2min/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
* libm2pim/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
* aclocal.m4: Regenerate.
* Makefile.in: Likewise.
* libm2cor/Makefile.in: Likewise.
* libm2iso/Makefile.in: Likewise.
* libm2log/Makefile.in: Likewise.
* libm2min/Makefile.in: Likewise.
* libm2pim/Makefile.in: Likewise.

c++: list CTAD and resolve_nondeduced_context [PR106214]

This extends the PR93107 fix, which made us do resolve_nondeduced_context
on the elements of an initializer list during auto deduction, to happen for
CTAD as well.

PR c++/106214
PR c++/93107

gcc/cp/ChangeLog:

* pt.cc (do_auto_deduction): Move up resolve_nondeduced_context
calls to happen before do_class_deduction. Add some
error_mark_node tests.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction114.C: New test.

Bump up precision size to 16 bits.

The new __dmr type that is being added as a possible future PowerPC instruction
set bumps into a structure field size issue.  The size of the __dmr type is 1024 bits.
The precision field in tree_type_common is currently 10 bits, so if you store
1,024 into field, you get a 0 back.  When you get 0 in the precision field, the
ccp pass passes this 0 to sext_hwi in hwint.h.  That function in turn generates
a shift that is equal to the host wide int bit size, which is undefined as
machine dependent for shifting in C/C++.

      int shift = HOST_BITS_PER_WIDE_INT - prec;
      return ((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) src << shift)) >> shift;

It turns out the x86_64 where I first did my tests returns the original input
before the two shifts, while the PowerPC always returns 0.  In the ccp pass, the
original input is -1, and so it worked.  When I did the runs on the PowerPC, the
result was 0, which ultimately led to the failure.

2023-02-01  Richard Biener  <rguenther@suse.de>
    Michael Meissner  <meissner@linux.ibm.com>

PR middle-end/108623
* tree-core.h (tree_type_common): Bump up precision field to 16 bits.
Align bit fields > 1 bit to at least an 8-bit boundary.

fortran: Fix coding style around free()

Fix coding-style errors introduced in ca2f64d5d08c1699ca4b7cb2bf6a76692e809e0f

gcc/fortran/ChangeLog:

* resolve.cc (resolve_select_type): Fix coding style.

libgfortran/ChangeLog:

* caf/single.c (_gfortran_caf_register): Fix coding style.
* io/async.c (update_pdt, async_io): Likewise.
* io/format.c (free_format_data): Likewise.
* io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise.
* io/unix.c (mem_close): Likewise.

PHIOPT: factor out unary operations instead of just conversions

After using factor_out_conditional_conversion with diamond bb,
we should be able do use it also for all normal unary gimple and not
just conversions. This allows to optimize PR 59424 for an example.
This is also a start to optimize PR 64700 and a few others.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

An example of this is:
```
static inline unsigned long long g(int t)
{
  unsigned t1 = t;
  return t1;
}
static int abs1(int a)
{
  if (a < 0)
    a = -a;
  return a;
}
unsigned long long f(int c, int d, int e)
{
  unsigned long long t;
  if (d > e)
    t = g(abs1(d));
  else
    t = g(abs1(e));
  return t;
}
```

Which should be optimized to:
  _9 = MAX_EXPR <d_5(D), e_6(D)>;
  _4 = ABS_EXPR <_9>;
  t_3 = (long long unsigned intD.16) _4;

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ...
(factor_out_conditional_operation): This and add support for all unary
operations.
(pass_phiopt::execute): Update call to factor_out_conditional_conversion
to call factor_out_conditional_operation instead.

PR tree-optimization/109424
PR tree-optimization/59424

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/abs-2.c: Update tree scan for
details change in wording.
* gcc.dg/tree-ssa/minmax-17.c: Likewise.
* gcc.dg/tree-ssa/pr103771.c: Likewise.
* gcc.dg/tree-ssa/minmax-18.c: New test.
* gcc.dg/tree-ssa/minmax-19.c: New test.

PHIOPT: Loop over calling factor_out_conditional_conversion

After adding diamond shaped bb support to factor_out_conditional_conversion,
we can get a case where we have two conversions that needs factored out
and then would have another phiopt happen.
An example is:
```
static inline unsigned long long g(int t)
{
  unsigned t1 = t;
  return t1;
}
unsigned long long f(int c, int d, int e)
{
  unsigned long long t;
  if (c > d)
    t = g(c);
  else
    t = g(d);
  return t;
}
```
In this case we should get a MAX_EXPR in phiopt1 with two casts.
Before this patch, we would just factor out the outer cast and then
wait till phiopt2 to factor out the inner cast.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (pass_phiopt::execute): Loop
over factor_out_conditional_conversion.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-17.c: New test.

PHIOPT: Add diamond bb form to factor_out_conditional_conversion

So the function factor_out_conditional_conversion already supports
diamond shaped bb forms, just need to be called for such a thing.

harden-cond-comp.c needed to be changed as we would optimize out the
conversion now and that causes the compare hardening not needing to
split the block which it was testing. So change it such that there
would be no chance of optimization.

Also add two testcases that showed the improvement. PR 103771 is
solved in ifconvert also for the vectorizer but now it is solved
in a general sense.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/49959
PR tree-optimization/103771

gcc/ChangeLog:

* tree-ssa-phiopt.cc (pass_phiopt::execute): Support
Diamond shapped bb form for factor_out_conditional_conversion.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/harden-cond-comp.c: Change testcase
slightly to avoid the new phiopt optimization.
* gcc.dg/tree-ssa/abs-2.c: New test.
* gcc.dg/tree-ssa/pr103771.c: New test.

RISC-V: Fix ugly && incorrect codes of RVV auto-vectorization

1. Add movmisalign pattern for TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
   targethook, current RISC-V has supported this target hook, we can't make
   it supported without movmisalign pattern.

2. Remove global extern of get_mask_policy_no_pred && get_tail_policy_no_pred.
   These 2 functions are comming from intrinsic builtin frameworks.
   We are sure we don't need them in auto-vectorization implementation.

3. Refine mask mode implementation.

4. We should not have "riscv_vector_" in riscv_vector namspace since it
   makes the codes inconsistent and ugly.

   For example:
   Before this patch:
   static opt_machine_mode
   riscv_get_mask_mode (machine_mode mode)
   {
     machine_mode mask_mode = VOIDmode;
     if (TARGET_VECTOR && riscv_vector::riscv_vector_get_mask_mode (mode).exists (&mask_mode))
      return mask_mode;
   ..

   After this patch:
   riscv_get_mask_mode (machine_mode mode)
   {
     machine_mode mask_mode = VOIDmode;
     if (TARGET_VECTOR && riscv_vector::get_mask_mode (mode).exists (&mask_mode))
      return mask_mode;
   ..

5. Fix fail testcase fixed-vlmax-1.c.

gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign<mode>): New pattern.
* config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): Delete.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(get_mask_mode): New function.
* config/riscv/riscv-v.cc (get_mask_policy_no_pred): Delete.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_mode): New function.
* config/riscv/riscv-vector-builtins.cc (use_real_merge_p): Remove
global extern.
(get_tail_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred): Ditto.
(get_mask_policy_for_pred): Ditto
* config/riscv/riscv.cc (riscv_get_mask_mode): Refine codes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix typo.

RISC-V: Handle multi-lib path correclty for linux

RISC-V Linux encodes the ABI into the path, so in theory, we can only use that
to select multi-lib paths, and no way to use different multi-lib paths between
`rv32i/ilp32` and `rv32ima/ilp32`, we'll mapping both to `/lib/ilp32`.

It's hard to do that with GCC's builtin multi-lib selection mechanism; builtin
mechanism did the option string compare and then enumerate all possible reuse
rules during the build time. However, it's impossible to RISC-V; we have a huge
number of combinations of `-march`, so implementing a customized multi-lib
selection becomes the only solution.

Multi-lib configuration is only used for determines which ISA should be used
when compiling the corresponding ABI variant after this patch.

During the multi-lib selection stage, only consider -mabi as the only key to
select the multi-lib path.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_select_multilib_by_abi): New.
(riscv_select_multilib): New.
(riscv_compute_multilib): Extract logic to riscv_select_multilib and
also handle select_by_abi.
* config/riscv/elf.h (RISCV_USE_CUSTOMISED_MULTI_LIB): Change it
to select_by_abi_arch_cmodel from 1.
* config/riscv/linux.h (RISCV_USE_CUSTOMISED_MULTI_LIB): Define.
* config/riscv/riscv-opts.h (enum riscv_multilib_select_kind): New.

Makefile.in: clean up match.pd-related dependencies

Clean up confusing changes from the recent refactoring for
parallel match.pd build.

gimple-match-head.o is not built. Remove related flags adjustment.

Autogenerated gimple-match-N.o files do not depend on
gimple-match-exports.cc.

{gimple,generic)-match-auto.h only depend on the prerequisites of the
corresponding s-{gimple,generic}-match stamp file, not any .cc file.

gcc/ChangeLog:

* Makefile.in: (gimple-match-head.o-warn): Remove.
(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
gimple-match-exports.cc.
(gimple-match-auto.h): Only depend on s-gimple-match.
(generic-match-auto.h): Likewise.

Move substitute_and_fold over to use simple_dce_from_worklist

While looking into a different issue, I noticed that it
would take until the second forwprop pass to do some
forward proping and it was because the ssa name was
used more than once but the second statement was
"dead" and we don't remove that until much later.

So this uses simple_dce_from_worklist instead of manually
removing of the known unused statements instead.
Propagate engine does not do a cleanupcfg afterwards either but manually
cleans up possible EH edges so simple_dce_from_worklist
needs to communicate that back to the propagate engine.

Some testcases needed to be updated/changed even because of better optimization.
gcc.dg/pr81192.c even had to be changed to be using the gimple FE so it would
be less fragile in the future too.
gcc.dg/tree-ssa/pr98737-1.c was failing because __atomic_fetch_ was being matched
but in those cases, the result was not being used so both __atomic_fetch_ and
__atomic_x_and_fetch_ are valid choices and would not make a code generation difference.
evrp7.c, evrp8.c, vrp35.c, vrp36.c: just needed a slightly change as the removal message
is different slightly.
kernels-alias-8.c: ccp1 is able to remove an unused load which causes ealias to have
one less load to analysis so update the expected scan #.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/109691
* tree-ssa-dce.cc (simple_dce_from_worklist): Add need_eh_cleanup
argument.
If the removed statement can throw, have need_eh_cleanup
include the bb of that statement.
* tree-ssa-dce.h (simple_dce_from_worklist): Update declaration.
* tree-ssa-propagate.cc (struct prop_stats_d): Remove
num_dce.
(substitute_and_fold_dom_walker::substitute_and_fold_dom_walker):
Initialize dceworklist instead of stmts_to_remove.
(substitute_and_fold_dom_walker::~substitute_and_fold_dom_walker):
Destore dceworklist instead of stmts_to_remove.
(substitute_and_fold_dom_walker::before_dom_children):
Set dceworklist instead of adding to stmts_to_remove.
(substitute_and_fold_engine::substitute_and_fold):
Call simple_dce_from_worklist instead of poping
from the list.
Don't update the stat on removal statements.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/evrp7.c: Update for output change.
* gcc.dg/tree-ssa/evrp8.c: Likewise.
* gcc.dg/tree-ssa/vrp35.c: Likewise.
* gcc.dg/tree-ssa/vrp36.c: Likewise.
* gcc.dg/tree-ssa/pr98737-1.c: Update scan-tree-dump-not
to check for assignment too instead of just a call.
* c-c++-common/goacc/kernels-alias-8.c: Update test
for removal of load.
* gcc.dg/pr81192.c: Rewrite testcase in gimple based test.

fortran: Remove conditionals around free()

gcc/fortran/ChangeLog:

* resolve.cc (resolve_select_type): Call free() unconditionally.

libgfortran/ChangeLog:

* caf/single.c (_gfortran_caf_register): Call free() unconditionally.
* io/async.c (update_pdt, async_io): Likewise.
* io/format.c (free_format_data): Likewise.
* io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise.
* io/unix.c (mem_close): Likewise.

Fortran: Fix mpz and mpfr memory leaks [PR fortran/68800]

gcc/fortran/ChangeLog:

PR fortran/68800
* expr.cc (find_array_section): Fix mpz memory leak.
* simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in
error paths.

Fortran: Reject semicolon after namelist name.

PR fortran/109662

libgfortran/ChangeLog:

* io/list_read.c: Add check for a semicolon after a namelist
name in read input. Issue a runtime error message.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr109662-a.f90: New test.

Daily bump.

c++: fix pretty printing of 'alignof' vs '__alignof__' [PR85979]

PR c++/85979

gcc/cp/ChangeLog:

* cxx-pretty-print.cc (cxx_pretty_printer::unary_expression)
<case ALIGNOF_EXPR>: Consider ALIGNOF_EXPR_STD_P.
* error.cc (dump_expr) <case ALIGNOF_EXPR>: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/alignof4.C: New test.

c++: goto entering scope of obj w/ non-trivial dtor [PR103091]

It seems ever since DR 2256 goto is permitted to cross the initialization
of a trivially initialized object with a non-trivial destructor. We
already supported this as an -fpermissive extension, so this patch just
makes us unconditionally support this.

DR 2256
PR c++/103091

gcc/cp/ChangeLog:

* decl.cc (decl_jump_unsafe): Return bool instead of int.
Don't consider TYPE_HAS_NONTRIVIAL_DESTRUCTOR.
(check_previous_goto_1): Simplify now that decl_jump_unsafe
returns bool instead of int.
(check_goto): Likewise.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.other/init9.C: Don't expect diagnostics for
goto made valid by DR 2256.
* g++.dg/init/goto4.C: New test.

c++: satisfaction of non-dep member alias template-id

constraints_satisfied_p already carefully checks dependence of template
arguments before proceeding with satisfaction, so the dependence check
in instantiate_alias_template is unnecessary and overly conservative.
Getting rid of it allows us to check satisfaction ahead of time in more
cases as in the below testcase.

gcc/cp/ChangeLog:

* pt.cc (instantiate_alias_template): Exit early upon
error from coerce_template_parms. Remove dependence test
guarding constraints_satisfied_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-alias6.C: New test.

c++: various code cleanups

* Harden some tree accessor macros and fix a couple of bad
  PLACEHOLDER_TYPE_CONSTRAINTS accesses uncovered by this.
* Use strip_innermost_template_args in outer_template_args.
* Add !processing_template_decl early exit tests to some dependence
  predicates.

gcc/cp/ChangeLog:

* cp-tree.h (PLACEHOLDER_TYPE_CONSTRAINTS_INFO): Harden via
TEMPLATE_TYPE_PARM_CHECK.
(TPARMS_PRIMARY_TEMPLATE): Harden via TREE_VEC_CHECK.
(TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL): Harden via
TEMPLATE_TEMPLATE_PARM_CHECK.
* cxx-pretty-print.cc (cxx_pretty_printer::simple_type_specifier):
Guard PLACEHOLDER_TYPE_CONSTRAINTS access.
* error.cc (dump_type) <case TEMPLATE_TYPE_PARM>: Use separate
variable to store CLASS_PLACEHOLDER_TEMPLATE result.
* pt.cc (outer_template_args): Use strip_innermost_template_args.
(any_type_dependent_arguments_p): Exit early if
!processing_template_decl.  Use range-based for.
(any_dependent_template_arguments_p): Likewise.

c++: parenthesized -> resolving to static data member [PR98283]

Here we're neglecting to propagate parenthesized-ness when the
member access (this->m) resolves to a static data member (and
thus finish_class_member_access_expr yields a VAR_DECL instead
of a COMPONENT_REF).

PR c++/98283

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build) <case COMPONENT_REF>: Propagate
REF_PARENTHESIZED_P more generally via force_paren_expr.
* semantics.cc (force_paren_expr): Document default argument.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/paren6.C: New test.

c++: bound ttp in lambda function type [PR109651]

After r14-11-g2245459c85a3f4 we now coerce the template arguments of a
bound ttp again after level-lowering it.  Notably a level-lowered ttp
doesn't have DECL_CONTEXT set, so during this coercion we fall back to
using current_template_parms to obtain the relevant set of in-scope
parameters.

But it turns out current_template_parms isn't properly set when
substituting the function type of a generic lambda, and so if the type
contains bound ttps that need to be lowered we'll crash during their
attempted coercion.  Specifically in the first testcase below,
current_template_parms during the lambda type substitution (with T=int)
is "1 U" instead of the expected "2 TT, 1 U", and we crash when level
lowering TT<int>.

Ultimately the problem is that tsubst_lambda_expr does things in the
wrong order: we ought to substitute (and install) the in-scope template
parameters _before_ substituting anything that may use those template
parameters (such as the function type of a generic lambda).  This patch
corrects this substitution order.

PR c++/109651

gcc/cp/ChangeLog:

* pt.cc (coerce_template_args_for_ttp): Mention we can hit the
current_template_parms fallback when level-lowering a bound ttp.
(tsubst_template_decl): Add lambda_tparms parameter.  Prefer to
use lambda_tparms instead of substituting DECL_TEMPLATE_PARMS.
(tsubst_decl) <case TEMPLATE_DECL>: Pass NULL_TREE as lambda_tparms
to tsubst_template_decl.
(tsubst_lambda_expr): For a generic lambda, substitute
DECL_TEMPLATE_PARMS and set current_template_parms to it
before substituting the function type.  Pass the substituted
DECL_TEMPLATE_PARMS as lambda_tparms to tsubst_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic-ttp1.C: New test.
* g++.dg/cpp2a/lambda-generic-ttp2.C: New test.

Fix aarch64/109762: push_options/push_options does not work sometimes

aarch64_isa_flags (and aarch64_asm_isa_flags) are both aarch64_feature_flags (uint64_t)
but since r12-8000-g14814e20161d, they are saved/restored as unsigned long. This
does not make a difference for LP64 targets but on ILP32 and LLP64IL32 targets,
it means it does not get restored correctly.
This patch changes over to use aarch64_feature_flags instead of unsigned long.

Committed as obvious after a bootstrap/test.

gcc/ChangeLog:

PR target/109762
* config/aarch64/aarch64-builtins.cc (aarch64_simd_switcher::aarch64_simd_switcher):
Change argument type to aarch64_feature_flags.
* config/aarch64/aarch64-protos.h (aarch64_simd_switcher): Change
constructor argument type to aarch64_feature_flags.
Change m_old_asm_isa_flags to be aarch64_feature_flags.

c++: non-dep init folding and access checking [PR109480]

enforce_access currently checks processing_template_decl to decide
whether to defer the given access check until instantiation time.
But using this flag is unreliable because it gets cleared during e.g.
non-dependent initializer folding, and so can lead to premature access
check failures as in the below testcase. It seems better to check
current_template_parms instead.

PR c++/109480

gcc/cp/ChangeLog:

* semantics.cc (enforce_access): Check current_template_parms
instead of processing_template_decl when deciding whether to
defer the access check.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent25a.C: New test.

c++: potentiality of templated memfn call [PR109480]

Here we're incorrectly deeming the templated call a.g() inside b's
initializer as potentially constant, despite g being non-constexpr,
which leads to us needlessly instantiating the initializer ahead of time
and which subsequently triggers a bug in access checking deferral (to be
fixed by the follow-up patch).

This patch fixes this by calling get_fns earlier during CALL_EXPR
potentiality checking so that when we extract a FUNCTION_DECL out of a
templated member function call (whose overall callee is typically a
COMPONENT_REF) we do the usual constexpr-eligibility checking for it.

In passing, I noticed the nearby special handling of the object argument
of a non-static member function call is effectively the same as the
generic argument handling a few lines below.  So this patch just gets
rid of this special handling; otherwise we'd have to adapt it to handle
templated versions of such calls.

PR c++/109480

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) <case CALL_EXPR>:
Reorganize to call get_fns sooner.  Remove special handling of
the object argument of a non-static member function call.  Remove
dead store to 'fun'.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept59.C: Make e() constexpr so that the
expected "without object" diagnostic isn't replaced by a
"call to non-constexpr function" diagnostic.
* g++.dg/template/non-dependent25.C: New test.

rs6000: Load high and low part of 64bit constant independently

Compare with previous version, this patch updates the comments only.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608293.html

For a complicate 64bit constant, below is one instruction-sequence to
build:
lis 9,0x800a
ori 9,9,0xabcd
sldi 9,9,32
oris 9,9,0xc167
ori 9,9,0xfa16

while we can also use below sequence to build:
lis 9,0xc167
lis 10,0x800a
ori 9,9,0xfa16
ori 10,10,0xabcd
rldimi 9,10,32,0
This sequence is using 2 registers to build high and low part firstly,
and then merge them.

In parallel aspect, this sequence would be faster. (Ofcause, using 1 more
register with potential register pressure).

The instruction sequence with two registers for parallel version can be
generated only if can_create_pseudo_p. Otherwise, the one register
sequence is generated.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Generate
more parallel code if can_create_pseudo_p.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/parall_5insn_const.c: New test.

Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.

Following up on posts/reviews by Segher and Uros, there's some question
over why the middle-end's lower subreg pass emits a clobber (of a
multi-word register) into the instruction stream before emitting the
sequence of moves of the word-sized parts.  This clobber interferes
with (LRA) register allocation, preventing the multi-word pseudo to
remain in the same hard registers.  This patch eliminates this
(presumably superfluous) clobber and thereby improves register allocation.

A concrete example of the observed improvement is PR target/43644.
For the test case:
__int128 foo(__int128 x, __int128 y) { return x+y; }

on x86_64-pc-linux-gnu, gcc -O2 currently generates:

foo: movq    %rsi, %rax
        movq    %rdi, %r8
        movq    %rax, %rdi
        movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %r8, %rax
        adcq    %rdi, %rdx
        ret

with this patch, we now generate the much improved:

foo: movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
        ret

2023-05-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/43644
* lower-subreg.cc (resolve_simple_move): Don't emit a clobber
immediately before moving a multi-word register by parts.

gcc/testsuite/ChangeLog
PR target/43644
* gcc.target/i386/pr43644.c: New test case.

Daily bump.

Delete duplicated riscv definition.

gcc/
* config/riscv/riscv-v.cc (riscv_vector_preferred_simd_mode): Delete.

RISC-V: autovec: Verify that GET_MODE_NUNITS is a multiple of 2.

While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

gcc/
* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.

RISC-V:autovec: Add target vectorization hooks

gcc/
* config/riscv/riscv.cc
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(riscv_support_vector_misalignment): Implement
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): Ditto.

Remove duplicated definition in risc-v vector support.

gcc/

* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Remove
duplicate definition.

RISC-V:autovec: Add auto-vectorization support functions

* config/riscv/riscv-v.cc (autovec_use_vlmax_p): New function.
(riscv_vector_preferred_simd_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.

RISC-V: autovec: Export policy functions to global scope

gcc/
* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.

RISC-V: autovec: Add new predicates and function prototypes

gcc/
* config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): New.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.

LoongArch: Enable shrink wrapping

This commit implements the target macros for shrink wrapping of function
prologues/epilogues shrink wrapping on LoongArch.

Bootstrapped and regtested on loongarch64-linux-gnu. I don't have an
access to SPEC CPU so I hope the reviewer can perform a benchmark to see
if there is real benefit.

gcc/ChangeLog:

* config/loongarch/loongarch.h (struct machine_function): Add
reg_is_wrapped_separately array for register wrapping
information.
* config/loongarch/loongarch.cc
(loongarch_get_separate_components): New function.
(loongarch_components_for_bb): Likewise.
(loongarch_disqualify_components): Likewise.
(loongarch_process_components): Likewise.
(loongarch_emit_prologue_components): Likewise.
(loongarch_emit_epilogue_components): Likewise.
(loongarch_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
(TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
(TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
(loongarch_for_each_saved_reg): Skip registers that are wrapped
separately.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/shrink-wrap.c: New test.

build: Use -nostdinc generating macro_list [PR109522]

This prevents a spurious message building a cross-compiler when target
libc is not installed yet:

cc1: error: no include path in which to search for stdc-predef.h

As stdc-predef.h was added to define __STDC_* macros by libc, it's
unlikely the header will ever contain some bad definitions w/o "__"
prefix so it should be safe.

gcc/ChangeLog:

PR other/109522
* Makefile.in (s-macro_list): Pass -nostdinc to
$(GCC_FOR_TARGET).

RISC-V: Enable basic RVV auto-vectorization support.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (preferred_simd_mode): New function.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto.
(preferred_simd_mode): Ditto.
* config/riscv/riscv.cc (riscv_get_arg_info): Handle RVV type in function arg.
(riscv_convert_vector_bits): Adjust for RVV auto-vectorization.
(riscv_preferred_simd_mode): New function.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New target hook support.
* config/riscv/vector.md: Add autovec.md.
* config/riscv/autovec.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add testcases for RVV auto-vectorization.
* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/scalable-1.c: New test.
* gcc.target/riscv/rvv/autovec/template-1.h: New test.
* gcc.target/riscv/rvv/autovec/v-1.c: New test.
* gcc.target/riscv/rvv/autovec/v-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.

libffi: fix handling of homogeneous float128 structs (#689)

If there is a homogeneous struct with float128 members, they should be
copied to vector register save area. The current code incorrectly copies
only the value of the first member, not increasing the pointer with each
iteration. Fix this.

Merged from upstream libffi commit: 464b4b66e3cf3b5489e730c1466ee1bf825560e0

2023-05-03 Dan Horák <dan@danny.cz>

libffi/
PR libffi/109447
* src/powerpc/ffi_linux64.c (ffi_prep_args64): Update arg.f128 pointer.

Fortran: Namelist read with invalid input accepted.

PR fortran/109662

libgfortran/ChangeLog:

* io/list_read.c: Add a check for a comma after a namelist
name in read input. Issue a runtime error message.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr109662.f90: New test.

gimple-range-op: Improve handling of sin/cos ranges

Similarly to the earlier sqrt patch, this patch attempts to improve
sin/cos ranges.  As the functions are periodic, for the reverse range
there is not much we can do (but I've discovered I forgot to take
into account the boundary ulps for the discovery of impossible result
ranges).  For fold_range, we can do something only if the range is
narrow enough (narrower than 2*pi).  The patch computes the value of
the functions (taking ulps into account) and also computes the derivative
to find out if the function is growing or declining on the boundaries and
from that it figures out if the result range should be
[min (fn (lb), fn (ub)), max (fn (lb), fn (ub))] or if it needs to be
extended to 1 (actually using +Inf) and/or -1 (actually using -Inf) because
there must be a local minimum and/or maximum in the range.

2023-05-06  Jakub Jelinek  <jakub@redhat.com>

* real.h (dconst_pi): Define.
(dconst_e_ptr): Formatting fix.
(dconst_pi_ptr): Declare.
* real.cc (dconst_pi_ptr): New function.
* gimple-range-op.cc (cfn_sincos::fold_range): Intersect the generic
boundaries range with range computed from sin/cos of the particular
bounds if the argument range is shorter than 2*pi.
(cfn_sincos::op1_range): Take bulps into account when determining
which result ranges are always invalid or behave like known NAN.

* gcc.dg/tree-ssa/range-sincos-2.c: New test.

Remove type from vrange_storage::equal_p.

The equal_p method in vrange_storage is only used to compare ranges
that are the same type. No sense passing the type if it can be
determined from the range being compared.

gcc/ChangeLog:

* gimple-range-cache.cc (sbr_sparse_bitmap::set_bb_range): Do not
pass type to vrange_storage::equal_p.
* value-range-storage.cc (vrange_storage::equal_p): Remove type.
(irange_storage::equal_p): Same.
(frange_storage::equal_p): Same.
* value-range-storage.h (class frange_storage): Same.

RISC-V: Fix incorrect demand info merge in local vsetvli optimization [PR109748]

This patch is fixing my recent optimization patch:
https://github.com/gcc-mirror/gcc/commit/d51f2456ee51bd59a79b4725ca0e488c25260bbf

In that patch, the new_info = parse_insn (i) is not correct.
Since consider the following case:

vsetvli a5,a4, e8,m1
..
vsetvli zero,a5, e32, m4
vle8.v
vmacc.vv
...

Since we have backward demand fusion in Phase 1, so the real demand of "vle8.v" is e32, m4.
However, if we use parse_insn (vle8.v) = e8, m1 which is not correct.

So this patch we change new_info = new_info.parse_insn (i)
into:

vector_insn_info new_info = m_vector_manager->vector_insn_infos[i->uid ()];

So that, we can correctly optimize codes into:

vsetvli a5,a4, e32, m4
..
.. (vsetvli zero,a5, e32, m4 is removed)
vle8.v
vmacc.vv

Since m_vector_manager->vector_insn_infos is the member variable of pass_vsetvl class.
We remove static void function "local_eliminate_vsetvl_insn", and make it as the member function
of pass_vsetvl class.

PR target/109748

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Remove it.
(pass_vsetvl::local_eliminate_vsetvl_insn): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109748.c: New test.

Canonicalize vec_merge when mask is constant.

Use swap_communattive_operands_p for canonicalization. When both value
has same operand precedence value, then first bit in the mask should
select first operand.

The canonicalization should help backends for pattern match. .i.e. x86
backend has lots of vec_merge patterns, combine will create any form
of vec_merge(mask, or inverted mask), then backend need to add 2
patterns to match exact 1 instruction. The canonicalization can
simplify 2 patterns to 1.

gcc/ChangeLog:

* combine.cc (maybe_swap_commutative_operands): Canonicalize
vec_merge when mask is constant.
* doc/md.texi: Document vec_merge canonicalization.

gimple-range-op: Improve handling of sqrt ranges

The previous patch just added basic intrinsic ranges for sqrt
([-0.0, +Inf] +-NAN being the general result range of the function
and [-0.0, +Inf] the general operand range if result isn't NAN etc.),
the following patch intersects those ranges with particular range
computed from argument or result's exact range with the expected
error in ulps taken into account and adds a function (frange_arithmetic
variant) which can be used by other functions as well as helper.

2023-05-06 Jakub Jelinek <jakub@redhat.com>

* value-range.h (frange_arithmetic): Declare.
* range-op-float.cc (frange_arithmetic): No longer static.
* gimple-range-op.cc (frange_mpfr_arg1): New function.
(cfn_sqrt::fold_range): Intersect the generic boundaries range
with range computed from sqrt of the particular bounds.
(cfn_sqrt::op1_range): Intersect the generic boundaries range
with range computed from squared particular bounds.

* gcc.dg/tree-ssa/range-sqrt-2.c: New test.

build: Replace seq for portability with GNU Make variant

Some hosts like AIX don't have seq command, this patch replaces it
with something that uses just GNU make features we've been using
for this already before for the parallel make check.

2023-05-06 Jakub Jelinek <jakub@redhat.com>

* Makefile.in (check_p_numbers): Rename to one_to_9999, move
earlier with helper variables also renamed.
(MATCH_SPLUT_SEQ): Use $(wordlist 1,$(NUM_MATCH_SPLITS),$(one_to_9999))
instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
(check_p_subdirs): Use $(one_to_9999) instead of $(check_p_numbers).

Daily bump.

CRIS: peephole2 an add into two addq or subq

Unfortunately, doesn't cause a performance improvement for coremark,
but happens a few times in newlib, just enough to affect coremark
0.01% by size (or 4 bytes, and three cycles (__fwalk_sglue and
__vfiprintf_r each two bytes).

gcc:
* config/cris/cris.md (splitop): Add PLUS.
* config/cris/cris.cc (cris_split_constant): Also handle
PLUS when a split into two insns may be useful.

gcc/testsuite:
* gcc.target/cris/peep2-addsplit1.c: New test.

CRIS: peephole2 a move of constant followed by and of same register

While moves of constants into registers are separately
optimizable, a combination of a move with a subsequent "and"
is slightly preferable even if the move can be generated
with the same number (and timing) of insns, as moves of
"just" registers are eliminated now and then in different
passes, loosely speaking.  This movandsplit1 pattern feeds
into the opsplit1/AND peephole2, with matching occurrences
observed in the floating point functions in libgcc.  Also, a
test-case to fit.  Coremark improvements are unimpressive:
less than 0.0003% speed, 0.1% size.

But that was pre-LRA; after the switch to LRA this peephole2
doesn't match anymore (for any of coremark, local tests,
libgcc and newlib libc) and the test-case passes with and
without the patch.  Still, there's no apparent reason why
LRA prefers "move R1,R2" "and I,R2" to "move I,R1" "and
R1,R2", or why that wouldn't "randomly" change (also seen
with other operations than "and").  Thus committed.

gcc:
* config/cris/cris.md (movandsplit1): New define_peephole2.

gcc/testsuite:
* gcc.target/cris/peep2-movandsplit1.c: New test.

CRIS: peephole2 a lsrq into a lslq+lsrq pair

Observed after opsplit1 with AND in libgcc floating-point
functions, like the first spottings of opsplit1/AND
opportunities. Two patterns are nominally needed, as the
peephole2 optimizer continues from the *first replacement*
insn, not from a minimum context for general matching; one
that includes it as the last match.

But, the "free-standing" opportunity (three shifts) didn't
match by itself in a gcc build of libraries plus running the
test-suite, and thus deemed uninteresting and left out.
(As expected; if it had matched, that'd have indicated a
previously missed optimization or other problem elsewhere.)
Only the one that includes the previous define_peephole2
that may generate the sequence (i.e. opsplit1/AND), matches
easily.

Coremark results aren't impressive though: 0.003%
improvement in speed and slightly less than 0.1% in size.

A testcase is added to match and another one to cover a case
of movulsr checking that it's used; it's preferable to
lsrandsplit when both would match.

gcc:
* config/cris/cris.md (lsrandsplit1): New define_peephole2.

gcc/testsuite:
* gcc.target/cris/peep2-lsrandsplit1.c,
gcc.target/cris/peep2-movulsr2.c: New tests.

doc: Document order of define_peephole2 scanning

I was a bit surprised when my newly-added define_peephole2 didn't
match, but it was because it was expected to partially match the
generated output of a previous define_peephole2, which matched and
modified the last insn of a sequence to be matched. I had assumed
that the algorithm backed-up the size of the match-buffer, thereby
exposing newly created opportunities *with sufficient context* to all
define_peephole2's. While things can change in that direction, let's
start with documenting the current state.

* doc/md.texi (define_peephole2): Document order of scanning.

Fortran: overloading of intrinsic binary operators [PR109641]

Fortran allows overloading of intrinsic operators also for operands of
numeric intrinsic types. The intrinsic operator versions are used
according to the rules of F2018 table 10.2 and imply type conversion as
long as the operand ranks are conformable. Otherwise no type conversion
shall be performed to allow the resolution of a matching user-defined
operator.

gcc/fortran/ChangeLog:

PR fortran/109641
* arith.cc (eval_intrinsic): Check conformability of ranks of operands
for intrinsic binary operators before performing type conversions.
* gfortran.h (gfc_op_rank_conformable): Add prototype.
* resolve.cc (resolve_operator): Check conformability of ranks of
operands for intrinsic binary operators before performing type
conversions.
(gfc_op_rank_conformable): New helper function to compare ranks of
operands of binary operator.

gcc/testsuite/ChangeLog:

PR fortran/109641
* gfortran.dg/overload_5.f90: New test.

RISC-V: Legitimise the const0_rtx for RVV indexed load/store

This patch try to legitimise the const0_rtx (aka zero register)
as the base register for the RVV indexed load/store instructions
by allowing the const as the operand of the indexed RTL pattern.
Then the underlying combine pass will try to perform the const
propagation.

For example:
vint32m1_t
test_vluxei32_v_i32m1_shortcut (vuint32m1_t bindex, size_t vl)
{
  return __riscv_vluxei32_v_i32m1 ((int32_t *)0, bindex, vl);
}

Before this patch:
li         a5,0                 <- can be eliminated.
vl1re32.v  v1,0(a1)
vsetvli    zero,a2,e32,m1,ta,ma
vluxei32.v v1,(a5),v1           <- can propagate the const 0 to a5 here.
vs1r.v     v1,0(a0)
ret

After this patch:
test_vluxei32_v_i32m1_shortcut:
vl1re32.v       v1,0(a1)
vsetvli zero,a2,e32,m1,ta,ma
vluxei32.v      v1,(0),v1
vs1r.v  v1,0(a0)
ret

As above, this patch allow you to propagaate the const 0 (aka zero
register) to the base register of the RVV indexed load in the combine
pass. This may benefit the underlying RVV auto-vectorization.

gcc/ChangeLog:

* config/riscv/vector.md: Allow const as the operand of RVV
indexed load/store.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c:
Adjust indexed load/store check condition.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
simplification by adding one macro in riscv for simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1                  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
consumed by simplify_rtx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
Adjust test check condition.

arm: [MVE intrinsics] rework vshrq vrshrq

Implement vshrq and vrshrq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vrshrq, vshrq): New.
* config/arm/arm-mve-builtins-base.def (vrshrq, vshrq): New.
* config/arm/arm-mve-builtins-base.h (vrshrq, vshrq): New.
* config/arm/arm_mve.h (vshrq): Remove.
(vrshrq): Remove.
(vrshrq_m): Remove.
(vshrq_m): Remove.
(vrshrq_x): Remove.
(vshrq_x): Remove.
(vshrq_n_s8): Remove.
(vshrq_n_s16): Remove.
(vshrq_n_s32): Remove.
(vshrq_n_u8): Remove.
(vshrq_n_u16): Remove.
(vshrq_n_u32): Remove.
(vrshrq_n_u8): Remove.
(vrshrq_n_s8): Remove.
(vrshrq_n_u16): Remove.
(vrshrq_n_s16): Remove.
(vrshrq_n_u32): Remove.
(vrshrq_n_s32): Remove.
(vrshrq_m_n_s8): Remove.
(vrshrq_m_n_s32): Remove.
(vrshrq_m_n_s16): Remove.
(vrshrq_m_n_u8): Remove.
(vrshrq_m_n_u32): Remove.
(vrshrq_m_n_u16): Remove.
(vshrq_m_n_s8): Remove.
(vshrq_m_n_s32): Remove.
(vshrq_m_n_s16): Remove.
(vshrq_m_n_u8): Remove.
(vshrq_m_n_u32): Remove.
(vshrq_m_n_u16): Remove.
(vrshrq_x_n_s8): Remove.
(vrshrq_x_n_s16): Remove.
(vrshrq_x_n_s32): Remove.
(vrshrq_x_n_u8): Remove.
(vrshrq_x_n_u16): Remove.
(vrshrq_x_n_u32): Remove.
(vshrq_x_n_s8): Remove.
(vshrq_x_n_s16): Remove.
(vshrq_x_n_s32): Remove.
(vshrq_x_n_u8): Remove.
(vshrq_x_n_u16): Remove.
(vshrq_x_n_u32): Remove.
(__arm_vshrq_n_s8): Remove.
(__arm_vshrq_n_s16): Remove.
(__arm_vshrq_n_s32): Remove.
(__arm_vshrq_n_u8): Remove.
(__arm_vshrq_n_u16): Remove.
(__arm_vshrq_n_u32): Remove.
(__arm_vrshrq_n_u8): Remove.
(__arm_vrshrq_n_s8): Remove.
(__arm_vrshrq_n_u16): Remove.
(__arm_vrshrq_n_s16): Remove.
(__arm_vrshrq_n_u32): Remove.
(__arm_vrshrq_n_s32): Remove.
(__arm_vrshrq_m_n_s8): Remove.
(__arm_vrshrq_m_n_s32): Remove.
(__arm_vrshrq_m_n_s16): Remove.
(__arm_vrshrq_m_n_u8): Remove.
(__arm_vrshrq_m_n_u32): Remove.
(__arm_vrshrq_m_n_u16): Remove.
(__arm_vshrq_m_n_s8): Remove.
(__arm_vshrq_m_n_s32): Remove.
(__arm_vshrq_m_n_s16): Remove.
(__arm_vshrq_m_n_u8): Remove.
(__arm_vshrq_m_n_u32): Remove.
(__arm_vshrq_m_n_u16): Remove.
(__arm_vrshrq_x_n_s8): Remove.
(__arm_vrshrq_x_n_s16): Remove.
(__arm_vrshrq_x_n_s32): Remove.
(__arm_vrshrq_x_n_u8): Remove.
(__arm_vrshrq_x_n_u16): Remove.
(__arm_vrshrq_x_n_u32): Remove.
(__arm_vshrq_x_n_s8): Remove.
(__arm_vshrq_x_n_s16): Remove.
(__arm_vshrq_x_n_s32): Remove.
(__arm_vshrq_x_n_u8): Remove.
(__arm_vshrq_x_n_u16): Remove.
(__arm_vshrq_x_n_u32): Remove.
(__arm_vshrq): Remove.
(__arm_vrshrq): Remove.
(__arm_vrshrq_m): Remove.
(__arm_vshrq_m): Remove.
(__arm_vrshrq_x): Remove.
(__arm_vshrq_x): Remove.

arm: [MVE intrinsics] factorize vsrhrq vrshrq

Factorize vsrhrq vrshrq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_VSHRQ_M_N, MVE_VSHRQ_N): New.
(mve_insn): Add vrshr, vshr.
* config/arm/mve.md (mve_vshrq_n_<supf><mode>)
(mve_vrshrq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vrshrq_m_n_<supf><mode>, mve_vshrq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] add binary_rshift shape

This patch adds the binary_rshift shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_rshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_rshift): New.

arm: [MVE intrinsics] rework vqrshrunbq vqrshruntq vqshrunbq vqshruntq

Implement vqrshrunbq, vqrshruntq, vqshrunbq, vqshruntq using the new
MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_N_NO_U_F): New.
(vqshrunbq, vqshruntq, vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins-base.def (vqshrunbq, vqshruntq)
(vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins-base.h (vqshrunbq, vqshruntq)
(vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vqshrunbq,
vqshruntq, vqrshrunbq, vqrshruntq.
* config/arm/arm_mve.h (vqrshrunbq): Remove.
(vqrshruntq): Remove.
(vqrshrunbq_m): Remove.
(vqrshruntq_m): Remove.
(vqrshrunbq_n_s16): Remove.
(vqrshrunbq_n_s32): Remove.
(vqrshruntq_n_s16): Remove.
(vqrshruntq_n_s32): Remove.
(vqrshrunbq_m_n_s32): Remove.
(vqrshrunbq_m_n_s16): Remove.
(vqrshruntq_m_n_s32): Remove.
(vqrshruntq_m_n_s16): Remove.
(__arm_vqrshrunbq_n_s16): Remove.
(__arm_vqrshrunbq_n_s32): Remove.
(__arm_vqrshruntq_n_s16): Remove.
(__arm_vqrshruntq_n_s32): Remove.
(__arm_vqrshrunbq_m_n_s32): Remove.
(__arm_vqrshrunbq_m_n_s16): Remove.
(__arm_vqrshruntq_m_n_s32): Remove.
(__arm_vqrshruntq_m_n_s16): Remove.
(__arm_vqrshrunbq): Remove.
(__arm_vqrshruntq): Remove.
(__arm_vqrshrunbq_m): Remove.
(__arm_vqrshruntq_m): Remove.
(vqshrunbq): Remove.
(vqshruntq): Remove.
(vqshrunbq_m): Remove.
(vqshruntq_m): Remove.
(vqshrunbq_n_s16): Remove.
(vqshruntq_n_s16): Remove.
(vqshrunbq_n_s32): Remove.
(vqshruntq_n_s32): Remove.
(vqshrunbq_m_n_s32): Remove.
(vqshrunbq_m_n_s16): Remove.
(vqshruntq_m_n_s32): Remove.
(vqshruntq_m_n_s16): Remove.
(__arm_vqshrunbq_n_s16): Remove.
(__arm_vqshruntq_n_s16): Remove.
(__arm_vqshrunbq_n_s32): Remove.
(__arm_vqshruntq_n_s32): Remove.
(__arm_vqshrunbq_m_n_s32): Remove.
(__arm_vqshrunbq_m_n_s16): Remove.
(__arm_vqshruntq_m_n_s32): Remove.
(__arm_vqshruntq_m_n_s16): Remove.
(__arm_vqshrunbq): Remove.
(__arm_vqshruntq): Remove.
(__arm_vqshrunbq_m): Remove.
(__arm_vqshruntq_m): Remove.

arm: [MVE intrinsics] factorize vqrshrunb vqrshrunt vqshrunb vqshrunt

Factorize vqrshrunb, vqrshrunt, vqshrunb, vqshrunt so that they use
existing patterns.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_SHRN_N): Add VQRSHRUNBQ,
VQRSHRUNTQ, VQSHRUNBQ, VQSHRUNTQ.
(MVE_SHRN_M_N): Likewise.
(mve_insn): Add vqrshrunb, vqrshrunt, vqshrunb, vqshrunt.
(isu): Add VQRSHRUNBQ, VQRSHRUNTQ, VQSHRUNBQ, VQSHRUNTQ.
(supf): Likewise.
* config/arm/mve.md (mve_vqrshrunbq_n_s<mode>): Remove.
(mve_vqrshruntq_n_s<mode>): Remove.
(mve_vqshrunbq_n_s<mode>): Remove.
(mve_vqshruntq_n_s<mode>): Remove.
(mve_vqrshrunbq_m_n_s<mode>): Remove.
(mve_vqrshruntq_m_n_s<mode>): Remove.
(mve_vqshrunbq_m_n_s<mode>): Remove.
(mve_vqshruntq_m_n_s<mode>): Remove.

arm: [MVE intrinsics] add binary_rshift_narrow_unsigned shape

This patch adds the binary_rshift_narrow_unsigned shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc
(binary_rshift_narrow_unsigned): New.
* config/arm/arm-mve-builtins-shapes.h
(binary_rshift_narrow_unsigned): New.

arm: [MVE intrinsics] rework vshrnbq vshrntq vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq

Implement vshrnbq, vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq,
vqrshrnbq, vqrshrntq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_N_NO_F): New.
(vshrnbq, vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq)
(vqrshrnbq, vqrshrntq): New.
* config/arm/arm-mve-builtins-base.def (vshrnbq, vshrntq)
(vrshrnbq, vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq):
New.
* config/arm/arm-mve-builtins-base.h (vshrnbq, vshrntq, vrshrnbq)
(vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vshrnbq,
vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq,
vqrshrntq.
* config/arm/arm_mve.h (vshrnbq): Remove.
(vshrntq): Remove.
(vshrnbq_m): Remove.
(vshrntq_m): Remove.
(vshrnbq_n_s16): Remove.
(vshrntq_n_s16): Remove.
(vshrnbq_n_u16): Remove.
(vshrntq_n_u16): Remove.
(vshrnbq_n_s32): Remove.
(vshrntq_n_s32): Remove.
(vshrnbq_n_u32): Remove.
(vshrntq_n_u32): Remove.
(vshrnbq_m_n_s32): Remove.
(vshrnbq_m_n_s16): Remove.
(vshrnbq_m_n_u32): Remove.
(vshrnbq_m_n_u16): Remove.
(vshrntq_m_n_s32): Remove.
(vshrntq_m_n_s16): Remove.
(vshrntq_m_n_u32): Remove.
(vshrntq_m_n_u16): Remove.
(__arm_vshrnbq_n_s16): Remove.
(__arm_vshrntq_n_s16): Remove.
(__arm_vshrnbq_n_u16): Remove.
(__arm_vshrntq_n_u16): Remove.
(__arm_vshrnbq_n_s32): Remove.
(__arm_vshrntq_n_s32): Remove.
(__arm_vshrnbq_n_u32): Remove.
(__arm_vshrntq_n_u32): Remove.
(__arm_vshrnbq_m_n_s32): Remove.
(__arm_vshrnbq_m_n_s16): Remove.
(__arm_vshrnbq_m_n_u32): Remove.
(__arm_vshrnbq_m_n_u16): Remove.
(__arm_vshrntq_m_n_s32): Remove.
(__arm_vshrntq_m_n_s16): Remove.
(__arm_vshrntq_m_n_u32): Remove.
(__arm_vshrntq_m_n_u16): Remove.
(__arm_vshrnbq): Remove.
(__arm_vshrntq): Remove.
(__arm_vshrnbq_m): Remove.
(__arm_vshrntq_m): Remove.
(vrshrnbq): Remove.
(vrshrntq): Remove.
(vrshrnbq_m): Remove.
(vrshrntq_m): Remove.
(vrshrnbq_n_s16): Remove.
(vrshrntq_n_s16): Remove.
(vrshrnbq_n_u16): Remove.
(vrshrntq_n_u16): Remove.
(vrshrnbq_n_s32): Remove.
(vrshrntq_n_s32): Remove.
(vrshrnbq_n_u32): Remove.
(vrshrntq_n_u32): Remove.
(vrshrnbq_m_n_s32): Remove.
(vrshrnbq_m_n_s16): Remove.
(vrshrnbq_m_n_u32): Remove.
(vrshrnbq_m_n_u16): Remove.
(vrshrntq_m_n_s32): Remove.
(vrshrntq_m_n_s16): Remove.
(vrshrntq_m_n_u32): Remove.
(vrshrntq_m_n_u16): Remove.
(__arm_vrshrnbq_n_s16): Remove.
(__arm_vrshrntq_n_s16): Remove.
(__arm_vrshrnbq_n_u16): Remove.
(__arm_vrshrntq_n_u16): Remove.
(__arm_vrshrnbq_n_s32): Remove.
(__arm_vrshrntq_n_s32): Remove.
(__arm_vrshrnbq_n_u32): Remove.
(__arm_vrshrntq_n_u32): Remove.
(__arm_vrshrnbq_m_n_s32): Remove.
(__arm_vrshrnbq_m_n_s16): Remove.
(__arm_vrshrnbq_m_n_u32): Remove.
(__arm_vrshrnbq_m_n_u16): Remove.
(__arm_vrshrntq_m_n_s32): Remove.
(__arm_vrshrntq_m_n_s16): Remove.
(__arm_vrshrntq_m_n_u32): Remove.
(__arm_vrshrntq_m_n_u16): Remove.
(__arm_vrshrnbq): Remove.
(__arm_vrshrntq): Remove.
(__arm_vrshrnbq_m): Remove.
(__arm_vrshrntq_m): Remove.
(vqshrnbq): Remove.
(vqshrntq): Remove.
(vqshrnbq_m): Remove.
(vqshrntq_m): Remove.
(vqshrnbq_n_s16): Remove.
(vqshrntq_n_s16): Remove.
(vqshrnbq_n_u16): Remove.
(vqshrntq_n_u16): Remove.
(vqshrnbq_n_s32): Remove.
(vqshrntq_n_s32): Remove.
(vqshrnbq_n_u32): Remove.
(vqshrntq_n_u32): Remove.
(vqshrnbq_m_n_s32): Remove.
(vqshrnbq_m_n_s16): Remove.
(vqshrnbq_m_n_u32): Remove.
(vqshrnbq_m_n_u16): Remove.
(vqshrntq_m_n_s32): Remove.
(vqshrntq_m_n_s16): Remove.
(vqshrntq_m_n_u32): Remove.
(vqshrntq_m_n_u16): Remove.
(__arm_vqshrnbq_n_s16): Remove.
(__arm_vqshrntq_n_s16): Remove.
(__arm_vqshrnbq_n_u16): Remove.
(__arm_vqshrntq_n_u16): Remove.
(__arm_vqshrnbq_n_s32): Remove.
(__arm_vqshrntq_n_s32): Remove.
(__arm_vqshrnbq_n_u32): Remove.
(__arm_vqshrntq_n_u32): Remove.
(__arm_vqshrnbq_m_n_s32): Remove.
(__arm_vqshrnbq_m_n_s16): Remove.
(__arm_vqshrnbq_m_n_u32): Remove.
(__arm_vqshrnbq_m_n_u16): Remove.
(__arm_vqshrntq_m_n_s32): Remove.
(__arm_vqshrntq_m_n_s16): Remove.
(__arm_vqshrntq_m_n_u32): Remove.
(__arm_vqshrntq_m_n_u16): Remove.
(__arm_vqshrnbq): Remove.
(__arm_vqshrntq): Remove.
(__arm_vqshrnbq_m): Remove.
(__arm_vqshrntq_m): Remove.
(vqrshrnbq): Remove.
(vqrshrntq): Remove.
(vqrshrnbq_m): Remove.
(vqrshrntq_m): Remove.
(vqrshrnbq_n_s16): Remove.
(vqrshrnbq_n_u16): Remove.
(vqrshrnbq_n_s32): Remove.
(vqrshrnbq_n_u32): Remove.
(vqrshrntq_n_s16): Remove.
(vqrshrntq_n_u16): Remove.
(vqrshrntq_n_s32): Remove.
(vqrshrntq_n_u32): Remove.
(vqrshrnbq_m_n_s32): Remove.
(vqrshrnbq_m_n_s16): Remove.
(vqrshrnbq_m_n_u32): Remove.
(vqrshrnbq_m_n_u16): Remove.
(vqrshrntq_m_n_s32): Remove.
(vqrshrntq_m_n_s16): Remove.
(vqrshrntq_m_n_u32): Remove.
(vqrshrntq_m_n_u16): Remove.
(__arm_vqrshrnbq_n_s16): Remove.
(__arm_vqrshrnbq_n_u16): Remove.
(__arm_vqrshrnbq_n_s32): Remove.
(__arm_vqrshrnbq_n_u32): Remove.
(__arm_vqrshrntq_n_s16): Remove.
(__arm_vqrshrntq_n_u16): Remove.
(__arm_vqrshrntq_n_s32): Remove.
(__arm_vqrshrntq_n_u32): Remove.
(__arm_vqrshrnbq_m_n_s32): Remove.
(__arm_vqrshrnbq_m_n_s16): Remove.
(__arm_vqrshrnbq_m_n_u32): Remove.
(__arm_vqrshrnbq_m_n_u16): Remove.
(__arm_vqrshrntq_m_n_s32): Remove.
(__arm_vqrshrntq_m_n_s16): Remove.
(__arm_vqrshrntq_m_n_u32): Remove.
(__arm_vqrshrntq_m_n_u16): Remove.
(__arm_vqrshrnbq): Remove.
(__arm_vqrshrntq): Remove.
(__arm_vqrshrnbq_m): Remove.
(__arm_vqrshrntq_m): Remove.

arm: [MVE intrinsics] factorize vshrntq vshrnbq vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq

Factorize vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq, vshrntq, vshrnbq,
vrshrnbq and vrshrntq so that they use the same pattern.

Introduce <isu> iterator for *shrn* so that we can use the same
pattern despite the different "s", "u" and "i" suffixes.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_SHRN_N, MVE_SHRN_M_N): New.
(mve_insn): Add vqrshrnb, vqrshrnt, vqshrnb, vqshrnt, vrshrnb,
vrshrnt, vshrnb, vshrnt.
(isu): New.
* config/arm/mve.md (mve_vqrshrnbq_n_<supf><mode>)
(mve_vqrshrntq_n_<supf><mode>, mve_vqshrnbq_n_<supf><mode>)
(mve_vqshrntq_n_<supf><mode>, mve_vrshrnbq_n_<supf><mode>)
(mve_vrshrntq_n_<supf><mode>, mve_vshrnbq_n_<supf><mode>)
(mve_vshrntq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vqrshrnbq_m_n_<supf><mode>, mve_vqrshrntq_m_n_<supf><mode>)
(mve_vqshrnbq_m_n_<supf><mode>, mve_vqshrntq_m_n_<supf><mode>)
(mve_vrshrnbq_m_n_<supf><mode>, mve_vrshrntq_m_n_<supf><mode>)
(mve_vshrnbq_m_n_<supf><mode>, mve_vshrntq_m_n_<supf><mode>):
Merge into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] add binary_rshift_narrow shape

This patch adds the binary_rshift_narrow shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_rshift_narrow):
New.
* config/arm/arm-mve-builtins-shapes.h (binary_rshift_narrow): New.

arm: [MVE intrinsics] rework vmaxq vminq

Implement vmaxq and vminq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_NO_F): New.
(vmaxq, vminq): New.
* config/arm/arm-mve-builtins-base.def (vmaxq, vminq): New.
* config/arm/arm-mve-builtins-base.h (vmaxq, vminq): New.
* config/arm/arm_mve.h (vminq): Remove.
(vmaxq): Remove.
(vmaxq_m): Remove.
(vminq_m): Remove.
(vminq_x): Remove.
(vmaxq_x): Remove.
(vminq_u8): Remove.
(vmaxq_u8): Remove.
(vminq_s8): Remove.
(vmaxq_s8): Remove.
(vminq_u16): Remove.
(vmaxq_u16): Remove.
(vminq_s16): Remove.
(vmaxq_s16): Remove.
(vminq_u32): Remove.
(vmaxq_u32): Remove.
(vminq_s32): Remove.
(vmaxq_s32): Remove.
(vmaxq_m_s8): Remove.
(vmaxq_m_s32): Remove.
(vmaxq_m_s16): Remove.
(vmaxq_m_u8): Remove.
(vmaxq_m_u32): Remove.
(vmaxq_m_u16): Remove.
(vminq_m_s8): Remove.
(vminq_m_s32): Remove.
(vminq_m_s16): Remove.
(vminq_m_u8): Remove.
(vminq_m_u32): Remove.
(vminq_m_u16): Remove.
(vminq_x_s8): Remove.
(vminq_x_s16): Remove.
(vminq_x_s32): Remove.
(vminq_x_u8): Remove.
(vminq_x_u16): Remove.
(vminq_x_u32): Remove.
(vmaxq_x_s8): Remove.
(vmaxq_x_s16): Remove.
(vmaxq_x_s32): Remove.
(vmaxq_x_u8): Remove.
(vmaxq_x_u16): Remove.
(vmaxq_x_u32): Remove.
(__arm_vminq_u8): Remove.
(__arm_vmaxq_u8): Remove.
(__arm_vminq_s8): Remove.
(__arm_vmaxq_s8): Remove.
(__arm_vminq_u16): Remove.
(__arm_vmaxq_u16): Remove.
(__arm_vminq_s16): Remove.
(__arm_vmaxq_s16): Remove.
(__arm_vminq_u32): Remove.
(__arm_vmaxq_u32): Remove.
(__arm_vminq_s32): Remove.
(__arm_vmaxq_s32): Remove.
(__arm_vmaxq_m_s8): Remove.
(__arm_vmaxq_m_s32): Remove.
(__arm_vmaxq_m_s16): Remove.
(__arm_vmaxq_m_u8): Remove.
(__arm_vmaxq_m_u32): Remove.
(__arm_vmaxq_m_u16): Remove.
(__arm_vminq_m_s8): Remove.
(__arm_vminq_m_s32): Remove.
(__arm_vminq_m_s16): Remove.
(__arm_vminq_m_u8): Remove.
(__arm_vminq_m_u32): Remove.
(__arm_vminq_m_u16): Remove.
(__arm_vminq_x_s8): Remove.
(__arm_vminq_x_s16): Remove.
(__arm_vminq_x_s32): Remove.
(__arm_vminq_x_u8): Remove.
(__arm_vminq_x_u16): Remove.
(__arm_vminq_x_u32): Remove.
(__arm_vmaxq_x_s8): Remove.
(__arm_vmaxq_x_s16): Remove.
(__arm_vmaxq_x_s32): Remove.
(__arm_vmaxq_x_u8): Remove.
(__arm_vmaxq_x_u16): Remove.
(__arm_vmaxq_x_u32): Remove.
(__arm_vminq): Remove.
(__arm_vmaxq): Remove.
(__arm_vmaxq_m): Remove.
(__arm_vminq_m): Remove.
(__arm_vminq_x): Remove.
(__arm_vmaxq_x): Remove.

arm: [MVE intrinsics] factorize vmaxq vminq

Factorize vmaxq and vminq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MAX_MIN_SU): New.
(max_min_su_str): New.
(max_min_supf): New.
* config/arm/mve.md (mve_vmaxq_s<mode>, mve_vmaxq_u<mode>)
(mve_vminq_s<mode>, mve_vminq_u<mode>): Merge into ...
(mve_<max_min_su_str>q_<max_min_supf><mode>): ... this.

arm: [MVE intrinsics] rework vqshlq vshlq

Implement vqshlq, vshlq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_M_N_R): New.
(vqshlq, vshlq): New.
* config/arm/arm-mve-builtins-base.def (vqshlq, vshlq): New.
* config/arm/arm-mve-builtins-base.h (vqshlq, vshlq): New.
* config/arm/arm_mve.h (vshlq): Remove.
(vshlq_r): Remove.
(vshlq_n): Remove.
(vshlq_m_r): Remove.
(vshlq_m): Remove.
(vshlq_m_n): Remove.
(vshlq_x): Remove.
(vshlq_x_n): Remove.
(vshlq_s8): Remove.
(vshlq_s16): Remove.
(vshlq_s32): Remove.
(vshlq_u8): Remove.
(vshlq_u16): Remove.
(vshlq_u32): Remove.
(vshlq_r_u8): Remove.
(vshlq_n_u8): Remove.
(vshlq_r_s8): Remove.
(vshlq_n_s8): Remove.
(vshlq_r_u16): Remove.
(vshlq_n_u16): Remove.
(vshlq_r_s16): Remove.
(vshlq_n_s16): Remove.
(vshlq_r_u32): Remove.
(vshlq_n_u32): Remove.
(vshlq_r_s32): Remove.
(vshlq_n_s32): Remove.
(vshlq_m_r_u8): Remove.
(vshlq_m_r_s8): Remove.
(vshlq_m_r_u16): Remove.
(vshlq_m_r_s16): Remove.
(vshlq_m_r_u32): Remove.
(vshlq_m_r_s32): Remove.
(vshlq_m_u8): Remove.
(vshlq_m_s8): Remove.
(vshlq_m_u16): Remove.
(vshlq_m_s16): Remove.
(vshlq_m_u32): Remove.
(vshlq_m_s32): Remove.
(vshlq_m_n_s8): Remove.
(vshlq_m_n_s32): Remove.
(vshlq_m_n_s16): Remove.
(vshlq_m_n_u8): Remove.
(vshlq_m_n_u32): Remove.
(vshlq_m_n_u16): Remove.
(vshlq_x_s8): Remove.
(vshlq_x_s16): Remove.
(vshlq_x_s32): Remove.
(vshlq_x_u8): Remove.
(vshlq_x_u16): Remove.
(vshlq_x_u32): Remove.
(vshlq_x_n_s8): Remove.
(vshlq_x_n_s16): Remove.
(vshlq_x_n_s32): Remove.
(vshlq_x_n_u8): Remove.
(vshlq_x_n_u16): Remove.
(vshlq_x_n_u32): Remove.
(__arm_vshlq_s8): Remove.
(__arm_vshlq_s16): Remove.
(__arm_vshlq_s32): Remove.
(__arm_vshlq_u8): Remove.
(__arm_vshlq_u16): Remove.
(__arm_vshlq_u32): Remove.
(__arm_vshlq_r_u8): Remove.
(__arm_vshlq_n_u8): Remove.
(__arm_vshlq_r_s8): Remove.
(__arm_vshlq_n_s8): Remove.
(__arm_vshlq_r_u16): Remove.
(__arm_vshlq_n_u16): Remove.
(__arm_vshlq_r_s16): Remove.
(__arm_vshlq_n_s16): Remove.
(__arm_vshlq_r_u32): Remove.
(__arm_vshlq_n_u32): Remove.
(__arm_vshlq_r_s32): Remove.
(__arm_vshlq_n_s32): Remove.
(__arm_vshlq_m_r_u8): Remove.
(__arm_vshlq_m_r_s8): Remove.
(__arm_vshlq_m_r_u16): Remove.
(__arm_vshlq_m_r_s16): Remove.
(__arm_vshlq_m_r_u32): Remove.
(__arm_vshlq_m_r_s32): Remove.
(__arm_vshlq_m_u8): Remove.
(__arm_vshlq_m_s8): Remove.
(__arm_vshlq_m_u16): Remove.
(__arm_vshlq_m_s16): Remove.
(__arm_vshlq_m_u32): Remove.
(__arm_vshlq_m_s32): Remove.
(__arm_vshlq_m_n_s8): Remove.
(__arm_vshlq_m_n_s32): Remove.
(__arm_vshlq_m_n_s16): Remove.
(__arm_vshlq_m_n_u8): Remove.
(__arm_vshlq_m_n_u32): Remove.
(__arm_vshlq_m_n_u16): Remove.
(__arm_vshlq_x_s8): Remove.
(__arm_vshlq_x_s16): Remove.
(__arm_vshlq_x_s32): Remove.
(__arm_vshlq_x_u8): Remove.
(__arm_vshlq_x_u16): Remove.
(__arm_vshlq_x_u32): Remove.
(__arm_vshlq_x_n_s8): Remove.
(__arm_vshlq_x_n_s16): Remove.
(__arm_vshlq_x_n_s32): Remove.
(__arm_vshlq_x_n_u8): Remove.
(__arm_vshlq_x_n_u16): Remove.
(__arm_vshlq_x_n_u32): Remove.
(__arm_vshlq): Remove.
(__arm_vshlq_r): Remove.
(__arm_vshlq_n): Remove.
(__arm_vshlq_m_r): Remove.
(__arm_vshlq_m): Remove.
(__arm_vshlq_m_n): Remove.
(__arm_vshlq_x): Remove.
(__arm_vshlq_x_n): Remove.
(vqshlq): Remove.
(vqshlq_r): Remove.
(vqshlq_n): Remove.
(vqshlq_m_r): Remove.
(vqshlq_m_n): Remove.
(vqshlq_m): Remove.
(vqshlq_u8): Remove.
(vqshlq_r_u8): Remove.
(vqshlq_n_u8): Remove.
(vqshlq_s8): Remove.
(vqshlq_r_s8): Remove.
(vqshlq_n_s8): Remove.
(vqshlq_u16): Remove.
(vqshlq_r_u16): Remove.
(vqshlq_n_u16): Remove.
(vqshlq_s16): Remove.
(vqshlq_r_s16): Remove.
(vqshlq_n_s16): Remove.
(vqshlq_u32): Remove.
(vqshlq_r_u32): Remove.
(vqshlq_n_u32): Remove.
(vqshlq_s32): Remove.
(vqshlq_r_s32): Remove.
(vqshlq_n_s32): Remove.
(vqshlq_m_r_u8): Remove.
(vqshlq_m_r_s8): Remove.
(vqshlq_m_r_u16): Remove.
(vqshlq_m_r_s16): Remove.
(vqshlq_m_r_u32): Remove.
(vqshlq_m_r_s32): Remove.
(vqshlq_m_n_s8): Remove.
(vqshlq_m_n_s32): Remove.
(vqshlq_m_n_s16): Remove.
(vqshlq_m_n_u8): Remove.
(vqshlq_m_n_u32): Remove.
(vqshlq_m_n_u16): Remove.
(vqshlq_m_s8): Remove.
(vqshlq_m_s32): Remove.
(vqshlq_m_s16): Remove.
(vqshlq_m_u8): Remove.
(vqshlq_m_u32): Remove.
(vqshlq_m_u16): Remove.
(__arm_vqshlq_u8): Remove.
(__arm_vqshlq_r_u8): Remove.
(__arm_vqshlq_n_u8): Remove.
(__arm_vqshlq_s8): Remove.
(__arm_vqshlq_r_s8): Remove.
(__arm_vqshlq_n_s8): Remove.
(__arm_vqshlq_u16): Remove.
(__arm_vqshlq_r_u16): Remove.
(__arm_vqshlq_n_u16): Remove.
(__arm_vqshlq_s16): Remove.
(__arm_vqshlq_r_s16): Remove.
(__arm_vqshlq_n_s16): Remove.
(__arm_vqshlq_u32): Remove.
(__arm_vqshlq_r_u32): Remove.
(__arm_vqshlq_n_u32): Remove.
(__arm_vqshlq_s32): Remove.
(__arm_vqshlq_r_s32): Remove.
(__arm_vqshlq_n_s32): Remove.
(__arm_vqshlq_m_r_u8): Remove.
(__arm_vqshlq_m_r_s8): Remove.
(__arm_vqshlq_m_r_u16): Remove.
(__arm_vqshlq_m_r_s16): Remove.
(__arm_vqshlq_m_r_u32): Remove.
(__arm_vqshlq_m_r_s32): Remove.
(__arm_vqshlq_m_n_s8): Remove.
(__arm_vqshlq_m_n_s32): Remove.
(__arm_vqshlq_m_n_s16): Remove.
(__arm_vqshlq_m_n_u8): Remove.
(__arm_vqshlq_m_n_u32): Remove.
(__arm_vqshlq_m_n_u16): Remove.
(__arm_vqshlq_m_s8): Remove.
(__arm_vqshlq_m_s32): Remove.
(__arm_vqshlq_m_s16): Remove.
(__arm_vqshlq_m_u8): Remove.
(__arm_vqshlq_m_u32): Remove.
(__arm_vqshlq_m_u16): Remove.
(__arm_vqshlq): Remove.
(__arm_vqshlq_r): Remove.
(__arm_vqshlq_n): Remove.
(__arm_vqshlq_m_r): Remove.
(__arm_vqshlq_m_n): Remove.
(__arm_vqshlq_m): Remove.

arm: [MVE intrinsics] add unspec_mve_function_exact_insn_vshl

Introduce a function that will be used to build vshl intrinsics. They
are special because they have to handle MODE_r.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn_vshl): New.

arm: [MVE intrinsics] add binary_lshift_r shape

This patch adds the binary_lshift_r shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_lshift_r): New.
* config/arm/arm-mve-builtins-shapes.h (binary_lshift_r): New.

arm: [MVE intrinsics] add support for MODE_r

A few intrinsics have an additional mode (MODE_r), which does not
always support the same set of predicates as MODE_none and MODE_n.
For vqshlq they are the same, but for vshlq they are not.

Indeed we have:
vqshlq
vqshlq_m
vqshlq_n
vqshlq_m_n
vqshlq_r
vqshlq_m_r

vshlq
vshlq_m
vshlq_x
vshlq_n
vshlq_m_n
vshlq_x_n
vshlq_r
vshlq_m_r

This patch adds support for it.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins.cc (has_inactive_argument)
(finish_opt_n_resolution): Handle MODE_r.
* config/arm/arm-mve-builtins.def (r): New mode.

arm: [MVE intrinsics] add binary_lshift shape

This patch adds the binary_lshift shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_lshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_lshift): New.

arm: [MVE intrinsics] rework vabdq

Implement vabdq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_N): New.
(vabdq): New.
* config/arm/arm-mve-builtins-base.def (vabdq): New.
* config/arm/arm-mve-builtins-base.h (vabdq): New.
* config/arm/arm_mve.h (vabdq): Remove.
(vabdq_m): Remove.
(vabdq_x): Remove.
(vabdq_u8): Remove.
(vabdq_s8): Remove.
(vabdq_u16): Remove.
(vabdq_s16): Remove.
(vabdq_u32): Remove.
(vabdq_s32): Remove.
(vabdq_f16): Remove.
(vabdq_f32): Remove.
(vabdq_m_s8): Remove.
(vabdq_m_s32): Remove.
(vabdq_m_s16): Remove.
(vabdq_m_u8): Remove.
(vabdq_m_u32): Remove.
(vabdq_m_u16): Remove.
(vabdq_m_f32): Remove.
(vabdq_m_f16): Remove.
(vabdq_x_s8): Remove.
(vabdq_x_s16): Remove.
(vabdq_x_s32): Remove.
(vabdq_x_u8): Remove.
(vabdq_x_u16): Remove.
(vabdq_x_u32): Remove.
(vabdq_x_f16): Remove.
(vabdq_x_f32): Remove.
(__arm_vabdq_u8): Remove.
(__arm_vabdq_s8): Remove.
(__arm_vabdq_u16): Remove.
(__arm_vabdq_s16): Remove.
(__arm_vabdq_u32): Remove.
(__arm_vabdq_s32): Remove.
(__arm_vabdq_m_s8): Remove.
(__arm_vabdq_m_s32): Remove.
(__arm_vabdq_m_s16): Remove.
(__arm_vabdq_m_u8): Remove.
(__arm_vabdq_m_u32): Remove.
(__arm_vabdq_m_u16): Remove.
(__arm_vabdq_x_s8): Remove.
(__arm_vabdq_x_s16): Remove.
(__arm_vabdq_x_s32): Remove.
(__arm_vabdq_x_u8): Remove.
(__arm_vabdq_x_u16): Remove.
(__arm_vabdq_x_u32): Remove.
(__arm_vabdq_f16): Remove.
(__arm_vabdq_f32): Remove.
(__arm_vabdq_m_f32): Remove.
(__arm_vabdq_m_f16): Remove.
(__arm_vabdq_x_f16): Remove.
(__arm_vabdq_x_f32): Remove.
(__arm_vabdq): Remove.
(__arm_vabdq_m): Remove.
(__arm_vabdq_x): Remove.

arm: [MVE intrinsics] factorize vabdq

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_FP_M_BINARY): Add vabdq.
(MVE_FP_VABDQ_ONLY): New.
(mve_insn): Add vabd.
* config/arm/mve.md (mve_vabdq_f<mode>): Move into ...
(@mve_<mve_insn>q_f<mode>): ... this.
(mve_vabdq_m_f<mode>): Remove.

arm: [MVE intrinsics] rework vqrdmulhq

Implement vqrdmulhq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vqrdmulhq): New.
* config/arm/arm-mve-builtins-base.def (vqrdmulhq): New.
* config/arm/arm-mve-builtins-base.h (vqrdmulhq): New.
* config/arm/arm_mve.h (vqrdmulhq): Remove.
(vqrdmulhq_m): Remove.
(vqrdmulhq_s8): Remove.
(vqrdmulhq_n_s8): Remove.
(vqrdmulhq_s16): Remove.
(vqrdmulhq_n_s16): Remove.
(vqrdmulhq_s32): Remove.
(vqrdmulhq_n_s32): Remove.
(vqrdmulhq_m_n_s8): Remove.
(vqrdmulhq_m_n_s32): Remove.
(vqrdmulhq_m_n_s16): Remove.
(vqrdmulhq_m_s8): Remove.
(vqrdmulhq_m_s32): Remove.
(vqrdmulhq_m_s16): Remove.
(__arm_vqrdmulhq_s8): Remove.
(__arm_vqrdmulhq_n_s8): Remove.
(__arm_vqrdmulhq_s16): Remove.
(__arm_vqrdmulhq_n_s16): Remove.
(__arm_vqrdmulhq_s32): Remove.
(__arm_vqrdmulhq_n_s32): Remove.
(__arm_vqrdmulhq_m_n_s8): Remove.
(__arm_vqrdmulhq_m_n_s32): Remove.
(__arm_vqrdmulhq_m_n_s16): Remove.
(__arm_vqrdmulhq_m_s8): Remove.
(__arm_vqrdmulhq_m_s32): Remove.
(__arm_vqrdmulhq_m_s16): Remove.
(__arm_vqrdmulhq): Remove.
(__arm_vqrdmulhq_m): Remove.

arm: [MVE intrinsics] factorize vqshlq vshlq

Factorize vqshlq and vshlq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_SHIFT_M_R, MVE_SHIFT_M_N)
(MVE_SHIFT_N, MVE_SHIFT_R): New.
(mve_insn): Add vqshl, vshl.
* config/arm/mve.md (mve_vqshlq_n_<supf><mode>)
(mve_vshlq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vqshlq_r_<supf><mode>, mve_vshlq_r_<supf><mode>): Merge into
...
(@mve_<mve_insn>q_r_<supf><mode>): ... this.
(mve_vqshlq_m_r_<supf><mode>, mve_vshlq_m_r_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_r_<supf><mode>): ... this.
(mve_vqshlq_m_n_<supf><mode>, mve_vshlq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
* config/arm/vec-common.md (mve_vshlq_<supf><mode>): Transform
into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.

arm: [MVE intrinsics] rework vrshlq vqrshlq

Implement vrshlq, vqrshlq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins-base.def (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins-base.h (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins.cc (has_inactive_argument): Handle
vqrshlq, vrshlq.
* config/arm/arm_mve.h (vrshlq): Remove.
(vrshlq_m_n): Remove.
(vrshlq_m): Remove.
(vrshlq_x): Remove.
(vrshlq_u8): Remove.
(vrshlq_n_u8): Remove.
(vrshlq_s8): Remove.
(vrshlq_n_s8): Remove.
(vrshlq_u16): Remove.
(vrshlq_n_u16): Remove.
(vrshlq_s16): Remove.
(vrshlq_n_s16): Remove.
(vrshlq_u32): Remove.
(vrshlq_n_u32): Remove.
(vrshlq_s32): Remove.
(vrshlq_n_s32): Remove.
(vrshlq_m_n_u8): Remove.
(vrshlq_m_n_s8): Remove.
(vrshlq_m_n_u16): Remove.
(vrshlq_m_n_s16): Remove.
(vrshlq_m_n_u32): Remove.
(vrshlq_m_n_s32): Remove.
(vrshlq_m_s8): Remove.
(vrshlq_m_s32): Remove.
(vrshlq_m_s16): Remove.
(vrshlq_m_u8): Remove.
(vrshlq_m_u32): Remove.
(vrshlq_m_u16): Remove.
(vrshlq_x_s8): Remove.
(vrshlq_x_s16): Remove.
(vrshlq_x_s32): Remove.
(vrshlq_x_u8): Remove.
(vrshlq_x_u16): Remove.
(vrshlq_x_u32): Remove.
(__arm_vrshlq_u8): Remove.
(__arm_vrshlq_n_u8): Remove.
(__arm_vrshlq_s8): Remove.
(__arm_vrshlq_n_s8): Remove.
(__arm_vrshlq_u16): Remove.
(__arm_vrshlq_n_u16): Remove.
(__arm_vrshlq_s16): Remove.
(__arm_vrshlq_n_s16): Remove.
(__arm_vrshlq_u32): Remove.
(__arm_vrshlq_n_u32): Remove.
(__arm_vrshlq_s32): Remove.
(__arm_vrshlq_n_s32): Remove.
(__arm_vrshlq_m_n_u8): Remove.
(__arm_vrshlq_m_n_s8): Remove.
(__arm_vrshlq_m_n_u16): Remove.
(__arm_vrshlq_m_n_s16): Remove.
(__arm_vrshlq_m_n_u32): Remove.
(__arm_vrshlq_m_n_s32): Remove.
(__arm_vrshlq_m_s8): Remove.
(__arm_vrshlq_m_s32): Remove.
(__arm_vrshlq_m_s16): Remove.
(__arm_vrshlq_m_u8): Remove.
(__arm_vrshlq_m_u32): Remove.
(__arm_vrshlq_m_u16): Remove.
(__arm_vrshlq_x_s8): Remove.
(__arm_vrshlq_x_s16): Remove.
(__arm_vrshlq_x_s32): Remove.
(__arm_vrshlq_x_u8): Remove.
(__arm_vrshlq_x_u16): Remove.
(__arm_vrshlq_x_u32): Remove.
(__arm_vrshlq): Remove.
(__arm_vrshlq_m_n): Remove.
(__arm_vrshlq_m): Remove.
(__arm_vrshlq_x): Remove.
(vqrshlq): Remove.
(vqrshlq_m_n): Remove.
(vqrshlq_m): Remove.
(vqrshlq_u8): Remove.
(vqrshlq_n_u8): Remove.
(vqrshlq_s8): Remove.
(vqrshlq_n_s8): Remove.
(vqrshlq_u16): Remove.
(vqrshlq_n_u16): Remove.
(vqrshlq_s16): Remove.
(vqrshlq_n_s16): Remove.
(vqrshlq_u32): Remove.
(vqrshlq_n_u32): Remove.
(vqrshlq_s32): Remove.
(vqrshlq_n_s32): Remove.
(vqrshlq_m_n_u8): Remove.
(vqrshlq_m_n_s8): Remove.
(vqrshlq_m_n_u16): Remove.
(vqrshlq_m_n_s16): Remove.
(vqrshlq_m_n_u32): Remove.
(vqrshlq_m_n_s32): Remove.
(vqrshlq_m_s8): Remove.
(vqrshlq_m_s32): Remove.
(vqrshlq_m_s16): Remove.
(vqrshlq_m_u8): Remove.
(vqrshlq_m_u32): Remove.
(vqrshlq_m_u16): Remove.
(__arm_vqrshlq_u8): Remove.
(__arm_vqrshlq_n_u8): Remove.
(__arm_vqrshlq_s8): Remove.
(__arm_vqrshlq_n_s8): Remove.
(__arm_vqrshlq_u16): Remove.
(__arm_vqrshlq_n_u16): Remove.
(__arm_vqrshlq_s16): Remove.
(__arm_vqrshlq_n_s16): Remove.
(__arm_vqrshlq_u32): Remove.
(__arm_vqrshlq_n_u32): Remove.
(__arm_vqrshlq_s32): Remove.
(__arm_vqrshlq_n_s32): Remove.
(__arm_vqrshlq_m_n_u8): Remove.
(__arm_vqrshlq_m_n_s8): Remove.
(__arm_vqrshlq_m_n_u16): Remove.
(__arm_vqrshlq_m_n_s16): Remove.
(__arm_vqrshlq_m_n_u32): Remove.
(__arm_vqrshlq_m_n_s32): Remove.
(__arm_vqrshlq_m_s8): Remove.
(__arm_vqrshlq_m_s32): Remove.
(__arm_vqrshlq_m_s16): Remove.
(__arm_vqrshlq_m_u8): Remove.
(__arm_vqrshlq_m_u32): Remove.
(__arm_vqrshlq_m_u16): Remove.
(__arm_vqrshlq): Remove.
(__arm_vqrshlq_m_n): Remove.
(__arm_vqrshlq_m): Remove.

arm: [MVE intrinsics] factorize vqrshlq vrshlq

Factorize vqrshlq, vrshlq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_RSHIFT_M_N, MVE_RSHIFT_N): New.
(mve_insn): Add vqrshl, vrshl.
* config/arm/mve.md (mve_vqrshlq_n_<supf><mode>)
(mve_vrshlq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vqrshlq_m_n_<supf><mode>, mve_vrshlq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] add binary_round_lshift shape

This patch adds the binary_round_lshift shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_round_lshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_round_lshift): New.

RISC-V: Fix PR109615

This patch is to fix following case:
void f (int8_t * restrict in, int8_t * restrict out, int n, int m, int cond)
{
  size_t vl = 101;
  if (cond)
    vl = m * 2;
  else
    vl = m * 2 * vl;

  for (size_t i = 0; i < n; i++)
    {
      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
      __riscv_vse8_v_i8mf8 (out + i, v, vl);

      vbool64_t mask = __riscv_vlm_v_b64 (in + i + 100, vl);

      vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tumu (mask, v, in + i + 100, vl);
      __riscv_vse8_v_i8mf8 (out + i + 100, v2, vl);
    }

  for (size_t i = 0; i < n; i++)
    {
      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300, vl);
      __riscv_vse8_v_i8mf8 (out + i + 300, v, vl);
    }
}

The value of "vl" is coming from different blocks so it will be wrapped as a PHI node of each
block.

In the first loop, the "vl" source is a PHI node from bb 4.
In the second loop, the "vl" source is a PHI node from bb 5.
since bb 5 is dominated by bb 4, the PHI input of "vl" in the second loop is the PHI node of "vl"
in bb 4.
So when 2 "vl" PHI node are both degenerate PHI node (the phi->num_inputs () == 1) and their only
input are same, it's safe for us to consider they are compatible.

This patch is only optimize degenerate PHI since it's safe and simple optimization.

non-dengerate PHI are considered as incompatible unless the PHI are the same in RTL_SSA.
TODO: non-generate PHI is complicated, we can support it when it is necessary in the future.

Before this patch:

...
.L2:
addi    a4,a1,100
add     t1,a0,a2
mv      t0,a0
beq     a2,zero,.L1
vsetvli zero,a3,e8,mf8,tu,mu
.L4:
addi    a6,t0,100
addi    a7,a4,-100
vle8.v  v1,0(t0)
addi    t0,t0,1
vse8.v  v1,0(a7)
vlm.v   v0,0(a6)
vle8.v  v1,0(a6),v0.t
vse8.v  v1,0(a4)
addi    a4,a4,1
bne     t0,t1,.L4
addi    a0,a0,300
addi    a1,a1,300
add     a2,a0,a2
vsetvli zero,a3,e8,mf8,ta,ma
.L5:
vle8.v  v2,0(a0)
addi    a0,a0,1
vse8.v  v2,0(a1)
addi    a1,a1,1
bne     a2,a0,.L5
.L1:
ret

After this patch:

...
.L2:
addi    a4,a1,100
add     t1,a0,a2
mv      t0,a0
beq     a2,zero,.L1
vsetvli zero,a3,e8,mf8,tu,mu
.L4:
addi    a6,t0,100
addi    a7,a4,-100
vle8.v  v1,0(t0)
addi    t0,t0,1
vse8.v  v1,0(a7)
vlm.v   v0,0(a6)
vle8.v  v1,0(a6),v0.t
vse8.v  v1,0(a4)
addi    a4,a4,1
bne     t0,t1,.L4
addi    a0,a0,300
addi    a1,a1,300
add     a2,a0,a2
.L5:
vle8.v  v2,0(a0)
addi    a0,a0,1
vse8.v  v2,0(a1)
addi    a1,a1,1
bne     a2,a0,.L5
.L1:
ret

PR target/109615

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (avl_info::multiple_source_equal_p): Add
denegrate PHI optmization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-74.c: Adapt testcase.
* gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/pr109615.c: New test.

i386: Rename index_register_operand predicate to register_no_SP_operand

Rename index_register_operand predicate to what it really does.

No functional change.

gcc/ChangeLog:

* config/i386/predicates.md (register_no_SP_operand):
Rename from index_register_operand.
(call_register_operand): Update for rename.
* config/i386/i386.md (*lea<mode>_general_[1234]): Update for rename.