gcc.gnu.org Git - gcc.git/log

RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < 4; i++)
    if (a[i] < min_v)
      last = i;

  return last;
}

--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8

condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li a5,32
vmv.v.x v8,a1
vl8re32.v v0,0(a0)
vid.v v16
vmslt.vv v0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addi a5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vm v8,v16,v0
vslidedown.vx v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li a0,66
ret

--param=riscv-autovec-preference=scalable

condition_reduction:
csrr a6,vlenb
mv a2,a0
li a3,32
li a0,66
srli a6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma    ----> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
slli a1,a5,2
vmv.v.x v2,a6
vmslt.vv v0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addi a5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vm v2,v3,v0
vslidedown.vx v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret

There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue.

vsetvl issue is not included in this patch but will be fixed soon.

gcc/ChangeLog:

* config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.

(cherry picked from commit e7545cadbedfc167749d801bd574cf9fe22ed5c5)

RISC-V: Add Types to Un-Typed Sync Instructions:

Updates the sync instructions to ensure that no insn is left without
a type attribute. Updates a total of 9 insns to have type "atomic"
or type "multi" based on number of assembly instructions generated

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/sync-rvwmo.md: updated types to "multi" or
"atomic" based on number of assembly lines generated
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit df177510665c4e1045bdaadf10d837f1bdc4ea06)

RISC-V: Make stack_save_restore tests more robust

Spurred by Jivan's patch and a desire for cleaner testresults, I went ahead and
make the stack_save_restore tests independent of the precise stack size by
using a regexp.

gcc/testsuite/
* gcc.target/riscv/stack_save_restore_1.c: Robustify.
* gcc.target/riscv/stack_save_restore_2.c: Robustify.

(cherry picked from commit e1f096a3cc96c71907cfbc7b8baf67a3d863cb6d)

[committed] RISC-V: Fix minor testsuite problem with zicond

I thought I had already fixed this, but clearly if I did, I didn't include it
in any upstream commits.

With -Og the optimizers are hindered in various ways and this prevents using
zicond. So skip this test with -Og (it was already being skipped at -O0).

gcc/testsuite
* gcc.target/riscv/zicond-primitiveSemantics.c: Disable for -Og.

(cherry picked from commit 3cd2b73079bac374ce1c542b9c9e354e00a8713d)

[PATCH v10] RISC-V: Add support for the Zfa extension

This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb

The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html

What needs special explanation is:
1, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to
  accelerate the processing of JavaScript Numbers.", so it seems that no implementation
  is required.

2, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.
  Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and
  fmaxm<hf\sf\df>3 to prepare for later.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension version, which depends on
the F extension.
* config/riscv/constraints.md (zfli): Constrain the floating point number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is
not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is
required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate
in assembly.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm<mode>3): New.
(fmaxm<mode>3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_zfa): New.
* config/riscv/riscv.opt: New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fli-1.c: New test.
* gcc.target/riscv/zfa-fli-2.c: New test.
* gcc.target/riscv/zfa-fli-3.c: New test.
* gcc.target/riscv/zfa-fli-4.c: New test.
* gcc.target/riscv/zfa-fli-6.c: New test.
* gcc.target/riscv/zfa-fli-7.c: New test.
* gcc.target/riscv/zfa-fli-8.c: New test.

Co-authored-by: Tsukasa OI <research_trasio@irq.a4lg.com>
(cherry picked from commit 30699b999e94b66ff8706d3b07a35b2a9554d10c)

RISC-V: Enable Hoist to GCSE simple constants

Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates. For a simple const (say 6 which needs seperate insn "LI 6")
backend currently returns 0, causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
fine since riscv_rtx_costs () was returning non-zero (although that
itself might need refining: see bugzilla 111139).

To keep testsuite parity, some V tests need updating which started failing
in the new costing regime.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_rtx_costs): Adjust const_int
cost. Add some comments about different constants handling.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Remove test
for Jump.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit b41d7eb0e14785ff0ad6e6922cbd4c880e680bf9)

RISC-V: Add early continue for ENTRY and EXIT block

Committed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::compute_local_properties):
Add early continue.

(cherry picked from commit 449ab115dece8ac8e8f27d2d7b5bc653a2c75d3a)

RISC-V: Move vector-abi testcases into rvv/base folder

Resolves failures like this on rv32gcv linux:
compiler exited with status 1
output is:
In file included from /tc-baseline/build-linux-gcv/sysroot/usr/include/features.h:515,
                 from /tc-baseline/build-linux-gcv/sysroot/usr/include/bits/libc-header-start.h:33,
                 from /tc-baseline/build-linux-gcv/sysroot/usr/include/stdint.h:26,
                 from /tc-baseline/build-linux-gcv/lib/gcc/riscv32-unknown-linux-gnu/14.0.0/include/stdint.h:9,
                 from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/stdint.h:9,
                 from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/riscv_vector.h:28,
                 from /tc-baseline/gcc/gcc/testsuite/gcc.target/riscv/vector-abi-1.c:4:
/tc-baseline/build-linux-gcv/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.

Tested using:
rv{32/64}{gc/gcv} newlib
rv{32/64}gcv linux

gcc/testsuite/ChangeLog:

* gcc.target/riscv/vector-abi-1.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-1.c: ...here.
* gcc.target/riscv/vector-abi-2.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-2.c: ...here.
* gcc.target/riscv/vector-abi-3.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-3.c: ...here.
* gcc.target/riscv/vector-abi-4.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-4.c: ...here.
* gcc.target/riscv/vector-abi-5.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-5.c: ...here.
* gcc.target/riscv/vector-abi-6.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-6.c: ...here.
* gcc.target/riscv/vector-abi-7.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-7.c: ...here.
* gcc.target/riscv/vector-abi-8.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-8.c: ...here.
* gcc.target/riscv/vector-abi-9.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-9.c: ...here.

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
(cherry picked from commit 3ea624da71095cd480c31983d13db45bd9c5a738)

RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS testcases

This patch is depending on middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html

We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns.

Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the COND_LEN_FMS/COND_LEN_FNMS STMT fold.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_preferred_else_value): Remove it since
it forbid COND_LEN_FMS/COND_LEN_FNMS STMT fold.
(TARGET_PREFERRED_ELSE_VALUE): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adapt test.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-9.c: New test.

(cherry picked from commit 1fbcae1c6452c9939a4be818a64cd01883abd80e)

RISC-V: Enable pressure-aware scheduling by default.

this patch enables pressure-aware scheduling for riscv. There have been
various requests for it so I figured I'd just go ahead and send
the patch.

There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.

As cost and scheduling models mature I expect the situation to improve
and for now I think it's generally favorable to enable pressure-aware
scheduling so we can work with it rather than trying to find every
possible problem in advance.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add -fsched-pressure.
* config/riscv/riscv.cc (riscv_option_override): Set sched
pressure algorithm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/narrow_constraint-1.c: Add
-fno-sched-pressure.
* gcc.target/riscv/rvv/base/narrow_constraint-17.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-18.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-19.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-20.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-21.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-22.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-23.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-24.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-25.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-26.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-27.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-28.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-29.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-30.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-31.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-4.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-5.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.

(cherry picked from commit a047513c9222f14adc6e5a015e038b207bb9a653)

RISC-V: Allow const 17-31 for vector shift.

This patch adds a missing constraint in order to be able to print (and
not ICE) vector immediates 17-31 for vector shifts.

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Allow vk operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-immediate.c: New test.

(cherry picked from commit b6ba0cc9339f2cc81398863ae779daa6c8853ad6)

RISC-V: Add missing conversion tests.

This adds some missing tests for vf[nw]cvt.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c:
Add tests.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.

(cherry picked from commit e7aec3ae38ce740885e73255e12675174790758d)

RISC-V: Fix reduc_strict_run-1 test case.

This patch fixes the reduc_strict_run-1 testcase by introducing
a variable that holds the reference result. This is necessary
because in presence of _Float16 emulation an intermediate
result used in a comparison is computed in higher precision.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c:
Add variable to hold reference result.

(cherry picked from commit 8c3146ce0ee14bc6747fb92947879d82d43f3bb2)

gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

Hi, Richard and Richi.

Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.

Consider this following case:
  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,            \
      TYPE *__restrict a,              \
      TYPE *__restrict b, int n)       \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      dst[i] -= a[i] * b[i];                                           \
  }

  TEST_TYPE (float)                                                            \

TEST_ALL ()

Gimple IR for RVV:

...
_39 = -vect__8.14_26;
vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, vect__4.8_34, vect__4.8_34, _46, 0);
...

This is because this following piece of codes in tree-ssa-math-opts.cc:

      if (len)
fma_stmt
  = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
addop, else_value, len, bias);
      else if (cond)
fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
       op2, addop, else_value);
      else
fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
      gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
   use_stmt));
      gsi_replace (&gsi, fma_stmt, true);
      /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
regardless of where the negation occurs.  */
      gimple *orig_stmt = gsi_stmt (gsi);
      if (fold_stmt (&gsi, follow_all_ssa_edges))
{
  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
    gcc_unreachable ();
  update_stmt (gsi_stmt (gsi));
}

'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA ====> COND_LEN_FNMA.

This patch support STMT fold into:

vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, vect__4.8_34, { 0.0, ... }, _46, 0);

Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.

Extend maximum num ops:
-  static const unsigned int MAX_NUM_OPS = 5;
+  static const unsigned int MAX_NUM_OPS = 7;

Bootstrap and Regtest on X86 passed.
Tested on aarch64 Qemu.

Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.

gcc/ChangeLog:

* genmatch.cc (decision_tree::gen): Support
COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
* gimple-match-exports.cc (gimple_simplify): Ditto.
(gimple_resimplify6): New function.
(gimple_resimplify7): New function.
(gimple_match_op::resimplify): Support
COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
(convert_conditional_op): Ditto.
(build_call_internal): Ditto.
(try_conditional_simplification): Ditto.
(gimple_extract): Ditto.
* gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
* internal-fn.cc (CASE): Ditto.

VECT: Apply LEN_FOLD_EXTRACT_LAST into loop vectorizer

Hi.

This patch is apply LEN_FOLD_EXTRACT_LAST into loop vectorizer.

Consider this following case:

/* Simple condition reduction.  */

int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < N; i++)
    if (a[i] < min_v)
      last = i;

  return last;
}

With this patch, we can generate this following IR:

  _44 = .SELECT_VL (ivtmp_42, POLY_INT_CST [4, 4]);
  _34 = vect_vec_iv_.5_33 + { POLY_INT_CST [4, 4], ... };
  ivtmp_36 = _44 * 4;
  vect__4.8_39 = .MASK_LEN_LOAD (vectp_a.6_37, 32B, { -1, ... }, _44, 0);

  mask__11.9_41 = vect__4.8_39 < vect_cst__40;
  last_5 = .LEN_FOLD_EXTRACT_LAST (last_14, mask__11.9_41, vect_vec_iv_.5_33, _44, 0);
  ...

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_reduction): Apply
LEN_FOLD_EXTRACT_LAST.
* tree-vect-stmts.cc (vectorizable_condition): Ditto.

(cherry picked from commit a28d4fce8ec2540259a257149de7081f27fb027e)

tree-optimization/111128 - fix shift pattern recog

The following fixes placement of shift operand sanitization with
MIN when the original shift operand was external but the actual
one is not.

PR tree-optimization/111128
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Emit external shift operand inline if we promoted it with
another pattern stmt.

* gcc.dg/torture/pr111128.c: New testcase.

(cherry picked from commit 7b67cab154d4b5ec2a6bb62755da31cefbe63536)

RISC-V: Fix one typo in autovec.md pattern comment

vfmsac => vfnmacc
vfmsub => vfnmadd

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec.md: Fix typo.

(cherry picked from commit 1c51805e2468bc10057bc0f2fc12fab909d21d04)

RISC-V: Refactor RVV class by frm_op_type template arg

As suggested by kito, we will add new frm_opt_type template arg
to the op class, to avoid the duplicated function expand.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class binop_frm): Removed.
(class reverse_binop_frm): Ditto.
(class widen_binop_frm): Ditto.
(class vfmacc_frm): Ditto.
(class vfnmacc_frm): Ditto.
(class vfmsac_frm): Ditto.
(class vfnmsac_frm): Ditto.
(class vfmadd_frm): Ditto.
(class vfnmadd_frm): Ditto.
(class vfmsub_frm): Ditto.
(class vfnmsub_frm): Ditto.
(class vfwmacc_frm): Ditto.
(class vfwnmacc_frm): Ditto.
(class vfwmsac_frm): Ditto.
(class vfwnmsac_frm): Ditto.
(class unop_frm): Ditto.
(class vfrec7_frm): Ditto.
(class binop): Add frm_op_type template arg.
(class unop): Ditto.
(class widen_binop): Ditto.
(class widen_binop_fp): Ditto.
(class reverse_binop): Ditto.
(class vfmacc): Ditto.
(class vfnmsac): Ditto.
(class vfmadd): Ditto.
(class vfnmsub): Ditto.
(class vfnmacc): Ditto.
(class vfmsac): Ditto.
(class vfnmadd): Ditto.
(class vfmsub): Ditto.
(class vfwmacc): Ditto.
(class vfwnmacc): Ditto.
(class vfwmsac): Ditto.
(class vfwnmsac): Ditto.
(class float_misc): Ditto.

(cherry picked from commit 0345152f922c3a58ae0a8ee014e37dcfab35592c)

Improve quality of code from LRA register elimination

This is primarily Jivan's work, I'm mostly responsible for the write-up and
coordinating with Vlad on a few questions.

On targets with limitations on immediates usable in arithmetic instructions,
LRA's register elimination phase can construct fairly poor code.

This example (from the GCC testsuite) illustrates the problem well.

int  consume (void *);
int foo (void) {
  int x[1000000];
  return consume (x + 1000);
}

If you compile on riscv64-linux-gnu with "-O2 -march=rv64gc -mabi=lp64d", then
you'll get this code (up to the call to consume()).

        .cfi_startproc
        li      t0,-4001792
        li      a0,-3997696
        li      a5,4001792
        addi    sp,sp,-16
        .cfi_def_cfa_offset 16
        addi    t0,t0,1792
        addi    a0,a0,1696
        addi    a5,a5,-1792
        sd      ra,8(sp)
        add     a5,a5,a0
        add     sp,sp,t0
        .cfi_def_cfa_offset 4000016
        .cfi_offset 1, -8
        add     a0,a5,sp
        call    consume

Of particular interest is the value in a0 when we call consume. We compute that
horribly inefficiently.   If we back-substitute from the final assignment to a0
we get...

a0 = a5 + sp
a0 = a5 + (sp + t0)
a0 = (a5 + a0) + (sp + t0)
a0 = ((a5 - 1792) + a0) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + (t0 + 1792))
a0 = (a5 + (a0 + 1696)) + (sp + t0)  // removed offsetting terms
a0 = (a5 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + -4001792)
a0 = (-3997696 + 1696) + (sp -16) // removed offsetting terms
a0 = sp - 3990616

That's a pretty convoluted way to compute sp - 3990616.

Something like this would be notably better (not great, but we need both the
stack adjustment and the address of the object to pass to consume):

   addi sp,sp,-16
   sd ra,8(sp)
   li t0,-4001792
   addi t0,t0,1792
   add sp,sp,t0
   li a0,4096
   addi a0,a0,-96
   add a0,sp,a0
   call consume

The problem is LRA's elimination code is not handling the case where we have
(plus (reg1) (reg2) where reg1 is an eliminable register and reg2 has a known
equivalency, particularly a constant.

If we can determine that reg2 is equivalent to a constant and treat (plus
(reg1) (reg2)) in the same way we'd treat (plus (reg1) (const_int)) then we can
get the desired code.

This eliminates about 19b instructions, or roughly 1% for deepsjeng on rv64.
There are improvements elsewhere, but they're relatively small.  This may
ultimately lessen the value of Manolis's fold-mem-offsets patch.  So we'll have
to evaluate that again once he posts a new version.

Bootstrapped and regression tested on x86_64 as well as bootstrapped on rv64.
Earlier versions have been tested against spec2017.  Pre-approved by Vlad in a
private email conversation (thanks Vlad!).

Committed to the trunk,

gcc/
* lra-eliminations.cc (eliminate_regs_in_insn): Use equivalences to
to help simplify code further.

(cherry picked from commit 6619b3d4c15cd754798b1048c67f3806bbcc2e6d)

[PATCH] RISC-V:add a more appropriate type attribute

Due to the more accurate type attribute added to the clz, ctz, and pcnt
operations in https://github.com/gcc-mirror/gcc/commit/07e2576d6f3 the
same type attribute should be used here.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*<bitmanip_optab>disi2_sext): Add a more
appropriate type attribute.

(cherry picked from commit 18befd6f050e70f11ecca1dd58624f0ee3c68cc7)

RISC-V: Add conditional unary neg/abs/not autovec patterns

Hi,

This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend.
For this C code:

void
test_3 (float *__restrict a, float *__restrict b, int *__restrict pred, int n)
{
  for (int i = 0; i < n; i += 1)
    {
      a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i];
    }
}

Before this patch:
        ...
        vsetvli a7,zero,e32,m1,ta,ma
        vfabs.v v2,v2
        vmerge.vvm      v1,v1,v2,v0
        ...

After this patch:
        ...
        vsetvli a7,zero,e32,m1,ta,mu
        vfabs.v v1,v2,v0.t
        ...

For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns
is enough.
For the FP abs pattern, We need to change the definition of `abs<mode>2` and
`@vcond_mask_<mode><vm>` pattern from define_expand to define_insn_and_split
in order to fuse them into a new pattern `*cond_abs<mode>` at the combine pass.
A fusion process similar to the one below:

(insn 30 29 31 4 (set (reg:RVVM1SF 152 [ vect_iftmp.15 ])
        (abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))) "float.c":15:56 discrim 1 12799 {absrvvm1sf2}
     (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ])
        (nil)))

(insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ])
        (if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ])
            (reg:RVVM1SF 152 [ vect_iftmp.15 ])
            (reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 12707 {vcond_mask_rvvm1sfrvvmf32bi}
     (expr_list:REG_DEAD (reg:RVVM1SF 152 [ vect_iftmp.15 ])
        (expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ])
            (expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ])
                (nil)))))
==>

(insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ])
        (if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ])
            (abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))
            (reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 13444 {*cond_absrvvm1sf}
     (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ])
        (expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ])
            (expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ])
                (nil)))))

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_abs<mode>): New combine pattern.
(*copysign<mode>_neg): Ditto.
* config/riscv/autovec.md (@vcond_mask_<mode><vm>): Adjust.
(<optab><mode>2): Ditto.
(cond_<optab><mode>): New.
(cond_len_<optab><mode>): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New.
(expand_cond_len_unop): New helper func.
* config/riscv/riscv-v.cc (shuffle_merge_patterns): Adjust.
(expand_cond_len_unop): New helper func.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.

(cherry picked from commit 92f2ec417c57e980b92b8966226fc2bfbf042af8)

RISC-V: Fix potential ICE of global vsetvl elimination

Committed for following VSETVL refactor patch to make V2 patch easier to review.
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(pass_vsetvl::global_eliminate_vsetvl_insn): Fix potential ICE.

(cherry picked from commit 3beef5e6b5b12b5c90040c8485f1836e2dd6cf83)

RISC-V: Fix VTYPE fuse rule bug

This bug is exposed after refactor patch.
Separate it and commited.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (ge_sew_ratio_unavailable_p):
Fix fuse rule bug.
* config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Ditto.

(cherry picked from commit 29487eb237b893c673e9ecc6409b175e22792f13)

RISC-V: Fix gather_load_run-12.c test

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c:
Add vsetvli asm.

(cherry picked from commit 5f3c8075f230309c4417b0e5256283d010ac99d2)

RISC-V: Add attribute to vtype change only vsetvl

This patch is prepare patch for VSETVL PASS.

Commited.

gcc/ChangeLog:

* config/riscv/vector.md: Add attribute.

(cherry picked from commit ea1eb12a38f09e494d5ef072e55653a6463d57eb)

RISC-V: Adapt live-1.c testcase

Commited.

Fix failures:

FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c scan-tree-dump-times optimized ".VEC_EXTRACT" 10
FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c scan-tree-dump-times optimized ".VEC_EXTRACT" 10

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/live-1.c: Adapt test.

(cherry picked from commit d18296e844f35c529f338569622b85fc44d68b5f)

RISC-V: Clang format riscv-vsetvl.cc[NFC]

Commited.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (change_insn): Clang format.
(vector_infos_manager::all_same_ratio_p): Ditto.
(vector_infos_manager::all_same_avl_p): Ditto.
(pass_vsetvl::refine_vsetvls): Ditto.
(pass_vsetvl::cleanup_vsetvls): Ditto.
(pass_vsetvl::commit_vsetvls): Ditto.
(pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::compute_probabilities): Ditto.

(cherry picked from commit 10a7d31dd5cdb2689272b5664627daece9b154ee)

RISC-V: Add riscv-vsetvl.def to t-riscv

This patch will be backport to GCC 13 and commit to trunk.
gcc/ChangeLog:

* config/riscv/t-riscv: Add riscv-vsetvl.def

(cherry picked from commit b817bfad31b3bb8701ad1b6bd350b841e45693df)

RISC-V: output Autovec params explicitly in --help ...

... otherwise user has no clue what -param to actually change

gcc/ChangeLog:
* config/riscv/riscv.opt: Add --param names
riscv-autovec-preference and riscv-autovec-lmul

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit 3571cc93511b39f7a403fe5eab0e316cd7e86220)

RISC-V: Add multiarch support on riscv-linux-gnu

This adds multiarch support to the RISC-V port so that bootstraps work with
Debian out-of-the-box. Without this patch the stage1 compiler is unable to
find headers/libraries when building the stage1 runtime.

This is functionally (and possibly textually) equivalent to Debian's fix for
the same problem.

gcc/
* config/riscv/t-linux: Add MULTIARCH_DIRNAME.

(cherry picked from commit 47f95bc4be4eb14730ab3eaaaf8f6e71fda47690)

VECT: Add LEN_FOLD_EXTRACT_LAST pattern

Hi, Richard and Richi.

This is the last autovec pattern I want to add for RVV (length loop control).

This patch is supposed to handled this following case:

int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v, int n)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < n; i++)
    if (a[i] < min_v)
      last = i;

  return last;
}

ARM SVE IR:

  ...
  mask__7.11_39 = vect__4.10_37 < vect_cst__38;
  _40 = loop_mask_36 & mask__7.11_39;
  last_5 = .FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32);
  ...

RVV IR, we want to see:
...
loop_len = SELECT_VL
mask__7.11_39 = vect__4.10_37 < vect_cst__38;
last_5 = .LEN_FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32, loop_len, bias);
...

gcc/ChangeLog:

* doc/md.texi: Add LEN_FOLD_EXTRACT_LAST pattern.
* internal-fn.cc (fold_len_extract_direct): Ditto.
(expand_fold_len_extract_optab_fn): Ditto.
(direct_fold_len_extract_optab_supported_p): Ditto.
* internal-fn.def (LEN_FOLD_EXTRACT_LAST): Ditto.
* optabs.def (OPTAB_D): Ditto.

(cherry picked from commit f4658e025424ac281dd8b7e61f798f435dbf1cab)

VECT: Support loop len control on EXTRACT_LAST vectorization

Hi, @Richi and @Richard, base on previous disscussion, I simpily fix issuses for
powerpc and s390 with your suggestions:

-  machine_mode len_load_mode = get_len_load_store_mode
-    (loop_vinfo->vector_mode, true).require ();
-  machine_mode len_store_mode = get_len_load_store_mode
-    (loop_vinfo->vector_mode, false).require ();
+  machine_mode len_load_mode, len_store_mode;
+  if (!get_len_load_store_mode (loop_vinfo->vector_mode, true)
+        .exists (&len_load_mode))
+    return false;
+  if (!get_len_load_store_mode (loop_vinfo->vector_mode, false)
+        .exists (&len_store_mode))
+    return false;

Co-Authored-By: Kewen.Lin <linkw@linux.ibm.com>
gcc/ChangeLog:

* tree-vect-loop.cc (vect_verify_loop_lens): Add exists check.
(vectorizable_live_operation): Add live vectorization for length loop
control.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/live-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/live_run-1.c: New test.

(cherry picked from commit c27f06260b248062c3b22f3963858ce3e1ee1882)

RISC-V: Change fnms testcases assertion to xfail

Hi,

This patch fixes inappropriate assertions in fnms testcases since
we want to generate .COND_FNMS but actually generate .FNMS + .VCOND_MASK.
A patch to do this optimization will follow.

Best,
Lehua

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto.

(cherry picked from commit eaabae8e305d8df244a00177b92e5d1101600ab0)

RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWREDUSUM.VS as the below samples

* __riscv_vfwredusum_vs_f32m1_f64m1_rm
* __riscv_vfwredusum_vs_f32m1_f64m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfwredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredusum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wredusum.c: New test.

(cherry picked from commit 1d17e3d66736cc8d875bf02530f3f6aa498f0d09)

[PATCH] RISC-V: Add Types to Missing Bitmanip Instructions

This patch updates the bitmanip instructions to ensure that no insn is left
without a type attribute. Updates a total of 8 insns to have type "bitmanip"

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/bitmanip.md: Added bitmanip type to insns
that are missing types.

(cherry picked from commit 36788c9ff6d044210ddee23154306ba54bc3087b)

tree-optimization/110897 - Fix missed vectorization of shift on both RISC-V and aarch64

Consider this following case:

#include <stdint.h>

#define TEST2_TYPE(TYPE) \
  __attribute__((noipa)) \
  void vshiftr_##TYPE (TYPE *__restrict dst, TYPE *__restrict a, TYPE *__restrict b, int n) \
  { \
    for (int i = 0; i < n; i++) \
      dst[i] = (a[i]) >> b[i]; \
  }

#define TEST_ALL() \
TEST2_TYPE(uint8_t) \
TEST2_TYPE(uint16_t) \
TEST2_TYPE(uint32_t) \
TEST2_TYPE(uint64_t) \

TEST_ALL()

Both RISC-V and aarch64 of trunk GCC failed vectorize uint8_t/uint16_t with following missed report:

<source>:17:1: missed: couldn't vectorize loop
<source>:17:1: missed: not vectorized: relevant stmt not supported: patt_46 = MIN_EXPR <_6, 7>;
<source>:17:1: missed: couldn't vectorize loop
<source>:17:1: missed: not vectorized: relevant stmt not supported: patt_47 = MIN_EXPR <_7, 15>;
Compiler returned: 0

Both GCC 13.1 can vectorize, see:

https://godbolt.org/z/6vaMK5M1o

Bootstrap and regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Add op vectype.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-1.c: Adapt testcase.

(cherry picked from commit c5f673dbc252e35e6b66e9b8abd30a4027193e0b)

tree-optimization/110838 - vectorization of widened right shifts

The following fixes a problem with my last attempt of avoiding
out-of-bound shift values for vectorized right shifts of widened
operands. Instead of truncating the shift amount with a bitwise
and we actually need to saturate it to the target precision.

The following does that and adds test coverage for the constant
and invariant but variable case that would previously have failed.

PR tree-optimization/110838
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Fix right-shift value sanitizing. Properly emit external
def mangling in the preheader rather than in the pattern
def sequence where it will fail vectorizing.

* gcc.dg/vect/pr110838.c: New testcase.

(cherry picked from commit 1a599caab86464006ea8c9501aff6c6638e891eb)

tree-optimization/110838 - vectorization of widened shifts

The following makes sure to limit the shift operand when vectorizing
(short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
operand otherwise invokes undefined behavior. When we determine
whether we can demote the operand we know we at most shift in the
sign bit so we can adjust the shift amount.

Note this has the possibility of un-CSEing common shift operands
as there's no good way to share pattern stmts between patterns.
We'd have to separately pattern recognize the definition.

PR tree-optimization/110838
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Adjust the shift operand of RSHIFT_EXPRs.

* gcc.dg/torture/pr110838.c: New testcase.

(cherry picked from commit 29370f1387274ad5a35a020db6a5d06c0324e6c1)

vect: Handle demoting FLOAT and promoting FIX_TRUNC.

The recent changes that allowed multi-step conversions for
"non-packing/unpacking", i.e. modifier == NONE targets included
promoting to-float and demoting to-int variants. This patch
adds the missing demoting to-float and promoting to-int handling.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_conversion): Handle
more demotion/promotion for modifier == NONE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c: New test.

(cherry picked from commit fe39eca4136bf2f9dc740c05e7957027736fc11b)

Use cvt_op to save intermediate type operand instead of "subtle" vec_dest.

When there're multiple operands in vec_oprnds0, vec_dest will be
overwrited to vectype_out, but in multi_step_cvt case, cvt_type is
expected. It caused an ICE when verify_gimple_in_cfg.

gcc/ChangeLog:

PR tree-optimization/110371
PR tree-optimization/110018
* tree-vect-stmts.cc (vectorizable_conversion): Use cvt_op to
save intermediate type operand instead of "subtle" vec_dest
for case NONE.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110371.c: New test.

(cherry picked from commit 1bfe7e5352d1f4ac525317454aca45aa80a517ba)

Don't use intermiediate type for FIX_TRUNC_EXPR when ftrapping-math.

> > Hmm, good question.  GENERIC has a direct truncation to unsigned char
> > for example, the C standard generally says if the integral part cannot
> > be represented then the behavior is undefined.  So I think we should be
> > safe here (0x1.0p32 doesn't fit an int).
>
> We should be following Annex F (unspecified value plus "invalid" exception
> for out-of-range floating-to-integer conversions rather than undefined
> behavior).  But we don't achieve that very well at present (see bug 93806
> comments 27-29 for examples of how such conversions produce wobbly
> values).

That would mean guarding this with !flag_trapping_math would be the appropriate
thing to do.

gcc/ChangeLog:

PR tree-optimization/110371
PR tree-optimization/110018
* tree-vect-stmts.cc (vectorizable_conversion): Don't use
intermiediate type for FIX_TRUNC_EXPR when ftrapping-math.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110018-1.c: Add -fno-trapping-math to dg-options.
* gcc.target/i386/pr110018-2.c: Ditto.

(cherry picked from commit 77a50c772771f681085922b493922516c3c03e9a)

Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

We have already use intermidate type in case WIDEN, but not for NONE,
this patch extended that.

gcc/ChangeLog:

PR target/110018
* tree-vect-stmts.cc (vectorizable_conversion): Use
intermiediate integer type for float_expr/fix_trunc_expr when
direct optab is not existed.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110018-1.c: New test.

(cherry picked from commit 6f19cf7526168f840fd22f6af3f0cb67efb90dc8)

vect: Cost intermediate conversions

g:6f19cf7526168f8 extended N-vector to N-vector conversions
to handle cases where an intermediate integer extension or
truncation is needed. This patch adjusts the cost to account
for these intermediate conversions.

gcc/
* tree-vect-stmts.cc (vectorizable_conversion): Take multi_step_cvt
into account when costing non-widening/truncating conversions.

(cherry picked from commit 9302b0743b366037379af0568534c23ab597b4d4)

vect: Refactor to allow internal_fn's

Refactor vect-patterns to allow patterns to be internal_fns starting
with widening_plus/minus patterns

2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>

gcc/ChangeLog:
* tree-vect-patterns.cc: Add include for gimple-iterator.
(vect_recog_widen_op_pattern): Refactor to use code_helper.
(vect_gimple_build): New function.
* tree-vect-stmts.cc (simple_integer_narrowing): Refactor to use
code_helper.
(vectorizable_call): Likewise.
(vect_gen_widened_results_half): Likewise.
(vect_create_vectorized_demotion_stmts): Likewise.
(vect_create_vectorized_promotion_stmts): Likewise.
(vect_create_half_widening_stmts): Likewise.
(vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.
* tree-vectorizer.h (supportable_widening_operation): Change
prototype to use code_helper.
(supportable_narrowing_operation): Likewise.
(vect_gimple_build): New function prototype.
* tree.h (code_helper::safe_as_tree_code): New function.
(code_helper::safe_as_fn_code): New function.

(cherry picked from commit fe29963d40a721d18b5f688b9d54dd9021bfb90a)

Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
intermediate integer type whenever gimple ranger can tell it's safe.

.i.e.
When there's no direct optab for vector long long -> vector float, but
the value range of integer can be represented as int, try vector int
-> vector float if availble.

gcc/ChangeLog:

PR tree-optimization/108804
* tree-vect-patterns.cc (vect_get_range_info): Remove static.
* tree-vect-stmts.cc (vect_create_vectorized_demotion_stmts):
Add new parameter narrow_src_p.
(vectorizable_conversion): Enhance NARROW FLOAT_EXPR
vectorization by truncating to lower precision.
* tree-vectorizer.h (vect_get_range_info): New declare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108804.c: New test.

genmatch: split shared code to gimple-match-exports.cc

In preparation for automatically splitting match.pd files I split off the
non-static helper functions that are shared between the match.pd functions off
to another file.

This file can be compiled in parallel and also allows us to later avoid
duplicate symbols errors.

gcc/ChangeLog:

PR bootstrap/84402
* Makefile.in (OBJS): Add gimple-match-exports.o.
* genmatch.cc (decision_tree::gen): Export gimple_gimplify helpers.
* gimple-match-head.cc (gimple_simplify, gimple_resimplify1,
gimple_resimplify2, gimple_resimplify3, gimple_resimplify4,
gimple_resimplify5, constant_for_folding, convert_conditional_op,
maybe_resimplify_conditional_op, gimple_match_op::resimplify,
maybe_build_generic_op, build_call_internal, maybe_push_res_to_seq,
do_valueize, try_conditional_simplification, gimple_extract,
gimple_extract_op, canonicalize_code, commutative_binary_op_p,
commutative_ternary_op_p, first_commutative_argument,
associative_binary_op_p, directly_supported_p,
get_conditional_internal_fn): Moved to gimple-match-exports.cc
* gimple-match-exports.cc: New file.

(cherry picked from commit 27fcf994c5515e1bbf2ff03d28fd2fa927c7e7b5)

tree-optimization/110897 - Fix missed vectorization of shift on both RISC-V and aarch64

[ Partial, just the testsuite to make comparisons against trunk easier ]

Consider this following case:

  __attribute__((noipa)) \
  void vshiftr_##TYPE (TYPE *__restrict dst, TYPE *__restrict a, TYPE *__restrict b, int n) \
  { \
    for (int i = 0; i < n; i++) \
      dst[i] = (a[i]) >> b[i]; \
  }

TEST2_TYPE(uint8_t) \
TEST2_TYPE(uint16_t) \
TEST2_TYPE(uint32_t) \
TEST2_TYPE(uint64_t) \

TEST_ALL()

Both RISC-V and aarch64 of trunk GCC failed vectorize uint8_t/uint16_t with following missed report:

<source>:17:1: missed: couldn't vectorize loop
<source>:17:1: missed: not vectorized: relevant stmt not supported: patt_46 = MIN_EXPR <_6, 7>;
<source>:17:1: missed: couldn't vectorize loop
<source>:17:1: missed: not vectorized: relevant stmt not supported: patt_47 = MIN_EXPR <_7, 15>;
Compiler returned: 0

Both GCC 13.1 can vectorize, see:

https://godbolt.org/z/6vaMK5M1o

Bootstrap and regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Add op vectype.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-1.c: Adapt testcase.

(cherry picked from commit c5f673dbc252e35e6b66e9b8abd30a4027193e0b)

vect: Handle demoting FLOAT and promoting FIX_TRUNC.

[Partial, testsuite only to make comparisons against the trunk easier ]

The recent changes that allowed multi-step conversions for
"non-packing/unpacking", i.e. modifier == NONE targets included
promoting to-float and demoting to-int variants. This patch
adds the missing demoting to-float and promoting to-int handling.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_conversion): Handle
more demotion/promotion for modifier == NONE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c: New test.

(cherry picked from commit fe39eca4136bf2f9dc740c05e7957027736fc11b)

RISC-V: Fix wrong select_kind in riscv_compute_multilib

Seems like I screw up bare-metal toolchian multi lib selection during
finxing linux multi-lib selction...

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_compute_multilib):
Fix wrong select_kind...

(cherry picked from commit 008cbecf622a413ebcc8b41a737f30fd7e2a1abf)

RISC-V: Suppress unused parameter warning in riscv-common.cc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_select_multilib_by_abi):
Drop unused parameter.
(riscv_select_multilib): Ditto.
(riscv_compute_multilib): Update call site of
riscv_select_multilib_by_abi and riscv_select_multilib_by_abi.

(cherry picked from commit 7a7f6b26259d22115ee4813ce130622ad1073d16)

poly_int: Handle more can_div_trunc_p cases

can_div_trunc_p (a, b, &Q, &r) tries to compute a Q and r that
satisfy the usual conditions for truncating division:

     (1) a = b * Q + r
     (2) |b * Q| <= |a|
     (3) |r| < |b|

We can compute Q using the constant component (the case when
all indeterminates are zero).  Since |r| < |b| for the constant
case, the requirements for indeterminate xi with coefficients
ai (for a) and bi (for b) are:

     (2') |bi * Q| <= |ai|
     (3') |ai - bi * Q| <= |bi|

(See the big comment for more details, restrictions, and reasoning).

However, the function works on abstract arithmetic types, and so
it has to be careful not to introduce new overflow.  The code
therefore only handled the extreme for (3'), that is:

     |ai - bi * Q| = |bi|

for the case where Q is zero.

Looking at it again, the overflow issue is a bit easier to handle than
I'd originally thought (or so I hope).  This patch therefore extends the
code to handle |ai - bi * Q| = |bi| for all Q, with Q = 0 no longer
being a separate case.

The net effect is to allow the function to succeed for things like:

     (a0 + b1 (Q+1) x) / (b0 + b1 x)

where Q = a0 / b0, with various sign conditions.  E.g. we now handle:

     (7 + 8x) / (4 + 4x)

with Q = 1 and r = 3 + 4x,

gcc/
* poly-int.h (can_div_trunc_p): Succeed for more boundary conditions.

gcc/testsuite/
* gcc.dg/plugin/poly-int-tests.h (test_can_div_trunc_p_const)
(test_can_div_trunc_p_const): Add more tests.

(cherry picked from commit 9524718654c3e4a13dd88bc1ac6409da1ec44e71)

[RISCV][committed] Remove spurious newline in ztso sequence

amo-table-ztso-load-3 the coordination branch after merging up the Ztso changes
due to a spurious newline in the output causing scan-function-body to fail.
There's probably an over-zealous .* or similar regexp in the framework. I
didn't see it in a quick scan, but could have easily missed it.

Regardless, fixing the extraneous newline is easy :-)

gcc/
* config/riscv/sync-ztso.md (atomic_load_ztso<mode>): Avoid extraenous
newline.

(cherry picked from commit 39491441a3aca7725d5a6dfeea4b01229d30c899)

[PATCH 2/2] RISC-V: Add quotes to #error messages (all)

From: Tsukasa OI <research_trasio@irq.a4lg.com>

In commit 1aaf3a64e92a ("[PATCH] RISC-V: Deduplicate #error messages in
testsuite"), the author made a mistake to miss the test after adding
quotes around extension names. To avoid future errors and for consistency
with other #error uses in the RISC-V testsuite, this commit quotes all
unquoted #error messages.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadba.c: Quote unquoted #error message.
* gcc.target/riscv/xtheadbb.c: Ditto.
* gcc.target/riscv/xtheadbs.c: Ditto.
* gcc.target/riscv/xtheadcmo.c: Ditto.
* gcc.target/riscv/xtheadcondmov.c: Ditto.
* gcc.target/riscv/xtheadfmemidx.c: Ditto.
* gcc.target/riscv/xtheadfmv.c: Ditto.
* gcc.target/riscv/xtheadint.c: Ditto.
* gcc.target/riscv/xtheadmac.c: Ditto.
* gcc.target/riscv/xtheadmemidx.c: Ditto.
* gcc.target/riscv/xtheadmempair.c: Ditto.
* gcc.target/riscv/xtheadsync.c: Ditto.
* gcc.target/riscv/zawrs.c: Ditto.
* gcc.target/riscv/zvbb.c: Ditto.
* gcc.target/riscv/zvbc.c: Ditto.
* gcc.target/riscv/zvkg.c: Ditto.
* gcc.target/riscv/zvkned.c: Ditto.
* gcc.target/riscv/zvknha.c: Ditto.
* gcc.target/riscv/zvknhb.c: Ditto.
* gcc.target/riscv/zvksed.c: Ditto.
* gcc.target/riscv/zvksh.c: Ditto.
* gcc.target/riscv/zvkt.c: Ditto.

(cherry picked from commit ab7de14eaf1d454cb8cbc37dbde89688ec6b7f5a)

[PATCH 1/2] RISC-V: Add quotes to #error messages

In commit 1aaf3a64e92a ("[PATCH] RISC-V: Deduplicate #error messages in
testsuite"), the author made a mistake to miss the test after adding
quotes around extension names. To avoid future errors and for consistency
with other #error uses in the RISC-V testsuite, this commit quotes #error
messages where necessary to avoid current test case failures.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvkn.c: Quote #error messages.
* gcc.target/riscv/zvkn-1.c: Ditto.
* gcc.target/riscv/zvknc.c: Ditto.
* gcc.target/riscv/zvknc-1.c: Ditto.
* gcc.target/riscv/zvknc-2.c: Ditto.
* gcc.target/riscv/zvkng.c: Ditto.
* gcc.target/riscv/zvkng-1.c: Ditto.
* gcc.target/riscv/zvkng-2.c: Ditto.
* gcc.target/riscv/zvks.c: Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvksc.c: Ditto.
* gcc.target/riscv/zvksc-1.c: Ditto.
* gcc.target/riscv/zvksc-2.c: Ditto.
* gcc.target/riscv/zvksg.c: Ditto.
* gcc.target/riscv/zvksg-1.c: Ditto.
* gcc.target/riscv/zvksg-2.c: Ditto.

(cherry picked from commit 56c28ce7b52d181641904b4a4a441301a848cf48)

LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

This patch exports 'compute_antinout_edge' and 'compute_earliest' as global scope
which is going to be used in VSETVL PASS of RISC-V backend.

The demand fusion is the fusion of VSETVL information to emit VSETVL which dominate and pre-config for most
of the RVV instructions in order to elide redundant VSETVLs.

For exmaple:

for
for
  for
    if (cond}
      VSETVL demand 1: SEW/LMUL = 16 and TU policy
    else
      VSETVL demand 2: SEW = 32

VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW = 32, LMUL = M2, TU policy.
Then emit such VSETVL at the outmost of the for loop to get the most optimal codegen and run-time execution.

Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and un-reliable as well as un-maintainable.
And, I recently read dragon book and morgan's book again, I found there "earliest" can allow us to do the
demand fusion in a very reliable and optimal way.

So, this patch exports these 2 functions which are very helpful for VSETVL pass.

gcc/ChangeLog:

* lcm.cc (compute_antinout_edge): Export as global use.
(compute_earliest): Ditto.
(compute_rev_insert_delete): Ditto.
* lcm.h (compute_antinout_edge): Ditto.
(compute_earliest): Ditto.

(cherry picked from commit d5dfba19aee783a6ba90fdba1993d576c7ec310b)

RISC-V: Fix -march error of zhinxmin testcases

This little patch fixs the -march error of a zhinxmin testcase I added earlier
and an old zhinxmin testcase, since these testcases are for zhinxmin extension
and not zfhmin extension.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Float16-zhinxmin-3.c: Adjust.
* gcc.target/riscv/_Float16-zhinxmin-4.c: Ditto.

(cherry picked from commit b4c8c551c48f5f29d9a719c4c7fc4fa4cec28fe7)

RISC-V: Add the missed half floating-point mode patterns of local_pic_load/store when only use zfhmin or zhinxmin

Hi,

There is a new failed RISC-V testcase(testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c)
on the current trunk branch when use medany as default cmodel.
The reason is the load of half floating-point imm is convert from RTL 1 to RTL
2 as the cmodel be changed from medlow to medany. This change let insn 7 be
combineed with @pred_broadcast patterns (insn 8) at combine pass. However,
insn 6 and insn 7 are combined for SF and DF mode, but not for HF mode, and
the fail combined leads to insn 7 and insn 8 be combined. The reason of the
fail combined is the local_pic_loadhf pattern doesn't exist when only enable
zfhmin(implied by zvfh).

Therefore, when only zfhmin but not zfh is enabled, the define_insn of
*local_pic_load<ANYF:mode> must also be able to produce the pattern for
*load_pic_loadhf pattern, since the zfhmin extension also includes a
half floating-point load/store instructions. So, I added an ANFLSF Iterator
and applied it to local_pic_load/store define_insns. I have checked other ANYF
usage scenarios and feel that this is the only place that needs to be corrected.
I may have missed something, please correct. Thanks.

RTL 1:

(insn 6 3 7 2 (set (reg:DI 137)
        (high:DI (symbol_ref/u:DI ("*.LC0") [flags 0x82]))) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":7:1 discrim 3 179 {*movdi_64bit}
     (nil))
(insn 7 6 8 2 (set (reg:HF 136)
        (mem/u/c:HF (lo_sum:DI (reg:DI 137)
                (symbol_ref/u:DI ("*.LC0") [flags 0x82])) [0  S2 A16])) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":7:1 discrim 3 126 {*movhf_hardfloat}
     (expr_list:REG_EQUAL (const_double:HF 8.8828125e+0 [0x0.8e2p+4])
        (nil)))

RTL 2:

(insn 6 3 7 2 (set (reg/f:DI 137)
        (symbol_ref/u:DI ("*.LC0") [flags 0x82])) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":7:1 discrim 3 179 {*movdi_64bit}
     (nil))
(insn 7 6 8 2 (set (reg:HF 136)
        (mem/u/c:HF (reg/f:DI 137) [0  S2 A16])) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":7:1 discrim 3 126 {*movhf_hardfloat}
     (expr_list:REG_EQUAL (const_double:HF 8.8828125e+0 [0x0.8e2p+4])
        (nil)))
(insn 8 7 9 2 (set (reg:V2HF 135)
        (if_then_else:V2HF (unspec:V2BI [
                    (const_vector:V2BI [
                            (const_int 1 [0x1]) repeated x2
                        ])
                    (const_int 2 [0x2]) repeated x3
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (vec_duplicate:V2HF (reg:HF 136))
            (unspec:V2HF [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":6:1 discrim 3 1389 {*pred_broadcastv2hf}
     (nil))

Best,
Lehua

gcc/ChangeLog:

* config/riscv/iterators.md (TARGET_HARD_FLOAT || TARGET_ZFINX): New.
* config/riscv/pic.md (*local_pic_load<ANYF:mode>): Change ANYF.
(*local_pic_load<ANYLSF:mode>): To ANYLSF.
(*local_pic_load_32d<ANYF:mode>): Ditto.
(*local_pic_load_32d<ANYLSF:mode>): Ditto.
(*local_pic_store<ANYF:mode>): Ditto.
(*local_pic_store<ANYLSF:mode>): Ditto.
(*local_pic_store_32d<ANYF:mode>): Ditto.
(*local_pic_store_32d<ANYLSF:mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Float16-zfhmin-4.c: New test.
* gcc.target/riscv/_Float16-zhinxmin-4.c: New test.

(cherry picked from commit 3709ca091bec43ee3203b96146585652c5d84728)

RISC-V: Revert the convert from vmv.s.x to vmv.v.i

Hi,

This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
optimize the special case when the scalar operand is zero.

Currently, the broadcast pattern where the scalar operand is a imm
will be converted to vmv.v.i from vmv.s.x and the mask operand will be
converted from 00..01 to 11..11. There are some advantages and
disadvantages before and after the conversion after discussing
with Juzhe offline and we chose not to do this transform.

Before:

  Advantages: The vsetvli info required by vmv.s.x has better compatibility since
  vmv.s.x only required SEW and VLEN be zero or one. That mean there
  is more opportunities to combine with other vsetlv infos in vsetvl pass.

  Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction
  will be needed.

After:

  Advantages: No need `li rd, imm` instruction since vmv.v.i support imm operand.

  Disadvantages: Like before's advantages. Worse compatibility leads to more
  vsetvl instrunctions need.

Consider the bellow C code and asm after autovec.
there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma)
after converted vmv.s.x to vmv.v.i.

```
int foo1(int* restrict a, int* restrict b, int *restrict c, int n) {
    int sum = 0;
    for (int i = 0; i < n; i++)
      sum += a[i] * b[i];

    return sum;
}
```

asm (Before):

```
foo1:
        ble     a3,zero,.L7
        vsetvli a2,zero,e32,m1,ta,ma
        vmv.v.i v1,0
.L6:
        vsetvli a5,a3,e32,m1,tu,ma
        slli    a4,a5,2
        sub     a3,a3,a5
        vle32.v v2,0(a0)
        vle32.v v3,0(a1)
        add     a0,a0,a4
        add     a1,a1,a4
        vmacc.vv        v1,v3,v2
        bne     a3,zero,.L6
        vsetvli a2,zero,e32,m1,ta,ma
        vmv.s.x v2,zero
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret
.L7:
        li      a0,0
        ret
```

asm (After):

```
foo1:
        ble     a3,zero,.L4
        vsetvli a2,zero,e32,m1,ta,ma
        vmv.v.i v1,0
.L3:
        vsetvli a5,a3,e32,m1,tu,ma
        slli    a4,a5,2
        sub     a3,a3,a5
        vle32.v v2,0(a0)
        vle32.v v3,0(a1)
        add     a0,a0,a4
        add     a1,a1,a4
        vmacc.vv        v1,v3,v2
        bne     a3,zero,.L3
        vsetivli        zero,1,e32,m1,ta,ma
        vmv.v.i v2,0
        vsetvli a2,zero,e32,m1,ta,ma
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret
.L4:
        li      a0,0
        ret
```

Best,
Lehua

Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_0_operand): New.
* config/riscv/vector.md (*pred_broadcast<mode>_zero): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Update.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.

(cherry picked from commit 86d80395cf3c8832b669135b1ca7ea8258790c19)

RISC-V: Forbidden fuse vlmax vsetvl to DEMAND_NONZERO_AVL vsetvl

Hi,

This little patch fix the fail testcase
(gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c)
after apply this patch
(https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627121.html).
The specific reason is that the vsetvl pass has bug and this patch
forbidden the fuse of this case. This patch needs to be committed
before that patch to work.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::backward_demand_fusion):
Forbidden.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c:
Address failure due to uninitialized vtype register.

(cherry picked from commit c43916857c6586e65f10713fdc5a65909918a8cc)

RISCV: Add rotate immediate regression test

This adds new regression tests to ensure half-register rotations are
correctly optimized into rori instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-rol-ror-08.c: New test.
* gcc.target/riscv/zbb-rol-ror-09.c: New test.

Co-authored-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
(cherry picked from commit d7b6cad9d6c40f1dab907abd8e71e713bb2a5bf5)

targhooks: Extend legitimate_address_p with code_helper [PR110248]

As PR110248 shows, some middle-end passes like IVOPTs can
query the target hook legitimate_address_p with some
artificially constructed rtx to determine whether some
addressing modes are supported by target for some gimple
statement. But for now the existing legitimate_address_p
only checks the given mode, it's unable to distinguish
some special cases unfortunately, for example, for LEN_LOAD
ifn on Power port, we would expand it with lxvl hardware
insn, which only supports one register to hold the address
(the other register is holding the length), that is we
don't support base (reg) + index (reg) addressing mode for
sure. But hook legitimate_address_p only considers the
given mode which would be some vector mode for LEN_LOAD
ifn, and we do support base + index addressing mode for
normal vector load and store insns, so the hook will return
true for the query unexpectedly.

This patch is to introduce one extra argument of type
code_helper for hook legitimate_address_p, it makes targets
able to handle some special case like what's described
above.

PR tree-optimization/110248

gcc/ChangeLog:

* coretypes.h (class code_helper): Add forward declaration.
* doc/tm.texi: Regenerate.
* lra-constraints.cc (valid_address_p): Call target hook
targetm.addr_space.legitimate_address_p with an extra parameter
ERROR_MARK as its prototype changes.
* recog.cc (memory_address_addr_space_p): Likewise.
* reload.cc (strict_memory_address_addr_space_p): Likewise.
* target.def (legitimate_address_p, addr_space.legitimate_address_p):
Extend with one more argument of type code_helper, update the
documentation accordingly.
* targhooks.cc (default_legitimate_address_p): Adjust for the
new code_helper argument.
(default_addr_space_legitimate_address_p): Likewise.
* targhooks.h (default_legitimate_address_p): Likewise.
(default_addr_space_legitimate_address_p): Likewise.
* config/aarch64/aarch64.cc (aarch64_legitimate_address_hook_p): Adjust
with extra unnamed code_helper argument with default ERROR_MARK.
* config/alpha/alpha.cc (alpha_legitimate_address_p): Likewise.
* config/arc/arc.cc (arc_legitimate_address_p): Likewise.
* config/arm/arm-protos.h (arm_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/arm/arm.cc (arm_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/avr/avr.cc (avr_addr_space_legitimate_address_p): Likewise.
* config/bfin/bfin.cc (bfin_legitimate_address_p): Likewise.
* config/bpf/bpf.cc (bpf_legitimate_address_p): Likewise.
* config/c6x/c6x.cc (c6x_legitimate_address_p): Likewise.
* config/cris/cris-protos.h (cris_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/cris/cris.cc (cris_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/csky/csky.cc (csky_legitimate_address_p): Likewise.
* config/epiphany/epiphany.cc (epiphany_legitimate_address_p):
Likewise.
* config/frv/frv.cc (frv_legitimate_address_p): Likewise.
* config/ft32/ft32.cc (ft32_addr_space_legitimate_address_p): Likewise.
* config/gcn/gcn.cc (gcn_addr_space_legitimate_address_p): Likewise.
* config/h8300/h8300.cc (h8300_legitimate_address_p): Likewise.
* config/i386/i386.cc (ix86_legitimate_address_p): Likewise.
* config/ia64/ia64.cc (ia64_legitimate_address_p): Likewise.
* config/iq2000/iq2000.cc (iq2000_legitimate_address_p): Likewise.
* config/lm32/lm32.cc (lm32_legitimate_address_p): Likewise.
* config/loongarch/loongarch.cc (loongarch_legitimate_address_p):
Likewise.
* config/m32c/m32c.cc (m32c_legitimate_address_p): Likewise.
(m32c_addr_space_legitimate_address_p): Likewise.
* config/m32r/m32r.cc (m32r_legitimate_address_p): Likewise.
* config/m68k/m68k.cc (m68k_legitimate_address_p): Likewise.
* config/mcore/mcore.cc (mcore_legitimate_address_p): Likewise.
* config/microblaze/microblaze-protos.h (tree.h): New include for
tree_code ERROR_MARK.
(microblaze_legitimate_address_p): Adjust with extra unnamed
code_helper argument with default ERROR_MARK.
* config/microblaze/microblaze.cc (microblaze_legitimate_address_p):
Likewise.
* config/mips/mips.cc (mips_legitimate_address_p): Likewise.
* config/mmix/mmix.cc (mmix_legitimate_address_p): Likewise.
* config/mn10300/mn10300.cc (mn10300_legitimate_address_p): Likewise.
* config/moxie/moxie.cc (moxie_legitimate_address_p): Likewise.
* config/msp430/msp430.cc (msp430_legitimate_address_p): Likewise.
(msp430_addr_space_legitimate_address_p): Adjust with extra code_helper
argument with default ERROR_MARK and adjust the call to function
msp430_legitimate_address_p.
* config/nds32/nds32.cc (nds32_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/nios2/nios2.cc (nios2_legitimate_address_p): Likewise.
* config/nvptx/nvptx.cc (nvptx_legitimate_address_p): Likewise.
* config/or1k/or1k.cc (or1k_legitimate_address_p): Likewise.
* config/pa/pa.cc (pa_legitimate_address_p): Likewise.
* config/pdp11/pdp11.cc (pdp11_legitimate_address_p): Likewise.
* config/pru/pru.cc (pru_addr_space_legitimate_address_p): Likewise.
* config/riscv/riscv.cc (riscv_legitimate_address_p): Likewise.
* config/rl78/rl78-protos.h (rl78_as_legitimate_address): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/rl78/rl78.cc (rl78_as_legitimate_address): Adjust with
extra unnamed code_helper argument with default ERROR_MARK.
* config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Likewise.
(rs6000_debug_legitimate_address_p): Adjust with extra code_helper
argument and adjust the call to function rs6000_legitimate_address_p.
* config/rx/rx.cc (rx_is_legitimate_address): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/s390/s390.cc (s390_legitimate_address_p): Likewise.
* config/sh/sh.cc (sh_legitimate_address_p): Likewise.
* config/sparc/sparc.cc (sparc_legitimate_address_p): Likewise.
* config/v850/v850.cc (v850_legitimate_address_p): Likewise.
* config/vax/vax.cc (vax_legitimate_address_p): Likewise.
* config/visium/visium.cc (visium_legitimate_address_p): Likewise.
* config/xtensa/xtensa.cc (xtensa_legitimate_address_p): Likewise.
* config/stormy16/stormy16-protos.h (xstormy16_legitimate_address_p):
Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/stormy16/stormy16.cc (xstormy16_legitimate_address_p):
Adjust with extra unnamed code_helper argument with default
ERROR_MARK.

(cherry picked from commit 165b1f6ad1d3969e2c23417797362d0528e65c79)

[PATCH] RISC-V: Deduplicate #error messages in testsuite

"#error Feature macro not defined" is required to test the existence of an
extension through the preprocessor. However, multiple occurrence of the
exact same error message will confuse the developer once an error is
encountered.

This commit replaces such error messages to
"#error Feature macro for `EXT' not defined" to make which
macro is missing.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvkn.c: Deduplicate #error messages.
* gcc.target/riscv/zvkn-1.c: Ditto.
* gcc.target/riscv/zvknc.c: Ditto.
* gcc.target/riscv/zvknc-1.c: Ditto.
* gcc.target/riscv/zvknc-2.c: Ditto.
* gcc.target/riscv/zvkng.c: Ditto.
* gcc.target/riscv/zvkng-1.c: Ditto.
* gcc.target/riscv/zvkng-2.c: Ditto.
* gcc.target/riscv/zvks.c: Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvksc.c: Ditto.
* gcc.target/riscv/zvksc-1.c: Ditto.
* gcc.target/riscv/zvksc-2.c: Ditto.
* gcc.target/riscv/zvksg.c: Ditto.
* gcc.target/riscv/zvksg-1.c: Ditto.
* gcc.target/riscv/zvksg-2.c: Ditto.

(cherry picked from commit 1aaf3a64e92ada283f6d3052151858df2ad99e77)

RISC-V: Fix XPASS slp testcases

This patch fixs XPASS slp testcases on trunk by
making the conditions for xfail stricter.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Fix.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto.

(cherry picked from commit 903d937569992a885faf8a1bf7d120e9e66f456b)

RISC-V: Support RVV VFWREDOSUM.VS rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWREDOSUM.VS as the below samples

* __riscv_vfwredosum_vs_f32m1_f64m1_rm
* __riscv_vfwredosum_vs_f32m1_f64m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(widen_freducop): Add frm_opt_type template arg.
(vfwredosum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredosum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wredosum.c: New test.

(cherry picked from commit c6259c4975e84b30d7de1f64afaece614d7c4500)

RISC-V: Support RVV VFREDOSUM.VS rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFREDOSUM.VS as the below samples.

* __riscv_vfredosum_vs_f32m1_f32m1_rm
* __riscv_vfredosum_vs_f32m1_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfredosum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfredosum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-redosum.c: New test.

(cherry picked from commit 3a68ef2cccb8a7f15ca188dbffd754d112d75898)

RISC-V: Support RVV VFREDUSUM.VS rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFREDUSUM.VS as the below samples.

* __riscv_vfredusum_vs_f32m1_f32m1_rm
* __riscv_vfredusum_vs_f32m1_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class freducop): Add frm_op_type template arg.
(vfredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfredusum_frm): New intrinsic function def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct reduc_alu_frm_def): New class for frm shape.
(SHAPE): New declaration.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-redusum.c: New test.

(cherry picked from commit 3d903a26d7b6b4e32ad9f1f8c6fb5adf766c7cc7)

RISC-V: Support RVV VFNCVT.F.{X|XU|F}.W rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFNCVT.F.{X|XU|F}.W as the below samples.

* __riscv_vfncvt_f_x_w_f32m1_rm
* __riscv_vfncvt_f_x_w_f32m1_rm_m
* __riscv_vfncvt_f_xu_w_f32m1_rm
* __riscv_vfncvt_f_xu_w_f32m1_rm_m
* __riscv_vfncvt_f_f_w_f32m1_rm
* __riscv_vfncvt_f_f_w_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfncvt_f): Add frm_op_type template arg.
(vfncvt_f_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_f_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-ncvt-f.c: New test.

(cherry picked from commit 20e1db413ee8bb4d5233d97484e19e4e1d85f4ac)

RISC-V: Support RVV VFNCVT.XU.F.W rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFNCVT.XU.F.W as the below samples.

* __riscv_vfncvt_xu_f_w_u16mf2_rm
* __riscv_vfncvt_xu_f_w_u16mf2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfncvt_xu_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_xu_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-ncvt-xu.c: New test.

(cherry picked from commit 72fc7e9d6aefbc4de1d3827062e47277fca83ef5)

RISC-V: Support RVV VFNCVT.X.F.W rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFNCVT.X.F.W as the below samples.

* __riscv_vfncvt_x_f_w_i16mf2_rm
* __riscv_vfncvt_x_f_w_i16mf2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfncvt_x): Add frm_op_type template arg.
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_x_frm): New intrinsic function def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct narrow_alu_frm_def): New shape function for frm.
(SHAPE): New declaration.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-ncvt-x.c: New test.

(cherry picked from commit 3d18a528bfd05f0bfdb27f52c2f6c2445f15e4ca)

RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]

void foo(_Float16 y, int64_t *i64p)
{
  vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
  vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
  vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
  asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
}

zve64f:
foo:
vsetivli zero,1,e16,mf4,ta,ma
vle64.v v1,0(a0)
vfmv.s.f v2,fa0
vsetvli zero,zero,e64,m1,ta,ma
vadd.vv v1,v1,v1

zve64d:
foo:
vsetivli zero,1,e64,m1,ta,ma
vle64.v v1,0(a0)
vfmv.s.f v2,fa0
vadd.vv v1,v1,v1

gcc/ChangeLog:

PR target/111037
* config/riscv/riscv-vsetvl.cc (float_insn_valid_sew_p): New function.
(second_sew_less_than_first_sew_p): Fix bug.
(first_sew_less_than_second_sew_p): Ditto.

gcc/testsuite/ChangeLog:

PR target/111037
* gcc.target/riscv/rvv/base/pr111037-1.c: New test.
* gcc.target/riscv/rvv/base/pr111037-2.c: New test.

(cherry picked from commit 29547511f7bae06f9f424f8c8583014878240016)

[PATCH] RISC-V: Support simplify (-1-x) for vector.

From: Yanzhang Wang <yanzhang.wang@intel.com>

The pattern is enabled for scalar but not for vector. The patch try to
make it consistent and will convert below code,

shortcut_for_riscv_vrsub_case_1_32:
        vl1re32.v       v1,0(a1)
        vsetvli zero,a2,e32,m1,ta,ma
        vrsub.vi        v1,v1,-1
        vs1r.v  v1,0(a0)
        ret

to,

shortcut_for_riscv_vrsub_case_1_32:
        vl1re32.v       v1,0(a1)
        vsetvli zero,a2,e32,m1,ta,ma
        vnot.v  v1,v1
        vs1r.v  v1,0(a0)
        ret

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Use
CONSTM1_RTX.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/simplify-vrsub.c: New test.

(cherry picked from commit e7a36e4715c7162ccfd7cd32da985d629bbd9c61)

RISC-V: Implement vector "average" autovec pattern.

This patch adds vector average patterns

op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;

If there is no direct support, the vectorizer can synthesize the pattern
but, presumably, due to lack of narrowing operation support, won't try a
narrowing shift. Therefore, this patch implements the expanders
instead.

gcc/ChangeLog:

* config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor):
Implement expander.
(<u>avg<v_double_trunc>3_ceil): Ditto.
* config/riscv/vector-iterators.md (ashiftrt): New iterator.
(ASHIFTRT): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/vec-avg-run.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-template.h: New test.

(cherry picked from commit 694242930906d9f7ad15977cac6dcbeae1f3d3f2)

internal-fn: Fix vector extraction into promoted subreg.

This patch fixes the case where vec_extract gets passed a promoted
subreg (e.g. from a return value). This is achieved by using
expand_convert_optab_fn instead of a separate expander function.

gcc/ChangeLog:

* internal-fn.cc (vec_extract_direct): Change type argument
numbers.
(expand_vec_extract_optab_fn): Call convert_optab_fn.
(direct_vec_extract_optab_supported_p): Use
convert_optab_supported_p.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: New test.

(cherry picked from commit c94e0f52f40310b6faeae11bae3366ccb1435199)

RISC-V: Support RVV VFWCVT.XU.F.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWCVT.X.F.V as the below samples.

* __riscv_vfwcvt_xu_f_v_u64m2_rm
* __riscv_vfwcvt_xu_f_v_u64m2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwcvt_xu_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wcvt-xu.c: New test.

(cherry picked from commit 1b7418ba1baf0d43fff6c6a68b8134813a35c1d9)

RISC-V: Fix one build error for template default arg

In some build option combination, the default value may result in
below error. This patch would like to fix it by passing a explict
argument.

riscv-vector-builtins-bases.cc:2495:24: error: invalid use of template-name \
‘riscv_vector::vfcvt_f’ without an argument list

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Use explicit argument.

(cherry picked from commit ac6b74e9a5a40c28aeb715f43d117a7c4d32f43f)

RISC-V: Support RVV VFWCVT.X.F.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWCVT.X.F.V as the below samples.

* __riscv_vfwcvt_x_f_v_i64m2_rm
* __riscv_vfwcvt_x_f_v_i64m2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwcvt_x_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wcvt-x.c: New test.

(cherry picked from commit f2bec0ac481fb97fc88b976d8255cc85bff7e20e)

RISC-V: Support RVV VFCVT.F.X.V and VFCVT.F.XU.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFCVT.F.X.V and VFCVT.F.XU.V as the below samples.

* __riscv_vfcvt_f_x_v_f32m1_rm
* __riscv_vfcvt_f_x_v_f32m1_rm_m
* __riscv_vfcvt_f_xu_v_f32m1_rm
* __riscv_vfcvt_f_xu_v_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_f_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-cvt-f.c: New test.

(cherry picked from commit dc2529e8243859faf35c66d994756c40978f0ce5)

RISC-V: Support RVV VFCVT.XU.F.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFCVT.XU.F.V as the below samples.

* __riscv_vfcvt_xu_f_v_u32m1_rm
* __riscv_vfcvt_xu_f_v_u32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_xu_frm): New intrinsic function def..

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-cvt-xu.c: New test.

(cherry picked from commit 567258f057913229084c21396b84c219f3fef05d)

RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

This patch allow us auto-vectorize this following case:

  void __attribute__ ((noinline, noclone))                                     \
  NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,                  \
    MASKTYPE *__restrict cond, intptr_t n)                             \
  {                                                                            \
    for (intptr_t i = 0; i < n; ++i)                                           \
      if (cond[i])                                                             \
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]                \
   + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]          \
   + src[i * 8 + 6] + src[i * 8 + 7]);                         \
  }

  TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t)                               \

  TEST2 (NAME##_i32, OUTTYPE, int32_t)                                         \

  TEST1 (NAME##_i32, int32_t)                                                  \

TEST (test)

ASM:

test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.vi v0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
slli a6,a4,2
vadd.vv v1,v9,v8
slli a7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (vec_mask_len_load_lanes<mode><vsingle>):
New pattern.
(vec_mask_len_store_lanes<mode><vsingle>): Ditto.
* config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
* config/riscv/riscv-v.cc (get_mask_mode): Add tuple mask mode.
(expand_lanes_load_store): New function.
* config/riscv/vector-iterators.md: New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
Adapt test.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add lanes tests.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-10.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-11.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-12.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-13.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-14.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-15.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-16.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-17.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-18.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-8.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-9.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-15.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-16.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-17.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-18.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c: New test.

(cherry picked from commit fe5788862ba8d5ac4551658d842f2d038bd8d363)

RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFCVT.X.F.V as the below samples.

* __riscv_vfcvt_x_f_v_i32m1_rm
* __riscv_vfcvt_x_f_v_i32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(enum frm_op_type): New type for frm.
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_x_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-cvt-x.c: New test.

(cherry picked from commit c6f65ce9483131b1996cbddf8aaaebe0d8e5141c)

RISC-V: Fix autovec_length_operand predicate[PR110989]

Currently, autovec_length_operand predicate incorrect configuration is
discovered in PR110989 since this following situation:

vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, POLY_INT_CST [2, 2], 0); ---> dummy length = VF.

The current autovec length operand failed to recognize the VF dummy length.

-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=scalable -Ofast -fno-schedule-insns -fno-schedule-insns2:

Before this patch:

srli a4,s0,2
addi a4,a4,-3
srli s0,s0,3
vsetvli a5,zero,e64,m1,ta,ma
vid.v v1
vmul.vx v1,v1,a4
addi a4,s0,-2
vadd.vx v1,v1,a4
addi a4,s0,-1
vslide1up.vx v2,v1,a4
vmv.v.x v1,a4
vand.vv v1,v2,v1
vl1re64.v v3,0(t2)
vrgather.vv v2,v3,v1
vmv.v.i v1,0
vmfeq.vv v0,v2,v1
vsetvli zero,s0,e32,mf2,ta,ma ---> s0 = POLY (2,2)
vle32.v v3,0(t3),v0.t
vsetvli a5,zero,e64,m1,ta,ma
vmfne.vv v0,v2,v1
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.f.x.v v1,v3
vsetvli zero,zero,e64,m1,ta,ma
vmerge.vvm v1,v1,v2,v0
vslidedown.vx v1,v1,a4
vfmv.f.s fa5,v1
j .L6

After this patch:

srli a4,s0,2
addi a4,a4,-3
srli s0,s0,3
vsetvli a5,zero,e64,m1,ta,ma
vid.v v1
vmul.vx v1,v1,a4
addi a4,s0,-2
vadd.vx v1,v1,a4
addi s0,s0,-1
vslide1up.vx v2,v1,s0
vmv.v.x v1,s0
vand.vv v1,v2,v1
vl1re64.v v3,0(t2)
vrgather.vv v2,v3,v1
vmv.v.i v1,0
vmfeq.vv v0,v2,v1
vle32.v v3,0(t3),v0.t
vmfne.vv v0,v2,v1
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.f.x.v v1,v3
vsetvli zero,zero,e64,m1,ta,ma
vmerge.vvm v1,v1,v2,v0
vslidedown.vx v1,v1,s0
vfmv.f.s fa5,v1
j .L6

2 vsetvli insns are reduced.

gcc/ChangeLog:

PR target/110989
* config/riscv/predicates.md: Fix predicate.

gcc/testsuite/ChangeLog:

PR target/110989
* gcc.target/riscv/rvv/autovec/pr110989.c: Add vsetvli assembly check.

(cherry picked from commit 0618adfa80fcd2fd7ae03b30553c60a6b1abf573)

Mode-Switching: Fix SET_SRC ICE for create_pre_exit

In same cases, like gcc/testsuite/gcc.dg/pr78148.c in RISC-V, there will
be only 1 operand when SET_SRC in create_pre_exit. For example as below.

(insn 13 9 14 2 (clobber (reg/i:TI 10 a0)) "gcc/testsuite/gcc.dg/pr78148.c":24:1 -1
(expr_list:REG_UNUSED (reg/i:TI 10 a0)
(nil)))

Unfortunately, SET_SRC requires at least 2 operands and then Segment
Fault here. For SH4 part result in Segment Fault, it looks like only
valid when the return_copy_pat is load or something like that. Thus,
this patch try to fix it by restrict the SET insn for SET_SRC.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* mode-switching.cc (create_pre_exit): Add SET insn check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mode-switch-ice-1.c: New test.

(cherry picked from commit d5ef0ee307058c5efade84e45228a7576c0141c7)

RISC-V: Support RVV VFREC7 rounding mode intrinsic API

Update in v2:

1. Remove the template of vfrec7 frm class.
2. Update the vfrec7_frm_obj declaration.

Original logs:

This patch would like to support the rounding mode API for the
VFREC7 as the below samples.

* __riscv_vfrec7_v_f32m1_rm
* __riscv_vfrec7_v_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfrec7_frm): New class for frm.
(vfrec7_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfrec7_frm): New intrinsic function definition.
* config/riscv/vector-iterators.md
(VFMISC): Remove VFREC7.
(misc_op): Ditto.
(float_insn_type): Ditto.
(VFMISC_FRM): New int iterator.
(misc_frm_op): New op for frm.
(float_frm_insn_type): New type for frm.
* config/riscv/vector.md (@pred_<misc_frm_op><mode>):
New pattern for misc frm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-rec7.c: New test.

(cherry picked from commit 469711f06865979854e587263d3d43137f256b57)

RISC-V: Add ZC* test for failed march args being passed.

Add ZC* extensions march args tests for error input cases.

Co-Authored by: Nandni Jamnadas <nandni.jamnadas@embecosm.com>
Co-Authored by: Jiawei <jiawei@iscas.ac.cn>
Co-Authored by: Mary Bennett <mary.bennett@embecosm.com>
Co-Authored by: Simon Cook <simon.cook@embecosm.com>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-24.c: New test.
* gcc.target/riscv/arch-25.c: New test.

(cherry picked from commit 7879f589af911ea6a910d08919014b0b2df1b4b1)

RISC-V: Enable compressible features when use ZC* extensions.

This patch enables the compressible features with ZC* extensions.

Since all ZC* extension depends on the Zca extension, it's sufficient to only
add the target Zca to extend the target RVC.

Co-Authored by: Mary Bennett <mary.bennett@embecosm.com>
Co-Authored by: Nandni Jamnadas <nandni.jamnadas@embecosm.com>
Co-Authored by: Simon Cook <simon.cook@embecosm.com>

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Enable compressed builtins when ZC* extensions enabled.
* config/riscv/riscv-shorten-memrefs.cc:
Enable shorten_memrefs pass when ZC* extensions enabled.
* config/riscv/riscv.cc (riscv_compressed_reg_p):
Enable compressible registers when ZC* extensions enabled.
(riscv_rtx_costs): Allow adjusting rtx costs when ZC* extensions enabled.
(riscv_address_cost): Allow adjusting address cost when ZC* extensions enabled.
(riscv_first_stack_step): Allow compression of the register saves
without adding extra instructions.
* config/riscv/riscv.h (FUNCTION_BOUNDARY): Adjusts function boundary
to 16 bits when ZC* extensions enabled.

(cherry picked from commit 6e46fcdf24f99ce1272305aac93cac51d45c04d6)

RISC-V: Minimal support for ZC* extensions.

This patch is the minimal support for ZC* extensions, include the extension
name, mask and target defination. Also define the dependencies with Zca
and Zce extension. Notes that all ZC* extensions depend on the Zca extension.
Zce includes all relevant ZC* extensions for microcontrollers using. Zce
will imply zcf when 'f' extension enabled in rv32.

Co-Authored by: Charlie Keaney <charlie.keaney@embecosm.com>
Co-Authored by: Mary Bennett <mary.bennett@embecosm.com>
Co-Authored by: Nandni Jamnadas <nandni.jamnadas@embecosm.com>
Co-Authored by: Simon Cook <simon.cook@embecosm.com>
Co-Authored by: Sinan Lin <sinan.lin@linux.alibaba.com>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yulong Shi <yulong@iscas.ac.cn>

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_subset_list::parse): New extensions.
* config/riscv/riscv-opts.h (MASK_ZCA): New mask.
(MASK_ZCB): Ditto.
(MASK_ZCE): Ditto.
(MASK_ZCF): Ditto.
(MASK_ZCD): Ditto.
(MASK_ZCMP): Ditto.
(MASK_ZCMT): Ditto.
(TARGET_ZCA): New target.
(TARGET_ZCB): Ditto.
(TARGET_ZCE): Ditto.
(TARGET_ZCF): Ditto.
(TARGET_ZCD): Ditto.
(TARGET_ZCMP): Ditto.
(TARGET_ZCMT): Ditto.
* config/riscv/riscv.opt: New target variable.

(cherry picked from commit 17c22f466162d3a1759f8c607b7e81e7dd631cd9)

Revert "Fix type error of 'switch (SUBREG_BYTE (op)).'"

This reverts commit 6c6f96040a13e3403a418803cd9f539701c4c00e.

(cherry picked from commit 9ec5d6de7355c15b3811150d1581dab5bd489966)

RISC-V: Support RVV VFSQRT rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFSQRT as the below samples.

* __riscv_vfsqrt_v_f32m1_rm
* __riscv_vfsqrt_v_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class unop_frm): New class for frm.
(vfsqrt_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfsqrt_frm): New intrinsic function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-sqrt.c: New test.

(cherry picked from commit 9be93b80585fc875a2fc6b7d490b640b7fe04365)

RISC-V: Support RVV VFWNMSAC rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWNMSAC as the below samples.

* __riscv_vfwnmsac_vv_f64m2_rm
* __riscv_vfwnmsac_vv_f64m2_rm_m
* __riscv_vfwnmsac_vf_f64m2_rm
* __riscv_vfwnmsac_vf_f64m2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfwnmsac_frm): New class for frm.
(vfwnmsac_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwnmsac_frm): New intrinsic function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wnmsac.c: New test.

(cherry picked from commit c944ded09595946290778a26794074e69cc65f3e)

RISC-V: Support RVV VFWMSAC rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWMSAC as the below samples.

* __riscv_vfwmsac_vv_f64m2_rm
* __riscv_vfwmsac_vv_f64m2_rm_m
* __riscv_vfwmsac_vf_f64m2_rm
* __riscv_vfwmsac_vf_f64m2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfwmsac_frm): New class for frm.
(vfwmsac_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwmsac_frm): New intrinsic function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wmsac.c: New test.

(cherry picked from commit d9577b4b4c2a7b4e8bc869d33b7de98a0e507e7c)

RISC-V: Support RVV VFWNMACC rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWNMACC as the below samples.

* __riscv_vfwnmacc_vv_f64m2_rm
* __riscv_vfwnmacc_vv_f64m2_rm_m
* __riscv_vfwnmacc_vf_f64m2_rm
* __riscv_vfwnmacc_vf_f64m2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfwnmacc_frm): New class for frm.
(vfwnmacc_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwnmacc_frm): New intrinsic function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wnmacc.c: New test.

(cherry picked from commit a66873593817f72bbccd86f41128dc5ae404e8b9)

RISC-V: Support RVV VFWMACC rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWMACC as the below samples.

* __riscv_vfwmacc_vv_f64m2_rm
* __riscv_vfwmacc_vv_f64m2_rm_m
* __riscv_vfwmacc_vf_f64m2_rm
* __riscv_vfwmacc_vf_f64m2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfwmacc_frm): New class for vfwmacc frm.
(vfwmacc_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwmacc_frm): Function definition for vfwmacc.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_widen_ternop_insn): Add frm support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-fwmacc.c: New test.

(cherry picked from commit d15840aa05bc16580b2c79b356012974928e07a4)

RISC-V: Support RVV VFNMSUB rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFNMSUB as the below samples.

* __riscv_vfnmsub_vv_f32m1_rm
* __riscv_vfnmsub_vv_f32m1_rm_m
* __riscv_vfnmsub_vf_f32m1_rm
* __riscv_vfnmsub_vf_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfnmsub_frm): New class for vfnmsub frm.
(vfnmsub_frm): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfnmsub_frm): New function declaration.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-nmsub.c: New test.

(cherry picked from commit 4ecc18554bbf789174efe4c9e0be40182898a8ce)

RISC-V: Add TAREGT_VECTOR check into VLS modes

This patch fixes bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110994

This is caused VLS modes incorrect codes int register allocation.

The original case trigger the ICE is fortran code but I can reproduce
with a C code.

gcc/ChangeLog:

PR target/110994
* config/riscv/riscv-opts.h (TARGET_VECTOR_VLS): Add TARGET_VETOR.

gcc/testsuite/ChangeLog:

PR target/110994
* gcc.target/riscv/rvv/autovec/vls/pr110994.c: New test.

(cherry picked from commit 9890f377013cf1e4f5b9fab8a7287a5380dade1f)

RISC-V: Fix vec_series expander[PR110985]

This patch fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110985

gcc/ChangeLog:
PR target/110985
* config/riscv/riscv-v.cc (expand_vec_series): Refactor the expander.

gcc/testsuite/ChangeLog:
PR target/110985
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c: New test.

(cherry picked from commit 685abdb4a1fe46a12da5cc9ae1d5aaef9344a339)

RISC-V: Allow CONST_VECTOR for VLS modes

This patch enables COSNT_VECTOR for VLS modes.

void foo1 (int * __restrict a)
{
    for (int i = 0; i < 16; i++)
      a[i] = 8;
}

void foo2 (int * __restrict a)
{
    for (int i = 0; i < 16; i++)
      a[i] = i;
}

Compile option: -O3 --param=riscv-autovec-preference=scalable

Before this patch:

foo1:
        lui     a5,%hi(.LC0)
        addi    a5,a5,%lo(.LC0)
        vsetivli        zero,4,e32,m1,ta,ma
        addi    a4,a0,16
        vle32.v v1,0(a5)
        vse32.v v1,0(a0)
        vse32.v v1,0(a4)
        addi    a4,a0,32
        vse32.v v1,0(a4)
        addi    a0,a0,48
        vse32.v v1,0(a0)
        ret
foo2:
        lui     a5,%hi(.LC1)
        addi    a5,a5,%lo(.LC1)
        vsetivli        zero,4,e32,m1,ta,ma
        vle32.v v1,0(a5)
        lui     a5,%hi(.LC2)
        addi    a5,a5,%lo(.LC2)
        vse32.v v1,0(a0)
        vle32.v v1,0(a5)
        lui     a5,%hi(.LC3)
        addi    a4,a0,16
        addi    a5,a5,%lo(.LC3)
        vse32.v v1,0(a4)
        vle32.v v1,0(a5)
        addi    a4,a0,32
        lui     a5,%hi(.LC4)
        vse32.v v1,0(a4)
        addi    a0,a0,48
        addi    a5,a5,%lo(.LC4)
        vle32.v v1,0(a5)
        vse32.v v1,0(a0)
        ret

After this patch:

foo1:
vsetivli zero,16,e32,mf2,ta,ma
vmv.v.i v1,8
vse32.v v1,0(a0)
ret
.size foo1, .-foo1
.align 1
.globl foo2
.type foo2, @function
foo2:
vsetivli zero,16,e32,mf2,ta,ma
vid.v v1
vse32.v v1,0(a0)
ret

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS CONST_VECTOR.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS CONST_VECTOR tests.
* gcc.target/riscv/rvv/autovec/vls/const-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/const-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/const-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/const-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/const-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/series-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/series-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/series-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/series-4.c: New test.

(cherry picked from commit e80c55e4ca68867ddb3cb3720f857bd22762768c)

VECT: Fix ICE on MASK_LEN_{LOAD, STORE} when no LEN recorded[PR110989]

This ICE is caused because of this situation:

mask__49.21_99 = vect__17.19_96 == { 0.0, ... };
...
vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, POLY_INT_CST [2, 2], 0);

The MASK_LEN_LOAD is using real MASK which is produced by the EQ comparison wheras the LEN
is the dummy LEN which is the vectorization factor.

In this situation, we didn't enter 'vect_record_loop_len' since there is no LEN loop control.
Then 'LOOP_VINFO_RGROUP_IV_TYPE' is not suitable type for 'build_int_cst' used for producing
LEN argument for 'MASK_LEN_LOAD', so use sizetype instead which is perfectly matching
RVV length requirement.

gcc/ChangeLog:
PR middle-end/110989
* tree-vect-stmts.cc (vectorizable_store): Replace iv_type with sizetype.
(vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:
PR middle-end/110989
* gcc.target/riscv/rvv/autovec/pr110989.c: New test.

(cherry picked from commit 5bfb5e772f6cf121563f08d27d2c652ea469bbfb)

RISC-V: Specify -mabi for ztso testcases

On rv32 targets, this patch fixes ztso testcases errors like this:
cc1: error: ABI requires '-march=rv32'

2023-08-11 Patrick O'Neill <patrick@rivosinc.com>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add -mabi=lp64d
to dg-options.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
(cherry picked from commit 2b1b804de2687c9f705cc3625e467dfa18723a45)

RISC-V: Support RVV VFMSUB rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFMSUB as the below samples.

* __riscv_vfmsub_vv_f32m1_rm
* __riscv_vfmsub_vv_f32m1_rm_m
* __riscv_vfmsub_vf_f32m1_rm
* __riscv_vfmsub_vf_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfmsub_frm): New class for vfmsub frm.
(vfmsub_frm): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmsub_frm): New function declaration.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-msub.c: New test.

(cherry picked from commit 6a8203b7dc0ab694c3f3f4aef503107975bb59aa)

VECT: Add vec_mask_len_{load_lanes,store_lanes} patterns

This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization patterns.

Here we want to support this following autovectorization:

void
foo (int8_t *__restrict a,
int8_t *__restrict b,
int8_t *__restrict cond,
int n)
{
  for (intptr_t i = 0; i < n; ++i)
    {
      if (cond[i])
        a[i] = b[i * 2] + b[i * 2 + 1];
    }
}

ARM SVE IR:

https://godbolt.org/z/cro1Eqc6a

  # loop_mask_60 = PHI <next_mask_82(4), max_mask_81(3)>
  ...
  mask__39.12_63 = vect__3.11_61 != { 0, ... };
  vec_mask_and_66 = loop_mask_60 & mask__39.12_63;
  ...
  vect_array.15 = .MASK_LOAD_LANES (_57, 8B, vec_mask_and_66);
  ...

For RVV, we would like to see IR:

  loop_len = SELECT_VL;
  ...
  mask__39.12_63 = vect__3.11_61 != { 0, ... };
  ...
  vect_array.15 = .MASK_LEN_LOAD_LANES (_57, 8B, mask__39.12_63, loop_len, bias);
  ...

Bootstrap and Regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* doc/md.texi: Add vec_mask_len_{load_lanes,store_lanes} patterns.
* internal-fn.cc (expand_partial_load_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
* internal-fn.def (MASK_LEN_LOAD_LANES): Ditto.
(MASK_LEN_STORE_LANES): Ditto.
* optabs.def (OPTAB_CD): Ditto.

(cherry picked from commit 59d789b34810d43ddba734e4adb80c29c210e49c)