gcc.gnu.org Git - gcc.git/log

RISC-V: Fix VSETVL PASS AVL/VL fetch bug[111295]

Fix bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111295

gcc/ChangeLog:

PR target/111295
* config/riscv/riscv-vsetvl.cc (insert_vsetvl): Bug fix.

gcc/testsuite/ChangeLog:

PR target/111295
* gcc.target/riscv/rvv/autovec/pr111295.c: New test.

(cherry picked from commit 1b4c70d4271a00514ae20970d483c3b78d9d66ef)

RISC-V: Remove unreasonable TARGET_64BIT for VLS modes with size = 64bit

Previously, I add TARGET_64BIT condtion to block VLS modes with size = 64bit in RV32 system
E.g. V8QI

Since I realized such modes may cause inferior codegen for some situations in RV32 system.

However, this is really quite ugly and it cause ICE for some cases in RV32:

FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (internal compiler error: in require, at machmode.h:313)
3937FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (test for excess errors)

For inferior codegen in RV32 system, we should try another reasonable approach to fix it.

Remove those TARGET_64BIT and fix ICE.

gcc/ChangeLog:

* config/riscv/riscv-vector-switch.def (VLS_ENTRY): Remove TARGET_64BIT

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl1024b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl2048b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl256b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl4096b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl512b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl1024b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl2048b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl256b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl4096b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl512b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: Ditto.

(cherry picked from commit ee21f79f72980732214156bae2eb5daf7e089bda)

RISC-V: Fix incorrect folder for VRGATHERI16 test case

Put the test file to the incorrect folder, this patch would like to
fix it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/intrisinc-vrgatherei16.c: Moved to...
* gcc.target/riscv/rvv/base/intrisinc-vrgatherei16.c: ...here.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit 0574a19047fa66f26a38e79c1b9ae6a8207bba89)

riscv: xtheadbb: Fix xtheadbb-li-rotr test for rv32

The test was introduced recently and tests a RV64-only feature.
However, when testing an RV32 compiler, the test gets executed as well
and fails with "cc1: error: ABI requires '-march=rv32'".
This patch fixes this by adding '-mabi=lp64' (like it is done for
other RV64-only tests as well).

Retested with RV32 and RV64 to ensure this won't pop up again.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: Don't run for RV32.

(cherry picked from commit 57d1c9c1fe57a0de66e5c20538f77f49b1298071)

RISC-V: Keep vlmax vector operators in simple form until split1 pass

This patch keep vlmax vector pattern in simple before split1 pass which
will allow more optimization (e.g. combine) before split1 pass.
This patch changes the vlmax pattern in autovec.md to define_insn_and_split
as much as possible and clean up some combine patterns that are no longer needed.
This patch also fixed PR111232 bug which was caused by a combined failed.

PR target/111232

gcc/ChangeLog:

* config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>):
Delete.
(*pred_widen_mulsu<mode>): Delete.
(*pred_single_widen_mul<mode>): Delete.
(*dual_widen_<any_widen_binop:optab><any_extend:su><mode>):
Add new combine patterns.
(*single_widen_sub<any_extend:su><mode>): Ditto.
(*single_widen_add<any_extend:su><mode>): Ditto.
(*single_widen_mult<any_extend:su><mode>): Ditto.
(*dual_widen_mulsu<mode>): Ditto.
(*dual_widen_mulus<mode>): Ditto.
(*dual_widen_<optab><mode>): Ditto.
(*single_widen_add<mode>): Ditto.
(*single_widen_sub<mode>): Ditto.
(*single_widen_mult<mode>): Ditto.
* config/riscv/autovec.md (<optab><mode>3):
Change define_expand to define_insn_and_split.
(<optab><mode>2): Ditto.
(abs<mode>2): Ditto.
(smul<mode>3_highpart): Ditto.
(umul<mode>3_highpart): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-4.c: Add more testcases.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111232.c: New test.

(cherry picked from commit 9ee40b9a7bee83394fc7ba6fef71cb76d91b49c8)

RISC-V: Part-3: Output .variant_cc directive for vector function

Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual[1] and
RISC-V ELF Specification[2].

[1] https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops
[2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#dynamic-linking

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_declare_function_name): Add protos.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.cc (riscv_asm_output_variant_cc):
Output .variant_cc directive for vector function.
(riscv_declare_function_name): Ditto.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.h (ASM_DECLARE_FUNCTION_NAME):
Implement ASM_DECLARE_FUNCTION_NAME.
(ASM_OUTPUT_DEF_FROM_DECLS): Implement ASM_OUTPUT_DEF_FROM_DECLS.
(ASM_OUTPUT_EXTERNAL): Implement ASM_OUTPUT_EXTERNAL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: New test.

(cherry picked from commit 4abcc5009c1ad852e235f368f535c0bf6bfa7697)

RISC-V: Part-2: Save/Restore vector registers which need to be preversed

Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.

gcc/ChangeLog:

* config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): Pass riscv_cc.
* config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
(riscv_frame_info::reset): Reset new fileds.
(riscv_call_tls_get_addr): Pass riscv_cc.
(riscv_function_arg): Return riscv_cc for call patterm.
(get_riscv_cc): New function return riscv_cc from rtl call_insn.
(riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
(riscv_save_reg_p): Add vector callee-saved check.
(riscv_stack_align): Add vector save area comment.
(riscv_compute_frame_info): Ditto.
(riscv_restore_reg): Update for type change.
(riscv_for_each_saved_v_reg): New function save vector registers.
(riscv_first_stack_step): Handle funciton with vector callee-saved registers.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_output_mi_thunk): Pass riscv_cc.
(TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
* config/riscv/riscv.h (get_riscv_cc): Export get_riscv_cc function.
* config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.

(cherry picked from commit fdd59c0f73e9e681cd5f4d0eee2dd58d60d8dbe1)

RISC-V: Part-1: Select suitable vector registers for vector type args and returns

I post the vector register calling convention rules from in the proposal[1]
directly here:

v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.

Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.

Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.

The rules for passing vector arguments are as follows:

1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.

2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.

3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.

NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.

Vector values are returned in the same manner as the first named argument of
the same type would be passed.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389

gcc/ChangeLog:

* config/riscv/riscv-protos.h (builtin_type_p): New function for checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector arguments.
(riscv_arguments_is_vector_type_p): New function for check vector returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: New test.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return.c: New test.

(cherry picked from commit 94a4b93292f8ab19910c844bb9b63e4a68b55d33)

RISC-V: Add conditional sqrt autovec pattern

This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><mode>):
Add sqrt + vcond_mask combine pattern.
* config/riscv/autovec.md (<optab><mode>2):
Change define_expand to define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.

(cherry picked from commit c1597e7fb9f9ecb9d7c33b5afa48031f284375de)

RISC-V: typo: add closing paren to a comment

gcc/ChangeLog:

* config/riscv/zicond.md: Add closing parent to a comment.

(cherry picked from commit 254100a9a003a16255a58eec3fa24168e6dc7124)

RISC-V: Fix Zicond ICE on large constants

Large constant cons and/or alt will trigger ICEs building GCC target
libraries (libgomp and libatomic) when the 'Zicond' extension is enabled.

For instance, zicond-ice-2.c (new test case in this commit) will cause
an ICE when SOME_NUMBER is 0x1000 or larger.  While opposite numbers
corresponding cons/alt (two temp2 variables) are checked, cons/alt
themselves are not checked and causing 2 ICEs building
GCC target libraries as of this writing:

1.  gcc/libatomic/config/posix/lock.c
2.  gcc/libgomp/fortran.c

Coercing a large value into a register will fix the issue.

It also coerce a large cons into a register on "imm, imm" case (the author
could not reproduce but possible to cause an ICE).

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Force
large constant cons/alt into a register.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ice-2.c: New test.  This is based on
an ICE at libat_lock_n func on gcc/libatomic/config/posix/lock.c
but heavily minimized.

(cherry picked from commit ce65641354d98fc80912d5516b7fea87c344c2cc)

riscv: Synthesize all 11-bit-rotate constants with rori

Some constants can be built up using LI+RORI instructions.
The current implementation requires one of the upper 32-bits
to be a zero bit, which is not neccesary.
Let's drop this requirement in order to be able to synthesize
a constant like 0xffffffff00ffffffL.

The tests for LI+RORI are made more strict to detect regression
in the calculation of the LI constant and the rotation amount.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Don't
require one zero bit in the upper 32 bits for LI+RORI synthesis.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: New tests.
* gcc.target/riscv/zbb-li-rotr.c: Likewise.

(cherry picked from commit 102dd3e8067f12beee1b8b0bec6848733d107aee)

RISC-V: Expose bswapsi for TARGET_64BIT

Various bswapsi tests are failing for rv64.  More importantly, we're generating
crappy code.

Let's take the first test from bswapsi-1.c as an example.

> typedef unsigned int uint32_t;
>
> #define __const_swab32(x) ((uint32_t)(                                \
>         (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) |            \
>         (((uint32_t)(x) & (uint32_t)0x0000ff00UL) <<  8) |            \
>         (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >>  8) |            \
>         (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))
>
> /* This byte swap implementation is used by the Linux kernel and the
>    GNU C library.  */
>
> uint32_t
> swap32_a (uint32_t in)
> {
>   return __const_swab32 (in);
> }
>
>
>

We generate this for rv64gc_zba_zbb_zbs:

>         srliw   a1,a0,24
>         slliw   a5,a0,24
>         slliw   a3,a0,8
>         li      a2,16711680
>         li      a4,65536
>         or      a5,a5,a1
>         and     a3,a3,a2
>         addi    a4,a4,-256
>         srliw   a0,a0,8
>         or      a5,a5,a3
>         and     a0,a0,a4
>         or      a0,a5,a0
>         retUrgh!

After this patch we generate:

>         rev8    a0,a0
>         srai    a0,a0,32
>         ret
Clearly better.

The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that the
RTL expanders already know how to widen a bswap, which is definitely true.  But
it's the case that failure to expose a bswapsi will cause the 32bit bswap
optimizations in gimple store merging to not trigger.  Thus we get crappy code.

To fix this we expose bswapsi on TARGET_64BIT.  gimple-store-merging then
detects the 32bit bswap idioms and generates suitable __builtin calls.  The
expander will "FAIL" expansion for TARGET_64BIT which forces the generic
expander code to synthesize the operation (we could synthesize in here, but
that'd result in duplicate code).

Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the testsuite
without any regressions.

gcc/
* config/riscv/bitmanip.md (bswapsi2): Expose for TARGET_64BIT.

(cherry picked from commit fbc01748ba46eb26074388a8fb7b44d25a414a72)

RISC-V: Add Types to Un-Typed Risc-v Instructions

Updates risc-v instructions to ensure that no instruction is left
without a type attribute. Added new types "trap" and "cbo" (for
cache related instructions)

Tested for regressions using rv32/64 multilib with newlib/linux and
rv32/64 gcv for linux.

gcc/Changelog:

* config/riscv/riscv.md: Update/Add types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit decbf9ec81f33052be12296b89cd86ea65ae10da)

RISC-V: Add Types to Un-Typed Pic Instructions

Updates pic instructions to ensure that no instruction is left
without a type attribute.

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/pic.md: Update types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit c85db606d46774283ca4ec037dc3051719828f41)

riscv: xtheadbb: Enable constant synthesis with th.srri

Some constants can be built up using rotate-right instructions.
The code that enables this can be found in riscv_build_integer_1().
However, this functionality is only available for Zbb, which
includes the rori instruction. This patch enables this also for
XTheadBb, which includes the th.srri instruction.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Enable constant
synthesis with rotate-right for XTheadBb.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: New test.

(cherry picked from commit af5cb06ec17780736749ed51cfc6dfad9397156c)

RISC-V: zicond: Fix opt2 pattern

Fixes: 1d5bc3285e8a ("[committed][RISC-V] Fix 20010221-1.c with zicond")
This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since in
failing case, pattern semantics were not matching with asm czero.nez

We start with the following src code snippet:

      if (a == 0)
return 0;
      else
return x;
    }

which is equivalent to:  "x = (a != 0) ? x : a" where x is NOT 0.
                                                ^^^^^^^^^^^^^^^^

and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|        (if_then_else:DI (ne (reg/v:DI 134 [ a ])
|                (const_int 0 [0]))
|            (reg/v:DI 136 [ x ])
|            (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
    czero.nez x, x, a   ; %0, %2, %1

which implies
    "x = (a != 0) ? 0 : a"

clearly not what the pattern wants to do.

Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez if X
is not guaranteed to be 0.

However this can be fixed with a small tweak

"x = (a != 0) ? x : a"

   is same as

"x = (a == 0) ? a : x"

and since middle operand is 0 when a == 0, it is equivalent to

"x = (a == 0) ? 0 : x"

which can be expressed with CZERO.eqz

before fix after fix
----------------- -----------------
li        a5,1         li        a5,1
ld        a4,8(sp) ld        a4,8(sp)
czero.nez a0,a4,a5 czero.eqz a0,a4,a5

The issue only happens at -O1 as at higher optimization levels, the
whole conditional move gets optimized away.

This fixes 4 testsuite failues in a zicond build:

FAIL: gcc.c-torture/execute/pr60003.c   -O1  execution test
FAIL: gcc.dg/setjmp-3.c execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1  execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1 -fpic execution test

gcc/ChangeLog:
* config/riscv/zicond.md: Fix op2 pattern.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit e87212ead5e9f36945b5e2d290187e2adca34da5)

RISC-V: Emit .note.GNU-stack for non-linux target as well

We only emit that on linux target before, that not problem before,
however Qemu has fix a bug to make qemu user mode honor PT_GNU_STACK[1],
that will cause problem when we test baremetal with qemu.

So the straightforward is enable that as well for non-linux toolchian,
the price is that will increase few bytes for each binary.

[1] https://github.com/qemu/qemu/commit/872f3d046f2381e3f416519e82df96bd60818311

gcc/ChangeLog:

* config/riscv/linux.h (TARGET_ASM_FILE_END): Move ...
* config/riscv/riscv.cc (TARGET_ASM_FILE_END): to here.

(cherry picked from commit fba0f47e4617e164716d3bce587fc6948088e225)

RISC-V: Support FP SGNJ autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.

Given below code example:

void test(float * restrict out, float * restrict in1, float * restrict in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = __builtin_copysignf (in1[i], in2[i]);
}

Before this patch:
test:
  csrr    a4,vlenb
  slli    a4,a4,1
  li      a5,128
  bleu    a5,a4,.L2
  mv      a5,a4
.L2:
  vsetvli zero,a5,e32,m8,ta,ma
  vle32.v v8,0(a1)
  vle32.v v16,0(a2)
  vsetvli a4,zero,e32,m8,ta,ma
  vfsgnj.vv       v8,v8,v16
  vsetvli zero,a5,e32,m8,ta,ma
  vse32.v v8,0(a0)
  ret

After this patch:
test:
  li      a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vle32.v v2,0(a2)
  vfsgnj.vv       v1,v1,v2
  vse32.v v1,0(a0)
  ret

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (copysign<mode>3): New pattern.
* config/riscv/vector.md: Extend iterator for VLS.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: New macro.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c: New test.

(cherry picked from commit a7b048c0f42198a0f8d4244f1bd25211cf48383f)

RISC-V: Export functions as global extern preparing for dynamic LMUL patch use

Notice those functions need to be use by COST model for dynamic LMUL use.
Extract as a single patch and committed.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (lookup_vector_type_attribute): Export global.
(get_all_predecessors): New function.
(get_all_successors): Ditto.
* config/riscv/riscv-v.cc (get_all_predecessors): Ditto.
(get_all_successors): Ditto.
* config/riscv/riscv-vector-builtins.cc (sizeless_type_p): Export global.
* config/riscv/riscv-vsetvl.cc (get_all_predecessors): Remove it.

(cherry picked from commit 509c10a62546b9b3430040e455b7258322a024e6)

riscv: xtheadcondmov: Don't run tests with -Oz

Recently, these xtheadcondmov tests regressed with -Oz:
* FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c
* FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c
* FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c
* FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c

As -Oz stands for "Optimize aggressively for size rather than speed.",
we need to inspect the generated code, which looks like this:

  -Oz
  0000000000000000 <not_int_int>:
     0:   e199                    bnez    a1,6 <.L2>
     2:   40100513                li      a0,1025
  0000000000000006 <.L2>:
     6:   8082                    ret

  -O2:
  0000000000000000 <not_int_int>:
     0:   40100793                li      a5,1025
     4:   40b7950b                th.mveqz        a0,a5,a1
     8:   8082                    ret

As the generated code with -Oz consumes less size, there is nothing
wrong in the code generation. Instead, let's not run the xtheadcondmov
tests with -Oz.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c: Disable for -Oz.
* gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-eqz.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-not.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-cond.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-nez.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit 8451fbd56871267e8c1cd781db6d8f02e826f66c)

RISC-V: Fix Dynamic LMUL compile option

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Fix Dynamic status.
* config/riscv/riscv-v.cc (preferred_simd_mode): Ditto.
(autovectorize_vector_modes): Ditto.
(vectorize_related_mode): Ditto.

(cherry picked from commit 6f94ef6c86074a8348ec21d8aade04ce67b4e292)

RISC-V: Support FP16 for RVV VRGATHEREI16 intrinsic

This patch would like to add FP16 support for the VRGATHEREI16
intrinsic. Aka:

* __riscv_vrgatherei16_vv_f16mf4
* __riscv_vrgatherei16_vv_f16mf4_m

As well as f16mf2 to f16m8 types.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add FP16 intrinsic def.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
(vfloat16m8_t): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/intrisinc-vrgatherei16.c: New test.

(cherry picked from commit d99a868a9b100ab5a4b270a1acece60b5b6153a3)

RISC-V: Support FP MAX/MIN autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.

Given below code example:

test (float *out, float *in1, float *in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = in1[i] > in2[i] ? in1[i] : in2[i];
    // Or out[i] = fmax (in1[i], in2[i]);
}

Before this patch:
test:
  csrr    a4,vlenb
  slli    a4,a4,1
  li      a5,128
  bleu    a5,a4,.L2
  mv      a5,a4
.L2:
  vsetvli zero,a5,e32,m8,ta,ma
  vle32.v v16,0(a1)
  vle32.v v8,0(a2)
  vsetvli a3,zero,e32,m8,ta,ma
  vmfgt.vv        v0,v16,v8
  vmerge.vvm      v8,v8,v16,v0
  vsetvli zero,a5,e32,m8,ta,ma
  vse32.v v8,0(a0)
  ret

After this patch:
test:
  li      a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vle32.v v2,0(a2)
  vfmax.vv        v1,v1,v2
  vse32.v v1,0(a0)
  ret

This MAX/MIN autovec acts on function call like fmaxf/fmax in math.h
too. And it depends on the option -ffast-math.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
fmax/fmin
* config/riscv/vector.md: Add VLS modes to vfmax/vfmin.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: New macros.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c: New test.

(cherry picked from commit a7d052b3200c7928d903a0242b8cfd75d131e374)

RISC-V: Add conditional autovec convert(INT<->FP) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><mode><vconvert>):
New combine pattern.
(*cond_<float_cvt><vconvert><mode>): Ditto.
(*cond_<optab><vnconvert><mode>): Ditto.
(*cond_<float_cvt><vnconvert><mode>): Ditto.
(*cond_<optab><mode><vnconvert>): Ditto.
(*cond_<float_cvt><mode><vnconvert>2): Ditto.
* config/riscv/autovec.md (<optab><mode><vconvert>2): Adjust.
(<float_cvt><vconvert><mode>2): Adjust.
(<optab><vnconvert><mode>2): Adjust.
(<float_cvt><vnconvert><mode>2): Adjust.
(<optab><mode><vnconvert>2): Adjust.
(<float_cvt><mode><vnconvert>2): Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add INT->FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c: New test.

(cherry picked from commit 258af9c7004cdc7963f783dd510404e79f0b5362)

RISC-V: Add conditional autovec convert(FP<->FP) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_extend<v_double_trunc><mode>):
New combine pattern.
(*cond_trunc<mode><v_double_trunc>): Ditto.
* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c: New test.

(cherry picked from commit 75a243c7c7c7efa9f12038480b46260ada739202)

RISC-V: Add conditional autovec convert(INT<->INT) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><v_double_trunc><mode>):
New combine pattern.
(*cond_<optab><v_quad_trunc><mode>): Ditto.
(*cond_<optab><v_oct_trunc><mode>): Ditto.
(*cond_trunc<mode><v_double_trunc>): Ditto.
* config/riscv/autovec.md (<optab><v_quad_trunc><mode>2): Adjust.
(<optab><v_oct_trunc><mode>2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New test.

(cherry picked from commit a1e5fd2c9adc35ef435dcc96991320d69453919a)

RISC-V: Adjust expand_cond_len_{unary,binop,op} api

This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code`
to `unsigned icode` and use the icode directly to determine whether the
rounding_mode operand is required.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-protos.h (expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Ditto.
(expand_cond_len_op): Ditto.
(expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
(expand_cond_len_ternop): Ditto.

(cherry picked from commit 4d1c8b04ec8731b57ddbc80d76e40a61d8fa3324)

RISC-V: Enable VECT_COMPARE_COSTS by default

since we have added COST framework, we by default enable VECT_COMPARE_COSTS.

Also, add 16/32/64 to provide more choices for COST comparison.

This patch doesn't change any behavior from the current testsuite since we are using
default COST model.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (autovectorize_vector_modes): Enable
VECT_COMPARE_COSTS by default.

(cherry picked from commit 5f2098cce6c75117927fef317c714dd2088b0189)

RISC-V: Add vec_extract for BI -> QI.

This patch adds a vec_extract expander that extracts a QImode from a
vector mask mode.  In doing so, it helps recognize a "live
operation"/extract last idiom for mask modes.  It fixes the ICE in
tree-vect-live-6.c by circumventing the fallback code in
extract_bit_field_1.  The problem there is still latent, though, and
needs to be addressed separately.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract<mode>qi): New expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/live-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/live_run-2.c: New test.

(cherry picked from commit ffbb19c6afc016f6dc001ad0f567d3216ff601b1)

testsuite/vect: Make match patterns more accurate.

On some targets we fail to vectorize with the first type the vectorizer
tries but succeed with the second. This patch changes several regex
patterns to reflect that behavior.

Before we would look for a single occurrence of e.g.
"vect_recog_dot_prod_pattern" but would possible have two (one for each
attempted mode). The new pattern tries to match sequences where we
first have a "vect_recog_dot_prod_pattern" and a "succeeded" afterwards
while making sure there is no "failed" or "Re-trying" in between.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-outer-4c-big-array.c: Adjust regex pattern.
* gcc.dg/vect/vect-reduc-dot-s16a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8b.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Ditto.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Ditto.

(cherry picked from commit e40edf6499576993862801640227e076b868241b)

RISC-V: Add dynamic LMUL compile option

We are going to support dynamic LMUL support.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Add
dynamic enum.
* config/riscv/riscv.opt: Add dynamic compile option.

(cherry picked from commit ef4e916b526a65411a577126d34c3b0bb97b6111)

RISC-V: Support FP ADD/SUB/MUL/DIV autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation ADD/SUB/MUL/DIV.

Given below code example:

test (float *out, float *in1, float *in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = in1[i] + in2[i];
}

Before this patch:
test:
  csrr a4,vlenb
  slli a4,a4,1
  li   a5,128
  bleu a5,a4,.L38
  mv   a5,a4
.L38:
  vsetvli  zero,a5,e32,m8,ta,ma
  vle32.v  v16,0(a1)
  vsetvli  a4,zero,e32,m8,ta,ma
  vmv.v.i  v8,0
  vsetvli  zero,a5,e32,m8,tu,ma
  vle32.v  v24,0(a2)
  vfadd.vv v8,v24,v16
  vse32.v  v8,0(a0)
  ret

After this patch:
test:
  li       a5,128
  vsetvli  zero,a5,e32,m1,ta,ma
  vle32.v  v1,0(a2)
  vle32.v  v2,0(a1)
  vfadd.vv v1,v1,v2
  vse32.v  v1,0(a0)
  ret

Please note this patch also fix the execution failure of below
vect test cases.

* vect-alias-check-10.c
* vect-alias-check-11.c
* vect-alias-check-12.c
* vect-alias-check-14.c

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
vls floating-point autovec.
* config/riscv/vector-iterators.md: New iterator for
floating-point V and VLS.
* config/riscv/vector.md: Add VLS to floating-point binop.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h:
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c: New test.

(cherry picked from commit ed60ffd814c86a225a4586da649f6e76718490db)

RISC-V: Support rounding mode for VFNMADD/VFNMACC autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfnmadd     <- autovec/autovec-opt

The autovec generated vfnmadd should take DYN mode, and the
frm must be restored before the vfnmadd insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfnmadd     <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmadd/vfnmacc.
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-4.c: New test.

(cherry picked from commit af0c625f6085567522cf55b2ced05f07ec7be67a)

RISC-V: Support rounding mode for VFNMSAC/VFNMSUB autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfnmsub     <- autovec/autovec-opt

The autovec generated vfnmsub should take DYN mode, and the
frm must be restored before the vfnmsub insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfnmsub     <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmsac/vfnmsub
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit a7cefeaead68e5d89f65ba3a558eddef9b0b0f75)

RISC-V: Support rounding mode for VFMSAC/VFMSUB autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfmsub      <- autovec/autovec-opt

The autovec generated vfmsub should take DYN mode, and the
frm must be restored before the vfmsub insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfmsub      <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmsac/vfmsub
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-2.c: New test.

(cherry picked from commit 625962440ba5c737d6f35f7a1c9af1e9ef6bef3a)

RISC-V: Support rounding mode for VFMADD/VFMACC autovec

There will be a case like below for intrinsic and autovec combination

vfadd RTZ   <- intrinisc static rounding
vfmadd      <- autovec/autovec-opt

The autovec generated vfmadd should take DYN mode, and the
frm must be restored before the vfmadd insn. This patch
would like to fix this issue by:

* Add the frm operand to the vfmadd/vfmacc autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfmadd      <- autovec/autovec-opt
| ...
+------------

However, we leverage unspec instead of use to consume the FRM register
because there are some restrictions from the combine pass. Some code
path of try_combine may require the XVECLEN(pat, 0) == 2 for the
recog_for_combine, and add new use will make the XVECLEN(pat, 0) == 3
and result in the vfwmacc optimization failure. For example, in the
test  widen-complicate-5.c and widen-8.c

Finally, there will be other fma cases and they will be covered in
the underlying patches.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmadd/vfmacc.
* config/riscv/autovec.md: Ditto.
* config/riscv/vector-iterators.md: Add UNSPEC_VFFMA.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c: New test.

(cherry picked from commit 3e37e8231849ded7e214042f60f59fdcec75d7d3)

RISC-V: Add vector_scalar_shift_operand

The vector shift immediates happen to have the same constraints as some
of the CSR-related operands, but it's a different usage. This adds a
name for them, so I don't get confused again next time.

gcc/ChangeLog:

* config/riscv/autovec.md (shifts): Use
vector_scalar_shift_operand.
* config/riscv/predicates.md (vector_scalar_shift_operand): New
predicate.

(cherry picked from commit 0337555c7a2524bd334bafdc06dd801818eb34b6)

RISC-V: Add Vector cost model framework for RVV

Hi, currently RVV vectorization only support picking LMUL according to
compile option --param=riscv-autovec-lmul= which is no ideal.

Compiler should be able to pick optimal LMUL/vectorization factor to
vectorize the loop according to the loop_vec_info and SSA-based register
pressure analysis.

Now, I figure out current GCC cost model provide the approach that we
can choose LMUL/vectorization factor by adjusting the COST.

This patch is just add the minimum COST model framework which is still
applying the default cost model (No vector codes changed from before).

Regression all pased and no difference.

gcc/ChangeLog:

* config.gcc: Add vector cost model framework for RVV.
* config/riscv/riscv.cc (riscv_vectorize_create_costs): Ditto.
(TARGET_VECTORIZE_CREATE_COSTS): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-vector-costs.cc: New file.
* config/riscv/riscv-vector-costs.h: New file.

(cherry picked from commit 4da3065a6422062b029df9660a226297802455f4)

RISC-V: Change vsetvl tail and mask policy to default policy

This patch change the vsetvl policy to default policy
(returned by get_prefer_mask_policy and get_prefer_tail_policy) instead
fixed policy. Any policy is now returned, allowing change to agnostic
or undisturbed. In the future, users may be able to control the default
policy, such as keeping agnostic by compiler options.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (IS_AGNOSTIC): Move to here.
* config/riscv/riscv-v.cc (gen_no_side_effects_vsetvl_rtx):
Change to default policy.
* config/riscv/riscv-vector-builtins-bases.cc: Change to default policy.
* config/riscv/riscv-vsetvl.h (IS_AGNOSTIC): Delete.
* config/riscv/riscv.cc (riscv_print_operand): Use IS_AGNOSTIC to test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Adjust.
* gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-24.c: New test.

(cherry picked from commit e69d050fd990f8e72e19e6dfb1bf7da2f09236f7)

RISC-V: Refactor and clean emit_{vlmax,nonvlmax}_xxx functions

This patch refactor the code of emit_{vlmax,nonvlmax}_xxx functions.
These functions are used to generate RVV insn. There are currently 31
such functions and a few duplicates. The reason so many functions are
needed is because there are more types of RVV instructions. There are
patterns that don't have mask operand, patterns that don't have merge
operand, and patterns that don't need a tail policy operand, etc.

Previously there was the insn_type enum, but it's value was just used
to indicate how many operands were passed in by caller. The rest of
the operands information is scattered throughout these functions.
For example, emit_vlmax_fp_insn indicates that a rounding mode operand
of FRM_DYN should also be passed, emit_vlmax_merge_insn means that
there is no mask operand or mask policy operand.

I introduced a new enum insn_flags to indicate some properties of these
RVV patterns. These insn_flags are then used to define insn_type enum.
For example for the defintion of WIDEN_TERNARY_OP:

  WIDEN_TERNARY_OP = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P
                       | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P | TERNARY_OP_P,

This flags mean the RVV pattern has no merge operand. This flags only apply
to vwmacc instructions. After defining the desired insn_type, all the
emit_{vlmax,nonvlmax}_xxx functions are unified into three functions:

  emit_vlmax_insn (icode, insn_flags, ops);
  emit_nonvlmax_insn (icode, insn_flags, ops, vl);
  emit_vlmax_insn_lra (icode, insn_flags, ops, vl);

Then user can select the appropriate insn_type and the appropriate emit_xxx
function for RVV patterns generation as needed.

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Adjust.
* config/riscv/autovec-vls.md: Ditto.
* config/riscv/autovec.md: Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add insn_type.
(enum insn_flags): Add insn flags.
(emit_vlmax_insn): Adjust.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_insn_lra): Add.
* config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): New.
(emit_vlmax_insn): Adjust.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_insn_lra): Add.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_nonvlmax_slide_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_insn): Delete.
(emit_nonvlmax_masked_insn): Delete.
(emit_vlmax_masked_store_insn): Delete.
(emit_nonvlmax_masked_store_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_vlmax_masked_fp_mu_insn): Delete.
(emit_nonvlmax_tu_insn): Delete.
(emit_nonvlmax_fp_tu_insn): Delete.
(emit_nonvlmax_tumu_insn): Delete.
(emit_nonvlmax_fp_tumu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_cpop_insn): Delete.
(emit_vlmax_integer_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_gather_insn): Delete.
(emit_vlmax_masked_gather_mu_insn): Delete.
(emit_vlmax_compress_insn): Delete.
(emit_nonvlmax_compress_insn): Delete.
(emit_vlmax_reduction_insn): Delete.
(emit_vlmax_fp_reduction_insn): Delete.
(emit_nonvlmax_fp_reduction_insn): Delete.
(expand_vec_series): Adjust.
(expand_const_vector): Adjust.
(legitimize_move): Adjust.
(sew64_scalar_helper): Adjust.
(expand_tuple_move): Adjust.
(expand_vector_init_insert_elems): Adjust.
(expand_vector_init_merge_repeating_sequence): Adjust.
(expand_vec_cmp): Adjust.
(expand_vec_cmp_float): Adjust.
(expand_vec_perm): Adjust.
(shuffle_merge_patterns): Adjust.
(shuffle_compress_patterns): Adjust.
(shuffle_decompress_patterns): Adjust.
(expand_load_store): Adjust.
(expand_cond_len_op): Adjust.
(expand_cond_len_unop): Adjust.
(expand_cond_len_binop): Adjust.
(expand_gather_scatter): Adjust.
(expand_cond_len_ternop): Adjust.
(expand_reduction): Adjust.
(expand_lanes_load_store): Adjust.
(expand_fold_extract_last): Adjust.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Adjust.
* config/riscv/vector.md: Adjust.

(cherry picked from commit 79ab19bcbae6e54c91bfca4ffa45cbc5eb0374bc)

RISC-V: Fix vsetvl pass ICE

This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any
vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn.

PR target/111234

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111234.c: New test.

(cherry picked from commit ac55f9710fe82a4ed8cb132f57303775ce60e5d1)

test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

Like ARM SVE, add RVV variable length xfail.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-7.c: Add RVV.

(cherry picked from commit 282c33c5f1c9b2965c18877aea8466701ab4e678)

test: Adapt slp-26.c check for RVV

Fix FAILs:
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0

Since RVV is able to vectorize it with VLS modes like amdgcn.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-26.c: Adapt for RVV.

(cherry picked from commit 5d34a42f3b64fde9bb8be74231d8d11590c8d1db)

RISC-V: Remove movmisalign pattern for VLA modes

This patch fixed this bunch of failures in "vect" testsuite:
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-1.c execution test
FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-2.c execution test

Spike report:
z  0000000000000000 ra 00000000000100f4 sp 0000003ffffffb30 gp 0000000000012cc8
tp 0000000000000000 t0 00000000000102d4 t1 000000000000000f t2 0000000000000000
s0 0000000000000000 s1 0000000000000000 a0 00000000000101a6 a1 0000000000000008
a2 0000000000000010 a3 0000000000012401 a4 0000000000012480 a5 0000000000000020
a6 000000000000001f a7 00000000000000d6 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 00000000000101ec va/inst 000000000206dc07 sr 8000000200006620
Load access fault!

(spike)
core   0: 0x0000000000010204 (0x02065087) vle16.v v1, (a2)
core   0: exception trap_load_address_misaligned, epc 0x0000000000010204
core   0:           tval 0x0000000000012c81
(spike) reg 0 a2
0x0000000000012c81

According to RVV ISA, we couldn't use "vle16.v" if the address is byte align.

Such issue is caused by this GIMPLE IR:

vect__1.15_17 = .MASK_LEN_LOAD (vectp_t.13_15, 8B, { -1, ... }, _24, 0);

For partial vectorization, the alignment is "8B" byte align here is incorrect here.

After this patch, the vectorization failed:

sll     a5,a4,0x1
add     a5,a5,a1
lhu     a3,64(a5)
lbu     a5,66(a5)
addw    a4,a4,1
srl     a3,a3,0x8
sll     a5,a5,0x8
or      a5,a5,a3
sh      a5,0(a2)
add     a2,a2,2
bne     a4,a0,101f8 <foo+0x14>

I will enable auto-vectorization in another approach in the next following patch.

gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign<mode>): Delete.

(cherry picked from commit f7bff24905a6959f85f866390db2fff1d6f95520)

test: Fix XPASS of RVV

XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1

Like ARM SVE, Fix these XPASS for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-double-reduc-5.c: Add riscv.
* gcc.dg/vect/vect-outer-4e.c: Ditto.
* gcc.dg/vect/vect-outer-4f.c: Ditto.
* gcc.dg/vect/vect-outer-4g.c: Ditto.
* gcc.dg/vect/vect-outer-4k.c: Ditto.
* gcc.dg/vect/vect-outer-4l.c: Ditto.

(cherry picked from commit ece3884b4b5d64dff1f112d0ec13c9b71dd0fc6a)

test: Add xfail for riscv_vector

Like ARM SVE, when we enable scalable vectorization for RVV,
we can't do constant fold for these yet for both ARM SVE and RVV.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr88598-1.c: Add riscv_vector.
* gcc.dg/vect/pr88598-2.c: Ditto.
* gcc.dg/vect/pr88598-3.c: Ditto.

(cherry picked from commit 586ca3db52228ac1c5f2b5ce754928ced4e8e434)

RISC-V: support cm.mva01s cm.mvsa01 in zcmp

Signed-off-by: Die Li <lidie@eswincomputing.com>
Co-Authored-By: Fei Gao <gaofei@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/peephole.md: New pattern.
* config/riscv/predicates.md (a0a1_reg_operand): New predicate.
(zcmp_mv_sreg_operand): New predicate.
* config/riscv/riscv.md: New predicate.
* config/riscv/zc.md (*mva01s<X:mode>): New pattern.
(*mvsa01<X:mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cm_mv_rv32.c: New test.

(cherry picked from commit 490bf0b9756368b34221348b0260e061634e497b)

RISC-V: support cm.popretz in zcmp

Generate cm.popretz instead of cm.popret if return value is 0.

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_zcmp_can_use_popretz): true if popretz can be used
(riscv_gen_multi_pop_insn): interface to generate cm.pop[ret][z]
(riscv_expand_epilogue): expand cm.pop[ret][z] in epilogue
* config/riscv/riscv.md: define A0_REGNUM
* config/riscv/zc.md
(@gpr_multi_popretz_up_to_ra_<mode>): md for popretz ra
(@gpr_multi_popretz_up_to_s0_<mode>): md for popretz ra, s0
(@gpr_multi_popretz_up_to_s1_<mode>): likewise
(@gpr_multi_popretz_up_to_s2_<mode>): likewise
(@gpr_multi_popretz_up_to_s3_<mode>): likewise
(@gpr_multi_popretz_up_to_s4_<mode>): likewise
(@gpr_multi_popretz_up_to_s5_<mode>): likewise
(@gpr_multi_popretz_up_to_s6_<mode>): likewise
(@gpr_multi_popretz_up_to_s7_<mode>): likewise
(@gpr_multi_popretz_up_to_s8_<mode>): likewise
(@gpr_multi_popretz_up_to_s9_<mode>): likewise
(@gpr_multi_popretz_up_to_s11_<mode>): likewise

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: add testcase for cm.popretz in rv32e
* gcc.target/riscv/rv32i_zcmp.c: add testcase for cm.popretz in rv32i

(cherry picked from commit b27d323a368033f0b37e93c57a57a35fd9997864)

RISC-V: support cm.push cm.pop cm.popret in zcmp

Zcmp can share the same logic as save-restore in stack allocation: pre-allocation
by cm.push, step 1 and step 2.

Pre-allocation not only saves callee saved GPRs, but also saves callee saved FPRs and
local variables if any.

Please be noted cm.push pushes ra, s0-s11 in reverse order than what save-restore does.
So adaption has been done in .cfi directives in my patch.

gcc/ChangeLog:

* config/riscv/iterators.md
(slot0_offset): slot 0 offset in stack GPRs area in bytes
(slot1_offset): slot 1 offset in stack GPRs area in bytes
(slot2_offset): likewise
(slot3_offset): likewise
(slot4_offset): likewise
(slot5_offset): likewise
(slot6_offset): likewise
(slot7_offset): likewise
(slot8_offset): likewise
(slot9_offset): likewise
(slot10_offset): likewise
(slot11_offset): likewise
(slot12_offset): likewise
* config/riscv/predicates.md
(stack_push_up_to_ra_operand): predicates of stack adjust pushing ra
(stack_push_up_to_s0_operand): predicates of stack adjust pushing ra, s0
(stack_push_up_to_s1_operand): likewise
(stack_push_up_to_s2_operand): likewise
(stack_push_up_to_s3_operand): likewise
(stack_push_up_to_s4_operand): likewise
(stack_push_up_to_s5_operand): likewise
(stack_push_up_to_s6_operand): likewise
(stack_push_up_to_s7_operand): likewise
(stack_push_up_to_s8_operand): likewise
(stack_push_up_to_s9_operand): likewise
(stack_push_up_to_s11_operand): likewise
(stack_pop_up_to_ra_operand): predicates of stack adjust poping ra
(stack_pop_up_to_s0_operand): predicates of stack adjust poping ra, s0
(stack_pop_up_to_s1_operand): likewise
(stack_pop_up_to_s2_operand): likewise
(stack_pop_up_to_s3_operand): likewise
(stack_pop_up_to_s4_operand): likewise
(stack_pop_up_to_s5_operand): likewise
(stack_pop_up_to_s6_operand): likewise
(stack_pop_up_to_s7_operand): likewise
(stack_pop_up_to_s8_operand): likewise
(stack_pop_up_to_s9_operand): likewise
(stack_pop_up_to_s11_operand): likewise
* config/riscv/riscv-protos.h
(riscv_zcmp_valid_stack_adj_bytes_p):declaration
* config/riscv/riscv.cc (struct riscv_frame_info): comment change
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_multi_push_regs_count): num of regs in multi-push
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place
(riscv_save_libcall_count): no functional change
(riscv_compute_frame_info): add zcmp frame info
(riscv_for_each_saved_reg): save or restore fprs in specified slot for zcmp
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_gen_multi_push_pop_insn): gen function for multi push and pop
(get_multi_push_fpr_mask): get mask for the fprs pushed by cm.push
(riscv_expand_prologue): allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): allocate stack by cm.pop[ret]
(zcmp_base_adj): calculate stack adjustment base size
(zcmp_additional_adj): calculate stack adjustment additional size
(riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment valid
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
(CALLEE_SAVED_FREG_NUMBER): get x of fsx(fs0 ~ fs11)
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file. machine description for zcmp

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_push_fpr.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.

(cherry picked from commit 3d1d3132b9d4dc8b6069ad95dad624371124f297)

middle-end: Apply MASK_LEN_LOAD_LANES/MASK_LEN_STORE_LANES to ivopts/alias

Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant.

Bootstrap and Regression on X86 passed.

Ok for trunk?

gcc/ChangeLog:

* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
(get_alias_ptr_type_for_ptr_address): Ditto.

(cherry picked from commit 0394184cebc15e5e3f13d04d9ffbc787a16018bd)

RISC-V: Make arch-24.c to test "success" case

arch-24.c and arch-25.c are exactly the same and redundant. The author
suspects that the original author intended to test two base ISAs (RV32I and
RV64I) so this commit changes arch-24.c to test that RV32I+Zcf does not
cause any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-24.c: Test RV32I+Zcf instead.

(cherry picked from commit a248e1cc860821b96a42be96478257c4964a7c2a)

RISC-V: Make sure we get VL REG operand for VLMAX vsetvl

Fix ICE in "vect" testsuite:

FAIL: gcc.dg/vect/pr64495.c (internal compiler error: in df_uses_record, at df-scan.cc:2958)
FAIL: gcc.dg/vect/pr64495.c (test for excess errors

After this patch, all current found VSETVL PASS related bugs in "vect" are fixed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(vector_insn_info::get_avl_or_vl_reg): Fix bug.

(cherry picked from commit 7accc6208befae77699a56f67a94da1e247ed069)

RISC-V: Enable movmisalign for VLS modes

Prevous patch (which removed VLA modes movmisalign pattern) to fix run-time bug.
Such patch disable vectorization for misalign data movement.

After I check LLVM codes, LLVM supports misalign for VLS modes.

Before this patch:

sll     a5,a4,0x1
add     a5,a5,a1
lhu     a3,64(a5)
lbu     a5,66(a5)
addw    a4,a4,1
srl     a3,a3,0x8
sll     a5,a5,0x8
or      a5,a5,a3
sh      a5,0(a2)
add     a2,a2,2
bne     a4,a0,101f8 <foo+0x14>

After this patch:

foo:
lui a0,%hi(.LANCHOR0)
addi a0,a0,%lo(.LANCHOR0)
addi sp,sp,-16
addi a1,a0,1
li a2,64
sd ra,8(sp)
vsetvli zero,a2,e8,m4,ta,ma
addi a0,a0,128
vle8.v v4,0(a1)
vse8.v v4,0(a0)
call memcmp
bne a0,zero,.L6
ld ra,8(sp)
addi sp,sp,16
jr ra
.L6:
call abort

Note this patch has passed all testcases in "vect" which are related to alignment.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (movmisalign<mode>): New pattern.
* config/riscv/riscv.cc (riscv_support_vector_misalignment): Support
VLS misalign.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: New test.

(cherry picked from commit 260f743aa476abce8f88cceaca12abcb8115b02f)

RISC-V: Use splitter to generate zicond in another case

So in analyzing Ventana's internal tree against the trunk it became apparent
that the current zicond code is missing a case that helps coremark's bitwise
CRC implementation.

Here's a minimized testcase:

long xor1(long crc, long poly)
{
  if (crc & 1)
    crc ^= poly;

  return crc;
}

ie, it's just a conditional xor.

We generate this:

        andi    a5,a0,1
        neg     a5,a5
        and     a5,a5,a1
        xor     a0,a5,a0
        ret

But we should instead generate:

        andi    a5,a0,1
        czero.eqz       a5,a1,a5
        xor     a0,a5,a0
        ret

Combine wants to generate:

Trying 7, 8 -> 9:
    7: r140:DI=r137:DI&0x1
    8: r141:DI=-r140:DI
      REG_DEAD r140:DI
    9: r142:DI=r141:DI&r144:DI
      REG_DEAD r144:DI
      REG_DEAD r141:DI
Failed to match this instruction:
(set (reg:DI 142)
    (and:DI (sign_extract:DI (reg/v:DI 137 [ crc ])
            (const_int 1 [0x1])
            (const_int 0 [0]))
        (reg:DI 144)))

A splitter can rewrite the above into a suitable if-then-else construct and
squeeze an instruction out of that pesky CRC loop.  Sadly it doesn't really
help anything else.

The patch includes two variants.  One that uses ZBS, the other uses an ANDI
logical to produce the input condition.

gcc/
* config/riscv/zicond.md: New splitters to rewrite single bit
sign extension as the condition to a czero in the desired form.

gcc/testsuite
* gcc.target/riscv/zicond-xor-01.c: New test.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
(cherry picked from commit 94b950df6f8c46925799f642e5c44f42638f2b5e)

RISC-V: Added zvfh support for zfa extensions.

This is a follow-up for the zfa extension, added according to the recommendations
for zvfh and patch of Tsukasa OI <research_trasio@irq.a4lg.com>. At the same time,
zfa-fli-5.c of which is also based on the patch.

Ref:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627284.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628492.html

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
zvfh can generate zfa extended instruction fli.h, just like zfh.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fli-7.c: Change fa0 to fa\[0-9\] to avoid
assigning register numbers that are non-zero.
* gcc.target/riscv/zfa-fli-8.c: Ditto.
* gcc.target/riscv/zfa-fli-5.c: New test.

(cherry picked from commit fce74ce2535aa3b7648ba82e7e61eb77d0175546)

RISC-V: generate builtin macro for compilation with strict alignment

Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false.

Tested for regressions using rv32/64 multilib with newlib/linux

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Generate
__riscv_unaligned_avoid with value 1 or
__riscv_unaligned_slow with value 1 or
__riscv_unaligned_fast with value 1
* config/riscv/riscv.cc (riscv_option_override): Define
riscv_user_wants_strict_align. Set
riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-1.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/attribute-4.c: Check for
__riscv_unaligned_avoid
* gcc.target/riscv/attribute-5.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.
* gcc.target/riscv/predef-align-6.c: New test.

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Co-authored-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit 6e23440b5df4011bbe1dbee74d47641125dd7d16)

RISC-V: Add Types to Un-Typed Vector Instructions

Updates vector instructions to ensure that no instruction is left
without a type attribute. Create a placeholder type "vector" for
instructions where a type isn't clear

Tested for regressions using rv32/rv64 gc/gcv multilib with newlib/linux.

gcc/Changelog:

* config/riscv/autovec-vls.md: Update types
* config/riscv/riscv.md: Add vector placeholder type
* config/riscv/vector.md: Update types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit 4b70c7c849331d45c0d6a1a4e1cf96b103be9aa6)

RISC-V: Fix one ICE for vect test vect-multitypes-5

There will be one ICE when build vect-multitypes-5.c similar as below:

riscv64-unknown-elf-gcc -O3 \
  -march=rv64imafdcv -mabi=lp64d -mcmodel=medlow \
  -fdiagnostics-plain-output -flto -ffat-lto-objects \
  --param riscv-autovec-preference=scalable -Wno-psabi \
  -ftree-vectorize -fno-tree-loop-distribute-patterns \
  -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details \
  gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c -o test.elf -lm

The below RTL is not well handled in riscv_legitimize_const_move, and
then fall through to the default pass. Then the
default force_const_mem will NULL_RTX, and will have ICE when operating
one the NULL_RTX.

(const:DI
  (plus:DI
    (symbol_ref:DI ("ic") [flags 0x2] <var_decl 0x7fe57740be10 ic>)
    (const_poly_int:DI [16, 16])))

This patch would like to take care of this rtl in riscv_legitimize_const_move.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_poly_move): New declaration.
(riscv_legitimize_const_move): Handle ref plus const poly.

(cherry picked from commit d16af3ebea84749ac673db29a4124d2dc7cd369e)

RISC-V: Add stub support for existing extensions (unprivileged)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.

(cherry picked from commit f30d6a48635b5b180e46c51138d0938d33abd942)

RISC-V: Add stub support for existing extensions (vendor)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported vendor extensions to
riscv_ext_version_table (no riscv_implied_info entries to add; all
information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add stub support for all vendor extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-30.c: New test for a stub
vendor extension 'XVentanaCondOps'.

(cherry picked from commit fea5442127daf8472966360279d402023dba3379)

RISC-V: Add stub support for existing extensions (privileged)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

As a start, this commit adds stub supported *privileged* extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from privileged extensions.
(riscv_ext_version_table): Add stub support for all privileged
extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-29.c: New test for a stub privileged
extension 'Smstateen' with some implications.

(cherry picked from commit 4053d295fdd81d3e05c4977e3cd9c647e8cc6bc2)

RISC-V: Make PR 102957 tests more comprehensive

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will partially break the original intent of PR
102957 test case because we wanted to make sure that we can parse a valid
two-letter extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit adds pr102957-2.c to make sure that there will be no errors if
we parse a valid two-letter extension name.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957-2.c: New test case using the 'Zk'
extension to continue testing whether we can use valid two-letter
extensions.

(cherry picked from commit 8b0662254cdac3e0b670c1c54752e1d43113b0f4)

RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}

This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust
* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
(get_vlmax_rtx): Exported.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
(emit_vlmax_masked_gather_mu_insn): Adjust.
(get_vlmax_rtx): New func.
(expand_load_store): Adjust.
(expand_cond_len_unop): Call expand_cond_len_op.
(expand_cond_len_op): New subroutine.
(expand_cond_len_binop): Call expand_cond_len_op.
(expand_cond_len_ternop): Call expand_cond_len_op.
(expand_lanes_load_store): Adjust.

(cherry picked from commit b3176bdc86c04da6545a4bd8e2fb7f38d3f2db8d)

vect test: Remove xfail for riscv

We are planning to enable "vect" testsuite with scalable vector auto-vectorization.

This case XPASS:
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1

like ARM SVE.
gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-12.c: Add riscv xfail.

(cherry picked from commit 97aafa9cbb68ffa23aa9f018cc5cb30648a72427)

RISC-V: Fix ASM check of vlmax_switch_vtype-16.c

Notice there is a failure:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c -O2 scan-assembler-times vsetvli\\s+zero,\\s*zero 2

Fix "2" into "3", the assembly is correct and better.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Fix ASM check.

(cherry picked from commit 58a48781efa31e08b570f035fbceaaa8018c3412)

RISC-V: Fix AVL/VL get ICE[VSETVL PASS]

Fix bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vector_insn_info::get_avl_or_vl_reg): New function.
(pass_vsetvl::compute_local_properties): Fix bug.
(pass_vsetvl::commit_vsetvls): Ditto.
* config/riscv/riscv-vsetvl.h: New function.

(cherry picked from commit 818cc9f2d2f3dbbd4004ff85d3125d92d1e430c9)

RISC-V: Fix error combine of pred_mov pattern

This patch fix PR110943 which will produce some error code. This is because
the error combine of some pred_mov pattern. Consider this code:

```

void foo9 (void *base, void *out, size_t vl)
{
    int64_t scalar = *(int64_t*)(base + 100);
    vint64m2_t v = __riscv_vmv_v_x_i64m2 (0, 1);
    *(vint64m2_t*)out = v;
}
```

RTL before combine pass:

```
(insn 11 10 12 2 (set (reg/v:RVVM2DI 134 [ v ])
        (if_then_else:RVVM2DI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM2DI repeat [
                    (const_int 0 [0])
                ])
            (unspec:RVVM2DI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/app/example.c":6:20 1089 {pred_movrvvm2di})
(insn 14 13 0 2 (set (mem:RVVM2DI (reg/v/f:DI 136 [ out ]) [1 MEM[(vint64m2_t *)out_4(D)]+0 S[32, 32] A128])
        (reg/v:RVVM2DI 134 [ v ])) "/app/example.c":7:23 717 {*movrvvm2di_whole})
```

RTL after combine pass:
```
(insn 14 13 0 2 (set (mem:RVVM2DI (reg:DI 138) [1 MEM[(vint64m2_t *)out_4(D)]+0 S[32, 32] A128])
        (if_then_else:RVVM2DI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM2DI repeat [
                    (const_int 0 [0])
                ])
            (unspec:RVVM2DI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/app/example.c":7:23 1089 {pred_movrvvm2di})
```

This combine change the semantics of insn 14. I split @pred_mov pattern and
restrict the conditon of @pred_mov.

PR target/110943

gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_int_or_double_0_operand):
New predicate.
* config/riscv/riscv-vector-builtins.cc (function_expander::function_expander):
force_reg mem target operand.
* config/riscv/vector.md (@pred_mov<mode>): Wrapper.
(*pred_mov<mode>): Remove imm -> reg pattern.
(*pred_broadcast<mode>_imm): Add imm -> reg pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Adjust.
* gcc.target/riscv/rvv/base/pr110943.c: New test.

(cherry picked from commit 973eb0deb467c79cc21f265a710a81054cfd3e8c)

RISC-V: Fix documentation of __builtin_riscv_pause

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.

(cherry picked from commit cf64ab18e3f820376ff20c663c7c7bf1af290f02)

RISC-V: __builtin_riscv_pause for all environment

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension (in
the assembler). However, GCC emits "pause" unconditionally, making an
assembler error while compiling code with __builtin_riscv_pause while the
'Zihintpause' extension disabled.

However, the "pause" instruction code (0x0100000f) is a HINT and emitting its
instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x0100000f" instead of "pause" only if the extension is disabled (making
the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Implement the 'Zihintpause' extension, version 2.0.
(riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause".
(riscv_builtins) : Add "pause" extension.
* config/riscv/riscv-opts.h (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Adjust output based on
TARGET_ZIHINTPAUSE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c: New test when the 'Zihintpause'
extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
* gcc.target/riscv/zihintpause-noarch.c: New test when the 'Zihintpause'
extension is disabled.

(cherry picked from commit c2d04dd659c499d8df19f68d0602ad4c7d7065c2)

RISC-V: Fix uninitialized probability for GIMPLE IR tests

This patch fix unitialized probability in GIMPLE IR code tests:
FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10b.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10b.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10c.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10c.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10d.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10d.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10e.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10e.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Skip
never probability.
(pass_vsetvl::compute_probabilities): Fix unitialized probability.

(cherry picked from commit 421cf6109ad23ae0f5d3da9adb582eb464e8826c)

RISC-V: Disable user vsetvl fusion into EMPTY or DIRTY (Polluted EMPTY) block

This patch is fixing these bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Fix bug.

(cherry picked from commit e7b585a468aa4980955ae25fa9f4b41a3dc2995e)

RISC-V: Fix VSETVL test failures

Committed.

Fix failures:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vxrm-8.c: Adapt tests.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.

(cherry picked from commit 1671ad9ecff9f361870aeb26d5c5c6d9808826d7)

RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest fusion.
I do the refactor for the following reasons:

  1. Current implementation of phase 3 is doing too many things which makes the code quality
     quite messy and not easy to maintain.
  2. The demand fusion I do previously is we explicitly make the fusion including how to fuse
     VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion point (location)
     whether it is correct and optimal...etc.

     We are dong these things too much so I added these following functions:

        enum fusion_type get_backward_fusion_type (const bb_info *,
     const vector_insn_info &);
        bool hard_empty_block_p (const bb_info *, const vector_insn_info &) const;
        bool backward_demand_fusion (void);
        bool forward_demand_fusion (void);
        bool cleanup_illegal_dirty_blocks (void);

     to make sure the VSETV fusion is optimal and correct. I found in may downstream testing it is
     not the reliable and optimal approach.

     Instead, this patch is to use 'compute_earliest' which is the function of LCM to fuse multiple
     'compatible' VSETVL demand info if they are having same earliest edge.  We let LCM decide almost
     everything of demand fusion for us. The only thing we do (Not the LCM do) is just checking the
     VSETVLs demand info are compatible or not. That's all we need to do.
     I belive such approach is much more reliable and optimal than before (We have many testcases already to check this refactor patch).
  3. Using LCM approach to do the demand fusion is more reliable and better CFG than before.
  ...

Here is the basics of this patch approach:

Consider this following case:

for
  for
    for
      ...
         for
   if (...)
     VSETVL 1 demand: RATIO = 32 and TU policy.
   else if (...)
     VSETVL 2 demand: SEW = 16.
   else
     VSETVL 3 demand: MU policy.

   - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 and VSETVL 3.
     They are having same earliest edge which is outside the 1th inner-most loop.

   - Then, we check these 3 VSETVL demand info are compatible so fuse them into a single VSETVL info:
     demand SEW = 16, LMUL = MF2, TU, MU.

   - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) will hoist such VSETVL
     to the outer-most loop. So that we can get optimal codegen.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p):
New function.
(after_or_same_p): Ditto.
(find_reg_killed_by): Delete.
(has_vsetvl_killed_avl_p): Ditto.
(anticipatable_occurrence_p): Refactor.
(any_set_in_bb_p): Delete.
(count_regno_occurrences): Ditto.
(backward_propagate_worthwhile_p): Ditto.
(demands_can_be_fused_p): Ditto.
(earliest_pred_can_be_fused_p): New function.
(vsetvl_dominated_by_p): Ditto.
(vector_insn_info::parse_insn): Refactor.
(vector_insn_info::merge): Refactor.
(vector_insn_info::dump): Refactor.
(vector_infos_manager::vector_infos_manager): Refactor.
(vector_infos_manager::all_empty_predecessor_p): Delete.
(vector_infos_manager::all_same_avl_p): Ditto.
(vector_infos_manager::create_bitmap_vectors): Refactor.
(vector_infos_manager::free_bitmap_vectors): Refactor.
(vector_infos_manager::dump): Refactor.
(pass_vsetvl::update_block_info): New function.
(enum fusion_type): Ditto.
(pass_vsetvl::get_backward_fusion_type): Delete.
(pass_vsetvl::hard_empty_block_p): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.
(pass_vsetvl::forward_demand_fusion): Ditto.
(pass_vsetvl::demand_fusion): Ditto.
(pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto.
(pass_vsetvl::compute_local_properties): Ditto.
(pass_vsetvl::earliest_fusion): New function.
(pass_vsetvl::vsetvl_fusion): Ditto.
(pass_vsetvl::commit_vsetvls): Refactor.
(get_first_vsetvl_before_rvv_insns): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_earliest_vsetvls): New function.
(pass_vsetvl::df_post_optimization): Refactor.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-27.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-28.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-29.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-30.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-35.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-36.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-50.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-51.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-66.c:
* gcc.target/riscv/rvv/vsetvl/avl_single-67.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-69.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-70.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-71.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-72.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-76.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-77.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-84.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-93.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-94.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-96.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/ffload-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_switch-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_switch-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-45.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-103.c: New test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-13.c: New test.

(cherry picked from commit e030af3e6f6d3ae555d6f70047ea3a2bf5744b7e)

RISC-V: Fix spill-11.c testsuite failure

Jivan's work also results in using a different save/restore function for the
spill-11 test. So the expected output needs minor adjusting

gcc/testsuite
* gcc.target/riscv/rvv/base/spill-11.c: Adjust expected output.

(cherry picked from commit 3745feb19ed072e0865b12a891d7dbf7ba12c337)

RISC-V: Fix spill-12 test

Jivan's recent work on IRA results in more efficient code for this test. This
adjusts the expected output for the removal of 5 instructions and conversion of
an addi into a simple mv.

gcc/testsuite
* gcc.target/riscv/rvv/base/spill-12.c: Update expected output.

(cherry picked from commit 6567837fd823a93f7f7948a73ff9dc1153592e8c)

RISC-V: Fix xtheadcondmov-indirect.c

The pressure sensitive scheduling change perturbs the output ever so slightly
for this test. Seemed easiest to just turn that off rather than generalize the
expected output enough to work across all the relevant optimization options.

gcc/testsuite/
* gcc.target/riscv/xtheadcondmov-indirect.c: Turn off pressure
sensitive scheduling.

RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < 4; i++)
    if (a[i] < min_v)
      last = i;

  return last;
}

--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8

condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li a5,32
vmv.v.x v8,a1
vl8re32.v v0,0(a0)
vid.v v16
vmslt.vv v0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addi a5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vm v8,v16,v0
vslidedown.vx v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li a0,66
ret

--param=riscv-autovec-preference=scalable

condition_reduction:
csrr a6,vlenb
mv a2,a0
li a3,32
li a0,66
srli a6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma    ----> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
slli a1,a5,2
vmv.v.x v2,a6
vmslt.vv v0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addi a5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vm v2,v3,v0
vslidedown.vx v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret

There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue.

vsetvl issue is not included in this patch but will be fixed soon.

gcc/ChangeLog:

* config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.

(cherry picked from commit e7545cadbedfc167749d801bd574cf9fe22ed5c5)

RISC-V: Add Types to Un-Typed Sync Instructions:

Updates the sync instructions to ensure that no insn is left without
a type attribute. Updates a total of 9 insns to have type "atomic"
or type "multi" based on number of assembly instructions generated

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/sync-rvwmo.md: updated types to "multi" or
"atomic" based on number of assembly lines generated
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit df177510665c4e1045bdaadf10d837f1bdc4ea06)

RISC-V: Make stack_save_restore tests more robust

Spurred by Jivan's patch and a desire for cleaner testresults, I went ahead and
make the stack_save_restore tests independent of the precise stack size by
using a regexp.

gcc/testsuite/
* gcc.target/riscv/stack_save_restore_1.c: Robustify.
* gcc.target/riscv/stack_save_restore_2.c: Robustify.

(cherry picked from commit e1f096a3cc96c71907cfbc7b8baf67a3d863cb6d)

[committed] RISC-V: Fix minor testsuite problem with zicond

I thought I had already fixed this, but clearly if I did, I didn't include it
in any upstream commits.

With -Og the optimizers are hindered in various ways and this prevents using
zicond. So skip this test with -Og (it was already being skipped at -O0).

gcc/testsuite
* gcc.target/riscv/zicond-primitiveSemantics.c: Disable for -Og.

(cherry picked from commit 3cd2b73079bac374ce1c542b9c9e354e00a8713d)

[PATCH v10] RISC-V: Add support for the Zfa extension

This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb

The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html

What needs special explanation is:
1, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to
  accelerate the processing of JavaScript Numbers.", so it seems that no implementation
  is required.

2, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.
  Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and
  fmaxm<hf\sf\df>3 to prepare for later.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension version, which depends on
the F extension.
* config/riscv/constraints.md (zfli): Constrain the floating point number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is
not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is
required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate
in assembly.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm<mode>3): New.
(fmaxm<mode>3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_zfa): New.
* config/riscv/riscv.opt: New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fli-1.c: New test.
* gcc.target/riscv/zfa-fli-2.c: New test.
* gcc.target/riscv/zfa-fli-3.c: New test.
* gcc.target/riscv/zfa-fli-4.c: New test.
* gcc.target/riscv/zfa-fli-6.c: New test.
* gcc.target/riscv/zfa-fli-7.c: New test.
* gcc.target/riscv/zfa-fli-8.c: New test.

Co-authored-by: Tsukasa OI <research_trasio@irq.a4lg.com>
(cherry picked from commit 30699b999e94b66ff8706d3b07a35b2a9554d10c)

RISC-V: Enable Hoist to GCSE simple constants

Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates. For a simple const (say 6 which needs seperate insn "LI 6")
backend currently returns 0, causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
fine since riscv_rtx_costs () was returning non-zero (although that
itself might need refining: see bugzilla 111139).

To keep testsuite parity, some V tests need updating which started failing
in the new costing regime.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_rtx_costs): Adjust const_int
cost. Add some comments about different constants handling.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Remove test
for Jump.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit b41d7eb0e14785ff0ad6e6922cbd4c880e680bf9)

RISC-V: Add early continue for ENTRY and EXIT block

Committed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::compute_local_properties):
Add early continue.

(cherry picked from commit 449ab115dece8ac8e8f27d2d7b5bc653a2c75d3a)

RISC-V: Move vector-abi testcases into rvv/base folder

Resolves failures like this on rv32gcv linux:
compiler exited with status 1
output is:
In file included from /tc-baseline/build-linux-gcv/sysroot/usr/include/features.h:515,
                 from /tc-baseline/build-linux-gcv/sysroot/usr/include/bits/libc-header-start.h:33,
                 from /tc-baseline/build-linux-gcv/sysroot/usr/include/stdint.h:26,
                 from /tc-baseline/build-linux-gcv/lib/gcc/riscv32-unknown-linux-gnu/14.0.0/include/stdint.h:9,
                 from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/stdint.h:9,
                 from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/riscv_vector.h:28,
                 from /tc-baseline/gcc/gcc/testsuite/gcc.target/riscv/vector-abi-1.c:4:
/tc-baseline/build-linux-gcv/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.

Tested using:
rv{32/64}{gc/gcv} newlib
rv{32/64}gcv linux

gcc/testsuite/ChangeLog:

* gcc.target/riscv/vector-abi-1.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-1.c: ...here.
* gcc.target/riscv/vector-abi-2.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-2.c: ...here.
* gcc.target/riscv/vector-abi-3.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-3.c: ...here.
* gcc.target/riscv/vector-abi-4.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-4.c: ...here.
* gcc.target/riscv/vector-abi-5.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-5.c: ...here.
* gcc.target/riscv/vector-abi-6.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-6.c: ...here.
* gcc.target/riscv/vector-abi-7.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-7.c: ...here.
* gcc.target/riscv/vector-abi-8.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-8.c: ...here.
* gcc.target/riscv/vector-abi-9.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-9.c: ...here.

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
(cherry picked from commit 3ea624da71095cd480c31983d13db45bd9c5a738)

RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS testcases

This patch is depending on middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html

We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns.

Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the COND_LEN_FMS/COND_LEN_FNMS STMT fold.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_preferred_else_value): Remove it since
it forbid COND_LEN_FMS/COND_LEN_FNMS STMT fold.
(TARGET_PREFERRED_ELSE_VALUE): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adapt test.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-9.c: New test.

(cherry picked from commit 1fbcae1c6452c9939a4be818a64cd01883abd80e)

RISC-V: Enable pressure-aware scheduling by default.

this patch enables pressure-aware scheduling for riscv. There have been
various requests for it so I figured I'd just go ahead and send
the patch.

There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.

As cost and scheduling models mature I expect the situation to improve
and for now I think it's generally favorable to enable pressure-aware
scheduling so we can work with it rather than trying to find every
possible problem in advance.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add -fsched-pressure.
* config/riscv/riscv.cc (riscv_option_override): Set sched
pressure algorithm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/narrow_constraint-1.c: Add
-fno-sched-pressure.
* gcc.target/riscv/rvv/base/narrow_constraint-17.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-18.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-19.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-20.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-21.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-22.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-23.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-24.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-25.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-26.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-27.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-28.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-29.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-30.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-31.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-4.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-5.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.

(cherry picked from commit a047513c9222f14adc6e5a015e038b207bb9a653)

RISC-V: Allow const 17-31 for vector shift.

This patch adds a missing constraint in order to be able to print (and
not ICE) vector immediates 17-31 for vector shifts.

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Allow vk operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-immediate.c: New test.

(cherry picked from commit b6ba0cc9339f2cc81398863ae779daa6c8853ad6)

RISC-V: Add missing conversion tests.

This adds some missing tests for vf[nw]cvt.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c:
Add tests.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.

(cherry picked from commit e7aec3ae38ce740885e73255e12675174790758d)

RISC-V: Fix reduc_strict_run-1 test case.

This patch fixes the reduc_strict_run-1 testcase by introducing
a variable that holds the reference result. This is necessary
because in presence of _Float16 emulation an intermediate
result used in a comparison is computed in higher precision.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c:
Add variable to hold reference result.

(cherry picked from commit 8c3146ce0ee14bc6747fb92947879d82d43f3bb2)

gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

Hi, Richard and Richi.

Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.

Consider this following case:
  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,            \
      TYPE *__restrict a,              \
      TYPE *__restrict b, int n)       \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      dst[i] -= a[i] * b[i];                                           \
  }

  TEST_TYPE (float)                                                            \

TEST_ALL ()

Gimple IR for RVV:

...
_39 = -vect__8.14_26;
vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, vect__4.8_34, vect__4.8_34, _46, 0);
...

This is because this following piece of codes in tree-ssa-math-opts.cc:

      if (len)
fma_stmt
  = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
addop, else_value, len, bias);
      else if (cond)
fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
       op2, addop, else_value);
      else
fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
      gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
      gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
   use_stmt));
      gsi_replace (&gsi, fma_stmt, true);
      /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
regardless of where the negation occurs.  */
      gimple *orig_stmt = gsi_stmt (gsi);
      if (fold_stmt (&gsi, follow_all_ssa_edges))
{
  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
    gcc_unreachable ();
  update_stmt (gsi_stmt (gsi));
}

'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA ====> COND_LEN_FNMA.

This patch support STMT fold into:

vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, vect__4.8_34, { 0.0, ... }, _46, 0);

Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.

Extend maximum num ops:
-  static const unsigned int MAX_NUM_OPS = 5;
+  static const unsigned int MAX_NUM_OPS = 7;

Bootstrap and Regtest on X86 passed.
Tested on aarch64 Qemu.

Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.

gcc/ChangeLog:

* genmatch.cc (decision_tree::gen): Support
COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
* gimple-match-exports.cc (gimple_simplify): Ditto.
(gimple_resimplify6): New function.
(gimple_resimplify7): New function.
(gimple_match_op::resimplify): Support
COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
(convert_conditional_op): Ditto.
(build_call_internal): Ditto.
(try_conditional_simplification): Ditto.
(gimple_extract): Ditto.
* gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
* internal-fn.cc (CASE): Ditto.

VECT: Apply LEN_FOLD_EXTRACT_LAST into loop vectorizer

Hi.

This patch is apply LEN_FOLD_EXTRACT_LAST into loop vectorizer.

Consider this following case:

/* Simple condition reduction.  */

int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < N; i++)
    if (a[i] < min_v)
      last = i;

  return last;
}

With this patch, we can generate this following IR:

  _44 = .SELECT_VL (ivtmp_42, POLY_INT_CST [4, 4]);
  _34 = vect_vec_iv_.5_33 + { POLY_INT_CST [4, 4], ... };
  ivtmp_36 = _44 * 4;
  vect__4.8_39 = .MASK_LEN_LOAD (vectp_a.6_37, 32B, { -1, ... }, _44, 0);

  mask__11.9_41 = vect__4.8_39 < vect_cst__40;
  last_5 = .LEN_FOLD_EXTRACT_LAST (last_14, mask__11.9_41, vect_vec_iv_.5_33, _44, 0);
  ...

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_reduction): Apply
LEN_FOLD_EXTRACT_LAST.
* tree-vect-stmts.cc (vectorizable_condition): Ditto.

(cherry picked from commit a28d4fce8ec2540259a257149de7081f27fb027e)

tree-optimization/111128 - fix shift pattern recog

The following fixes placement of shift operand sanitization with
MIN when the original shift operand was external but the actual
one is not.

PR tree-optimization/111128
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Emit external shift operand inline if we promoted it with
another pattern stmt.

* gcc.dg/torture/pr111128.c: New testcase.

(cherry picked from commit 7b67cab154d4b5ec2a6bb62755da31cefbe63536)

RISC-V: Fix one typo in autovec.md pattern comment

vfmsac => vfnmacc
vfmsub => vfnmadd

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec.md: Fix typo.

(cherry picked from commit 1c51805e2468bc10057bc0f2fc12fab909d21d04)

RISC-V: Refactor RVV class by frm_op_type template arg

As suggested by kito, we will add new frm_opt_type template arg
to the op class, to avoid the duplicated function expand.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class binop_frm): Removed.
(class reverse_binop_frm): Ditto.
(class widen_binop_frm): Ditto.
(class vfmacc_frm): Ditto.
(class vfnmacc_frm): Ditto.
(class vfmsac_frm): Ditto.
(class vfnmsac_frm): Ditto.
(class vfmadd_frm): Ditto.
(class vfnmadd_frm): Ditto.
(class vfmsub_frm): Ditto.
(class vfnmsub_frm): Ditto.
(class vfwmacc_frm): Ditto.
(class vfwnmacc_frm): Ditto.
(class vfwmsac_frm): Ditto.
(class vfwnmsac_frm): Ditto.
(class unop_frm): Ditto.
(class vfrec7_frm): Ditto.
(class binop): Add frm_op_type template arg.
(class unop): Ditto.
(class widen_binop): Ditto.
(class widen_binop_fp): Ditto.
(class reverse_binop): Ditto.
(class vfmacc): Ditto.
(class vfnmsac): Ditto.
(class vfmadd): Ditto.
(class vfnmsub): Ditto.
(class vfnmacc): Ditto.
(class vfmsac): Ditto.
(class vfnmadd): Ditto.
(class vfmsub): Ditto.
(class vfwmacc): Ditto.
(class vfwnmacc): Ditto.
(class vfwmsac): Ditto.
(class vfwnmsac): Ditto.
(class float_misc): Ditto.

(cherry picked from commit 0345152f922c3a58ae0a8ee014e37dcfab35592c)

Improve quality of code from LRA register elimination

This is primarily Jivan's work, I'm mostly responsible for the write-up and
coordinating with Vlad on a few questions.

On targets with limitations on immediates usable in arithmetic instructions,
LRA's register elimination phase can construct fairly poor code.

This example (from the GCC testsuite) illustrates the problem well.

int  consume (void *);
int foo (void) {
  int x[1000000];
  return consume (x + 1000);
}

If you compile on riscv64-linux-gnu with "-O2 -march=rv64gc -mabi=lp64d", then
you'll get this code (up to the call to consume()).

        .cfi_startproc
        li      t0,-4001792
        li      a0,-3997696
        li      a5,4001792
        addi    sp,sp,-16
        .cfi_def_cfa_offset 16
        addi    t0,t0,1792
        addi    a0,a0,1696
        addi    a5,a5,-1792
        sd      ra,8(sp)
        add     a5,a5,a0
        add     sp,sp,t0
        .cfi_def_cfa_offset 4000016
        .cfi_offset 1, -8
        add     a0,a5,sp
        call    consume

Of particular interest is the value in a0 when we call consume. We compute that
horribly inefficiently.   If we back-substitute from the final assignment to a0
we get...

a0 = a5 + sp
a0 = a5 + (sp + t0)
a0 = (a5 + a0) + (sp + t0)
a0 = ((a5 - 1792) + a0) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + (t0 + 1792))
a0 = (a5 + (a0 + 1696)) + (sp + t0)  // removed offsetting terms
a0 = (a5 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + -4001792)
a0 = (-3997696 + 1696) + (sp -16) // removed offsetting terms
a0 = sp - 3990616

That's a pretty convoluted way to compute sp - 3990616.

Something like this would be notably better (not great, but we need both the
stack adjustment and the address of the object to pass to consume):

   addi sp,sp,-16
   sd ra,8(sp)
   li t0,-4001792
   addi t0,t0,1792
   add sp,sp,t0
   li a0,4096
   addi a0,a0,-96
   add a0,sp,a0
   call consume

The problem is LRA's elimination code is not handling the case where we have
(plus (reg1) (reg2) where reg1 is an eliminable register and reg2 has a known
equivalency, particularly a constant.

If we can determine that reg2 is equivalent to a constant and treat (plus
(reg1) (reg2)) in the same way we'd treat (plus (reg1) (const_int)) then we can
get the desired code.

This eliminates about 19b instructions, or roughly 1% for deepsjeng on rv64.
There are improvements elsewhere, but they're relatively small.  This may
ultimately lessen the value of Manolis's fold-mem-offsets patch.  So we'll have
to evaluate that again once he posts a new version.

Bootstrapped and regression tested on x86_64 as well as bootstrapped on rv64.
Earlier versions have been tested against spec2017.  Pre-approved by Vlad in a
private email conversation (thanks Vlad!).

Committed to the trunk,

gcc/
* lra-eliminations.cc (eliminate_regs_in_insn): Use equivalences to
to help simplify code further.

(cherry picked from commit 6619b3d4c15cd754798b1048c67f3806bbcc2e6d)

[PATCH] RISC-V:add a more appropriate type attribute

Due to the more accurate type attribute added to the clz, ctz, and pcnt
operations in https://github.com/gcc-mirror/gcc/commit/07e2576d6f3 the
same type attribute should be used here.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*<bitmanip_optab>disi2_sext): Add a more
appropriate type attribute.

(cherry picked from commit 18befd6f050e70f11ecca1dd58624f0ee3c68cc7)

RISC-V: Add conditional unary neg/abs/not autovec patterns

Hi,

This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend.
For this C code:

void
test_3 (float *__restrict a, float *__restrict b, int *__restrict pred, int n)
{
  for (int i = 0; i < n; i += 1)
    {
      a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i];
    }
}

Before this patch:
        ...
        vsetvli a7,zero,e32,m1,ta,ma
        vfabs.v v2,v2
        vmerge.vvm      v1,v1,v2,v0
        ...

After this patch:
        ...
        vsetvli a7,zero,e32,m1,ta,mu
        vfabs.v v1,v2,v0.t
        ...

For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns
is enough.
For the FP abs pattern, We need to change the definition of `abs<mode>2` and
`@vcond_mask_<mode><vm>` pattern from define_expand to define_insn_and_split
in order to fuse them into a new pattern `*cond_abs<mode>` at the combine pass.
A fusion process similar to the one below:

(insn 30 29 31 4 (set (reg:RVVM1SF 152 [ vect_iftmp.15 ])
        (abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))) "float.c":15:56 discrim 1 12799 {absrvvm1sf2}
     (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ])
        (nil)))

(insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ])
        (if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ])
            (reg:RVVM1SF 152 [ vect_iftmp.15 ])
            (reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 12707 {vcond_mask_rvvm1sfrvvmf32bi}
     (expr_list:REG_DEAD (reg:RVVM1SF 152 [ vect_iftmp.15 ])
        (expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ])
            (expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ])
                (nil)))))
==>

(insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ])
        (if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ])
            (abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))
            (reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 13444 {*cond_absrvvm1sf}
     (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ])
        (expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ])
            (expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ])
                (nil)))))

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_abs<mode>): New combine pattern.
(*copysign<mode>_neg): Ditto.
* config/riscv/autovec.md (@vcond_mask_<mode><vm>): Adjust.
(<optab><mode>2): Ditto.
(cond_<optab><mode>): New.
(cond_len_<optab><mode>): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New.
(expand_cond_len_unop): New helper func.
* config/riscv/riscv-v.cc (shuffle_merge_patterns): Adjust.
(expand_cond_len_unop): New helper func.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.

(cherry picked from commit 92f2ec417c57e980b92b8966226fc2bfbf042af8)

RISC-V: Fix potential ICE of global vsetvl elimination

Committed for following VSETVL refactor patch to make V2 patch easier to review.
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(pass_vsetvl::global_eliminate_vsetvl_insn): Fix potential ICE.

(cherry picked from commit 3beef5e6b5b12b5c90040c8485f1836e2dd6cf83)

RISC-V: Fix VTYPE fuse rule bug

This bug is exposed after refactor patch.
Separate it and commited.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (ge_sew_ratio_unavailable_p):
Fix fuse rule bug.
* config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Ditto.

(cherry picked from commit 29487eb237b893c673e9ecc6409b175e22792f13)