gcc.gnu.org Git - gcc.git/log

RISC-V: Update Types for Vector Instructions

Adds types to vector instructions that were added after or were
missed by the original patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628594.html

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Update types
* config/riscv/autovec.md: likewise

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit aa512cc0146d1be957ccc35a8f4a45ebad0de598)

RISC-V: Enable RVV scalable vectorization by default[PR111311]

This patch is not ready but they all will be fixed very soon.

gcc/ChangeLog:

PR target/111311
* config/riscv/riscv.opt: Set default as scalable vectorization.

(cherry picked from commit 88a0a883960910530bfefa750461168f539f4a00)

RISC-V: Remove redundant functions

I just finished V2 version of LMUL cost model.
Turns out we don't these redundant functions.

Remove them.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (get_all_predecessors): Remove.
(get_all_successors): Ditto.
* config/riscv/riscv-v.cc (get_all_predecessors): Ditto.
(get_all_successors): Ditto.

(cherry picked from commit 48d4ab698036de859e194edc037faed2ef9b58a5)

RISC-V: Use dominance analysis in global vsetvl elimination

I found that it's more reasonable to use existing dominance analysis.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::global_eliminate_vsetvl_insn):
Use dominance analysis.
(pass_vsetvl::init): Ditto.
(pass_vsetvl::done): Ditto.

(cherry picked from commit 7f9083ffe262cb14c49d042fc6363514badea6cb)

RISC-V: Add VLS modes VEC_PERM support[PR111311]

This patch add VLS modes VEC_PERM support which fix these following
FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311:

FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 1

These FAILs are fixed after this patch.

PR target/111311

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS modes.
* config/riscv/riscv-protos.h (cmp_lmul_le_one): New function.
(cmp_lmul_gt_one): Ditto.
* config/riscv/riscv-v.cc (cmp_lmul_le_one): Ditto.
(cmp_lmul_gt_one): Ditto.
* config/riscv/riscv.cc (riscv_print_operand): Add VLS modes.
(riscv_vectorize_vec_perm_const): Ditto.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-7.c: New test.

(cherry picked from commit d05aac047e0643d5c32b706c4c3b12e13f35e19a)

RISC-V: Add missing VLS mask bool mode reg -> reg patterns

Committed.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (*mov<mode>_vls): New pattern.
* config/riscv/vector-iterators.md: New iterator

(cherry picked from commit 4ab2520ec424fa097ec839f2cde33522b220e93a)

RISC-V: Expand fixed-vlmax/vls vector permutation in targethook

When debugging FAIL: gcc.dg/pr92301.c execution test.
Realize a vls vector permutation situation failed to vectorize since early return false:

-  /* For constant size indices, we dont't need to handle it here.
-     Just leave it to vec_perm<mode>.  */
-  if (d->perm.length ().is_constant ())
-    return false;

To avoid more potential failed vectorization case. Now expand it in targethook.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_generic_patterns): Expand
fixed-vlmax/vls vector permutation.

(cherry picked from commit 108779056eb4b56e715a094fac48a699d2dc91b3)

RISC-V: Avoid unnecessary slideup in compress pattern of vec_perm

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_compress_patterns): Avoid unnecessary slideup.

(cherry picked from commit e390872aebcfebb7c9bc95d8fb7e44f2bec996d3)

RISC-V: Fix dump FILE of VSETVL PASS[PR111311]

To make the dump FILE not too big, add TDF_DETAILS.

This patch fix these following FAILs in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

FAIL: gcc.c-torture/unsorted/dump-noaddr.c.*r.vsetvl, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions comparison
FAIL: gcc.c-torture/unsorted/dump-noaddr.c.*r.vsetvl, -O3 -g comparison

gcc/ChangeLog:

PR target/111311
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::vsetvl_fusion): Add TDF_DETAILS.
(pass_vsetvl::pre_vsetvl): Ditto.
(pass_vsetvl::init): Ditto.
(pass_vsetvl::lazy_vsetvl): Ditto.

(cherry picked from commit 0d50facd937bda26e3083046dc5dec8fca47e1e6)

RISC-V: Fix VLS floating-point operations predicate

VLS vfadd should depend on ZVFH instead of ZVFHMIN.
Obvious fix and committed.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix floating-point operations predicate.

(cherry picked from commit df9a25384e6c484643b48b59b4e6e07504889b61)

Support folding min(poly,poly) to const

This patch adds support that tries to fold `MIN (poly, poly)` to
a constant. Consider the following C Code:

```
void foo2 (int* restrict a, int* restrict b, int n)
{
    for (int i = 0; i < 3; i += 1)
      a[i] += b[i];
}
```

Before this patch:

```
void foo2 (int * restrict a, int * restrict b, int n)
{
  vector([4,4]) int vect__7.27;
  vector([4,4]) int vect__6.26;
  vector([4,4]) int vect__4.23;
  unsigned long _32;

  <bb 2> [local count: 268435456]:
  _32 = MIN_EXPR <3, POLY_INT_CST [4, 4]>;
  vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, _32, 0);
  vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, _32, 0);
  vect__7.27_9 = vect__6.26_15 + vect__4.23_20;
  .MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, _32, 0, vect__7.27_9); [tail call]
  return;

}
```

After this patch:

```
void foo2 (int * restrict a, int * restrict b, int n)
{
  vector([4,4]) int vect__7.27;
  vector([4,4]) int vect__6.26;
  vector([4,4]) int vect__4.23;

  <bb 2> [local count: 268435456]:
  vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, 3, 0);
  vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, 3, 0);
  vect__7.27_9 = vect__6.26_15 + vect__4.23_20;
  .MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, 3, 0, vect__7.27_9); [tail call]
  return;

}
```

For RISC-V RVV, csrr and branch instructions can be reduced:

Before this patch:

```
foo2:
        csrr    a4,vlenb
        srli    a4,a4,2
        li      a5,3
        bleu    a5,a4,.L5
        mv      a5,a4
.L5:
        vsetvli zero,a5,e32,m1,ta,ma
        ...
```

After this patch.

```
foo2:
vsetivli zero,3,e32,m1,ta,ma
        ...
```

gcc/ChangeLog:

* fold-const.cc (can_min_p): New function.
(poly_int_binop): Try fold MIN_EXPR.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/div-1.c: Adjust.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/fold-min-poly.c: New test.

(cherry picked from commit 7547f65f60c0bbf8de704c569c92c7a0e31a6175)

riscv: xtheadbb: Fix extendqi<SUPERQI> insn

Recently three SPEC CPU 2017 benchmarks broke when using xtheadbb:
* 500.perlbench_r
* 525.x264_r
* 557.xz_r

Tracing the issue down revealed, that we emit a 'th.ext xN,xN,15,0'
for a extendqi<SUPERQI> insn, which is obviously wrong.
This patch splits the common 'extend<SHORT:mode><SUPERQI:mode>2_th_ext'
insn into two 'extendqi<SUPERQI>' and 'extendhi<SUPERQI>' insns,
which emit the right extension instruction.
Additionally, this patch adds test cases for these insns.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext):
Remove broken INSN.
(*extendhi<SUPERQI:mode>2_th_ext): New INSN.
(*extendqi<SUPERQI:mode>2_th_ext): New INSN.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-ext-2.c: New test.
* gcc.target/riscv/xtheadbb-ext-3.c: New test.

(cherry picked from commit d8bdc978dc9cd4a6210997edacedb954375af70d)

riscv: thead: Fix mode attribute for extension patterns

The mode attribute of an extension pattern is usually set to the target type.
Let's follow this convention consistently for xtheadbb.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/thead.md: Use more appropriate mode attributes
for extensions.

(cherry picked from commit 0e25761b373f075a41d43b9462366a653dbf1121)

riscv: bitmanip: Remove duplicate zero_extendhi<GPR:mode>2 pattern

We currently have two identical zero_extendhi<GPR:mode>2 patterns:
* '*zero_extendhi<GPR:mode>2_zbb'
* '*zero_extendhi<GPR:mode>2_bitmanip'

This patch removes the *_zbb pattern and ensures that all sign- and
zero-extensions use the postfix '_bitmanip'.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/bitmanip.md (*extend<SHORT:mode><SUPERQI:mode>2_zbb):
Rename postfix to _bitmanip.
(*extend<SHORT:mode><SUPERQI:mode>2_bitmanip): Renamed pattern.
(*zero_extendhi<GPR:mode>2_zbb): Remove duplicated pattern.

(cherry picked from commit 0c37fef39fa0a8f77ea4fc67d1bbf5067d4bddb9)

RISC-V: Suppress bogus warning for VLS types

This patch fixes over 100+ bogus FAILs due to experimental vector ABI warning.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_pass_in_vector_p): Only allow RVV type.

(cherry picked from commit a0e042d61dadc6bdcbeaa3b712b7a83415a12547)

RISC-V: Fix incorrect nregs calculation for VLS modes

This patch fixes obvious bug: TARGET_MIN_VLEN is bitsize.

All these following bugs are fixed with this patch:
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O0  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O0  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O1  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O1  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O3 -g  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O3 -g  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -Os  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -Os  (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/mov-13.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/mov-13.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-1.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-1.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-2.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-3.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-3.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-4.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-4.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-5.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-5.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-6.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-6.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-sp-adjust.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-sp-adjust.c (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_hard_regno_nregs): Fix bug.

(cherry picked from commit f9cb357ae962ba2922b8507f4d96227780a063b9)

RISC-V: Add VLS mask modes mov patterns

This patterns fix these following ICE FAILs when running the whole GCC testsuite
with enabling scalable vector by default.

All of these FAILs are fixed:
FAIL: c-c++-common/opaque-vector.c  -std=c++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -std=c++14 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -std=c++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -std=c++17 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -std=c++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -std=c++20 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -std=c++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -std=c++98 (test for excess errors)
FAIL: c-c++-common/pr105998.c  -std=c++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -std=c++14 (test for excess errors)
FAIL: c-c++-common/pr105998.c  -std=c++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -std=c++17 (test for excess errors)
FAIL: c-c++-common/pr105998.c  -std=c++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -std=c++20 (test for excess errors)
FAIL: c-c++-common/pr105998.c  -std=c++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -std=c++98 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -std=c++14 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -std=c++17 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -std=c++20 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -std=c++98 (test for excess errors)
FAIL: g++.dg/ext/vector36.C  -std=gnu++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/ext/vector36.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/ext/vector36.C  -std=gnu++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/ext/vector36.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/ext/vector36.C  -std=gnu++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/ext/vector36.C  -std=gnu++20 (test for excess errors)
FAIL: g++.dg/ext/vector36.C  -std=gnu++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/ext/vector36.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/pr58950.C  -std=gnu++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/pr58950.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/pr58950.C  -std=gnu++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/pr58950.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/pr58950.C  -std=gnu++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/pr58950.C  -std=gnu++20 (test for excess errors)
FAIL: g++.dg/pr58950.C  -std=gnu++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/pr58950.C  -std=gnu++98 (test for excess errors)
FAIL: c-c++-common/torture/builtin-shufflevector-2.c   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/torture/vector-compare-2.c   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/torture/vector-compare-2.c   -O0  (test for excess errors)
FAIL: g++.dg/torture/pr104450.C   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/torture/pr104450.C   -O0  (test for excess errors)

FAIL: gcc.dg/analyzer/pr96713.c (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/analyzer/pr96713.c (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -Wc++-compat  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/pr105998.c  -Wc++-compat  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -Wc++-compat  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -Wc++-compat  (test for excess errors)
FAIL: gcc.dg/pr100239.c (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/pr100239.c (test for excess errors)
FAIL: gcc.dg/pr97238.c (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/pr97238.c (test for excess errors)
FAIL: c-c++-common/torture/builtin-shufflevector-2.c   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/torture/pr70310.c   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/torture/pr70310.c   -O0  (test for excess errors)

gcc/ChangeLog:

* config/riscv/autovec-vls.md: Add VLS mask modes mov patterns.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md: Ditto.

(cherry picked from commit 6aba1fa7a4ceaf66adc587da23834d1f317f871d)

RISC-V: Remove incorrect earliest vsetvl post optimization[PR111313]

This patch removes the incorrect earliest poset vsetvl optimization,
such bug was found in vect-double-reduc-5.c which is runtime(execution fail) and also in PR111313.

For VLMAX intrinsics, we always emit a bogus patter which is vlmax_avl (see vector.md) to
occupy a scalar register which is used by the following RVV instruction which is VLMAX AVL.

Then for O2, O3, Ofast, earliest LCM works so well.
However, for O1, the vlmax_avl is not well optimized in the before pass which confused LCM earliest
so that we will end up with some redundant vsetvli zero,zero instructions in O1. (Note that O2 O3 Ofast are all good).

To elide those redundant vsetvli zero,zero, I added cleanup_earliest_vsetvls to elide those redundant vsetvls.

Now, after I review the implementation of this post optimizaiton again, I found it is incorrect and it is hard to
do the post optimizations for vsetvls that earliest LCM failed to eliminate.

Besides, such performance issues only happen in O1 or O0, such issues may not be serious.
So remove it and we may will find another way (E.g. adjust vlmax_avl pattern COST)
to optimize it if we really need to care about performance for O1.

PR target/111313

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::cleanup_earliest_vsetvls): Remove.
(pass_vsetvl::df_post_optimization): Remove incorrect function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-13.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-17.c: Skip check for O1.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-18.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-19.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-20.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-13.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-16.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-17.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-18.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-19.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-20.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-21.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-22.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-23.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-24.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-25.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-26.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-27.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-28.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111313.c: New test.

(cherry picked from commit 572abb52f5761a647035ee39d0e443c1c3622e75)

RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

'XVentanaCondOps' is a vendor extension from Ventana Micro Systems
containing two instructions for conditional move and will be supported on
their Veyron V1 CPU.

And most notably (for historical reasons), 'XVentanaCondOps' and the
standard 'Zicond' extension are functionally equivalent (only encodings and
instruction names are different).

*   czero.eqz == vt.maskc
*   czero.nez == vt.maskcn

This commit adds support for the 'XVentanaCondOps' extension by extending
'Zicond' extension support.  With this, we can now reuse the optimization
using the 'Zicond' extension for the 'XVentanaCondOps' extension.

The specification for the 'XVentanaCondOps' extension is based on:
<https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.1/ventana-custom-extensions-v1.0.1.pdf>

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_flag_table):
Parse 'XVentanaCondOps' extension.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): New.
(TARGET_XVENTANACONDOPS): Ditto.
(TARGET_ZICOND_LIKE): New to represent targets with conditional
moves like 'Zicond'.  It includes RV64 + 'XVentanaCondOps'.
* config/riscv/riscv.cc (riscv_rtx_costs): Replace TARGET_ZICOND
with TARGET_ZICOND_LIKE.
(riscv_expand_conditional_move): Ditto.
* config/riscv/riscv.md (mov<mode>cc): Replace TARGET_ZICOND with
TARGET_ZICOND_LIKE.
* config/riscv/riscv.opt: Add new riscv_xventana_subext.
* config/riscv/zicond.md: Modify description.
(eqz_ventana): New to match corresponding czero instructions.
(nez_ventana): Ditto.
(*czero.<eqz>.<GPR><X>): Emit a 'XVentanaCondOps' instruction if
'Zicond' is not available but 'XVentanaCondOps' + RV64 is.
(*czero.<eqz>.<GPR><X>): Ditto.
(*czero.eqz.<GPR><X>.opt1): Ditto.
(*czero.nez.<GPR><X>.opt2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-primitiveSemantics.c: New test,
* gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c: New
test to make sure that XVentanaCondOps instructions are disabled
on RV32.
* gcc.target/riscv/xventanacondops-xor-01.c: New test,

(cherry picked from commit af88776caa20342482b11ccb580742a46c621250)

RISC-V: Fix incorrect mode tieable which cause ICE in RA[PR111296]

This patch fix incorrect mode tieable between DI and V2SI which cause ICE
in RA.

gcc/ChangeLog:

PR target/111296
* config/riscv/riscv.cc (riscv_modes_tieable_p): Fix incorrect mode
tieable for RVV modes.

gcc/testsuite/ChangeLog:

PR target/111296
* g++.target/riscv/rvv/base/pr111296.C: New test.

(cherry picked from commit 6b96de22d6bcadb45530c1898b264e4738afa4fd)

RISC-V: Fix VSETVL PASS AVL/VL fetch bug[111295]

Fix bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111295

gcc/ChangeLog:

PR target/111295
* config/riscv/riscv-vsetvl.cc (insert_vsetvl): Bug fix.

gcc/testsuite/ChangeLog:

PR target/111295
* gcc.target/riscv/rvv/autovec/pr111295.c: New test.

(cherry picked from commit 1b4c70d4271a00514ae20970d483c3b78d9d66ef)

RISC-V: Remove unreasonable TARGET_64BIT for VLS modes with size = 64bit

Previously, I add TARGET_64BIT condtion to block VLS modes with size = 64bit in RV32 system
E.g. V8QI

Since I realized such modes may cause inferior codegen for some situations in RV32 system.

However, this is really quite ugly and it cause ICE for some cases in RV32:

FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (internal compiler error: in require, at machmode.h:313)
3937FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (test for excess errors)

For inferior codegen in RV32 system, we should try another reasonable approach to fix it.

Remove those TARGET_64BIT and fix ICE.

gcc/ChangeLog:

* config/riscv/riscv-vector-switch.def (VLS_ENTRY): Remove TARGET_64BIT

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl1024b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl2048b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl256b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl4096b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl512b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl1024b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl2048b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl256b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl4096b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl512b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: Ditto.

(cherry picked from commit ee21f79f72980732214156bae2eb5daf7e089bda)

RISC-V: Fix incorrect folder for VRGATHERI16 test case

Put the test file to the incorrect folder, this patch would like to
fix it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/intrisinc-vrgatherei16.c: Moved to...
* gcc.target/riscv/rvv/base/intrisinc-vrgatherei16.c: ...here.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit 0574a19047fa66f26a38e79c1b9ae6a8207bba89)

riscv: xtheadbb: Fix xtheadbb-li-rotr test for rv32

The test was introduced recently and tests a RV64-only feature.
However, when testing an RV32 compiler, the test gets executed as well
and fails with "cc1: error: ABI requires '-march=rv32'".
This patch fixes this by adding '-mabi=lp64' (like it is done for
other RV64-only tests as well).

Retested with RV32 and RV64 to ensure this won't pop up again.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: Don't run for RV32.

(cherry picked from commit 57d1c9c1fe57a0de66e5c20538f77f49b1298071)

RISC-V: Keep vlmax vector operators in simple form until split1 pass

This patch keep vlmax vector pattern in simple before split1 pass which
will allow more optimization (e.g. combine) before split1 pass.
This patch changes the vlmax pattern in autovec.md to define_insn_and_split
as much as possible and clean up some combine patterns that are no longer needed.
This patch also fixed PR111232 bug which was caused by a combined failed.

PR target/111232

gcc/ChangeLog:

* config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>):
Delete.
(*pred_widen_mulsu<mode>): Delete.
(*pred_single_widen_mul<mode>): Delete.
(*dual_widen_<any_widen_binop:optab><any_extend:su><mode>):
Add new combine patterns.
(*single_widen_sub<any_extend:su><mode>): Ditto.
(*single_widen_add<any_extend:su><mode>): Ditto.
(*single_widen_mult<any_extend:su><mode>): Ditto.
(*dual_widen_mulsu<mode>): Ditto.
(*dual_widen_mulus<mode>): Ditto.
(*dual_widen_<optab><mode>): Ditto.
(*single_widen_add<mode>): Ditto.
(*single_widen_sub<mode>): Ditto.
(*single_widen_mult<mode>): Ditto.
* config/riscv/autovec.md (<optab><mode>3):
Change define_expand to define_insn_and_split.
(<optab><mode>2): Ditto.
(abs<mode>2): Ditto.
(smul<mode>3_highpart): Ditto.
(umul<mode>3_highpart): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-4.c: Add more testcases.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111232.c: New test.

(cherry picked from commit 9ee40b9a7bee83394fc7ba6fef71cb76d91b49c8)

RISC-V: Part-3: Output .variant_cc directive for vector function

Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual[1] and
RISC-V ELF Specification[2].

[1] https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops
[2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#dynamic-linking

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_declare_function_name): Add protos.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.cc (riscv_asm_output_variant_cc):
Output .variant_cc directive for vector function.
(riscv_declare_function_name): Ditto.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.h (ASM_DECLARE_FUNCTION_NAME):
Implement ASM_DECLARE_FUNCTION_NAME.
(ASM_OUTPUT_DEF_FROM_DECLS): Implement ASM_OUTPUT_DEF_FROM_DECLS.
(ASM_OUTPUT_EXTERNAL): Implement ASM_OUTPUT_EXTERNAL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: New test.

(cherry picked from commit 4abcc5009c1ad852e235f368f535c0bf6bfa7697)

RISC-V: Part-2: Save/Restore vector registers which need to be preversed

Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.

gcc/ChangeLog:

* config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): Pass riscv_cc.
* config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
(riscv_frame_info::reset): Reset new fileds.
(riscv_call_tls_get_addr): Pass riscv_cc.
(riscv_function_arg): Return riscv_cc for call patterm.
(get_riscv_cc): New function return riscv_cc from rtl call_insn.
(riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
(riscv_save_reg_p): Add vector callee-saved check.
(riscv_stack_align): Add vector save area comment.
(riscv_compute_frame_info): Ditto.
(riscv_restore_reg): Update for type change.
(riscv_for_each_saved_v_reg): New function save vector registers.
(riscv_first_stack_step): Handle funciton with vector callee-saved registers.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_output_mi_thunk): Pass riscv_cc.
(TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
* config/riscv/riscv.h (get_riscv_cc): Export get_riscv_cc function.
* config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.

(cherry picked from commit fdd59c0f73e9e681cd5f4d0eee2dd58d60d8dbe1)

RISC-V: Part-1: Select suitable vector registers for vector type args and returns

I post the vector register calling convention rules from in the proposal[1]
directly here:

v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.

Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.

Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.

The rules for passing vector arguments are as follows:

1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.

2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.

3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.

NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.

Vector values are returned in the same manner as the first named argument of
the same type would be passed.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389

gcc/ChangeLog:

* config/riscv/riscv-protos.h (builtin_type_p): New function for checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector arguments.
(riscv_arguments_is_vector_type_p): New function for check vector returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: New test.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return.c: New test.

(cherry picked from commit 94a4b93292f8ab19910c844bb9b63e4a68b55d33)

RISC-V: Add conditional sqrt autovec pattern

This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><mode>):
Add sqrt + vcond_mask combine pattern.
* config/riscv/autovec.md (<optab><mode>2):
Change define_expand to define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.

(cherry picked from commit c1597e7fb9f9ecb9d7c33b5afa48031f284375de)

RISC-V: typo: add closing paren to a comment

gcc/ChangeLog:

* config/riscv/zicond.md: Add closing parent to a comment.

(cherry picked from commit 254100a9a003a16255a58eec3fa24168e6dc7124)

RISC-V: Fix Zicond ICE on large constants

Large constant cons and/or alt will trigger ICEs building GCC target
libraries (libgomp and libatomic) when the 'Zicond' extension is enabled.

For instance, zicond-ice-2.c (new test case in this commit) will cause
an ICE when SOME_NUMBER is 0x1000 or larger.  While opposite numbers
corresponding cons/alt (two temp2 variables) are checked, cons/alt
themselves are not checked and causing 2 ICEs building
GCC target libraries as of this writing:

1.  gcc/libatomic/config/posix/lock.c
2.  gcc/libgomp/fortran.c

Coercing a large value into a register will fix the issue.

It also coerce a large cons into a register on "imm, imm" case (the author
could not reproduce but possible to cause an ICE).

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Force
large constant cons/alt into a register.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ice-2.c: New test.  This is based on
an ICE at libat_lock_n func on gcc/libatomic/config/posix/lock.c
but heavily minimized.

(cherry picked from commit ce65641354d98fc80912d5516b7fea87c344c2cc)

riscv: Synthesize all 11-bit-rotate constants with rori

Some constants can be built up using LI+RORI instructions.
The current implementation requires one of the upper 32-bits
to be a zero bit, which is not neccesary.
Let's drop this requirement in order to be able to synthesize
a constant like 0xffffffff00ffffffL.

The tests for LI+RORI are made more strict to detect regression
in the calculation of the LI constant and the rotation amount.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Don't
require one zero bit in the upper 32 bits for LI+RORI synthesis.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: New tests.
* gcc.target/riscv/zbb-li-rotr.c: Likewise.

(cherry picked from commit 102dd3e8067f12beee1b8b0bec6848733d107aee)

RISC-V: Expose bswapsi for TARGET_64BIT

Various bswapsi tests are failing for rv64.  More importantly, we're generating
crappy code.

Let's take the first test from bswapsi-1.c as an example.

> typedef unsigned int uint32_t;
>
> #define __const_swab32(x) ((uint32_t)(                                \
>         (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) |            \
>         (((uint32_t)(x) & (uint32_t)0x0000ff00UL) <<  8) |            \
>         (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >>  8) |            \
>         (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))
>
> /* This byte swap implementation is used by the Linux kernel and the
>    GNU C library.  */
>
> uint32_t
> swap32_a (uint32_t in)
> {
>   return __const_swab32 (in);
> }
>
>
>

We generate this for rv64gc_zba_zbb_zbs:

>         srliw   a1,a0,24
>         slliw   a5,a0,24
>         slliw   a3,a0,8
>         li      a2,16711680
>         li      a4,65536
>         or      a5,a5,a1
>         and     a3,a3,a2
>         addi    a4,a4,-256
>         srliw   a0,a0,8
>         or      a5,a5,a3
>         and     a0,a0,a4
>         or      a0,a5,a0
>         retUrgh!

After this patch we generate:

>         rev8    a0,a0
>         srai    a0,a0,32
>         ret
Clearly better.

The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that the
RTL expanders already know how to widen a bswap, which is definitely true.  But
it's the case that failure to expose a bswapsi will cause the 32bit bswap
optimizations in gimple store merging to not trigger.  Thus we get crappy code.

To fix this we expose bswapsi on TARGET_64BIT.  gimple-store-merging then
detects the 32bit bswap idioms and generates suitable __builtin calls.  The
expander will "FAIL" expansion for TARGET_64BIT which forces the generic
expander code to synthesize the operation (we could synthesize in here, but
that'd result in duplicate code).

Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the testsuite
without any regressions.

gcc/
* config/riscv/bitmanip.md (bswapsi2): Expose for TARGET_64BIT.

(cherry picked from commit fbc01748ba46eb26074388a8fb7b44d25a414a72)

RISC-V: Add Types to Un-Typed Risc-v Instructions

Updates risc-v instructions to ensure that no instruction is left
without a type attribute. Added new types "trap" and "cbo" (for
cache related instructions)

Tested for regressions using rv32/64 multilib with newlib/linux and
rv32/64 gcv for linux.

gcc/Changelog:

* config/riscv/riscv.md: Update/Add types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit decbf9ec81f33052be12296b89cd86ea65ae10da)

RISC-V: Add Types to Un-Typed Pic Instructions

Updates pic instructions to ensure that no instruction is left
without a type attribute.

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/pic.md: Update types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit c85db606d46774283ca4ec037dc3051719828f41)

riscv: xtheadbb: Enable constant synthesis with th.srri

Some constants can be built up using rotate-right instructions.
The code that enables this can be found in riscv_build_integer_1().
However, this functionality is only available for Zbb, which
includes the rori instruction. This patch enables this also for
XTheadBb, which includes the th.srri instruction.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Enable constant
synthesis with rotate-right for XTheadBb.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: New test.

(cherry picked from commit af5cb06ec17780736749ed51cfc6dfad9397156c)

RISC-V: zicond: Fix opt2 pattern

Fixes: 1d5bc3285e8a ("[committed][RISC-V] Fix 20010221-1.c with zicond")
This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since in
failing case, pattern semantics were not matching with asm czero.nez

We start with the following src code snippet:

      if (a == 0)
return 0;
      else
return x;
    }

which is equivalent to:  "x = (a != 0) ? x : a" where x is NOT 0.
                                                ^^^^^^^^^^^^^^^^

and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|        (if_then_else:DI (ne (reg/v:DI 134 [ a ])
|                (const_int 0 [0]))
|            (reg/v:DI 136 [ x ])
|            (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
    czero.nez x, x, a   ; %0, %2, %1

which implies
    "x = (a != 0) ? 0 : a"

clearly not what the pattern wants to do.

Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez if X
is not guaranteed to be 0.

However this can be fixed with a small tweak

"x = (a != 0) ? x : a"

   is same as

"x = (a == 0) ? a : x"

and since middle operand is 0 when a == 0, it is equivalent to

"x = (a == 0) ? 0 : x"

which can be expressed with CZERO.eqz

before fix after fix
----------------- -----------------
li        a5,1         li        a5,1
ld        a4,8(sp) ld        a4,8(sp)
czero.nez a0,a4,a5 czero.eqz a0,a4,a5

The issue only happens at -O1 as at higher optimization levels, the
whole conditional move gets optimized away.

This fixes 4 testsuite failues in a zicond build:

FAIL: gcc.c-torture/execute/pr60003.c   -O1  execution test
FAIL: gcc.dg/setjmp-3.c execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1  execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1 -fpic execution test

gcc/ChangeLog:
* config/riscv/zicond.md: Fix op2 pattern.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit e87212ead5e9f36945b5e2d290187e2adca34da5)

RISC-V: Emit .note.GNU-stack for non-linux target as well

We only emit that on linux target before, that not problem before,
however Qemu has fix a bug to make qemu user mode honor PT_GNU_STACK[1],
that will cause problem when we test baremetal with qemu.

So the straightforward is enable that as well for non-linux toolchian,
the price is that will increase few bytes for each binary.

[1] https://github.com/qemu/qemu/commit/872f3d046f2381e3f416519e82df96bd60818311

gcc/ChangeLog:

* config/riscv/linux.h (TARGET_ASM_FILE_END): Move ...
* config/riscv/riscv.cc (TARGET_ASM_FILE_END): to here.

(cherry picked from commit fba0f47e4617e164716d3bce587fc6948088e225)

RISC-V: Support FP SGNJ autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.

Given below code example:

void test(float * restrict out, float * restrict in1, float * restrict in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = __builtin_copysignf (in1[i], in2[i]);
}

Before this patch:
test:
  csrr    a4,vlenb
  slli    a4,a4,1
  li      a5,128
  bleu    a5,a4,.L2
  mv      a5,a4
.L2:
  vsetvli zero,a5,e32,m8,ta,ma
  vle32.v v8,0(a1)
  vle32.v v16,0(a2)
  vsetvli a4,zero,e32,m8,ta,ma
  vfsgnj.vv       v8,v8,v16
  vsetvli zero,a5,e32,m8,ta,ma
  vse32.v v8,0(a0)
  ret

After this patch:
test:
  li      a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vle32.v v2,0(a2)
  vfsgnj.vv       v1,v1,v2
  vse32.v v1,0(a0)
  ret

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (copysign<mode>3): New pattern.
* config/riscv/vector.md: Extend iterator for VLS.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: New macro.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c: New test.

(cherry picked from commit a7b048c0f42198a0f8d4244f1bd25211cf48383f)

RISC-V: Export functions as global extern preparing for dynamic LMUL patch use

Notice those functions need to be use by COST model for dynamic LMUL use.
Extract as a single patch and committed.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (lookup_vector_type_attribute): Export global.
(get_all_predecessors): New function.
(get_all_successors): Ditto.
* config/riscv/riscv-v.cc (get_all_predecessors): Ditto.
(get_all_successors): Ditto.
* config/riscv/riscv-vector-builtins.cc (sizeless_type_p): Export global.
* config/riscv/riscv-vsetvl.cc (get_all_predecessors): Remove it.

(cherry picked from commit 509c10a62546b9b3430040e455b7258322a024e6)

riscv: xtheadcondmov: Don't run tests with -Oz

Recently, these xtheadcondmov tests regressed with -Oz:
* FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c
* FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c
* FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c
* FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c

As -Oz stands for "Optimize aggressively for size rather than speed.",
we need to inspect the generated code, which looks like this:

  -Oz
  0000000000000000 <not_int_int>:
     0:   e199                    bnez    a1,6 <.L2>
     2:   40100513                li      a0,1025
  0000000000000006 <.L2>:
     6:   8082                    ret

  -O2:
  0000000000000000 <not_int_int>:
     0:   40100793                li      a5,1025
     4:   40b7950b                th.mveqz        a0,a5,a1
     8:   8082                    ret

As the generated code with -Oz consumes less size, there is nothing
wrong in the code generation. Instead, let's not run the xtheadcondmov
tests with -Oz.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c: Disable for -Oz.
* gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-eqz.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-not.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-cond.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-nez.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit 8451fbd56871267e8c1cd781db6d8f02e826f66c)

RISC-V: Fix Dynamic LMUL compile option

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Fix Dynamic status.
* config/riscv/riscv-v.cc (preferred_simd_mode): Ditto.
(autovectorize_vector_modes): Ditto.
(vectorize_related_mode): Ditto.

(cherry picked from commit 6f94ef6c86074a8348ec21d8aade04ce67b4e292)

RISC-V: Support FP16 for RVV VRGATHEREI16 intrinsic

This patch would like to add FP16 support for the VRGATHEREI16
intrinsic. Aka:

* __riscv_vrgatherei16_vv_f16mf4
* __riscv_vrgatherei16_vv_f16mf4_m

As well as f16mf2 to f16m8 types.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add FP16 intrinsic def.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
(vfloat16m8_t): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/intrisinc-vrgatherei16.c: New test.

(cherry picked from commit d99a868a9b100ab5a4b270a1acece60b5b6153a3)

RISC-V: Support FP MAX/MIN autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.

Given below code example:

test (float *out, float *in1, float *in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = in1[i] > in2[i] ? in1[i] : in2[i];
    // Or out[i] = fmax (in1[i], in2[i]);
}

Before this patch:
test:
  csrr    a4,vlenb
  slli    a4,a4,1
  li      a5,128
  bleu    a5,a4,.L2
  mv      a5,a4
.L2:
  vsetvli zero,a5,e32,m8,ta,ma
  vle32.v v16,0(a1)
  vle32.v v8,0(a2)
  vsetvli a3,zero,e32,m8,ta,ma
  vmfgt.vv        v0,v16,v8
  vmerge.vvm      v8,v8,v16,v0
  vsetvli zero,a5,e32,m8,ta,ma
  vse32.v v8,0(a0)
  ret

After this patch:
test:
  li      a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vle32.v v2,0(a2)
  vfmax.vv        v1,v1,v2
  vse32.v v1,0(a0)
  ret

This MAX/MIN autovec acts on function call like fmaxf/fmax in math.h
too. And it depends on the option -ffast-math.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
fmax/fmin
* config/riscv/vector.md: Add VLS modes to vfmax/vfmin.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: New macros.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c: New test.

(cherry picked from commit a7d052b3200c7928d903a0242b8cfd75d131e374)

RISC-V: Add conditional autovec convert(INT<->FP) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><mode><vconvert>):
New combine pattern.
(*cond_<float_cvt><vconvert><mode>): Ditto.
(*cond_<optab><vnconvert><mode>): Ditto.
(*cond_<float_cvt><vnconvert><mode>): Ditto.
(*cond_<optab><mode><vnconvert>): Ditto.
(*cond_<float_cvt><mode><vnconvert>2): Ditto.
* config/riscv/autovec.md (<optab><mode><vconvert>2): Adjust.
(<float_cvt><vconvert><mode>2): Adjust.
(<optab><vnconvert><mode>2): Adjust.
(<float_cvt><vnconvert><mode>2): Adjust.
(<optab><mode><vnconvert>2): Adjust.
(<float_cvt><mode><vnconvert>2): Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add INT->FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c: New test.

(cherry picked from commit 258af9c7004cdc7963f783dd510404e79f0b5362)

RISC-V: Add conditional autovec convert(FP<->FP) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_extend<v_double_trunc><mode>):
New combine pattern.
(*cond_trunc<mode><v_double_trunc>): Ditto.
* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c: New test.

(cherry picked from commit 75a243c7c7c7efa9f12038480b46260ada739202)

RISC-V: Add conditional autovec convert(INT<->INT) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><v_double_trunc><mode>):
New combine pattern.
(*cond_<optab><v_quad_trunc><mode>): Ditto.
(*cond_<optab><v_oct_trunc><mode>): Ditto.
(*cond_trunc<mode><v_double_trunc>): Ditto.
* config/riscv/autovec.md (<optab><v_quad_trunc><mode>2): Adjust.
(<optab><v_oct_trunc><mode>2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New test.

(cherry picked from commit a1e5fd2c9adc35ef435dcc96991320d69453919a)

RISC-V: Adjust expand_cond_len_{unary,binop,op} api

This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code`
to `unsigned icode` and use the icode directly to determine whether the
rounding_mode operand is required.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-protos.h (expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Ditto.
(expand_cond_len_op): Ditto.
(expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
(expand_cond_len_ternop): Ditto.

(cherry picked from commit 4d1c8b04ec8731b57ddbc80d76e40a61d8fa3324)

RISC-V: Enable VECT_COMPARE_COSTS by default

since we have added COST framework, we by default enable VECT_COMPARE_COSTS.

Also, add 16/32/64 to provide more choices for COST comparison.

This patch doesn't change any behavior from the current testsuite since we are using
default COST model.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (autovectorize_vector_modes): Enable
VECT_COMPARE_COSTS by default.

(cherry picked from commit 5f2098cce6c75117927fef317c714dd2088b0189)

RISC-V: Add vec_extract for BI -> QI.

This patch adds a vec_extract expander that extracts a QImode from a
vector mask mode.  In doing so, it helps recognize a "live
operation"/extract last idiom for mask modes.  It fixes the ICE in
tree-vect-live-6.c by circumventing the fallback code in
extract_bit_field_1.  The problem there is still latent, though, and
needs to be addressed separately.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract<mode>qi): New expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/live-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/live_run-2.c: New test.

(cherry picked from commit ffbb19c6afc016f6dc001ad0f567d3216ff601b1)

testsuite/vect: Make match patterns more accurate.

On some targets we fail to vectorize with the first type the vectorizer
tries but succeed with the second. This patch changes several regex
patterns to reflect that behavior.

Before we would look for a single occurrence of e.g.
"vect_recog_dot_prod_pattern" but would possible have two (one for each
attempted mode). The new pattern tries to match sequences where we
first have a "vect_recog_dot_prod_pattern" and a "succeeded" afterwards
while making sure there is no "failed" or "Re-trying" in between.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-outer-4c-big-array.c: Adjust regex pattern.
* gcc.dg/vect/vect-reduc-dot-s16a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8b.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Ditto.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Ditto.

(cherry picked from commit e40edf6499576993862801640227e076b868241b)

RISC-V: Add dynamic LMUL compile option

We are going to support dynamic LMUL support.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Add
dynamic enum.
* config/riscv/riscv.opt: Add dynamic compile option.

(cherry picked from commit ef4e916b526a65411a577126d34c3b0bb97b6111)

RISC-V: Support FP ADD/SUB/MUL/DIV autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation ADD/SUB/MUL/DIV.

Given below code example:

test (float *out, float *in1, float *in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = in1[i] + in2[i];
}

Before this patch:
test:
  csrr a4,vlenb
  slli a4,a4,1
  li   a5,128
  bleu a5,a4,.L38
  mv   a5,a4
.L38:
  vsetvli  zero,a5,e32,m8,ta,ma
  vle32.v  v16,0(a1)
  vsetvli  a4,zero,e32,m8,ta,ma
  vmv.v.i  v8,0
  vsetvli  zero,a5,e32,m8,tu,ma
  vle32.v  v24,0(a2)
  vfadd.vv v8,v24,v16
  vse32.v  v8,0(a0)
  ret

After this patch:
test:
  li       a5,128
  vsetvli  zero,a5,e32,m1,ta,ma
  vle32.v  v1,0(a2)
  vle32.v  v2,0(a1)
  vfadd.vv v1,v1,v2
  vse32.v  v1,0(a0)
  ret

Please note this patch also fix the execution failure of below
vect test cases.

* vect-alias-check-10.c
* vect-alias-check-11.c
* vect-alias-check-12.c
* vect-alias-check-14.c

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
vls floating-point autovec.
* config/riscv/vector-iterators.md: New iterator for
floating-point V and VLS.
* config/riscv/vector.md: Add VLS to floating-point binop.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h:
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c: New test.

(cherry picked from commit ed60ffd814c86a225a4586da649f6e76718490db)

RISC-V: Support rounding mode for VFNMADD/VFNMACC autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfnmadd     <- autovec/autovec-opt

The autovec generated vfnmadd should take DYN mode, and the
frm must be restored before the vfnmadd insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfnmadd     <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmadd/vfnmacc.
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-4.c: New test.

(cherry picked from commit af0c625f6085567522cf55b2ced05f07ec7be67a)

RISC-V: Support rounding mode for VFNMSAC/VFNMSUB autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfnmsub     <- autovec/autovec-opt

The autovec generated vfnmsub should take DYN mode, and the
frm must be restored before the vfnmsub insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfnmsub     <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmsac/vfnmsub
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit a7cefeaead68e5d89f65ba3a558eddef9b0b0f75)

RISC-V: Support rounding mode for VFMSAC/VFMSUB autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfmsub      <- autovec/autovec-opt

The autovec generated vfmsub should take DYN mode, and the
frm must be restored before the vfmsub insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfmsub      <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmsac/vfmsub
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-2.c: New test.

(cherry picked from commit 625962440ba5c737d6f35f7a1c9af1e9ef6bef3a)

RISC-V: Support rounding mode for VFMADD/VFMACC autovec

There will be a case like below for intrinsic and autovec combination

vfadd RTZ   <- intrinisc static rounding
vfmadd      <- autovec/autovec-opt

The autovec generated vfmadd should take DYN mode, and the
frm must be restored before the vfmadd insn. This patch
would like to fix this issue by:

* Add the frm operand to the vfmadd/vfmacc autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfmadd      <- autovec/autovec-opt
| ...
+------------

However, we leverage unspec instead of use to consume the FRM register
because there are some restrictions from the combine pass. Some code
path of try_combine may require the XVECLEN(pat, 0) == 2 for the
recog_for_combine, and add new use will make the XVECLEN(pat, 0) == 3
and result in the vfwmacc optimization failure. For example, in the
test  widen-complicate-5.c and widen-8.c

Finally, there will be other fma cases and they will be covered in
the underlying patches.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmadd/vfmacc.
* config/riscv/autovec.md: Ditto.
* config/riscv/vector-iterators.md: Add UNSPEC_VFFMA.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c: New test.

(cherry picked from commit 3e37e8231849ded7e214042f60f59fdcec75d7d3)

RISC-V: Add vector_scalar_shift_operand

The vector shift immediates happen to have the same constraints as some
of the CSR-related operands, but it's a different usage. This adds a
name for them, so I don't get confused again next time.

gcc/ChangeLog:

* config/riscv/autovec.md (shifts): Use
vector_scalar_shift_operand.
* config/riscv/predicates.md (vector_scalar_shift_operand): New
predicate.

(cherry picked from commit 0337555c7a2524bd334bafdc06dd801818eb34b6)

RISC-V: Add Vector cost model framework for RVV

Hi, currently RVV vectorization only support picking LMUL according to
compile option --param=riscv-autovec-lmul= which is no ideal.

Compiler should be able to pick optimal LMUL/vectorization factor to
vectorize the loop according to the loop_vec_info and SSA-based register
pressure analysis.

Now, I figure out current GCC cost model provide the approach that we
can choose LMUL/vectorization factor by adjusting the COST.

This patch is just add the minimum COST model framework which is still
applying the default cost model (No vector codes changed from before).

Regression all pased and no difference.

gcc/ChangeLog:

* config.gcc: Add vector cost model framework for RVV.
* config/riscv/riscv.cc (riscv_vectorize_create_costs): Ditto.
(TARGET_VECTORIZE_CREATE_COSTS): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-vector-costs.cc: New file.
* config/riscv/riscv-vector-costs.h: New file.

(cherry picked from commit 4da3065a6422062b029df9660a226297802455f4)

RISC-V: Change vsetvl tail and mask policy to default policy

This patch change the vsetvl policy to default policy
(returned by get_prefer_mask_policy and get_prefer_tail_policy) instead
fixed policy. Any policy is now returned, allowing change to agnostic
or undisturbed. In the future, users may be able to control the default
policy, such as keeping agnostic by compiler options.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (IS_AGNOSTIC): Move to here.
* config/riscv/riscv-v.cc (gen_no_side_effects_vsetvl_rtx):
Change to default policy.
* config/riscv/riscv-vector-builtins-bases.cc: Change to default policy.
* config/riscv/riscv-vsetvl.h (IS_AGNOSTIC): Delete.
* config/riscv/riscv.cc (riscv_print_operand): Use IS_AGNOSTIC to test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Adjust.
* gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-24.c: New test.

(cherry picked from commit e69d050fd990f8e72e19e6dfb1bf7da2f09236f7)

RISC-V: Refactor and clean emit_{vlmax,nonvlmax}_xxx functions

This patch refactor the code of emit_{vlmax,nonvlmax}_xxx functions.
These functions are used to generate RVV insn. There are currently 31
such functions and a few duplicates. The reason so many functions are
needed is because there are more types of RVV instructions. There are
patterns that don't have mask operand, patterns that don't have merge
operand, and patterns that don't need a tail policy operand, etc.

Previously there was the insn_type enum, but it's value was just used
to indicate how many operands were passed in by caller. The rest of
the operands information is scattered throughout these functions.
For example, emit_vlmax_fp_insn indicates that a rounding mode operand
of FRM_DYN should also be passed, emit_vlmax_merge_insn means that
there is no mask operand or mask policy operand.

I introduced a new enum insn_flags to indicate some properties of these
RVV patterns. These insn_flags are then used to define insn_type enum.
For example for the defintion of WIDEN_TERNARY_OP:

  WIDEN_TERNARY_OP = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P
                       | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P | TERNARY_OP_P,

This flags mean the RVV pattern has no merge operand. This flags only apply
to vwmacc instructions. After defining the desired insn_type, all the
emit_{vlmax,nonvlmax}_xxx functions are unified into three functions:

  emit_vlmax_insn (icode, insn_flags, ops);
  emit_nonvlmax_insn (icode, insn_flags, ops, vl);
  emit_vlmax_insn_lra (icode, insn_flags, ops, vl);

Then user can select the appropriate insn_type and the appropriate emit_xxx
function for RVV patterns generation as needed.

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Adjust.
* config/riscv/autovec-vls.md: Ditto.
* config/riscv/autovec.md: Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add insn_type.
(enum insn_flags): Add insn flags.
(emit_vlmax_insn): Adjust.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_insn_lra): Add.
* config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): New.
(emit_vlmax_insn): Adjust.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_insn_lra): Add.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_nonvlmax_slide_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_insn): Delete.
(emit_nonvlmax_masked_insn): Delete.
(emit_vlmax_masked_store_insn): Delete.
(emit_nonvlmax_masked_store_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_vlmax_masked_fp_mu_insn): Delete.
(emit_nonvlmax_tu_insn): Delete.
(emit_nonvlmax_fp_tu_insn): Delete.
(emit_nonvlmax_tumu_insn): Delete.
(emit_nonvlmax_fp_tumu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_cpop_insn): Delete.
(emit_vlmax_integer_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_gather_insn): Delete.
(emit_vlmax_masked_gather_mu_insn): Delete.
(emit_vlmax_compress_insn): Delete.
(emit_nonvlmax_compress_insn): Delete.
(emit_vlmax_reduction_insn): Delete.
(emit_vlmax_fp_reduction_insn): Delete.
(emit_nonvlmax_fp_reduction_insn): Delete.
(expand_vec_series): Adjust.
(expand_const_vector): Adjust.
(legitimize_move): Adjust.
(sew64_scalar_helper): Adjust.
(expand_tuple_move): Adjust.
(expand_vector_init_insert_elems): Adjust.
(expand_vector_init_merge_repeating_sequence): Adjust.
(expand_vec_cmp): Adjust.
(expand_vec_cmp_float): Adjust.
(expand_vec_perm): Adjust.
(shuffle_merge_patterns): Adjust.
(shuffle_compress_patterns): Adjust.
(shuffle_decompress_patterns): Adjust.
(expand_load_store): Adjust.
(expand_cond_len_op): Adjust.
(expand_cond_len_unop): Adjust.
(expand_cond_len_binop): Adjust.
(expand_gather_scatter): Adjust.
(expand_cond_len_ternop): Adjust.
(expand_reduction): Adjust.
(expand_lanes_load_store): Adjust.
(expand_fold_extract_last): Adjust.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Adjust.
* config/riscv/vector.md: Adjust.

(cherry picked from commit 79ab19bcbae6e54c91bfca4ffa45cbc5eb0374bc)

RISC-V: Fix vsetvl pass ICE

This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any
vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn.

PR target/111234

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111234.c: New test.

(cherry picked from commit ac55f9710fe82a4ed8cb132f57303775ce60e5d1)

test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

Like ARM SVE, add RVV variable length xfail.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-7.c: Add RVV.

(cherry picked from commit 282c33c5f1c9b2965c18877aea8466701ab4e678)

test: Adapt slp-26.c check for RVV

Fix FAILs:
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0

Since RVV is able to vectorize it with VLS modes like amdgcn.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-26.c: Adapt for RVV.

(cherry picked from commit 5d34a42f3b64fde9bb8be74231d8d11590c8d1db)

RISC-V: Remove movmisalign pattern for VLA modes

This patch fixed this bunch of failures in "vect" testsuite:
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-1.c execution test
FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-2.c execution test

Spike report:
z  0000000000000000 ra 00000000000100f4 sp 0000003ffffffb30 gp 0000000000012cc8
tp 0000000000000000 t0 00000000000102d4 t1 000000000000000f t2 0000000000000000
s0 0000000000000000 s1 0000000000000000 a0 00000000000101a6 a1 0000000000000008
a2 0000000000000010 a3 0000000000012401 a4 0000000000012480 a5 0000000000000020
a6 000000000000001f a7 00000000000000d6 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 00000000000101ec va/inst 000000000206dc07 sr 8000000200006620
Load access fault!

(spike)
core   0: 0x0000000000010204 (0x02065087) vle16.v v1, (a2)
core   0: exception trap_load_address_misaligned, epc 0x0000000000010204
core   0:           tval 0x0000000000012c81
(spike) reg 0 a2
0x0000000000012c81

According to RVV ISA, we couldn't use "vle16.v" if the address is byte align.

Such issue is caused by this GIMPLE IR:

vect__1.15_17 = .MASK_LEN_LOAD (vectp_t.13_15, 8B, { -1, ... }, _24, 0);

For partial vectorization, the alignment is "8B" byte align here is incorrect here.

After this patch, the vectorization failed:

sll     a5,a4,0x1
add     a5,a5,a1
lhu     a3,64(a5)
lbu     a5,66(a5)
addw    a4,a4,1
srl     a3,a3,0x8
sll     a5,a5,0x8
or      a5,a5,a3
sh      a5,0(a2)
add     a2,a2,2
bne     a4,a0,101f8 <foo+0x14>

I will enable auto-vectorization in another approach in the next following patch.

gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign<mode>): Delete.

(cherry picked from commit f7bff24905a6959f85f866390db2fff1d6f95520)

test: Fix XPASS of RVV

XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1

Like ARM SVE, Fix these XPASS for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-double-reduc-5.c: Add riscv.
* gcc.dg/vect/vect-outer-4e.c: Ditto.
* gcc.dg/vect/vect-outer-4f.c: Ditto.
* gcc.dg/vect/vect-outer-4g.c: Ditto.
* gcc.dg/vect/vect-outer-4k.c: Ditto.
* gcc.dg/vect/vect-outer-4l.c: Ditto.

(cherry picked from commit ece3884b4b5d64dff1f112d0ec13c9b71dd0fc6a)

test: Add xfail for riscv_vector

Like ARM SVE, when we enable scalable vectorization for RVV,
we can't do constant fold for these yet for both ARM SVE and RVV.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr88598-1.c: Add riscv_vector.
* gcc.dg/vect/pr88598-2.c: Ditto.
* gcc.dg/vect/pr88598-3.c: Ditto.

(cherry picked from commit 586ca3db52228ac1c5f2b5ce754928ced4e8e434)

RISC-V: support cm.mva01s cm.mvsa01 in zcmp

Signed-off-by: Die Li <lidie@eswincomputing.com>
Co-Authored-By: Fei Gao <gaofei@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/peephole.md: New pattern.
* config/riscv/predicates.md (a0a1_reg_operand): New predicate.
(zcmp_mv_sreg_operand): New predicate.
* config/riscv/riscv.md: New predicate.
* config/riscv/zc.md (*mva01s<X:mode>): New pattern.
(*mvsa01<X:mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cm_mv_rv32.c: New test.

(cherry picked from commit 490bf0b9756368b34221348b0260e061634e497b)

RISC-V: support cm.popretz in zcmp

Generate cm.popretz instead of cm.popret if return value is 0.

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_zcmp_can_use_popretz): true if popretz can be used
(riscv_gen_multi_pop_insn): interface to generate cm.pop[ret][z]
(riscv_expand_epilogue): expand cm.pop[ret][z] in epilogue
* config/riscv/riscv.md: define A0_REGNUM
* config/riscv/zc.md
(@gpr_multi_popretz_up_to_ra_<mode>): md for popretz ra
(@gpr_multi_popretz_up_to_s0_<mode>): md for popretz ra, s0
(@gpr_multi_popretz_up_to_s1_<mode>): likewise
(@gpr_multi_popretz_up_to_s2_<mode>): likewise
(@gpr_multi_popretz_up_to_s3_<mode>): likewise
(@gpr_multi_popretz_up_to_s4_<mode>): likewise
(@gpr_multi_popretz_up_to_s5_<mode>): likewise
(@gpr_multi_popretz_up_to_s6_<mode>): likewise
(@gpr_multi_popretz_up_to_s7_<mode>): likewise
(@gpr_multi_popretz_up_to_s8_<mode>): likewise
(@gpr_multi_popretz_up_to_s9_<mode>): likewise
(@gpr_multi_popretz_up_to_s11_<mode>): likewise

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: add testcase for cm.popretz in rv32e
* gcc.target/riscv/rv32i_zcmp.c: add testcase for cm.popretz in rv32i

(cherry picked from commit b27d323a368033f0b37e93c57a57a35fd9997864)

RISC-V: support cm.push cm.pop cm.popret in zcmp

Zcmp can share the same logic as save-restore in stack allocation: pre-allocation
by cm.push, step 1 and step 2.

Pre-allocation not only saves callee saved GPRs, but also saves callee saved FPRs and
local variables if any.

Please be noted cm.push pushes ra, s0-s11 in reverse order than what save-restore does.
So adaption has been done in .cfi directives in my patch.

gcc/ChangeLog:

* config/riscv/iterators.md
(slot0_offset): slot 0 offset in stack GPRs area in bytes
(slot1_offset): slot 1 offset in stack GPRs area in bytes
(slot2_offset): likewise
(slot3_offset): likewise
(slot4_offset): likewise
(slot5_offset): likewise
(slot6_offset): likewise
(slot7_offset): likewise
(slot8_offset): likewise
(slot9_offset): likewise
(slot10_offset): likewise
(slot11_offset): likewise
(slot12_offset): likewise
* config/riscv/predicates.md
(stack_push_up_to_ra_operand): predicates of stack adjust pushing ra
(stack_push_up_to_s0_operand): predicates of stack adjust pushing ra, s0
(stack_push_up_to_s1_operand): likewise
(stack_push_up_to_s2_operand): likewise
(stack_push_up_to_s3_operand): likewise
(stack_push_up_to_s4_operand): likewise
(stack_push_up_to_s5_operand): likewise
(stack_push_up_to_s6_operand): likewise
(stack_push_up_to_s7_operand): likewise
(stack_push_up_to_s8_operand): likewise
(stack_push_up_to_s9_operand): likewise
(stack_push_up_to_s11_operand): likewise
(stack_pop_up_to_ra_operand): predicates of stack adjust poping ra
(stack_pop_up_to_s0_operand): predicates of stack adjust poping ra, s0
(stack_pop_up_to_s1_operand): likewise
(stack_pop_up_to_s2_operand): likewise
(stack_pop_up_to_s3_operand): likewise
(stack_pop_up_to_s4_operand): likewise
(stack_pop_up_to_s5_operand): likewise
(stack_pop_up_to_s6_operand): likewise
(stack_pop_up_to_s7_operand): likewise
(stack_pop_up_to_s8_operand): likewise
(stack_pop_up_to_s9_operand): likewise
(stack_pop_up_to_s11_operand): likewise
* config/riscv/riscv-protos.h
(riscv_zcmp_valid_stack_adj_bytes_p):declaration
* config/riscv/riscv.cc (struct riscv_frame_info): comment change
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_multi_push_regs_count): num of regs in multi-push
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place
(riscv_save_libcall_count): no functional change
(riscv_compute_frame_info): add zcmp frame info
(riscv_for_each_saved_reg): save or restore fprs in specified slot for zcmp
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_gen_multi_push_pop_insn): gen function for multi push and pop
(get_multi_push_fpr_mask): get mask for the fprs pushed by cm.push
(riscv_expand_prologue): allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): allocate stack by cm.pop[ret]
(zcmp_base_adj): calculate stack adjustment base size
(zcmp_additional_adj): calculate stack adjustment additional size
(riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment valid
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
(CALLEE_SAVED_FREG_NUMBER): get x of fsx(fs0 ~ fs11)
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file. machine description for zcmp

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_push_fpr.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.

(cherry picked from commit 3d1d3132b9d4dc8b6069ad95dad624371124f297)

middle-end: Apply MASK_LEN_LOAD_LANES/MASK_LEN_STORE_LANES to ivopts/alias

Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant.

Bootstrap and Regression on X86 passed.

Ok for trunk?

gcc/ChangeLog:

* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
(get_alias_ptr_type_for_ptr_address): Ditto.

(cherry picked from commit 0394184cebc15e5e3f13d04d9ffbc787a16018bd)

RISC-V: Make arch-24.c to test "success" case

arch-24.c and arch-25.c are exactly the same and redundant. The author
suspects that the original author intended to test two base ISAs (RV32I and
RV64I) so this commit changes arch-24.c to test that RV32I+Zcf does not
cause any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-24.c: Test RV32I+Zcf instead.

(cherry picked from commit a248e1cc860821b96a42be96478257c4964a7c2a)

RISC-V: Make sure we get VL REG operand for VLMAX vsetvl

Fix ICE in "vect" testsuite:

FAIL: gcc.dg/vect/pr64495.c (internal compiler error: in df_uses_record, at df-scan.cc:2958)
FAIL: gcc.dg/vect/pr64495.c (test for excess errors

After this patch, all current found VSETVL PASS related bugs in "vect" are fixed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(vector_insn_info::get_avl_or_vl_reg): Fix bug.

(cherry picked from commit 7accc6208befae77699a56f67a94da1e247ed069)

RISC-V: Enable movmisalign for VLS modes

Prevous patch (which removed VLA modes movmisalign pattern) to fix run-time bug.
Such patch disable vectorization for misalign data movement.

After I check LLVM codes, LLVM supports misalign for VLS modes.

Before this patch:

sll     a5,a4,0x1
add     a5,a5,a1
lhu     a3,64(a5)
lbu     a5,66(a5)
addw    a4,a4,1
srl     a3,a3,0x8
sll     a5,a5,0x8
or      a5,a5,a3
sh      a5,0(a2)
add     a2,a2,2
bne     a4,a0,101f8 <foo+0x14>

After this patch:

foo:
lui a0,%hi(.LANCHOR0)
addi a0,a0,%lo(.LANCHOR0)
addi sp,sp,-16
addi a1,a0,1
li a2,64
sd ra,8(sp)
vsetvli zero,a2,e8,m4,ta,ma
addi a0,a0,128
vle8.v v4,0(a1)
vse8.v v4,0(a0)
call memcmp
bne a0,zero,.L6
ld ra,8(sp)
addi sp,sp,16
jr ra
.L6:
call abort

Note this patch has passed all testcases in "vect" which are related to alignment.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (movmisalign<mode>): New pattern.
* config/riscv/riscv.cc (riscv_support_vector_misalignment): Support
VLS misalign.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: New test.

(cherry picked from commit 260f743aa476abce8f88cceaca12abcb8115b02f)

RISC-V: Use splitter to generate zicond in another case

So in analyzing Ventana's internal tree against the trunk it became apparent
that the current zicond code is missing a case that helps coremark's bitwise
CRC implementation.

Here's a minimized testcase:

long xor1(long crc, long poly)
{
  if (crc & 1)
    crc ^= poly;

  return crc;
}

ie, it's just a conditional xor.

We generate this:

        andi    a5,a0,1
        neg     a5,a5
        and     a5,a5,a1
        xor     a0,a5,a0
        ret

But we should instead generate:

        andi    a5,a0,1
        czero.eqz       a5,a1,a5
        xor     a0,a5,a0
        ret

Combine wants to generate:

Trying 7, 8 -> 9:
    7: r140:DI=r137:DI&0x1
    8: r141:DI=-r140:DI
      REG_DEAD r140:DI
    9: r142:DI=r141:DI&r144:DI
      REG_DEAD r144:DI
      REG_DEAD r141:DI
Failed to match this instruction:
(set (reg:DI 142)
    (and:DI (sign_extract:DI (reg/v:DI 137 [ crc ])
            (const_int 1 [0x1])
            (const_int 0 [0]))
        (reg:DI 144)))

A splitter can rewrite the above into a suitable if-then-else construct and
squeeze an instruction out of that pesky CRC loop.  Sadly it doesn't really
help anything else.

The patch includes two variants.  One that uses ZBS, the other uses an ANDI
logical to produce the input condition.

gcc/
* config/riscv/zicond.md: New splitters to rewrite single bit
sign extension as the condition to a czero in the desired form.

gcc/testsuite
* gcc.target/riscv/zicond-xor-01.c: New test.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
(cherry picked from commit 94b950df6f8c46925799f642e5c44f42638f2b5e)

RISC-V: Added zvfh support for zfa extensions.

This is a follow-up for the zfa extension, added according to the recommendations
for zvfh and patch of Tsukasa OI <research_trasio@irq.a4lg.com>. At the same time,
zfa-fli-5.c of which is also based on the patch.

Ref:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627284.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628492.html

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
zvfh can generate zfa extended instruction fli.h, just like zfh.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fli-7.c: Change fa0 to fa\[0-9\] to avoid
assigning register numbers that are non-zero.
* gcc.target/riscv/zfa-fli-8.c: Ditto.
* gcc.target/riscv/zfa-fli-5.c: New test.

(cherry picked from commit fce74ce2535aa3b7648ba82e7e61eb77d0175546)

RISC-V: generate builtin macro for compilation with strict alignment

Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false.

Tested for regressions using rv32/64 multilib with newlib/linux

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Generate
__riscv_unaligned_avoid with value 1 or
__riscv_unaligned_slow with value 1 or
__riscv_unaligned_fast with value 1
* config/riscv/riscv.cc (riscv_option_override): Define
riscv_user_wants_strict_align. Set
riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-1.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/attribute-4.c: Check for
__riscv_unaligned_avoid
* gcc.target/riscv/attribute-5.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.
* gcc.target/riscv/predef-align-6.c: New test.

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Co-authored-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit 6e23440b5df4011bbe1dbee74d47641125dd7d16)

RISC-V: Add Types to Un-Typed Vector Instructions

Updates vector instructions to ensure that no instruction is left
without a type attribute. Create a placeholder type "vector" for
instructions where a type isn't clear

Tested for regressions using rv32/rv64 gc/gcv multilib with newlib/linux.

gcc/Changelog:

* config/riscv/autovec-vls.md: Update types
* config/riscv/riscv.md: Add vector placeholder type
* config/riscv/vector.md: Update types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit 4b70c7c849331d45c0d6a1a4e1cf96b103be9aa6)

RISC-V: Fix one ICE for vect test vect-multitypes-5

There will be one ICE when build vect-multitypes-5.c similar as below:

riscv64-unknown-elf-gcc -O3 \
  -march=rv64imafdcv -mabi=lp64d -mcmodel=medlow \
  -fdiagnostics-plain-output -flto -ffat-lto-objects \
  --param riscv-autovec-preference=scalable -Wno-psabi \
  -ftree-vectorize -fno-tree-loop-distribute-patterns \
  -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details \
  gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c -o test.elf -lm

The below RTL is not well handled in riscv_legitimize_const_move, and
then fall through to the default pass. Then the
default force_const_mem will NULL_RTX, and will have ICE when operating
one the NULL_RTX.

(const:DI
  (plus:DI
    (symbol_ref:DI ("ic") [flags 0x2] <var_decl 0x7fe57740be10 ic>)
    (const_poly_int:DI [16, 16])))

This patch would like to take care of this rtl in riscv_legitimize_const_move.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_poly_move): New declaration.
(riscv_legitimize_const_move): Handle ref plus const poly.

(cherry picked from commit d16af3ebea84749ac673db29a4124d2dc7cd369e)

RISC-V: Add stub support for existing extensions (unprivileged)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.

(cherry picked from commit f30d6a48635b5b180e46c51138d0938d33abd942)

RISC-V: Add stub support for existing extensions (vendor)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported vendor extensions to
riscv_ext_version_table (no riscv_implied_info entries to add; all
information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add stub support for all vendor extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-30.c: New test for a stub
vendor extension 'XVentanaCondOps'.

(cherry picked from commit fea5442127daf8472966360279d402023dba3379)

RISC-V: Add stub support for existing extensions (privileged)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

As a start, this commit adds stub supported *privileged* extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from privileged extensions.
(riscv_ext_version_table): Add stub support for all privileged
extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-29.c: New test for a stub privileged
extension 'Smstateen' with some implications.

(cherry picked from commit 4053d295fdd81d3e05c4977e3cd9c647e8cc6bc2)

RISC-V: Make PR 102957 tests more comprehensive

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will partially break the original intent of PR
102957 test case because we wanted to make sure that we can parse a valid
two-letter extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit adds pr102957-2.c to make sure that there will be no errors if
we parse a valid two-letter extension name.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957-2.c: New test case using the 'Zk'
extension to continue testing whether we can use valid two-letter
extensions.

(cherry picked from commit 8b0662254cdac3e0b670c1c54752e1d43113b0f4)

RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}

This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust
* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
(get_vlmax_rtx): Exported.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
(emit_vlmax_masked_gather_mu_insn): Adjust.
(get_vlmax_rtx): New func.
(expand_load_store): Adjust.
(expand_cond_len_unop): Call expand_cond_len_op.
(expand_cond_len_op): New subroutine.
(expand_cond_len_binop): Call expand_cond_len_op.
(expand_cond_len_ternop): Call expand_cond_len_op.
(expand_lanes_load_store): Adjust.

(cherry picked from commit b3176bdc86c04da6545a4bd8e2fb7f38d3f2db8d)

vect test: Remove xfail for riscv

We are planning to enable "vect" testsuite with scalable vector auto-vectorization.

This case XPASS:
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1

like ARM SVE.
gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-12.c: Add riscv xfail.

(cherry picked from commit 97aafa9cbb68ffa23aa9f018cc5cb30648a72427)

RISC-V: Fix ASM check of vlmax_switch_vtype-16.c

Notice there is a failure:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c -O2 scan-assembler-times vsetvli\\s+zero,\\s*zero 2

Fix "2" into "3", the assembly is correct and better.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Fix ASM check.

(cherry picked from commit 58a48781efa31e08b570f035fbceaaa8018c3412)

RISC-V: Fix AVL/VL get ICE[VSETVL PASS]

Fix bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vector_insn_info::get_avl_or_vl_reg): New function.
(pass_vsetvl::compute_local_properties): Fix bug.
(pass_vsetvl::commit_vsetvls): Ditto.
* config/riscv/riscv-vsetvl.h: New function.

(cherry picked from commit 818cc9f2d2f3dbbd4004ff85d3125d92d1e430c9)

RISC-V: Fix error combine of pred_mov pattern

This patch fix PR110943 which will produce some error code. This is because
the error combine of some pred_mov pattern. Consider this code:

```

void foo9 (void *base, void *out, size_t vl)
{
    int64_t scalar = *(int64_t*)(base + 100);
    vint64m2_t v = __riscv_vmv_v_x_i64m2 (0, 1);
    *(vint64m2_t*)out = v;
}
```

RTL before combine pass:

```
(insn 11 10 12 2 (set (reg/v:RVVM2DI 134 [ v ])
        (if_then_else:RVVM2DI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM2DI repeat [
                    (const_int 0 [0])
                ])
            (unspec:RVVM2DI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/app/example.c":6:20 1089 {pred_movrvvm2di})
(insn 14 13 0 2 (set (mem:RVVM2DI (reg/v/f:DI 136 [ out ]) [1 MEM[(vint64m2_t *)out_4(D)]+0 S[32, 32] A128])
        (reg/v:RVVM2DI 134 [ v ])) "/app/example.c":7:23 717 {*movrvvm2di_whole})
```

RTL after combine pass:
```
(insn 14 13 0 2 (set (mem:RVVM2DI (reg:DI 138) [1 MEM[(vint64m2_t *)out_4(D)]+0 S[32, 32] A128])
        (if_then_else:RVVM2DI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM2DI repeat [
                    (const_int 0 [0])
                ])
            (unspec:RVVM2DI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/app/example.c":7:23 1089 {pred_movrvvm2di})
```

This combine change the semantics of insn 14. I split @pred_mov pattern and
restrict the conditon of @pred_mov.

PR target/110943

gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_int_or_double_0_operand):
New predicate.
* config/riscv/riscv-vector-builtins.cc (function_expander::function_expander):
force_reg mem target operand.
* config/riscv/vector.md (@pred_mov<mode>): Wrapper.
(*pred_mov<mode>): Remove imm -> reg pattern.
(*pred_broadcast<mode>_imm): Add imm -> reg pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Adjust.
* gcc.target/riscv/rvv/base/pr110943.c: New test.

(cherry picked from commit 973eb0deb467c79cc21f265a710a81054cfd3e8c)

RISC-V: Fix documentation of __builtin_riscv_pause

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.

(cherry picked from commit cf64ab18e3f820376ff20c663c7c7bf1af290f02)

RISC-V: __builtin_riscv_pause for all environment

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension (in
the assembler). However, GCC emits "pause" unconditionally, making an
assembler error while compiling code with __builtin_riscv_pause while the
'Zihintpause' extension disabled.

However, the "pause" instruction code (0x0100000f) is a HINT and emitting its
instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x0100000f" instead of "pause" only if the extension is disabled (making
the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Implement the 'Zihintpause' extension, version 2.0.
(riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause".
(riscv_builtins) : Add "pause" extension.
* config/riscv/riscv-opts.h (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Adjust output based on
TARGET_ZIHINTPAUSE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c: New test when the 'Zihintpause'
extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
* gcc.target/riscv/zihintpause-noarch.c: New test when the 'Zihintpause'
extension is disabled.

(cherry picked from commit c2d04dd659c499d8df19f68d0602ad4c7d7065c2)

RISC-V: Fix uninitialized probability for GIMPLE IR tests

This patch fix unitialized probability in GIMPLE IR code tests:
FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10b.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10b.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10c.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10c.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10d.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10d.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10e.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10e.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Skip
never probability.
(pass_vsetvl::compute_probabilities): Fix unitialized probability.

(cherry picked from commit 421cf6109ad23ae0f5d3da9adb582eb464e8826c)

RISC-V: Disable user vsetvl fusion into EMPTY or DIRTY (Polluted EMPTY) block

This patch is fixing these bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Fix bug.

(cherry picked from commit e7b585a468aa4980955ae25fa9f4b41a3dc2995e)

RISC-V: Fix VSETVL test failures

Committed.

Fix failures:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\\([a-x0-9]+\\) 3

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vxrm-8.c: Adapt tests.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.

(cherry picked from commit 1671ad9ecff9f361870aeb26d5c5c6d9808826d7)

RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest fusion.
I do the refactor for the following reasons:

  1. Current implementation of phase 3 is doing too many things which makes the code quality
     quite messy and not easy to maintain.
  2. The demand fusion I do previously is we explicitly make the fusion including how to fuse
     VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion point (location)
     whether it is correct and optimal...etc.

     We are dong these things too much so I added these following functions:

        enum fusion_type get_backward_fusion_type (const bb_info *,
     const vector_insn_info &);
        bool hard_empty_block_p (const bb_info *, const vector_insn_info &) const;
        bool backward_demand_fusion (void);
        bool forward_demand_fusion (void);
        bool cleanup_illegal_dirty_blocks (void);

     to make sure the VSETV fusion is optimal and correct. I found in may downstream testing it is
     not the reliable and optimal approach.

     Instead, this patch is to use 'compute_earliest' which is the function of LCM to fuse multiple
     'compatible' VSETVL demand info if they are having same earliest edge.  We let LCM decide almost
     everything of demand fusion for us. The only thing we do (Not the LCM do) is just checking the
     VSETVLs demand info are compatible or not. That's all we need to do.
     I belive such approach is much more reliable and optimal than before (We have many testcases already to check this refactor patch).
  3. Using LCM approach to do the demand fusion is more reliable and better CFG than before.
  ...

Here is the basics of this patch approach:

Consider this following case:

for
  for
    for
      ...
         for
   if (...)
     VSETVL 1 demand: RATIO = 32 and TU policy.
   else if (...)
     VSETVL 2 demand: SEW = 16.
   else
     VSETVL 3 demand: MU policy.

   - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 and VSETVL 3.
     They are having same earliest edge which is outside the 1th inner-most loop.

   - Then, we check these 3 VSETVL demand info are compatible so fuse them into a single VSETVL info:
     demand SEW = 16, LMUL = MF2, TU, MU.

   - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) will hoist such VSETVL
     to the outer-most loop. So that we can get optimal codegen.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p):
New function.
(after_or_same_p): Ditto.
(find_reg_killed_by): Delete.
(has_vsetvl_killed_avl_p): Ditto.
(anticipatable_occurrence_p): Refactor.
(any_set_in_bb_p): Delete.
(count_regno_occurrences): Ditto.
(backward_propagate_worthwhile_p): Ditto.
(demands_can_be_fused_p): Ditto.
(earliest_pred_can_be_fused_p): New function.
(vsetvl_dominated_by_p): Ditto.
(vector_insn_info::parse_insn): Refactor.
(vector_insn_info::merge): Refactor.
(vector_insn_info::dump): Refactor.
(vector_infos_manager::vector_infos_manager): Refactor.
(vector_infos_manager::all_empty_predecessor_p): Delete.
(vector_infos_manager::all_same_avl_p): Ditto.
(vector_infos_manager::create_bitmap_vectors): Refactor.
(vector_infos_manager::free_bitmap_vectors): Refactor.
(vector_infos_manager::dump): Refactor.
(pass_vsetvl::update_block_info): New function.
(enum fusion_type): Ditto.
(pass_vsetvl::get_backward_fusion_type): Delete.
(pass_vsetvl::hard_empty_block_p): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.
(pass_vsetvl::forward_demand_fusion): Ditto.
(pass_vsetvl::demand_fusion): Ditto.
(pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto.
(pass_vsetvl::compute_local_properties): Ditto.
(pass_vsetvl::earliest_fusion): New function.
(pass_vsetvl::vsetvl_fusion): Ditto.
(pass_vsetvl::commit_vsetvls): Refactor.
(get_first_vsetvl_before_rvv_insns): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_earliest_vsetvls): New function.
(pass_vsetvl::df_post_optimization): Refactor.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-27.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-28.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-29.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-30.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-35.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-36.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-50.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-51.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-66.c:
* gcc.target/riscv/rvv/vsetvl/avl_single-67.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-69.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-70.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-71.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-72.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-76.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-77.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-84.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-93.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-94.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-96.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/ffload-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_switch-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_switch-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-45.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-103.c: New test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-13.c: New test.

(cherry picked from commit e030af3e6f6d3ae555d6f70047ea3a2bf5744b7e)

RISC-V: Fix spill-11.c testsuite failure

Jivan's work also results in using a different save/restore function for the
spill-11 test. So the expected output needs minor adjusting

gcc/testsuite
* gcc.target/riscv/rvv/base/spill-11.c: Adjust expected output.

(cherry picked from commit 3745feb19ed072e0865b12a891d7dbf7ba12c337)

RISC-V: Fix spill-12 test

Jivan's recent work on IRA results in more efficient code for this test. This
adjusts the expected output for the removal of 5 instructions and conversion of
an addi into a simple mv.

gcc/testsuite
* gcc.target/riscv/rvv/base/spill-12.c: Update expected output.

(cherry picked from commit 6567837fd823a93f7f7948a73ff9dc1153592e8c)

RISC-V: Fix xtheadcondmov-indirect.c

The pressure sensitive scheduling change perturbs the output ever so slightly
for this test. Seemed easiest to just turn that off rather than generalize the
expected output enough to work across all the relevant optimization options.

gcc/testsuite/
* gcc.target/riscv/xtheadcondmov-indirect.c: Turn off pressure
sensitive scheduling.

RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < 4; i++)
    if (a[i] < min_v)
      last = i;

  return last;
}

--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8

condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li a5,32
vmv.v.x v8,a1
vl8re32.v v0,0(a0)
vid.v v16
vmslt.vv v0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addi a5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vm v8,v16,v0
vslidedown.vx v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li a0,66
ret

--param=riscv-autovec-preference=scalable

condition_reduction:
csrr a6,vlenb
mv a2,a0
li a3,32
li a0,66
srli a6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma    ----> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
slli a1,a5,2
vmv.v.x v2,a6
vmslt.vv v0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addi a5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vm v2,v3,v0
vslidedown.vx v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret

There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue.

vsetvl issue is not included in this patch but will be fixed soon.

gcc/ChangeLog:

* config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.

(cherry picked from commit e7545cadbedfc167749d801bd574cf9fe22ed5c5)

RISC-V: Add Types to Un-Typed Sync Instructions:

Updates the sync instructions to ensure that no insn is left without
a type attribute. Updates a total of 9 insns to have type "atomic"
or type "multi" based on number of assembly instructions generated

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/sync-rvwmo.md: updated types to "multi" or
"atomic" based on number of assembly lines generated
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit df177510665c4e1045bdaadf10d837f1bdc4ea06)

RISC-V: Make stack_save_restore tests more robust

Spurred by Jivan's patch and a desire for cleaner testresults, I went ahead and
make the stack_save_restore tests independent of the precise stack size by
using a regexp.

gcc/testsuite/
* gcc.target/riscv/stack_save_restore_1.c: Robustify.
* gcc.target/riscv/stack_save_restore_2.c: Robustify.

(cherry picked from commit e1f096a3cc96c71907cfbc7b8baf67a3d863cb6d)