gcc.gnu.org Git - gcc.git/log

RISC-V: Remove redundant vec_duplicate pattern

Currently, VLS and VLA patterns are different.
VLA is define_expand
VLS is define_insn_and_split

It makes no sense that they are different pattern format.
Merge them into same pattern (define_insn_and_split).
It can also be helpful for the future vv -> vx fwprop optimization.

gcc/ChangeLog:

* config/riscv/riscv-selftests.cc (run_broadcast_selftests): Adapt selftests.
* config/riscv/vector.md (@vec_duplicate<mode>): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111313.c: Adapt test.

(cherry picked from commit 4260f4af4dde6dbf85c28da7e8aaf03985b3d171)

RISC-V: Fix bogus FAILs of vsetvl testcases

Due the the global codes change which change the CFG cause bogus vsetvl checking FAILs.

Adapt testcases for the global codes change.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-21.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/avl_single-26.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-39.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-41.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c: Ditto.

(cherry picked from commit bdb7d85dde56b69af378adcffe45accf792cf4fd)

RISC-V: Removed misleading comments in testcases

This patch removed the misleading comments in testcases since we
support fold min(int, poly) to constant by this patch
(https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629651.html).
Thereby the csrr will not appear inside the assembly code, even if there
is no support for some VLS vector patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/div-1.c: Removed comments.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.

(cherry picked from commit 1b03c73295266984378dd9da99a9458b591b964c)

RISC-V: Add fixed PR111255 testcase by other patch

This patch add the missed PR111255 testcase which is fixed by this
committed patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628922.html).

PR target/111255

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111255.c: New test.

(cherry picked from commit 4ab744ace2478c4b986ec4ac27c0e3467b7a6419)

RISC-V: Support VLS reduction

Notice previous VLS reduction patch is missing some codes which cause multiple ICE:
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-1.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredsum\\.vs 22
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-10.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmax\\.vs 9
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-10.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmaxu\\.vs 9
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-10.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmin\\.vs 9
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-10.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredminu\\.vs 9
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-11.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmax\\.vs 8
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-11.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmaxu\\.vs 8
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-11.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmin\\.vs 8
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-11.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredminu\\.vs 8
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-12.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredmax\\.vs 10
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-12.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredmin\\.vs 10
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-13.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredmax\\.vs 9
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-13.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredmin\\.vs 9
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-14.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredmax\\.vs 8
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-14.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredmin\\.vs 8
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-15.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredand\\.vs 22
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-15.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredor\\.vs 22
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-15.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredxor\\.vs 22
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-16.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredand\\.vs 20
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-16.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredor\\.vs 20
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-16.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredxor\\.vs 20
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-17.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredand\\.vs 18
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-17.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredor\\.vs 18
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-17.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredxor\\.vs 18
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-18.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredand\\.vs 16
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-18.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredor\\.vs 16
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-18.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredxor\\.vs 16
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-19.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-19.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-2.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredsum\\.vs 20
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-20.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-20.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-21.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (internal compiler error: in code_for_pred, at ./insn-opinit.h:1560)
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-21.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-3.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredsum\\.vs 18
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-4.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredsum\\.vs 16
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-5.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredusum\\.vs 10
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-6.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredusum\\.vs 9
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-7.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vfredusum\\.vs 8
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-8.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmax\\.vs 11
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-8.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmaxu\\.vs 11
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-8.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmin\\.vs 11
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-8.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredminu\\.vs 11
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-9.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmax\\.vs 10
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-9.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmaxu\\.vs 10
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-9.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredmin\\.vs 10
FAIL: gcc.target/riscv/rvv/autovec/vls/reduc-9.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vredminu\\.vs 10

Committed.

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS modes.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md: Ditto.

(cherry picked from commit 71e0f38dcb73e4cdfe61fc28821551b325320302)

RISC-V: Fix VSETVL PASS fusion bug

There is an obvious fusion bug that is exposed by more VLS patterns support.
After more VLS modes support, it cause following FAILs:
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c execution test

Demand 1: SEW = 64, LMUL = 1, RATIO = 64, demand SEW, demand GE_SEW.
Demand 2: SEW = 64, LMUL = 2, RATIO = 32, demand SEW, demand GE_SEW, demand RATIO.

Before this patch:
merge demand: SEW = 64, LMUL = 1, RATIO = 32, demand SEW, demand LMUL, demand GE_SEW.
It's obvious incorrect of merge LMUL which should be new LMUL = (demand 2 RATIO * greatest SEW) = M2

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vlmul_for_greatest_sew_second_ratio): New function.
* config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Fix bug.

(cherry picked from commit 8fbc0871da26fac1668ba939f4492876794734ac)

RISC-V: Support VLS modes vec_init auto-vectorization

There are multiple SLP dump FAILs in vect testsuite.
After analysis, confirm we are missing vec_init for VLS modes.
This patch is not sufficient to fix those FAILs (We need more VLS patterns will send them soon).

This patch is the prerequsite patch for fixing those SLP FAILs.

Finish the whole regression.
Ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md: Extend VLS modes.
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS vec_init tests.
* gcc.target/riscv/rvv/autovec/vls/init-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-9.c: New test.

(cherry picked from commit 1f9bf6f372da48c75d42f2669ba92f3fd4370fda)

RISC-V: Remove autovec-vls.md file and clean up VLS move modes[NFC]

We have largely supportted VLS modes. Only move patterns of VLS modes are
different from VLS patterns. The rest of them are the same.

We always extend the current VLA patterns with VLSmodes:

VI --> V_VLSI
VF --> V_VLSF

It makes no sense to have a separate file holding a very few VLS patterns
that can not be extended from the current VLA patterns.

So remove autovec-vls.md

gcc/ChangeLog:

* config/riscv/vector.md (mov<mode>): New pattern.
(*mov<mode>_mem_to_mem): Ditto.
(*mov<mode>): Ditto.
(@mov<VLS_AVL_REG:mode><P:mode>_lra): Ditto.
(*mov<VLS_AVL_REG:mode><P:mode>_lra): Ditto.
(*mov<mode>_vls): Ditto.
(movmisalign<mode>): Ditto.
(@vec_duplicate<mode>): Ditto.
* config/riscv/autovec-vls.md: Removed.

(cherry picked from commit 4e679b9ceac22cf369a57ebb4f9175c1d02b2466)

RISC-V: Support VLS modes reduction[PR111153]

This patch supports VLS reduction vectorization.

It can optimize the current reduction vectorization codegen with current COST model.

TYPE __attribute__ ((noinline, noclone)) \
reduc_plus_##TYPE (TYPE * __restrict a, int n) \
{ \
  TYPE r = 0; \
  for (int i = 0; i < n; ++i) \
    r += a[i]; \
  return r; \
}

  T (int32_t) \

TEST_PLUS (DEF_REDUC_PLUS)

Before this patch:

        vle32.v v2,0(a5)
        addi    a5,a5,16
        vadd.vv v1,v1,v2
        bne     a5,a4,.L4
        lui     a4,%hi(.LC0)
        lui     a5,%hi(.LC1)
        addi    a4,a4,%lo(.LC0)
        vlm.v   v0,0(a4)
        addi    a5,a5,%lo(.LC1)
        andi    a1,a1,-4
        vmv1r.v v2,v3
        vlm.v   v4,0(a5)
        vcompress.vm    v2,v1,v0
        vmv1r.v v0,v4
        vadd.vv v1,v2,v1
        vcompress.vm    v3,v1,v0
        vadd.vv v3,v3,v1
        vmv.x.s a0,v3
        sext.w  a0,a0
        beq     a3,a1,.L12

After this patch:

vle32.v v2,0(a5)
addi a5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
li a5,0
andi a1,a1,-4
vmv.s.x v2,a5
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
beq a3,a1,.L12

PR target/111153

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS mode reduction case.
* gcc.target/riscv/rvv/autovec/vls/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-15.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-16.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-17.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-18.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-9.c: New test.

(cherry picked from commit fafd2502c5416fe4f69daf13224ab1efbf256a1c)

RISC-V: Remove redundant codes of VLS patterns[NFC]

Consider those VLS patterns are the same VLA patterns.
Now extend VI -> V_VLSI and VF -> V_VLSF.
Then remove the redundant codes of VLS patterns.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (<optab><mode>3): Deleted.
(copysign<mode>3): Ditto.
(xorsign<mode>3): Ditto.
(<optab><mode>2): Ditto.
* config/riscv/autovec.md: Extend VLS modes.

(cherry picked from commit 5761dce5d71e3dd013ce4db4c5e9b5e49c6cba23)

RISC-V: Expand VLS mode to scalar mode move[PR111391]

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391

PR target/111391

gcc/ChangeLog:

* config/riscv/autovec.md (@vec_extract<mode><vel>): Remove @.
(vec_extract<mode><vel>): Ditto.
* config/riscv/riscv-vsetvl.cc (emit_vsetvl_insn): Fix bug.
(pass_vsetvl::local_eliminate_vsetvl_insn): Fix bug.
* config/riscv/riscv.cc (riscv_legitimize_move): Expand VLS mode to scalar mode move.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
* gcc.target/riscv/rvv/autovec/pr111391-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr111391-2.c: New test.

(cherry picked from commit 86451305d8b2a25e7c6ea6c2f1ee69c419cba3ef)

RISC-V: Make SHA-256, SM3 and SM4 builtins operate on uint32_t

This is in parity with the LLVM commit a64b3e92c7cb ("[RISCV] Re-define
sha256, Zksed, and Zksh intrinsics to use i32 types.").

SHA-256, SM3 and SM4 instructions operate on 32-bit integers and upper
32-bits have no effects on RV64 (the output is sign-extended from the
original 32-bit value).  In that sense, making those intrinsics only
operate on uint32_t is much more natural than XLEN-bits wide integers.

This commit reforms instructions and expansions based on 32-bit
instruction handling on RV64 (such as ADDW).

Before:
   riscv_<op>_si: For RV32, fully operate on uint32_t
   riscv_<op>_di: For RV64, fully operate on uint64_t
After:
  *riscv_<op>_si: For RV32, fully operate on uint32_t
   riscv_<op>_di_extended:
                  For RV64.  Input is uint32_t and output is int64_t,
                  sign-extended from the int32_t result
                  (represents a part of <op> behavior).
   riscv_<op>_si: Common (fully operate on uint32_t).
                  On RV32, "expands" to *riscv_<op>_si.
                  On RV64, initially expands to riscv_<op>_di_extended *and*
                  extracts lower 32-bits from the int64_t result.

It also refines definitions of SHA-256, SM3 and SM4 intrinsics.

gcc/ChangeLog:

* config/riscv/crypto.md (riscv_sha256sig0_<mode>,
riscv_sha256sig1_<mode>, riscv_sha256sum0_<mode>,
riscv_sha256sum1_<mode>, riscv_sm3p0_<mode>, riscv_sm3p1_<mode>,
riscv_sm4ed_<mode>, riscv_sm4ks_<mode>): Remove and replace with
new insn/expansions.
(SHA256_OP, SM3_OP, SM4_OP): New iterators.
(sha256_op, sm3_op, sm4_op): New attributes for iteration.
(*riscv_<sha256_op>_si): New raw instruction for RV32.
(*riscv_<sm3_op>_si): Ditto.
(*riscv_<sm4_op>_si): Ditto.
(riscv_<sha256_op>_di_extended): New base instruction for RV64.
(riscv_<sm3_op>_di_extended): Ditto.
(riscv_<sm4_op>_di_extended): Ditto.
(riscv_<sha256_op>_si): New common instruction expansion.
(riscv_<sm3_op>_si): Ditto.
(riscv_<sm4_op>_si): Ditto.
* config/riscv/riscv-builtins.cc: Add availability "crypto_zknh",
"crypto_zksh" and "crypto_zksed".  Remove availability
"crypto_zksh{32,64}" and "crypto_zksed{32,64}".
* config/riscv/riscv-ftypes.def: Remove unused function type.
* config/riscv/riscv-scalar-crypto.def: Make SHA-256, SM3 and SM4
intrinsics to operate on uint32_t.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zknh-sha256.c: Moved to...
* gcc.target/riscv/zknh-sha256-64.c: ...here.  Test RV64.
* gcc.target/riscv/zknh-sha256-32.c: New test for RV32.
* gcc.target/riscv/zksh64.c: Change the type.
* gcc.target/riscv/zksed64.c: Ditto.

(cherry picked from commit 9882b81410f247604fbfd5883894a96127f461ac)

RISC-V: Make bit manipulation value / round number and shift amount types for builtins unsigned

For bit manipulation operations, input(s) and the manipulated output are
better to be unsigned like other target-independent builtins like
__builtin_bswap32 and __builtin_popcount.

Although this is not completely compatible as before (as the type changes),
most code will run normally, even without warnings (with -Wall -Wextra).

To make consistent to the LLVM commit 599421ae36c3 ("[RISCV] Use unsigned
instead of signed types for Zk* and Zb* builtins."), round numbers and
shift amount on the scalar crypto instructions are also changed
to unsigned.

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc (RISCV_ATYPE_UQI): New for
uint8_t. (RISCV_ATYPE_UHI): New for uint16_t.
(RISCV_ATYPE_QI, RISCV_ATYPE_HI, RISCV_ATYPE_SI, RISCV_ATYPE_DI):
Removed as no longer used.
(RISCV_ATYPE_UDI): New for uint64_t.
* config/riscv/riscv-cmo.def: Make types unsigned for not working
"zicbop_cbo_prefetchi" and working bit manipulation clmul builtin
argument/return types.
* config/riscv/riscv-ftypes.def: Make bit manipulation, round
number and shift amount types unsigned.
* config/riscv/riscv-scalar-crypto.def: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: Make signed type to unsigned.
* gcc.target/riscv/zbc64.c: Ditto.
* gcc.target/riscv/zbkb32.c: Ditto.
* gcc.target/riscv/zbkb64.c: Ditto.
* gcc.target/riscv/zbkc32.c: Ditto.
* gcc.target/riscv/zbkc64.c: Ditto.
* gcc.target/riscv/zbkx32.c: Ditto.
* gcc.target/riscv/zbkx64.c: Ditto.
* gcc.target/riscv/zknd32.c: Ditto.
* gcc.target/riscv/zknd64.c: Ditto.
* gcc.target/riscv/zkne32.c: Ditto.
* gcc.target/riscv/zkne64.c: Ditto.
* gcc.target/riscv/zknh-sha256.c: Ditto.
* gcc.target/riscv/zknh-sha512-32.c: Ditto.
* gcc.target/riscv/zknh-sha512-64.c: Ditto.
* gcc.target/riscv/zksed32.c: Ditto.
* gcc.target/riscv/zksed64.c: Ditto.
* gcc.target/riscv/zksh32.c: Ditto.
* gcc.target/riscv/zksh64.c: Ditto.

(cherry picked from commit a1751681867a3ce760ea6924c3c632f1b81db97e)

RISC-V: Support FP SGNJX autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation SGNJX.

Give sample code as below:

void
test (float * restrict out, float * restrict in1, float * restrict in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = in1[i] * copysignf (1.0, in2[i]);
}

Before this patch:
test:
  li      a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v2,0(a1)
  lui     a4,%hi(.LC0)
  flw     fa5,%lo(.LC0)(a4)
  vfmv.v.f        v1,fa5
  vle32.v v3,0(a2)
  vfsgnj.vv       v1,v1,v3
  vfmul.vv        v1,v1,v2
  vse32.v v1,0(a0)
  ret

After this patch:
test:
  li      a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vle32.v v2,0(a2)
  vfsgnjx.vv      v1,v1,v2
  vse32.v v1,0(a0)
  ret

This SGNJX autovec acts on function call copysignf/copysignf
in math.h too. And it depends on the option -ffast-math.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (xorsign<mode>3): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: New macro.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnjx-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnjx-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit 23224f06c980533d474b3a29d2437e5537916fc0)

fix PR 111259 invalid zcmp mov predicate.

The code changes are from Palmer.

root cause:
In a gcc build with --enable-checking=yes, REGNO (op) checks
rtx code and expected code 'reg'. so a rtx with 'subreg' causes
an internal compiler error.

solution:
Restrict predicate to allow 'reg' only.

gcc/ChangeLog:

* config/riscv/predicates.md: Restrict predicate
to allow 'reg' only.
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
(cherry picked from commit d7b6020276a843e97f6135259b4ab3b53a5850e2)

RISC-V: Fix using wrong mode to get reduction insn vlmax

This patch fix using wrong mode when emit vlmax reduction insn. We should
use src operand instead dest operand (which always LMUL=m1) to get the vlmax
length. This patch alse remove dest_mode and mask_mode from insn_expander
constructor, which can be geted by insn_flags.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_flags): Change name.
(enum insn_type): Ditto.
* config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): Removed.
(emit_vlmax_insn): Adjust.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_insn_lra): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c: New test.

(cherry picked from commit dd6e5d29cbdbed25e4e52e5f06b1bfa835aab215)

test: Block SLP check of slp-35.c for vect_strided5

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-35.c: Block SLP check for vect_strided5 targets.

(cherry picked from commit b259284ee135a432e0097d923d0908350f74f468)

test: Block SLP check of slp-34.c for vect_strided5

Since RISC-V use vsseg5 which is the vect_store_lanes with stride 5
if failed on RISC-V.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-34.c: Block check for vect_strided5.

(cherry picked from commit 5c7c359c907852c4c374e85d4f8a392fd960e98e)

test: Block vect_strided5 for slp-34-big-array.c SLP check

If failed on RISC-V since it use vect_store_lanes with array 5.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-34-big-array.c: Block SLP check for vect_strided5.

(cherry picked from commit 16c5d0f0c4c6f72bdfc01c640f11a845530c4d3d)

test: Block slp-16.c check for target support vect_strided6

This testcase FAIL in RISC-V because RISC-V support vect_load_lanes with 6.
FAIL: gcc.dg/vect/slp-16.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-16.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

Since it use vlseg6 (vect_load_lanes with array size = 6)

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-16.c: Block vect_strided6.
* lib/target-supports.exp: Add strided type.

(cherry picked from commit 9b80311cdc685e6f27cf4f8625ac3d24dcc59d7f)

test: Isolate slp-1.c check of target supports vect_strided5

This test failed in RISC-V:
FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4

Because this loop:
  /* SLP with unrolling by 8.  */
  for (i = 0; i < N; i++)
    {
      out[i*5] = 8;
      out[i*5 + 1] = 7;
      out[i*5 + 2] = 81;
      out[i*5 + 3] = 28;
      out[i*5 + 4] = 18;
    }

is using vect_load_lanes with array size = 5.
instead of SLP.

When we adjust the COST of LANES load store, then it will use SLP.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-1.c: Add vect_stried5.

(cherry picked from commit 0854ebea63f59eb678ebf4440afe1d18ed5bb6d4)

test: Remove XPASS for RISCV

Like ARM SVE, this test cause FAILs of XPASS:
XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 72)
XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 77)
XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 note (test for warnings, line 68)

on RISC-V

gcc/testsuite/ChangeLog:

* gcc.dg/Wstringop-overflow-47.c: Add riscv.

(cherry picked from commit e1ec05b800e2ee9f2dfc8f99b1c5622103f52cd5)

RISC-V: Refactor expand_reduction and cleanup enum reduction_type

This patch refactors expand_reduction, remove the reduction_type argument
and add insn_flags argument to determine the passing of the operands.
ops has also been modified to restrict it to only two cases and to remove
operand that are not in use.

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Adjust.
* config/riscv/autovec.md: Ditto.
* config/riscv/riscv-protos.h (enum class): Delete enum reduction_type.
(expand_reduction): Adjust expand_reduction prototype.
* config/riscv/riscv-v.cc (need_mask_operand_p): New helper function.
(expand_reduction): Refactor expand_reduction.

(cherry picked from commit e6413b5dc5b786391802368207ec86945eef2ae0)

RISC-V: Support combine extend and reduce sum to widen reduce sum

This patch add combine pattern to combine extend and reduce sum
to widen reduce sum. The pattern in autovec.md was adjusted as
needed. Note that the current vectorization cannot generate reduce
operand which is LMUL=M8, because this means that we need an LMUL=M16
for the extended operand, which is currently not possible. So I've
added VI_QHS_NO_M8 and VF_HS_NO_M8 mode iterator, which exclude
mode which is LMUL=M8.

PR target/111381

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*reduc_plus_scal_<mode>):
New combine pattern.
(*fold_left_widen_plus_<mode>): Ditto.
(*mask_len_fold_left_widen_plus_<mode>): Ditto.
* config/riscv/autovec.md (reduc_plus_scal_<mode>):
Change from define_expand to define_insn_and_split.
(fold_left_plus_<mode>): Ditto.
(mask_len_fold_left_plus_<mode>): Ditto.
* config/riscv/riscv-v.cc (expand_reduction):
Support widen reduction.
* config/riscv/vector-iterators.md (UNSPEC_WREDUC_SUM):
Add new iterators and attrs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen_reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_order-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_order-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_order_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_order_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_run-1.c: New test.

(cherry picked from commit 68cb873fd360dbb64f2a6dfb28e79399ff99d07d)

[RA]: Improve cost calculation of pseudos with equivalences

RISCV target developers reported that RA can spill pseudo used in a
loop although there are enough registers to assign.  It happens when
the pseudo has an equivalence outside the loop and the equivalence is
not merged into insns using the pseudo.  IRA sets up that memory cost
to zero when the pseudo has an equivalence and it means that the
pseudo will be probably spilled.  This approach worked well for i686
(different approaches were benchmarked long time ago on spec2k).
Although common sense says that the code is wrong and this was
confirmed by RISCV developers.

I've tried the following patch on I7-9700k and it improved spec17 fp
by 1.5% (21.1 vs 20.8) although spec17 int is a bit worse by 0.45%
(8.54 vs 8.58).  The average generated code size is practically the
same (0.001% difference).

In the future we probably need to try more sophisticated cost
calculation which should take into account that the equiv can not be
combined in usage insns and the costs of reloads because of this.

gcc/ChangeLog:

* ira-costs.cc (find_costs_and_classes): Decrease memory cost
by equiv savings.

(cherry picked from commit 3c834d85f2ec42c60995c2b678196a06cb744959)

RISC-V: Refactor vector reduction patterns

This patch adjust reduction patterns struct, change it from:
           (any_reduc:VI
             (vec_duplicate:VI
               (vec_select:<VEL>
                 (match_operand:<V_LMUL1> 4 "register_operand"      "   vr,   vr")
                 (parallel [(const_int 0)])))
             (match_operand:VI           3 "register_operand"      "   vr,   vr"))
to:
           (unspec:<V_LMUL1> [
             (match_operand:VI            3 "register_operand"      "   vr,   vr")
             (match_operand:<V_LMUL1>     4 "register_operand"      "   vr,   vr")
           ] ANY_REDUC)

The reason for the change is that the semantics of the previous pattern is incorrect.
GCC does not have a standard rtx code to express the reduction calculation process.
It makes more sense to use UNSPEC.

Further, all reduction icode are geted by the UNSPEC and MODE (code_for_pred (unspec, mode)),
so that all reduction patterns can have a uniform icode name. After this adjust, widen_reducop
and widen_freducop are redundant.

gcc/ChangeLog:

* config/riscv/autovec.md: Change rtx code to unspec.
* config/riscv/riscv-protos.h (expand_reduction): Change prototype.
* config/riscv/riscv-v.cc (expand_reduction): Change prototype.
* config/riscv/riscv-vector-builtins-bases.cc (class widen_reducop):
Removed.
(class widen_freducop): Removed.
* config/riscv/vector-iterators.md (minu): Add reduc unspec, iterators, attrs.
* config/riscv/vector.md (@pred_reduc_<reduc><mode>): Change name.
(@pred_<reduc_op><mode>): New name.
(@pred_widen_reduc_plus<v_su><mode>): Change name.
(@pred_reduc_plus<order><mode>): Change name.
(@pred_widen_reduc_plus<order><mode>): Change name.

(cherry picked from commit 6223ea766daf7c9155106b9784302442e2ff98d3)

RISC-V: Cleanup redundant reduction patterns after refactor vector mode

This patch cleanups redundant reduction patterns after Juzhe change vector mode
from fixed-size to scalable-size. For example, whether it is zvl32b, zvl64b,
zvl128b, RVVM1SI indicates that it occupies a vector register. Therefore, it is
easy to map vector modes to LMUL1 vector modes with define_mode_attr without
creating a separate pattern for each LMUL1 Mode. For example, this patch can
combine four patterns (@pred_reduc_<reduc><VQI:mode><VQI_LMUL1:mode>,
@pred_reduc_<reduc><VHI:mode><VHI_LMUL1:mode>
@pred_reduc_<reduc><VSI:mode><VSI_LMUL1:mode>,
@pred_reduc_<reduc><VDI:mode><VDI_LMUL1:mode>) to a single pattern
@pred_reduc_<reduc><mode>.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_reduction): Adjust call.
* config/riscv/riscv-vector-builtins-bases.cc: Adjust call.
* config/riscv/vector-iterators.md: New iterators and attrs.
* config/riscv/vector.md (@pred_reduc_<reduc><VQI:mode><VQI_LMUL1:mode>):
Removed.
(@pred_reduc_<reduc><VHI:mode><VHI_LMUL1:mode>): Removed.
(@pred_reduc_<reduc><VSI:mode><VSI_LMUL1:mode>): Removed.
(@pred_reduc_<reduc><VDI:mode><VDI_LMUL1:mode>): Removed.
(@pred_reduc_<reduc><mode>): Added.
(@pred_widen_reduc_plus<v_su><VQI:mode><VHI_LMUL1:mode>): Removed.
(@pred_widen_reduc_plus<v_su><VHI:mode><VSI_LMUL1:mode>): Removed.
(@pred_widen_reduc_plus<v_su><mode>): Added.
(@pred_widen_reduc_plus<v_su><VSI:mode><VDI_LMUL1:mode>): Removed.
(@pred_reduc_<reduc><VHF:mode><VHF_LMUL1:mode>): Removed.
(@pred_reduc_<reduc><VSF:mode><VSF_LMUL1:mode>): Removed.
(@pred_reduc_<reduc><VDF:mode><VDF_LMUL1:mode>): Removed.
(@pred_reduc_plus<order><VHF:mode><VHF_LMUL1:mode>): Removed.
(@pred_reduc_plus<order><VSF:mode><VSF_LMUL1:mode>): Removed.
(@pred_reduc_plus<order><mode>): Added.
(@pred_reduc_plus<order><VDF:mode><VDF_LMUL1:mode>): Removed.
(@pred_widen_reduc_plus<order><VHF:mode><VSF_LMUL1:mode>): Removed.
(@pred_widen_reduc_plus<order><VSF:mode><VDF_LMUL1:mode>): Removed.
(@pred_widen_reduc_plus<order><mode>): Added.

(cherry picked from commit 14c481f7fc0a90de7e5b7aec109e12b7b5220d65)

RISC-V: Support VLS modes mask operations

This patch support mask operations (comparison and logical).

This patch reduce these FAILs of "vect" testsuite:
FAIL: gcc.dg/vect/vect-bic-bitmask-12.c -flto -ffat-lto-objects scan-tree-dump dce7 "<=\\s*.+{ 255,.+}"
FAIL: gcc.dg/vect/vect-bic-bitmask-12.c scan-tree-dump dce7 "<=\\s*.+{ 255,.+}"
FAIL: gcc.dg/vect/vect-bic-bitmask-23.c -flto -ffat-lto-objects scan-tree-dump dce7 "<=\\s*.+{ 255, 15, 1, 65535 }"
FAIL: gcc.dg/vect/vect-bic-bitmask-23.c scan-tree-dump dce7 "<=\\s*.+{ 255, 15, 1, 65535 }"

Full regression passed (with reducing 4 FAILs).

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add VLS mask modes.
* config/riscv/autovec.md (@vcond_mask_<mode><vm>): Remove @.
(vcond_mask_<mode><vm>): Add VLS mask modes.
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS tests.
* gcc.target/riscv/rvv/autovec/vls/cmp-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/mask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/mask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/mask-3.c: New test.

(cherry picked from commit 8ebb02dd6c9d190c84bf40259201e8e7327291f8)

RISC-V: Fix ICE in get_avl_or_vl_reg

update v1 -> v2: Add available fortran compiler check in rvv-fortran.exp.

This patch fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111395 ICE

update v2 -> v3: Remove redundant format.

PR target/111395

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (avl_info::operator==): Fix ICE.
(vector_insn_info::global_merge): Ditto.
(vector_insn_info::get_avl_or_vl_reg): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fortran/pr111395.f90: New test.
* gcc.target/riscv/rvv/rvv-fortran.exp: New test.

(cherry picked from commit 53ad1bd520759580b9a5cc590a81a1a30b9e2e28)

RISC-V: Format VSETVL PASS code

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::global_eliminate_vsetvl_insn): Format it.

(cherry picked from commit 7c4f6ebe54f4da0097acd07f41782ff6cc39e9a4)

RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization

This patch support VLS modes VEC_EXTRACT to fix PR111391:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391

I need VLS modes VEC_EXTRACT to fix this issue.

I have run the whole gcc testsuite, notice this patch increase these 4 FAILs:
FAIL: c-c++-common/vector-subscript-4.c  -std=gnu++14  scan-tree-dump-not optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c  -std=gnu++17  scan-tree-dump-not optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c  -std=gnu++20  scan-tree-dump-not optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c  -std=gnu++98  scan-tree-dump-not optimized "vector"

After analysis and comparing with LLVM:
https://godbolt.org/z/ozhfKhj5Y

with this patch, GCC generate similar codegen like LLVM (Previously it can not be vectorized).

This patch is the prerequisite patch to fix an ICE.

So let's ignore those increased 4 dump IR FAILs since ICE is un-acceptable wheras dump FAILs are acceptable (But we should remember and eventually fix dump IR FAILs too).

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract<mode><vel>): Add VLS modes.
(@vec_extract<mode><vel>): Ditto.
* config/riscv/vector.md: Ditto

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add more def.
* gcc.target/riscv/rvv/autovec/vls/extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/extract-2.c: New test.

(cherry picked from commit d03773c8efea216c67b3ac8870fcac2662c522fe)

RISC-V: Support cond vmulh.vv and vmulu.vv autovec patterns

This patch adds combine patterns to combine vmulh[u].vv + vcond_mask
to mask vmulh[u].vv. For vmulsu.vv, it can not be produced in midend
currently. We will send another patch to take this issue.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<mulh_table><mode>3_highpart):
New combine pattern.
* config/riscv/autovec.md (smul<mode>3_highpart): Mrege smul and umul.
(<mulh_table><mode>3_highpart): Merged pattern.
(umul<mode>3_highpart): Mrege smul and umul.
* config/riscv/vector-iterators.md (umul): New iterators.
(UNSPEC_VMULHU): New iterators.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_mulh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_mulh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-2.c: New test.

(cherry picked from commit c0a70df6403397a69204cba1df82114a9ddf7076)

RISC-V: Support cond vnsrl/vnsra autovec patterns

This patch add combine patterns to combine vnsra.w[vxi] + vcond_mask
to a mask vnsra.w[vxi].

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_v<any_shiftrt:optab><any_extend:optab>trunc<mode>):
New combine pattern.
(*cond_<any_shiftrt:optab>trunc<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c: New test.

(cherry picked from commit 842e4d51c11ff5ac842d925e73f4094901f4a9be)

RISC-V: Support cond vfsgnj.vv autovec patterns

This patch add combine patterns to combine vfsgnj.vv + vcond_mask
to mask vfsgnj.vv. For vfsgnjx.vv, it can not be produced in midend
currently. We will send another patch to take this issue.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*copysign<mode>_neg): Move.
(*cond_copysign<mode>): New combine pattern.
* config/riscv/riscv-v.cc (needs_fp_rounding): Extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_copysign-run.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-template.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-zvfh-run.c: New test.

(cherry picked from commit 6737a51728881790b54e490494b468267f04a608)

RISC-V: Bugfix PR111362 for incorrect frm emit

When the mode switching from NONE to CALL, we will restore the
frm but lack some check if we have static frm insn in cfun.

This patch would like to fix this by adding static frm insn check.

PR target/111362

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_emit_frm_mode_set): Bugfix.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/no-honor-frm-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit feb23a37e6142016c3463aa3be3e900d45bc3ea5)

RISC-V: Remove redundant ABI test

We only support and report warning for RVV types.

We don't report warning for GNU vectors.
So this testcase checking is incorrect and the FAIL is bogus.

Remove it and commit it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vector-abi-9.c: Removed.

(cherry picked from commit 20268ad194d3c00d1182af345875ce63a4a9c762)

RISC-V: Enable vec_int testsuite for RVV VLA vectorization

This patch is the final version of enabling vect_int test for RVV.

There are still 80+ FAILs and they can't be fixed by adjusting testcases or target-supports.exp

Here is the analysis of **ALL** FAILs:

1. REAL highest priority FAILs:

ICE:

FAIL: gcc.dg/vect/vect-live-6.c (internal compiler error: in force_align_down_and_div, at poly-int.h:1903)
FAIL: gcc.dg/vect/vect-live-6.c (test for excess errors)
FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (internal compiler error: in force_align_down_and_div, at poly-int.h:1903)
FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (test for excess errors)

Execution fails:
FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-reduc-7.c execution test
FAIL: gcc.dg/vect/vect-alias-check-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-10.c execution test
FAIL: gcc.dg/vect/vect-alias-check-11.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-11.c execution test
FAIL: gcc.dg/vect/vect-alias-check-12.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-12.c execution test
FAIL: gcc.dg/vect/vect-alias-check-14.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-14.c execution test
FAIL: gcc.dg/vect/vect-double-reduc-5.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-double-reduc-5.c execution test

These FAILs are REAL problem that we need to address first.

2. Missed optimizations due to lacking VLS modes patterns:

FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 0 loops in function" 2
FAIL: gcc.dg/vect/pr65518.c scan-tree-dump-times vect "vectorized 0 loops in function" 2
FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-16.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-16.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-34-big-array.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-34-big-array.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-34.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-34.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-35.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-35.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-43.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 13
FAIL: gcc.dg/vect/slp-43.c scan-tree-dump-times vect "vectorized 1 loops" 13
FAIL: gcc.dg/vect/slp-45.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 13
FAIL: gcc.dg/vect/slp-45.c scan-tree-dump-times vect "vectorized 1 loops" 13
FAIL: gcc.dg/vect/slp-47.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-47.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-48.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-48.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

These testcases need VLS modes vec_init patterns.

FAIL: gcc.dg/vect/vect-bic-bitmask-12.c -flto -ffat-lto-objects  scan-tree-dump dce7 "<=\\s*.+{ 255,.+}"
FAIL: gcc.dg/vect/vect-bic-bitmask-12.c scan-tree-dump dce7 "<=\\s*.+{ 255,.+}"
FAIL: gcc.dg/vect/vect-bic-bitmask-23.c -flto -ffat-lto-objects  scan-tree-dump dce7 "<=\\s*.+{ 255, 15, 1, 65535 }"
FAIL: gcc.dg/vect/vect-bic-bitmask-23.c scan-tree-dump dce7 "<=\\s*.+{ 255, 15, 1, 65535 }"

These testcases need VLS modes VCOND_MASK and vec_cmp patterns.

3. Maybe bogus dump check FAILs:

FAIL: gcc.dg/vect/vect-multitypes-11.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-outer-4c-big-array.c -flto -ffat-lto-objects  scan-tree-dump-times vect "zero step in outer loop." 1
FAIL: gcc.dg/vect/vect-outer-4c-big-array.c scan-tree-dump-times vect "zero step in outer loop." 1
FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-s8a.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-s8a.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-s8a.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-s8a.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-u16b.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-u16b.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-u8a.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-u8a.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-u8b.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-dot-u8b.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-1a.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-1a.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-1b-big-array.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-1b-big-array.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-2a.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-2a.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-2b-big-array.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/vect-reduc-pattern-2b-big-array.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1

These testcases because we don't support widen_sum/vec_unpack....etc patterns.
Currently, we don't support them since we don't see the benefits.
May support those patterns if they are beneficial ? Or Fix testcases ?

Conclusion:

IMHO, I think we can merge this patch after we addressed all REAL highest priority issues (1).

The rest FAILs are not big issues then we can reduce them by supporting more features (For example VLS modes).

Feel free to give any comments.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Enable vect_int for RVV.

(cherry picked from commit fcf66bceb4670fcd6ed8efef7f64003354e609f1)

RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]

As this PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337

We support VECTOR BOOL vcond_mask to fix this following ICE:
0x1a9e309 gimple_expand_vec_cond_expr
../../../../gcc/gcc/gimple-isel.cc:283
0x1a9ea56 execute
../../../../gcc/gcc/gimple-isel.cc:390

gcc/ChangeLog:

PR target/111337
* config/riscv/autovec.md (vcond_mask_<mode><mode>): New pattern.

(cherry picked from commit 701b9309b687ed46188b9caeb7d88ad60b0212e5)

RISC-V: Finish Typing Un-Typed Instructions and Turn on Assert

Updates autovec instruction that was added after last patch and turns on the
assert statement to ensure all new instructions have a type.

* config/riscv/autovec-opt.md: Update type
* config/riscv/riscv.cc (riscv_sched_variable_issue): Enable assert

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit 360c8cad6a727d5afd43017ca1ce9a84c6db61c5)

RISC-V: Remove unused structure in cost model

The struct range is unused, remove it.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.h (struct range): Removed.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit 75f069a6403b5d4217fb5b654a9c656b4dca9dc1)

RISC-V: Support Dynamic LMUL Cost model

This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic.

Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
      int32_t *__restrict d,
      int32_t *__restrict d2,
      int32_t *__restrict d3,
      int32_t *__restrict d4,
      int32_t *__restrict d5,
      int n)
{
  for (int i = 0; i < n; i++)
    {
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      d2[i] = a2[i] + c2[i];
      d3[i] = a3[i] + c3[i];
      d4[i] = a4[i] + c4[i];
      d5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i] + a[i];

      c2[i] = a[i] + c[i];
      c3[i] = b5[i] * a5[i];
      c4[i] = a2[i] * a3[i];
      c5[i] = b5[i] * a2[i];
      c[i] = a[i] + c3[i];
      c2[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i]
      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
    }
}

Demo: https://godbolt.org/z/x1acoMxGT

You can see it will produce register spilling if you specify LMUL >= 4

Now, with --param=riscv-autovec-lmul=dynamic.

GCC is able to pick LMUL = 2 to optimized this case.

This feature is supported by linear scan based local live ranges analysis and
compute maximum live V_REGS in specific program point of the function to determine the VF/LMUL.

Note that this patch can well handle both SLP and non-SLP loop.

Currenty approach didn't consider the later instruction scheduler which may improve the register pressure.
In this case, we are conservatively applying smaller VF/LMUL. (Not sure whether we should support live range shrink for such corner case since we don't known whether it can improve performance a lot.)

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (get_last_live_range): New function.
(compute_nregs_for_mode): Ditto.
(live_range_conflict_p): Ditto.
(max_number_of_live_regs): Ditto.
(compute_lmul): Ditto.
(costs::prefer_new_lmul_p): Ditto.
(costs::better_main_loop_than_p): Ditto.
* config/riscv/riscv-vector-costs.h (struct stmt_point): New struct.
(struct var_live_range): Ditto.
(struct autovec_info): Ditto.
* config/riscv/t-riscv: Update makefile for COST model.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: New test.

(cherry picked from commit af6d089b4ded96b529c7063bd7873e4630a3a64d)

riscv: Add support for str(n)cmp inline expansion

This patch implements expansions for the cmpstrsi and cmpstrnsi
builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
instructions are available.  The expansion basically emits a comparison
sequence which compares XLEN bits per step if possible.

This allows to inline calls to strcmp() and strncmp() if both strings
are xlen-aligned.  For strncmp() the length parameter needs to be known.
The benefits over calls to libc are:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment tests

The inlining mechanism is gated by a new switches ('-minline-strcmp' and
'-minline-strncmp') and by the variable 'optimize_size'.
The amount of emitted unrolled loop iterations can be controlled by the
parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.

The comparision sequence is inspired by the strcmp example
in the appendix of the Bitmanip specification (incl. the fast
result calculation in case the first word does not contain
a NULL byte).  Additional inspiration comes from rs6000-string.c.

The emitted sequence is not triggering any readahead pagefault issues,
because only aligned strings are accessed by aligned xlen-loads.

This patch has been tested using the glibc string tests on QEMU:
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
* rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/bitmanip.md (*<optab>_not<mode>): Export INSN name.
(<optab>_not<mode>3): Likewise.
* config/riscv/riscv-protos.h (riscv_expand_strcmp): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
macros.
(GEN_EMIT_HELPER2): Likewise.
(emit_strcmp_scalar_compare_byte): New function.
(emit_strcmp_scalar_compare_subword): Likewise.
(emit_strcmp_scalar_compare_word): Likewise.
(emit_strcmp_scalar_load_and_compare): Likewise.
(emit_strcmp_scalar_call_to_libc): Likewise.
(emit_strcmp_scalar_result_calculation_nonul): Likewise.
(emit_strcmp_scalar_result_calculation): Likewise.
(riscv_expand_strcmp_scalar): Likewise.
(riscv_expand_strcmp): Likewise.
* config/riscv/riscv.md (*slt<u>_<X:mode><GPR:mode>): Export
INSN name.
(@slt<u>_<X:mode><GPR:mode>3): Likewise.
(cmpstrnsi): Invoke expansion function for str(n)cmp.
(cmpstrsi): Likewise.
* config/riscv/riscv.opt: Add new parameter
'-mstring-compare-inline-limit'.
* doc/invoke.texi: Document new parameter
'-mstring-compare-inline-limit'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-strcmp.c: New test.
* gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
* gcc.target/riscv/zbb-strcmp-disabled.c: New test.
* gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
* gcc.target/riscv/zbb-strcmp.c: New test.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
(cherry picked from commit 949f1ccf1ba9d1f33ca3809424e97429b717950a)

riscv: Add support for strlen inline expansion

This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
The inserted sequences are:

rv32gc_zbb (RV64 is similar):
      add     a3,a0,4
      li      a4,-1
.L1:  lw      a5,0(a0)
      add     a0,a0,4
      orc.b   a5,a5
      beq     a5,a4,.L1
      not     a5,a5
      ctz     a5,a5
      srl     a5,a5,0x3
      add     a0,a0,a5
      sub     a0,a0,a3

rv64gc_xtheadbb (RV32 is similar):
      add       a4,a0,8
.L2:  ld        a5,0(a0)
      add       a0,a0,8
      th.tstnbz a5,a5
      beqz      a5,.L2
      th.rev    a5,a5
      th.ff1    a5,a5
      srl       a5,a5,0x3
      add       a0,a0,a5
      sub       a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test

The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.

Tested using the glibc string tests.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config.gcc: Add new object riscv-string.o.
riscv-string.cc.
* config/riscv/riscv-protos.h (riscv_expand_strlen):
New function.
* config/riscv/riscv.md (strlen<mode>): New expand INSN.
* config/riscv/riscv.opt: New flag 'minline-strlen'.
* config/riscv/t-riscv: Add new object riscv-string.o.
* config/riscv/thead.md (th_rev<mode>2): Export INSN name.
(th_rev<mode>2): Likewise.
(th_tstnbz<mode>2): New INSN.
* doc/invoke.texi: Document '-minline-strlen'.
* emit-rtl.cc (emit_likely_jump_insn): New helper function.
(emit_unlikely_jump_insn): Likewise.
* rtl.h (emit_likely_jump_insn): New prototype.
(emit_unlikely_jump_insn): Likewise.
* config/riscv/riscv-string.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
* gcc.target/riscv/xtheadbb-strlen.c: New test.
* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
* gcc.target/riscv/zbb-strlen-disabled.c: New test.
* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
* gcc.target/riscv/zbb-strlen.c: New test.

(cherry picked from commit df48285b2484eb4f8e0570c566677114eb0e553a)

RISC-V: Add missed cond autovec testcases

This patch adds all missed cond autovec testcases. For not support
cond patterns, the following patches will be sent to fix it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vrem op.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-1.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-2.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-3.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-4.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: Moved to...
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max_run-5.c: ...here.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: Removed.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_complicate-9.c: New test.

(cherry picked from commit 5e19f8991579f70aeccbe4003b7f8c914ce7f338)

RISC-V: Elimilate warning in class vcreate

The following is the content of class vcreate:
class vcreate : public function_base
{
public:
  gimple *fold (gimple_folder &f) const override
  {
    ....
  }

  rtx expand (function_expander &e) const override
  {
    return NULL_RTX;
  }
};

The warning caused is:
./riscv-gcc/gcc/config/riscv/riscv-vector-builtins-bases.cc:1719:34:
  warning: unused parameter 'e' [-Wunused-parameter]
  rtx expand (function_expander &e) const override
                                 ^

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: remove unused
parameter e and replace NULL_RTX with gcc_unreachable.

(cherry picked from commit b90a4c3dd502974f352084c23a6cdfd767e1340b)

RISC-V: Add vcreate intrinsics for RVV tuple types

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (class vcreate): New class.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vcreate): Add vcreate support.
* config/riscv/riscv-vector-builtins-shapes.cc (struct vcreate_def): Ditto.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc: Add args type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple_create.c: New test.

(cherry picked from commit c1e4efd8ae3488c5a2c11ac42d4670b67e1f7bf4)

RISC-V: enable muti push and pop for Zcmp when shrink-wrap-separate is ineffective

So that zcmp can be enabled in -Os where
shrink-wrap-separate is not effective.

To force enabling zcmp multi push/pop in speed perfered case,
fno-shrink-wrap-separate has to be explictly given.

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_avoid_shrink_wrapping_separate): wrap the condition check in
riscv_avoid_shrink_wrapping_separate.
(riscv_avoid_multi_push):avoid multi push if shrink_wrapping_separate
is active.
(riscv_get_separate_components):call riscv_avoid_shrink_wrapping_separate

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: remove -fno-shrink-wrap-separate
* gcc.target/riscv/rv32i_zcmp.c: likewise
* gcc.target/riscv/zcmp_push_fpr.c: likewise
* gcc.target/riscv/zcmp_stack_alignment.c: likewise
* gcc.target/riscv/zcmp_shrink_wrap_separate.c: New test.
* gcc.target/riscv/zcmp_shrink_wrap_separate2.c: New test.

(cherry picked from commit 721021a18e2ac004140ddd93113c11075ea890c6)

Allow targets to check shrink-wrap-separate enabled or not

No functional changes but restructure and expose use_shrink_wrapping_separate
to the TARGETs.

gcc/ChangeLog:

* shrink-wrap.cc (try_shrink_wrapping_separate):call
use_shrink_wrapping_separate.
(use_shrink_wrapping_separate): wrap the condition
check in use_shrink_wrapping_separate.
* shrink-wrap.h (use_shrink_wrapping_separate): add to extern

(cherry picked from commit 66d89a43a7b6bafca1d4675744808be53ef2736f)

RISC-V: Add Types to Un-Typed Thead Instructions

Updates the THEAD instructions to ensure that no insn is left
without a type attribute.

Tested for regressions using rv32/64 multilib for linux/newlib.

gcc/Changelog:

* config/riscv/thead.md: Update types

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit 316d57da5bb9205b946afc56d78582fee874e4b5)

RISC-V: Update Types for RISC-V Instructions

Adds types to riscv instructions that were added or were
missed by the original patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628996.html

gcc/ChangeLog:

* config/riscv/riscv.md: Update types

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit 25c30049f5896ef6312cf45a1c058ee3e3079e6a)

RISC-V: Add Types to Un-Typed Zicond Instructions

Creates a new "zicond" type and updates all zicond instructions
with that type.

gcc/ChangeLog:

* config/riscv/riscv.md: Add "zicond" type
* config/riscv/zicond.md: Update types

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit 4074aede45e3d8fbdb8fe28e1f084e869d3546f5)

RISC-V: Add Types for Un-Typed zc Instructions

Adds types to the untyped zc instructions. Creates a new
types "pushpop" and "mvpair" for now

gcc/ChangeLog:

* config/riscv/riscv.md: Add "pushpop" and "mvpair" types
* config/riscv/zc.md: Update types

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit d8751d9e32214380e6fdbb9e47f13192cc899469)

RISC-V: Update Types for Vector Instructions

Adds types to vector instructions that were added after or were
missed by the original patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628594.html

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Update types
* config/riscv/autovec.md: likewise

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit aa512cc0146d1be957ccc35a8f4a45ebad0de598)

RISC-V: Enable RVV scalable vectorization by default[PR111311]

This patch is not ready but they all will be fixed very soon.

gcc/ChangeLog:

PR target/111311
* config/riscv/riscv.opt: Set default as scalable vectorization.

(cherry picked from commit 88a0a883960910530bfefa750461168f539f4a00)

RISC-V: Remove redundant functions

I just finished V2 version of LMUL cost model.
Turns out we don't these redundant functions.

Remove them.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (get_all_predecessors): Remove.
(get_all_successors): Ditto.
* config/riscv/riscv-v.cc (get_all_predecessors): Ditto.
(get_all_successors): Ditto.

(cherry picked from commit 48d4ab698036de859e194edc037faed2ef9b58a5)

RISC-V: Use dominance analysis in global vsetvl elimination

I found that it's more reasonable to use existing dominance analysis.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::global_eliminate_vsetvl_insn):
Use dominance analysis.
(pass_vsetvl::init): Ditto.
(pass_vsetvl::done): Ditto.

(cherry picked from commit 7f9083ffe262cb14c49d042fc6363514badea6cb)

RISC-V: Add VLS modes VEC_PERM support[PR111311]

This patch add VLS modes VEC_PERM support which fix these following
FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311:

FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 1

These FAILs are fixed after this patch.

PR target/111311

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS modes.
* config/riscv/riscv-protos.h (cmp_lmul_le_one): New function.
(cmp_lmul_gt_one): Ditto.
* config/riscv/riscv-v.cc (cmp_lmul_le_one): Ditto.
(cmp_lmul_gt_one): Ditto.
* config/riscv/riscv.cc (riscv_print_operand): Add VLS modes.
(riscv_vectorize_vec_perm_const): Ditto.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-7.c: New test.

(cherry picked from commit d05aac047e0643d5c32b706c4c3b12e13f35e19a)

RISC-V: Add missing VLS mask bool mode reg -> reg patterns

Committed.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (*mov<mode>_vls): New pattern.
* config/riscv/vector-iterators.md: New iterator

(cherry picked from commit 4ab2520ec424fa097ec839f2cde33522b220e93a)

RISC-V: Expand fixed-vlmax/vls vector permutation in targethook

When debugging FAIL: gcc.dg/pr92301.c execution test.
Realize a vls vector permutation situation failed to vectorize since early return false:

-  /* For constant size indices, we dont't need to handle it here.
-     Just leave it to vec_perm<mode>.  */
-  if (d->perm.length ().is_constant ())
-    return false;

To avoid more potential failed vectorization case. Now expand it in targethook.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_generic_patterns): Expand
fixed-vlmax/vls vector permutation.

(cherry picked from commit 108779056eb4b56e715a094fac48a699d2dc91b3)

RISC-V: Avoid unnecessary slideup in compress pattern of vec_perm

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_compress_patterns): Avoid unnecessary slideup.

(cherry picked from commit e390872aebcfebb7c9bc95d8fb7e44f2bec996d3)

RISC-V: Fix dump FILE of VSETVL PASS[PR111311]

To make the dump FILE not too big, add TDF_DETAILS.

This patch fix these following FAILs in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

FAIL: gcc.c-torture/unsorted/dump-noaddr.c.*r.vsetvl, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions comparison
FAIL: gcc.c-torture/unsorted/dump-noaddr.c.*r.vsetvl, -O3 -g comparison

gcc/ChangeLog:

PR target/111311
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::vsetvl_fusion): Add TDF_DETAILS.
(pass_vsetvl::pre_vsetvl): Ditto.
(pass_vsetvl::init): Ditto.
(pass_vsetvl::lazy_vsetvl): Ditto.

(cherry picked from commit 0d50facd937bda26e3083046dc5dec8fca47e1e6)

RISC-V: Fix VLS floating-point operations predicate

VLS vfadd should depend on ZVFH instead of ZVFHMIN.
Obvious fix and committed.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix floating-point operations predicate.

(cherry picked from commit df9a25384e6c484643b48b59b4e6e07504889b61)

Support folding min(poly,poly) to const

This patch adds support that tries to fold `MIN (poly, poly)` to
a constant. Consider the following C Code:

```
void foo2 (int* restrict a, int* restrict b, int n)
{
    for (int i = 0; i < 3; i += 1)
      a[i] += b[i];
}
```

Before this patch:

```
void foo2 (int * restrict a, int * restrict b, int n)
{
  vector([4,4]) int vect__7.27;
  vector([4,4]) int vect__6.26;
  vector([4,4]) int vect__4.23;
  unsigned long _32;

  <bb 2> [local count: 268435456]:
  _32 = MIN_EXPR <3, POLY_INT_CST [4, 4]>;
  vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, _32, 0);
  vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, _32, 0);
  vect__7.27_9 = vect__6.26_15 + vect__4.23_20;
  .MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, _32, 0, vect__7.27_9); [tail call]
  return;

}
```

After this patch:

```
void foo2 (int * restrict a, int * restrict b, int n)
{
  vector([4,4]) int vect__7.27;
  vector([4,4]) int vect__6.26;
  vector([4,4]) int vect__4.23;

  <bb 2> [local count: 268435456]:
  vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, 3, 0);
  vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, 3, 0);
  vect__7.27_9 = vect__6.26_15 + vect__4.23_20;
  .MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, 3, 0, vect__7.27_9); [tail call]
  return;

}
```

For RISC-V RVV, csrr and branch instructions can be reduced:

Before this patch:

```
foo2:
        csrr    a4,vlenb
        srli    a4,a4,2
        li      a5,3
        bleu    a5,a4,.L5
        mv      a5,a4
.L5:
        vsetvli zero,a5,e32,m1,ta,ma
        ...
```

After this patch.

```
foo2:
vsetivli zero,3,e32,m1,ta,ma
        ...
```

gcc/ChangeLog:

* fold-const.cc (can_min_p): New function.
(poly_int_binop): Try fold MIN_EXPR.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/div-1.c: Adjust.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/fold-min-poly.c: New test.

(cherry picked from commit 7547f65f60c0bbf8de704c569c92c7a0e31a6175)

riscv: xtheadbb: Fix extendqi<SUPERQI> insn

Recently three SPEC CPU 2017 benchmarks broke when using xtheadbb:
* 500.perlbench_r
* 525.x264_r
* 557.xz_r

Tracing the issue down revealed, that we emit a 'th.ext xN,xN,15,0'
for a extendqi<SUPERQI> insn, which is obviously wrong.
This patch splits the common 'extend<SHORT:mode><SUPERQI:mode>2_th_ext'
insn into two 'extendqi<SUPERQI>' and 'extendhi<SUPERQI>' insns,
which emit the right extension instruction.
Additionally, this patch adds test cases for these insns.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext):
Remove broken INSN.
(*extendhi<SUPERQI:mode>2_th_ext): New INSN.
(*extendqi<SUPERQI:mode>2_th_ext): New INSN.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-ext-2.c: New test.
* gcc.target/riscv/xtheadbb-ext-3.c: New test.

(cherry picked from commit d8bdc978dc9cd4a6210997edacedb954375af70d)

riscv: thead: Fix mode attribute for extension patterns

The mode attribute of an extension pattern is usually set to the target type.
Let's follow this convention consistently for xtheadbb.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/thead.md: Use more appropriate mode attributes
for extensions.

(cherry picked from commit 0e25761b373f075a41d43b9462366a653dbf1121)

riscv: bitmanip: Remove duplicate zero_extendhi<GPR:mode>2 pattern

We currently have two identical zero_extendhi<GPR:mode>2 patterns:
* '*zero_extendhi<GPR:mode>2_zbb'
* '*zero_extendhi<GPR:mode>2_bitmanip'

This patch removes the *_zbb pattern and ensures that all sign- and
zero-extensions use the postfix '_bitmanip'.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/bitmanip.md (*extend<SHORT:mode><SUPERQI:mode>2_zbb):
Rename postfix to _bitmanip.
(*extend<SHORT:mode><SUPERQI:mode>2_bitmanip): Renamed pattern.
(*zero_extendhi<GPR:mode>2_zbb): Remove duplicated pattern.

(cherry picked from commit 0c37fef39fa0a8f77ea4fc67d1bbf5067d4bddb9)

RISC-V: Suppress bogus warning for VLS types

This patch fixes over 100+ bogus FAILs due to experimental vector ABI warning.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_pass_in_vector_p): Only allow RVV type.

(cherry picked from commit a0e042d61dadc6bdcbeaa3b712b7a83415a12547)

RISC-V: Fix incorrect nregs calculation for VLS modes

This patch fixes obvious bug: TARGET_MIN_VLEN is bitsize.

All these following bugs are fixed with this patch:
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O0  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O0  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O1  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O1  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O3 -g  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O3 -g  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -Os  (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -Os  (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/mov-13.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/mov-13.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-1.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-1.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-2.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-2.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-3.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-3.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-4.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-4.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-5.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-5.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-6.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-6.c (test for excess errors)
FAIL: gcc.target/riscv/rvv/base/spill-sp-adjust.c (internal compiler error: in partial_subreg_p, at rtl.h:3186)
FAIL: gcc.target/riscv/rvv/base/spill-sp-adjust.c (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_hard_regno_nregs): Fix bug.

(cherry picked from commit f9cb357ae962ba2922b8507f4d96227780a063b9)

RISC-V: Add VLS mask modes mov patterns

This patterns fix these following ICE FAILs when running the whole GCC testsuite
with enabling scalable vector by default.

All of these FAILs are fixed:
FAIL: c-c++-common/opaque-vector.c  -std=c++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -std=c++14 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -std=c++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -std=c++17 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -std=c++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -std=c++20 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -std=c++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -std=c++98 (test for excess errors)
FAIL: c-c++-common/pr105998.c  -std=c++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -std=c++14 (test for excess errors)
FAIL: c-c++-common/pr105998.c  -std=c++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -std=c++17 (test for excess errors)
FAIL: c-c++-common/pr105998.c  -std=c++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -std=c++20 (test for excess errors)
FAIL: c-c++-common/pr105998.c  -std=c++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -std=c++98 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -std=c++14 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -std=c++17 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -std=c++20 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -std=c++98 (test for excess errors)
FAIL: g++.dg/ext/vector36.C  -std=gnu++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/ext/vector36.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/ext/vector36.C  -std=gnu++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/ext/vector36.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/ext/vector36.C  -std=gnu++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/ext/vector36.C  -std=gnu++20 (test for excess errors)
FAIL: g++.dg/ext/vector36.C  -std=gnu++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/ext/vector36.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/pr58950.C  -std=gnu++14 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/pr58950.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/pr58950.C  -std=gnu++17 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/pr58950.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/pr58950.C  -std=gnu++20 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/pr58950.C  -std=gnu++20 (test for excess errors)
FAIL: g++.dg/pr58950.C  -std=gnu++98 (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/pr58950.C  -std=gnu++98 (test for excess errors)
FAIL: c-c++-common/torture/builtin-shufflevector-2.c   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/torture/vector-compare-2.c   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/torture/vector-compare-2.c   -O0  (test for excess errors)
FAIL: g++.dg/torture/pr104450.C   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: g++.dg/torture/pr104450.C   -O0  (test for excess errors)

FAIL: gcc.dg/analyzer/pr96713.c (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/analyzer/pr96713.c (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -Wc++-compat  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/opaque-vector.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/pr105998.c  -Wc++-compat  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/pr105998.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -Wc++-compat  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: c-c++-common/vector-scalar.c  -Wc++-compat  (test for excess errors)
FAIL: gcc.dg/pr100239.c (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/pr100239.c (test for excess errors)
FAIL: gcc.dg/pr97238.c (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/pr97238.c (test for excess errors)
FAIL: c-c++-common/torture/builtin-shufflevector-2.c   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/torture/pr70310.c   -O0  (internal compiler error: in emit_move_multi_word, at expr.cc:4079)
FAIL: gcc.dg/torture/pr70310.c   -O0  (test for excess errors)

gcc/ChangeLog:

* config/riscv/autovec-vls.md: Add VLS mask modes mov patterns.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md: Ditto.

(cherry picked from commit 6aba1fa7a4ceaf66adc587da23834d1f317f871d)

RISC-V: Remove incorrect earliest vsetvl post optimization[PR111313]

This patch removes the incorrect earliest poset vsetvl optimization,
such bug was found in vect-double-reduc-5.c which is runtime(execution fail) and also in PR111313.

For VLMAX intrinsics, we always emit a bogus patter which is vlmax_avl (see vector.md) to
occupy a scalar register which is used by the following RVV instruction which is VLMAX AVL.

Then for O2, O3, Ofast, earliest LCM works so well.
However, for O1, the vlmax_avl is not well optimized in the before pass which confused LCM earliest
so that we will end up with some redundant vsetvli zero,zero instructions in O1. (Note that O2 O3 Ofast are all good).

To elide those redundant vsetvli zero,zero, I added cleanup_earliest_vsetvls to elide those redundant vsetvls.

Now, after I review the implementation of this post optimizaiton again, I found it is incorrect and it is hard to
do the post optimizations for vsetvls that earliest LCM failed to eliminate.

Besides, such performance issues only happen in O1 or O0, such issues may not be serious.
So remove it and we may will find another way (E.g. adjust vlmax_avl pattern COST)
to optimize it if we really need to care about performance for O1.

PR target/111313

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::cleanup_earliest_vsetvls): Remove.
(pass_vsetvl::df_post_optimization): Remove incorrect function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-13.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-17.c: Skip check for O1.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-18.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-19.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-20.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-13.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-16.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-17.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-18.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-19.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-20.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-21.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-22.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-23.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-24.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-25.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-26.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-27.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-28.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_phi-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111313.c: New test.

(cherry picked from commit 572abb52f5761a647035ee39d0e443c1c3622e75)

RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

'XVentanaCondOps' is a vendor extension from Ventana Micro Systems
containing two instructions for conditional move and will be supported on
their Veyron V1 CPU.

And most notably (for historical reasons), 'XVentanaCondOps' and the
standard 'Zicond' extension are functionally equivalent (only encodings and
instruction names are different).

*   czero.eqz == vt.maskc
*   czero.nez == vt.maskcn

This commit adds support for the 'XVentanaCondOps' extension by extending
'Zicond' extension support.  With this, we can now reuse the optimization
using the 'Zicond' extension for the 'XVentanaCondOps' extension.

The specification for the 'XVentanaCondOps' extension is based on:
<https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.1/ventana-custom-extensions-v1.0.1.pdf>

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_flag_table):
Parse 'XVentanaCondOps' extension.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): New.
(TARGET_XVENTANACONDOPS): Ditto.
(TARGET_ZICOND_LIKE): New to represent targets with conditional
moves like 'Zicond'.  It includes RV64 + 'XVentanaCondOps'.
* config/riscv/riscv.cc (riscv_rtx_costs): Replace TARGET_ZICOND
with TARGET_ZICOND_LIKE.
(riscv_expand_conditional_move): Ditto.
* config/riscv/riscv.md (mov<mode>cc): Replace TARGET_ZICOND with
TARGET_ZICOND_LIKE.
* config/riscv/riscv.opt: Add new riscv_xventana_subext.
* config/riscv/zicond.md: Modify description.
(eqz_ventana): New to match corresponding czero instructions.
(nez_ventana): Ditto.
(*czero.<eqz>.<GPR><X>): Emit a 'XVentanaCondOps' instruction if
'Zicond' is not available but 'XVentanaCondOps' + RV64 is.
(*czero.<eqz>.<GPR><X>): Ditto.
(*czero.eqz.<GPR><X>.opt1): Ditto.
(*czero.nez.<GPR><X>.opt2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-primitiveSemantics.c: New test,
* gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c: New
test to make sure that XVentanaCondOps instructions are disabled
on RV32.
* gcc.target/riscv/xventanacondops-xor-01.c: New test,

(cherry picked from commit af88776caa20342482b11ccb580742a46c621250)

RISC-V: Fix incorrect mode tieable which cause ICE in RA[PR111296]

This patch fix incorrect mode tieable between DI and V2SI which cause ICE
in RA.

gcc/ChangeLog:

PR target/111296
* config/riscv/riscv.cc (riscv_modes_tieable_p): Fix incorrect mode
tieable for RVV modes.

gcc/testsuite/ChangeLog:

PR target/111296
* g++.target/riscv/rvv/base/pr111296.C: New test.

(cherry picked from commit 6b96de22d6bcadb45530c1898b264e4738afa4fd)

RISC-V: Fix VSETVL PASS AVL/VL fetch bug[111295]

Fix bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111295

gcc/ChangeLog:

PR target/111295
* config/riscv/riscv-vsetvl.cc (insert_vsetvl): Bug fix.

gcc/testsuite/ChangeLog:

PR target/111295
* gcc.target/riscv/rvv/autovec/pr111295.c: New test.

(cherry picked from commit 1b4c70d4271a00514ae20970d483c3b78d9d66ef)

RISC-V: Remove unreasonable TARGET_64BIT for VLS modes with size = 64bit

Previously, I add TARGET_64BIT condtion to block VLS modes with size = 64bit in RV32 system
E.g. V8QI

Since I realized such modes may cause inferior codegen for some situations in RV32 system.

However, this is really quite ugly and it cause ICE for some cases in RV32:

FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (internal compiler error: in require, at machmode.h:313)
3937FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (test for excess errors)

For inferior codegen in RV32 system, we should try another reasonable approach to fix it.

Remove those TARGET_64BIT and fix ICE.

gcc/ChangeLog:

* config/riscv/riscv-vector-switch.def (VLS_ENTRY): Remove TARGET_64BIT

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl1024b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl2048b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl256b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl4096b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl512b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl1024b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl2048b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl256b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl4096b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl512b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: Ditto.

(cherry picked from commit ee21f79f72980732214156bae2eb5daf7e089bda)

RISC-V: Fix incorrect folder for VRGATHERI16 test case

Put the test file to the incorrect folder, this patch would like to
fix it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/intrisinc-vrgatherei16.c: Moved to...
* gcc.target/riscv/rvv/base/intrisinc-vrgatherei16.c: ...here.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit 0574a19047fa66f26a38e79c1b9ae6a8207bba89)

riscv: xtheadbb: Fix xtheadbb-li-rotr test for rv32

The test was introduced recently and tests a RV64-only feature.
However, when testing an RV32 compiler, the test gets executed as well
and fails with "cc1: error: ABI requires '-march=rv32'".
This patch fixes this by adding '-mabi=lp64' (like it is done for
other RV64-only tests as well).

Retested with RV32 and RV64 to ensure this won't pop up again.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: Don't run for RV32.

(cherry picked from commit 57d1c9c1fe57a0de66e5c20538f77f49b1298071)

RISC-V: Keep vlmax vector operators in simple form until split1 pass

This patch keep vlmax vector pattern in simple before split1 pass which
will allow more optimization (e.g. combine) before split1 pass.
This patch changes the vlmax pattern in autovec.md to define_insn_and_split
as much as possible and clean up some combine patterns that are no longer needed.
This patch also fixed PR111232 bug which was caused by a combined failed.

PR target/111232

gcc/ChangeLog:

* config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>):
Delete.
(*pred_widen_mulsu<mode>): Delete.
(*pred_single_widen_mul<mode>): Delete.
(*dual_widen_<any_widen_binop:optab><any_extend:su><mode>):
Add new combine patterns.
(*single_widen_sub<any_extend:su><mode>): Ditto.
(*single_widen_add<any_extend:su><mode>): Ditto.
(*single_widen_mult<any_extend:su><mode>): Ditto.
(*dual_widen_mulsu<mode>): Ditto.
(*dual_widen_mulus<mode>): Ditto.
(*dual_widen_<optab><mode>): Ditto.
(*single_widen_add<mode>): Ditto.
(*single_widen_sub<mode>): Ditto.
(*single_widen_mult<mode>): Ditto.
* config/riscv/autovec.md (<optab><mode>3):
Change define_expand to define_insn_and_split.
(<optab><mode>2): Ditto.
(abs<mode>2): Ditto.
(smul<mode>3_highpart): Ditto.
(umul<mode>3_highpart): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-4.c: Add more testcases.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111232.c: New test.

(cherry picked from commit 9ee40b9a7bee83394fc7ba6fef71cb76d91b49c8)

RISC-V: Part-3: Output .variant_cc directive for vector function

Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual[1] and
RISC-V ELF Specification[2].

[1] https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops
[2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#dynamic-linking

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_declare_function_name): Add protos.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.cc (riscv_asm_output_variant_cc):
Output .variant_cc directive for vector function.
(riscv_declare_function_name): Ditto.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.h (ASM_DECLARE_FUNCTION_NAME):
Implement ASM_DECLARE_FUNCTION_NAME.
(ASM_OUTPUT_DEF_FROM_DECLS): Implement ASM_OUTPUT_DEF_FROM_DECLS.
(ASM_OUTPUT_EXTERNAL): Implement ASM_OUTPUT_EXTERNAL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: New test.

(cherry picked from commit 4abcc5009c1ad852e235f368f535c0bf6bfa7697)

RISC-V: Part-2: Save/Restore vector registers which need to be preversed

Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.

gcc/ChangeLog:

* config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): Pass riscv_cc.
* config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
(riscv_frame_info::reset): Reset new fileds.
(riscv_call_tls_get_addr): Pass riscv_cc.
(riscv_function_arg): Return riscv_cc for call patterm.
(get_riscv_cc): New function return riscv_cc from rtl call_insn.
(riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
(riscv_save_reg_p): Add vector callee-saved check.
(riscv_stack_align): Add vector save area comment.
(riscv_compute_frame_info): Ditto.
(riscv_restore_reg): Update for type change.
(riscv_for_each_saved_v_reg): New function save vector registers.
(riscv_first_stack_step): Handle funciton with vector callee-saved registers.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_output_mi_thunk): Pass riscv_cc.
(TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
* config/riscv/riscv.h (get_riscv_cc): Export get_riscv_cc function.
* config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.

(cherry picked from commit fdd59c0f73e9e681cd5f4d0eee2dd58d60d8dbe1)

RISC-V: Part-1: Select suitable vector registers for vector type args and returns

I post the vector register calling convention rules from in the proposal[1]
directly here:

v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.

Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.

Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.

The rules for passing vector arguments are as follows:

1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.

2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.

3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.

NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.

Vector values are returned in the same manner as the first named argument of
the same type would be passed.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389

gcc/ChangeLog:

* config/riscv/riscv-protos.h (builtin_type_p): New function for checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector arguments.
(riscv_arguments_is_vector_type_p): New function for check vector returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: New test.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return.c: New test.

(cherry picked from commit 94a4b93292f8ab19910c844bb9b63e4a68b55d33)

RISC-V: Add conditional sqrt autovec pattern

This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><mode>):
Add sqrt + vcond_mask combine pattern.
* config/riscv/autovec.md (<optab><mode>2):
Change define_expand to define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.

(cherry picked from commit c1597e7fb9f9ecb9d7c33b5afa48031f284375de)

RISC-V: typo: add closing paren to a comment

gcc/ChangeLog:

* config/riscv/zicond.md: Add closing parent to a comment.

(cherry picked from commit 254100a9a003a16255a58eec3fa24168e6dc7124)

RISC-V: Fix Zicond ICE on large constants

Large constant cons and/or alt will trigger ICEs building GCC target
libraries (libgomp and libatomic) when the 'Zicond' extension is enabled.

For instance, zicond-ice-2.c (new test case in this commit) will cause
an ICE when SOME_NUMBER is 0x1000 or larger.  While opposite numbers
corresponding cons/alt (two temp2 variables) are checked, cons/alt
themselves are not checked and causing 2 ICEs building
GCC target libraries as of this writing:

1.  gcc/libatomic/config/posix/lock.c
2.  gcc/libgomp/fortran.c

Coercing a large value into a register will fix the issue.

It also coerce a large cons into a register on "imm, imm" case (the author
could not reproduce but possible to cause an ICE).

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Force
large constant cons/alt into a register.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ice-2.c: New test.  This is based on
an ICE at libat_lock_n func on gcc/libatomic/config/posix/lock.c
but heavily minimized.

(cherry picked from commit ce65641354d98fc80912d5516b7fea87c344c2cc)

riscv: Synthesize all 11-bit-rotate constants with rori

Some constants can be built up using LI+RORI instructions.
The current implementation requires one of the upper 32-bits
to be a zero bit, which is not neccesary.
Let's drop this requirement in order to be able to synthesize
a constant like 0xffffffff00ffffffL.

The tests for LI+RORI are made more strict to detect regression
in the calculation of the LI constant and the rotation amount.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Don't
require one zero bit in the upper 32 bits for LI+RORI synthesis.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: New tests.
* gcc.target/riscv/zbb-li-rotr.c: Likewise.

(cherry picked from commit 102dd3e8067f12beee1b8b0bec6848733d107aee)

RISC-V: Expose bswapsi for TARGET_64BIT

Various bswapsi tests are failing for rv64.  More importantly, we're generating
crappy code.

Let's take the first test from bswapsi-1.c as an example.

> typedef unsigned int uint32_t;
>
> #define __const_swab32(x) ((uint32_t)(                                \
>         (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) |            \
>         (((uint32_t)(x) & (uint32_t)0x0000ff00UL) <<  8) |            \
>         (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >>  8) |            \
>         (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))
>
> /* This byte swap implementation is used by the Linux kernel and the
>    GNU C library.  */
>
> uint32_t
> swap32_a (uint32_t in)
> {
>   return __const_swab32 (in);
> }
>
>
>

We generate this for rv64gc_zba_zbb_zbs:

>         srliw   a1,a0,24
>         slliw   a5,a0,24
>         slliw   a3,a0,8
>         li      a2,16711680
>         li      a4,65536
>         or      a5,a5,a1
>         and     a3,a3,a2
>         addi    a4,a4,-256
>         srliw   a0,a0,8
>         or      a5,a5,a3
>         and     a0,a0,a4
>         or      a0,a5,a0
>         retUrgh!

After this patch we generate:

>         rev8    a0,a0
>         srai    a0,a0,32
>         ret
Clearly better.

The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that the
RTL expanders already know how to widen a bswap, which is definitely true.  But
it's the case that failure to expose a bswapsi will cause the 32bit bswap
optimizations in gimple store merging to not trigger.  Thus we get crappy code.

To fix this we expose bswapsi on TARGET_64BIT.  gimple-store-merging then
detects the 32bit bswap idioms and generates suitable __builtin calls.  The
expander will "FAIL" expansion for TARGET_64BIT which forces the generic
expander code to synthesize the operation (we could synthesize in here, but
that'd result in duplicate code).

Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the testsuite
without any regressions.

gcc/
* config/riscv/bitmanip.md (bswapsi2): Expose for TARGET_64BIT.

(cherry picked from commit fbc01748ba46eb26074388a8fb7b44d25a414a72)

RISC-V: Add Types to Un-Typed Risc-v Instructions

Updates risc-v instructions to ensure that no instruction is left
without a type attribute. Added new types "trap" and "cbo" (for
cache related instructions)

Tested for regressions using rv32/64 multilib with newlib/linux and
rv32/64 gcv for linux.

gcc/Changelog:

* config/riscv/riscv.md: Update/Add types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit decbf9ec81f33052be12296b89cd86ea65ae10da)

RISC-V: Add Types to Un-Typed Pic Instructions

Updates pic instructions to ensure that no instruction is left
without a type attribute.

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/pic.md: Update types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit c85db606d46774283ca4ec037dc3051719828f41)

riscv: xtheadbb: Enable constant synthesis with th.srri

Some constants can be built up using rotate-right instructions.
The code that enables this can be found in riscv_build_integer_1().
However, this functionality is only available for Zbb, which
includes the rori instruction. This patch enables this also for
XTheadBb, which includes the th.srri instruction.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Enable constant
synthesis with rotate-right for XTheadBb.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: New test.

(cherry picked from commit af5cb06ec17780736749ed51cfc6dfad9397156c)

RISC-V: zicond: Fix opt2 pattern

Fixes: 1d5bc3285e8a ("[committed][RISC-V] Fix 20010221-1.c with zicond")
This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since in
failing case, pattern semantics were not matching with asm czero.nez

We start with the following src code snippet:

      if (a == 0)
return 0;
      else
return x;
    }

which is equivalent to:  "x = (a != 0) ? x : a" where x is NOT 0.
                                                ^^^^^^^^^^^^^^^^

and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|        (if_then_else:DI (ne (reg/v:DI 134 [ a ])
|                (const_int 0 [0]))
|            (reg/v:DI 136 [ x ])
|            (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
    czero.nez x, x, a   ; %0, %2, %1

which implies
    "x = (a != 0) ? 0 : a"

clearly not what the pattern wants to do.

Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez if X
is not guaranteed to be 0.

However this can be fixed with a small tweak

"x = (a != 0) ? x : a"

   is same as

"x = (a == 0) ? a : x"

and since middle operand is 0 when a == 0, it is equivalent to

"x = (a == 0) ? 0 : x"

which can be expressed with CZERO.eqz

before fix after fix
----------------- -----------------
li        a5,1         li        a5,1
ld        a4,8(sp) ld        a4,8(sp)
czero.nez a0,a4,a5 czero.eqz a0,a4,a5

The issue only happens at -O1 as at higher optimization levels, the
whole conditional move gets optimized away.

This fixes 4 testsuite failues in a zicond build:

FAIL: gcc.c-torture/execute/pr60003.c   -O1  execution test
FAIL: gcc.dg/setjmp-3.c execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1  execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1 -fpic execution test

gcc/ChangeLog:
* config/riscv/zicond.md: Fix op2 pattern.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit e87212ead5e9f36945b5e2d290187e2adca34da5)

RISC-V: Emit .note.GNU-stack for non-linux target as well

We only emit that on linux target before, that not problem before,
however Qemu has fix a bug to make qemu user mode honor PT_GNU_STACK[1],
that will cause problem when we test baremetal with qemu.

So the straightforward is enable that as well for non-linux toolchian,
the price is that will increase few bytes for each binary.

[1] https://github.com/qemu/qemu/commit/872f3d046f2381e3f416519e82df96bd60818311

gcc/ChangeLog:

* config/riscv/linux.h (TARGET_ASM_FILE_END): Move ...
* config/riscv/riscv.cc (TARGET_ASM_FILE_END): to here.

(cherry picked from commit fba0f47e4617e164716d3bce587fc6948088e225)

RISC-V: Support FP SGNJ autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.

Given below code example:

void test(float * restrict out, float * restrict in1, float * restrict in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = __builtin_copysignf (in1[i], in2[i]);
}

Before this patch:
test:
  csrr    a4,vlenb
  slli    a4,a4,1
  li      a5,128
  bleu    a5,a4,.L2
  mv      a5,a4
.L2:
  vsetvli zero,a5,e32,m8,ta,ma
  vle32.v v8,0(a1)
  vle32.v v16,0(a2)
  vsetvli a4,zero,e32,m8,ta,ma
  vfsgnj.vv       v8,v8,v16
  vsetvli zero,a5,e32,m8,ta,ma
  vse32.v v8,0(a0)
  ret

After this patch:
test:
  li      a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vle32.v v2,0(a2)
  vfsgnj.vv       v1,v1,v2
  vse32.v v1,0(a0)
  ret

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (copysign<mode>3): New pattern.
* config/riscv/vector.md: Extend iterator for VLS.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: New macro.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c: New test.

(cherry picked from commit a7b048c0f42198a0f8d4244f1bd25211cf48383f)

RISC-V: Export functions as global extern preparing for dynamic LMUL patch use

Notice those functions need to be use by COST model for dynamic LMUL use.
Extract as a single patch and committed.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (lookup_vector_type_attribute): Export global.
(get_all_predecessors): New function.
(get_all_successors): Ditto.
* config/riscv/riscv-v.cc (get_all_predecessors): Ditto.
(get_all_successors): Ditto.
* config/riscv/riscv-vector-builtins.cc (sizeless_type_p): Export global.
* config/riscv/riscv-vsetvl.cc (get_all_predecessors): Remove it.

(cherry picked from commit 509c10a62546b9b3430040e455b7258322a024e6)

riscv: xtheadcondmov: Don't run tests with -Oz

Recently, these xtheadcondmov tests regressed with -Oz:
* FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c
* FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c
* FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c
* FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c

As -Oz stands for "Optimize aggressively for size rather than speed.",
we need to inspect the generated code, which looks like this:

  -Oz
  0000000000000000 <not_int_int>:
     0:   e199                    bnez    a1,6 <.L2>
     2:   40100513                li      a0,1025
  0000000000000006 <.L2>:
     6:   8082                    ret

  -O2:
  0000000000000000 <not_int_int>:
     0:   40100793                li      a5,1025
     4:   40b7950b                th.mveqz        a0,a5,a1
     8:   8082                    ret

As the generated code with -Oz consumes less size, there is nothing
wrong in the code generation. Instead, let's not run the xtheadcondmov
tests with -Oz.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c: Disable for -Oz.
* gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-eqz.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-not.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-cond.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-nez.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit 8451fbd56871267e8c1cd781db6d8f02e826f66c)

RISC-V: Fix Dynamic LMUL compile option

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Fix Dynamic status.
* config/riscv/riscv-v.cc (preferred_simd_mode): Ditto.
(autovectorize_vector_modes): Ditto.
(vectorize_related_mode): Ditto.

(cherry picked from commit 6f94ef6c86074a8348ec21d8aade04ce67b4e292)

RISC-V: Support FP16 for RVV VRGATHEREI16 intrinsic

This patch would like to add FP16 support for the VRGATHEREI16
intrinsic. Aka:

* __riscv_vrgatherei16_vv_f16mf4
* __riscv_vrgatherei16_vv_f16mf4_m

As well as f16mf2 to f16m8 types.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add FP16 intrinsic def.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
(vfloat16m8_t): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/intrisinc-vrgatherei16.c: New test.

(cherry picked from commit d99a868a9b100ab5a4b270a1acece60b5b6153a3)

RISC-V: Support FP MAX/MIN autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.

Given below code example:

test (float *out, float *in1, float *in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = in1[i] > in2[i] ? in1[i] : in2[i];
    // Or out[i] = fmax (in1[i], in2[i]);
}

Before this patch:
test:
  csrr    a4,vlenb
  slli    a4,a4,1
  li      a5,128
  bleu    a5,a4,.L2
  mv      a5,a4
.L2:
  vsetvli zero,a5,e32,m8,ta,ma
  vle32.v v16,0(a1)
  vle32.v v8,0(a2)
  vsetvli a3,zero,e32,m8,ta,ma
  vmfgt.vv        v0,v16,v8
  vmerge.vvm      v8,v8,v16,v0
  vsetvli zero,a5,e32,m8,ta,ma
  vse32.v v8,0(a0)
  ret

After this patch:
test:
  li      a5,128
  vsetvli zero,a5,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vle32.v v2,0(a2)
  vfmax.vv        v1,v1,v2
  vse32.v v1,0(a0)
  ret

This MAX/MIN autovec acts on function call like fmaxf/fmax in math.h
too. And it depends on the option -ffast-math.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
fmax/fmin
* config/riscv/vector.md: Add VLS modes to vfmax/vfmin.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: New macros.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c: New test.

(cherry picked from commit a7d052b3200c7928d903a0242b8cfd75d131e374)

RISC-V: Add conditional autovec convert(INT<->FP) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><mode><vconvert>):
New combine pattern.
(*cond_<float_cvt><vconvert><mode>): Ditto.
(*cond_<optab><vnconvert><mode>): Ditto.
(*cond_<float_cvt><vnconvert><mode>): Ditto.
(*cond_<optab><mode><vnconvert>): Ditto.
(*cond_<float_cvt><mode><vnconvert>2): Ditto.
* config/riscv/autovec.md (<optab><mode><vconvert>2): Adjust.
(<float_cvt><vconvert><mode>2): Adjust.
(<optab><vnconvert><mode>2): Adjust.
(<float_cvt><vnconvert><mode>2): Adjust.
(<optab><mode><vnconvert>2): Adjust.
(<float_cvt><mode><vnconvert>2): Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add INT->FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c: New test.

(cherry picked from commit 258af9c7004cdc7963f783dd510404e79f0b5362)

RISC-V: Add conditional autovec convert(FP<->FP) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_extend<v_double_trunc><mode>):
New combine pattern.
(*cond_trunc<mode><v_double_trunc>): Ditto.
* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c: New test.

(cherry picked from commit 75a243c7c7c7efa9f12038480b46260ada739202)

RISC-V: Add conditional autovec convert(INT<->INT) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><v_double_trunc><mode>):
New combine pattern.
(*cond_<optab><v_quad_trunc><mode>): Ditto.
(*cond_<optab><v_oct_trunc><mode>): Ditto.
(*cond_trunc<mode><v_double_trunc>): Ditto.
* config/riscv/autovec.md (<optab><v_quad_trunc><mode>2): Adjust.
(<optab><v_oct_trunc><mode>2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New test.

(cherry picked from commit a1e5fd2c9adc35ef435dcc96991320d69453919a)

RISC-V: Adjust expand_cond_len_{unary,binop,op} api

This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code`
to `unsigned icode` and use the icode directly to determine whether the
rounding_mode operand is required.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-protos.h (expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Ditto.
(expand_cond_len_op): Ditto.
(expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
(expand_cond_len_ternop): Ditto.

(cherry picked from commit 4d1c8b04ec8731b57ddbc80d76e40a61d8fa3324)