Currently, VLS and VLA patterns are different.
VLA is define_expand
VLS is define_insn_and_split
It makes no sense that they are different pattern format.
Merge them into same pattern (define_insn_and_split).
It can also be helpful for the future vv -> vx fwprop optimization.
This patch removed the misleading comments in testcases since we
support fold min(int, poly) to constant by this patch
(https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629651.html).
Thereby the csrr will not appear inside the assembly code, even if there
is no support for some VLS vector patterns.
RISC-V: Add fixed PR111255 testcase by other patch
This patch add the missed PR111255 testcase which is fixed by this
committed patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628922.html).
PR target/111255
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr111255.c: New test.
There is an obvious fusion bug that is exposed by more VLS patterns support.
After more VLS modes support, it cause following FAILs:
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c execution test
Demand 1: SEW = 64, LMUL = 1, RATIO = 64, demand SEW, demand GE_SEW.
Demand 2: SEW = 64, LMUL = 2, RATIO = 32, demand SEW, demand GE_SEW, demand RATIO.
Before this patch:
merge demand: SEW = 64, LMUL = 1, RATIO = 32, demand SEW, demand LMUL, demand GE_SEW.
It's obvious incorrect of merge LMUL which should be new LMUL = (demand 2 RATIO * greatest SEW) = M2
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (vlmul_for_greatest_sew_second_ratio): New function.
* config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Fix bug.
RISC-V: Support VLS modes vec_init auto-vectorization
There are multiple SLP dump FAILs in vect testsuite.
After analysis, confirm we are missing vec_init for VLS modes.
This patch is not sufficient to fix those FAILs (We need more VLS patterns will send them soon).
This patch is the prerequsite patch for fixing those SLP FAILs.
* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS vec_init tests.
* gcc.target/riscv/rvv/autovec/vls/init-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/repeat-9.c: New test.
It can optimize the current reduction vectorization codegen with current COST model.
TYPE __attribute__ ((noinline, noclone)) \
reduc_plus_##TYPE (TYPE * __restrict a, int n) \
{ \
TYPE r = 0; \
for (int i = 0; i < n; ++i) \
r += a[i]; \
return r; \
}
* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS mode reduction case.
* gcc.target/riscv/rvv/autovec/vls/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-15.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-16.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-17.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-18.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-9.c: New test.
Tsukasa OI [Thu, 7 Sep 2023 07:39:33 +0000 (07:39 +0000)]
RISC-V: Make SHA-256, SM3 and SM4 builtins operate on uint32_t
This is in parity with the LLVM commit a64b3e92c7cb ("[RISCV] Re-define
sha256, Zksed, and Zksh intrinsics to use i32 types.").
SHA-256, SM3 and SM4 instructions operate on 32-bit integers and upper
32-bits have no effects on RV64 (the output is sign-extended from the
original 32-bit value). In that sense, making those intrinsics only
operate on uint32_t is much more natural than XLEN-bits wide integers.
This commit reforms instructions and expansions based on 32-bit
instruction handling on RV64 (such as ADDW).
Before:
riscv_<op>_si: For RV32, fully operate on uint32_t
riscv_<op>_di: For RV64, fully operate on uint64_t
After:
*riscv_<op>_si: For RV32, fully operate on uint32_t
riscv_<op>_di_extended:
For RV64. Input is uint32_t and output is int64_t,
sign-extended from the int32_t result
(represents a part of <op> behavior).
riscv_<op>_si: Common (fully operate on uint32_t).
On RV32, "expands" to *riscv_<op>_si.
On RV64, initially expands to riscv_<op>_di_extended *and*
extracts lower 32-bits from the int64_t result.
It also refines definitions of SHA-256, SM3 and SM4 intrinsics.
gcc/ChangeLog:
* config/riscv/crypto.md (riscv_sha256sig0_<mode>,
riscv_sha256sig1_<mode>, riscv_sha256sum0_<mode>,
riscv_sha256sum1_<mode>, riscv_sm3p0_<mode>, riscv_sm3p1_<mode>,
riscv_sm4ed_<mode>, riscv_sm4ks_<mode>): Remove and replace with
new insn/expansions.
(SHA256_OP, SM3_OP, SM4_OP): New iterators.
(sha256_op, sm3_op, sm4_op): New attributes for iteration.
(*riscv_<sha256_op>_si): New raw instruction for RV32.
(*riscv_<sm3_op>_si): Ditto.
(*riscv_<sm4_op>_si): Ditto.
(riscv_<sha256_op>_di_extended): New base instruction for RV64.
(riscv_<sm3_op>_di_extended): Ditto.
(riscv_<sm4_op>_di_extended): Ditto.
(riscv_<sha256_op>_si): New common instruction expansion.
(riscv_<sm3_op>_si): Ditto.
(riscv_<sm4_op>_si): Ditto.
* config/riscv/riscv-builtins.cc: Add availability "crypto_zknh",
"crypto_zksh" and "crypto_zksed". Remove availability
"crypto_zksh{32,64}" and "crypto_zksed{32,64}".
* config/riscv/riscv-ftypes.def: Remove unused function type.
* config/riscv/riscv-scalar-crypto.def: Make SHA-256, SM3 and SM4
intrinsics to operate on uint32_t.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zknh-sha256.c: Moved to...
* gcc.target/riscv/zknh-sha256-64.c: ...here. Test RV64.
* gcc.target/riscv/zknh-sha256-32.c: New test for RV32.
* gcc.target/riscv/zksh64.c: Change the type.
* gcc.target/riscv/zksed64.c: Ditto.
Tsukasa OI [Wed, 6 Sep 2023 06:28:39 +0000 (06:28 +0000)]
RISC-V: Make bit manipulation value / round number and shift amount types for builtins unsigned
For bit manipulation operations, input(s) and the manipulated output are
better to be unsigned like other target-independent builtins like
__builtin_bswap32 and __builtin_popcount.
Although this is not completely compatible as before (as the type changes),
most code will run normally, even without warnings (with -Wall -Wextra).
To make consistent to the LLVM commit 599421ae36c3 ("[RISCV] Use unsigned
instead of signed types for Zk* and Zb* builtins."), round numbers and
shift amount on the scalar crypto instructions are also changed
to unsigned.
gcc/ChangeLog:
* config/riscv/riscv-builtins.cc (RISCV_ATYPE_UQI): New for
uint8_t. (RISCV_ATYPE_UHI): New for uint16_t.
(RISCV_ATYPE_QI, RISCV_ATYPE_HI, RISCV_ATYPE_SI, RISCV_ATYPE_DI):
Removed as no longer used.
(RISCV_ATYPE_UDI): New for uint64_t.
* config/riscv/riscv-cmo.def: Make types unsigned for not working
"zicbop_cbo_prefetchi" and working bit manipulation clmul builtin
argument/return types.
* config/riscv/riscv-ftypes.def: Make bit manipulation, round
number and shift amount types unsigned.
* config/riscv/riscv-scalar-crypto.def: Ditto.
Pan Li [Fri, 15 Sep 2023 12:57:20 +0000 (20:57 +0800)]
RISC-V: Support FP SGNJX autovec for VLS mode
This patch would like to allow the VLS mode autovec for the
floating-point binary operation SGNJX.
Give sample code as below:
void
test (float * restrict out, float * restrict in1, float * restrict in2)
{
for (int i = 0; i < 128; i++)
out[i] = in1[i] * copysignf (1.0, in2[i]);
}
Before this patch:
test:
li a5,128
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
lui a4,%hi(.LC0)
flw fa5,%lo(.LC0)(a4)
vfmv.v.f v1,fa5
vle32.v v3,0(a2)
vfsgnj.vv v1,v1,v3
vfmul.vv v1,v1,v2
vse32.v v1,0(a0)
ret
After this patch:
test:
li a5,128
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a1)
vle32.v v2,0(a2)
vfsgnjx.vv v1,v1,v2
vse32.v v1,0(a0)
ret
This SGNJX autovec acts on function call copysignf/copysignf
in math.h too. And it depends on the option -ffast-math.
gcc/ChangeLog:
* config/riscv/autovec-vls.md (xorsign<mode>3): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: New macro.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnjx-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnjx-2.c: New test.
root cause:
In a gcc build with --enable-checking=yes, REGNO (op) checks
rtx code and expected code 'reg'. so a rtx with 'subreg' causes
an internal compiler error.
RISC-V: Fix using wrong mode to get reduction insn vlmax
This patch fix using wrong mode when emit vlmax reduction insn. We should
use src operand instead dest operand (which always LMUL=m1) to get the vlmax
length. This patch alse remove dest_mode and mask_mode from insn_expander
constructor, which can be geted by insn_flags.
test: Block slp-16.c check for target support vect_strided6
This testcase FAIL in RISC-V because RISC-V support vect_load_lanes with 6.
FAIL: gcc.dg/vect/slp-16.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-16.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2
Since it use vlseg6 (vect_load_lanes with array size = 6)
test: Isolate slp-1.c check of target supports vect_strided5
This test failed in RISC-V:
FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4
Because this loop:
/* SLP with unrolling by 8. */
for (i = 0; i < N; i++)
{
out[i*5] = 8;
out[i*5 + 1] = 7;
out[i*5 + 2] = 81;
out[i*5 + 3] = 28;
out[i*5 + 4] = 18;
}
is using vect_load_lanes with array size = 5.
instead of SLP.
When we adjust the COST of LANES load store, then it will use SLP.
Like ARM SVE, this test cause FAILs of XPASS:
XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 72)
XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 77)
XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 note (test for warnings, line 68)
RISC-V: Refactor expand_reduction and cleanup enum reduction_type
This patch refactors expand_reduction, remove the reduction_type argument
and add insn_flags argument to determine the passing of the operands.
ops has also been modified to restrict it to only two cases and to remove
operand that are not in use.
RISC-V: Support combine extend and reduce sum to widen reduce sum
This patch add combine pattern to combine extend and reduce sum
to widen reduce sum. The pattern in autovec.md was adjusted as
needed. Note that the current vectorization cannot generate reduce
operand which is LMUL=M8, because this means that we need an LMUL=M16
for the extended operand, which is currently not possible. So I've
added VI_QHS_NO_M8 and VF_HS_NO_M8 mode iterator, which exclude
mode which is LMUL=M8.
PR target/111381
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*reduc_plus_scal_<mode>):
New combine pattern.
(*fold_left_widen_plus_<mode>): Ditto.
(*mask_len_fold_left_widen_plus_<mode>): Ditto.
* config/riscv/autovec.md (reduc_plus_scal_<mode>):
Change from define_expand to define_insn_and_split.
(fold_left_plus_<mode>): Ditto.
(mask_len_fold_left_plus_<mode>): Ditto.
* config/riscv/riscv-v.cc (expand_reduction):
Support widen reduction.
* config/riscv/vector-iterators.md (UNSPEC_WREDUC_SUM):
Add new iterators and attrs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen_reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_order-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_order-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_order_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_order_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_reduc_run-1.c: New test.
[RA]: Improve cost calculation of pseudos with equivalences
RISCV target developers reported that RA can spill pseudo used in a
loop although there are enough registers to assign. It happens when
the pseudo has an equivalence outside the loop and the equivalence is
not merged into insns using the pseudo. IRA sets up that memory cost
to zero when the pseudo has an equivalence and it means that the
pseudo will be probably spilled. This approach worked well for i686
(different approaches were benchmarked long time ago on spec2k).
Although common sense says that the code is wrong and this was
confirmed by RISCV developers.
I've tried the following patch on I7-9700k and it improved spec17 fp
by 1.5% (21.1 vs 20.8) although spec17 int is a bit worse by 0.45%
(8.54 vs 8.58). The average generated code size is practically the
same (0.001% difference).
In the future we probably need to try more sophisticated cost
calculation which should take into account that the equiv can not be
combined in usage insns and the costs of reloads because of this.
gcc/ChangeLog:
* ira-costs.cc (find_costs_and_classes): Decrease memory cost
by equiv savings.
The reason for the change is that the semantics of the previous pattern is incorrect.
GCC does not have a standard rtx code to express the reduction calculation process.
It makes more sense to use UNSPEC.
Further, all reduction icode are geted by the UNSPEC and MODE (code_for_pred (unspec, mode)),
so that all reduction patterns can have a uniform icode name. After this adjust, widen_reducop
and widen_freducop are redundant.
RISC-V: Cleanup redundant reduction patterns after refactor vector mode
This patch cleanups redundant reduction patterns after Juzhe change vector mode
from fixed-size to scalable-size. For example, whether it is zvl32b, zvl64b,
zvl128b, RVVM1SI indicates that it occupies a vector register. Therefore, it is
easy to map vector modes to LMUL1 vector modes with define_mode_attr without
creating a separate pattern for each LMUL1 Mode. For example, this patch can
combine four patterns (@pred_reduc_<reduc><VQI:mode><VQI_LMUL1:mode>,
@pred_reduc_<reduc><VHI:mode><VHI_LMUL1:mode>
@pred_reduc_<reduc><VSI:mode><VSI_LMUL1:mode>,
@pred_reduc_<reduc><VDI:mode><VDI_LMUL1:mode>) to a single pattern
@pred_reduc_<reduc><mode>.
* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS tests.
* gcc.target/riscv/rvv/autovec/vls/cmp-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cmp-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/mask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/mask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/mask-3.c: New test.
RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization
This patch support VLS modes VEC_EXTRACT to fix PR111391:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391
I need VLS modes VEC_EXTRACT to fix this issue.
I have run the whole gcc testsuite, notice this patch increase these 4 FAILs:
FAIL: c-c++-common/vector-subscript-4.c -std=gnu++14 scan-tree-dump-not optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c -std=gnu++17 scan-tree-dump-not optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c -std=gnu++20 scan-tree-dump-not optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c -std=gnu++98 scan-tree-dump-not optimized "vector"
After analysis and comparing with LLVM:
https://godbolt.org/z/ozhfKhj5Y
with this patch, GCC generate similar codegen like LLVM (Previously it can not be vectorized).
This patch is the prerequisite patch to fix an ICE.
So let's ignore those increased 4 dump IR FAILs since ICE is un-acceptable wheras dump FAILs are acceptable (But we should remember and eventually fix dump IR FAILs too).
* gcc.target/riscv/rvv/autovec/vls/def.h: Add more def.
* gcc.target/riscv/rvv/autovec/vls/extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/extract-2.c: New test.
RISC-V: Support cond vmulh.vv and vmulu.vv autovec patterns
This patch adds combine patterns to combine vmulh[u].vv + vcond_mask
to mask vmulh[u].vv. For vmulsu.vv, it can not be produced in midend
currently. We will send another patch to take this issue.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_<mulh_table><mode>3_highpart):
New combine pattern.
* config/riscv/autovec.md (smul<mode>3_highpart): Mrege smul and umul.
(<mulh_table><mode>3_highpart): Merged pattern.
(umul<mode>3_highpart): Mrege smul and umul.
* config/riscv/vector-iterators.md (umul): New iterators.
(UNSPEC_VMULHU): New iterators.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_mulh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_mulh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-2.c: New test.
This patch add combine patterns to combine vnsra.w[vxi] + vcond_mask
to a mask vnsra.w[vxi].
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_v<any_shiftrt:optab><any_extend:optab>trunc<mode>):
New combine pattern.
(*cond_<any_shiftrt:optab>trunc<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c: New test.
This patch add combine patterns to combine vfsgnj.vv + vcond_mask
to mask vfsgnj.vv. For vfsgnjx.vv, it can not be produced in midend
currently. We will send another patch to take this issue.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-run.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-template.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-zvfh-run.c: New test.
These testcases because we don't support widen_sum/vec_unpack....etc patterns.
Currently, we don't support them since we don't see the benefits.
May support those patterns if they are beneficial ? Or Fix testcases ?
Conclusion:
IMHO, I think we can merge this patch after we addressed all REAL highest priority issues (1).
The rest FAILs are not big issues then we can reduce them by supporting more features (For example VLS modes).
Feel free to give any comments.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Enable vect_int for RVV.
RISC-V: Support VECTOR BOOL vcond_mask optab[PR111337]
As this PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337
We support VECTOR BOOL vcond_mask to fix this following ICE:
0x1a9e309 gimple_expand_vec_cond_expr
../../../../gcc/gcc/gimple-isel.cc:283
0x1a9ea56 execute
../../../../gcc/gcc/gimple-isel.cc:390
gcc/ChangeLog:
PR target/111337
* config/riscv/autovec.md (vcond_mask_<mode><mode>): New pattern.
You can see it will produce register spilling if you specify LMUL >= 4
Now, with --param=riscv-autovec-lmul=dynamic.
GCC is able to pick LMUL = 2 to optimized this case.
This feature is supported by linear scan based local live ranges analysis and
compute maximum live V_REGS in specific program point of the function to determine the VF/LMUL.
Note that this patch can well handle both SLP and non-SLP loop.
Currenty approach didn't consider the later instruction scheduler which may improve the register pressure.
In this case, we are conservatively applying smaller VF/LMUL. (Not sure whether we should support live range shrink for such corner case since we don't known whether it can improve performance a lot.)
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: New test.
This patch implements expansions for the cmpstrsi and cmpstrnsi
builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
instructions are available. The expansion basically emits a comparison
sequence which compares XLEN bits per step if possible.
This allows to inline calls to strcmp() and strncmp() if both strings
are xlen-aligned. For strncmp() the length parameter needs to be known.
The benefits over calls to libc are:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment tests
The inlining mechanism is gated by a new switches ('-minline-strcmp' and
'-minline-strncmp') and by the variable 'optimize_size'.
The amount of emitted unrolled loop iterations can be controlled by the
parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.
The comparision sequence is inspired by the strcmp example
in the appendix of the Bitmanip specification (incl. the fast
result calculation in case the first word does not contain
a NULL byte). Additional inspiration comes from rs6000-string.c.
The emitted sequence is not triggering any readahead pagefault issues,
because only aligned strings are accessed by aligned xlen-loads.
This patch has been tested using the glibc string tests on QEMU:
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
* rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/bitmanip.md (*<optab>_not<mode>): Export INSN name.
(<optab>_not<mode>3): Likewise.
* config/riscv/riscv-protos.h (riscv_expand_strcmp): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
macros.
(GEN_EMIT_HELPER2): Likewise.
(emit_strcmp_scalar_compare_byte): New function.
(emit_strcmp_scalar_compare_subword): Likewise.
(emit_strcmp_scalar_compare_word): Likewise.
(emit_strcmp_scalar_load_and_compare): Likewise.
(emit_strcmp_scalar_call_to_libc): Likewise.
(emit_strcmp_scalar_result_calculation_nonul): Likewise.
(emit_strcmp_scalar_result_calculation): Likewise.
(riscv_expand_strcmp_scalar): Likewise.
(riscv_expand_strcmp): Likewise.
* config/riscv/riscv.md (*slt<u>_<X:mode><GPR:mode>): Export
INSN name.
(@slt<u>_<X:mode><GPR:mode>3): Likewise.
(cmpstrnsi): Invoke expansion function for str(n)cmp.
(cmpstrsi): Likewise.
* config/riscv/riscv.opt: Add new parameter
'-mstring-compare-inline-limit'.
* doc/invoke.texi: Document new parameter
'-mstring-compare-inline-limit'.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-strcmp.c: New test.
* gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
* gcc.target/riscv/zbb-strcmp-disabled.c: New test.
* gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
* gcc.target/riscv/zbb-strcmp.c: New test.
This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
The inserted sequences are:
rv32gc_zbb (RV64 is similar):
add a3,a0,4
li a4,-1
.L1: lw a5,0(a0)
add a0,a0,4
orc.b a5,a5
beq a5,a4,.L1
not a5,a5
ctz a5,a5
srl a5,a5,0x3
add a0,a0,a5
sub a0,a0,a3
This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test
The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.
Tested using the glibc string tests.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config.gcc: Add new object riscv-string.o.
riscv-string.cc.
* config/riscv/riscv-protos.h (riscv_expand_strlen):
New function.
* config/riscv/riscv.md (strlen<mode>): New expand INSN.
* config/riscv/riscv.opt: New flag 'minline-strlen'.
* config/riscv/t-riscv: Add new object riscv-string.o.
* config/riscv/thead.md (th_rev<mode>2): Export INSN name.
(th_rev<mode>2): Likewise.
(th_tstnbz<mode>2): New INSN.
* doc/invoke.texi: Document '-minline-strlen'.
* emit-rtl.cc (emit_likely_jump_insn): New helper function.
(emit_unlikely_jump_insn): Likewise.
* rtl.h (emit_likely_jump_insn): New prototype.
(emit_unlikely_jump_insn): Likewise.
* config/riscv/riscv-string.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
* gcc.target/riscv/xtheadbb-strlen.c: New test.
* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
* gcc.target/riscv/zbb-strlen-disabled.c: New test.
* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
* gcc.target/riscv/zbb-strlen.c: New test.
RISC-V: enable muti push and pop for Zcmp when shrink-wrap-separate is ineffective
So that zcmp can be enabled in -Os where
shrink-wrap-separate is not effective.
To force enabling zcmp multi push/pop in speed perfered case,
fno-shrink-wrap-separate has to be explictly given.
gcc/ChangeLog:
* config/riscv/riscv.cc
(riscv_avoid_shrink_wrapping_separate): wrap the condition check in
riscv_avoid_shrink_wrapping_separate.
(riscv_avoid_multi_push):avoid multi push if shrink_wrapping_separate
is active.
(riscv_get_separate_components):call riscv_avoid_shrink_wrapping_separate
Edwin Lu [Mon, 11 Sep 2023 16:57:37 +0000 (09:57 -0700)]
RISC-V: Update Types for RISC-V Instructions
Adds types to riscv instructions that were added or were
missed by the original patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628996.html
Edwin Lu [Mon, 11 Sep 2023 16:47:02 +0000 (09:47 -0700)]
RISC-V: Update Types for Vector Instructions
Adds types to vector instructions that were added after or were
missed by the original patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628594.html
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-7.c: New test.
RISC-V: Expand fixed-vlmax/vls vector permutation in targethook
When debugging FAIL: gcc.dg/pr92301.c execution test.
Realize a vls vector permutation situation failed to vectorize since early return false:
- /* For constant size indices, we dont't need to handle it here.
- Just leave it to vec_perm<mode>. */
- if (d->perm.length ().is_constant ())
- return false;
To avoid more potential failed vectorization case. Now expand it in targethook.
This patch adds support that tries to fold `MIN (poly, poly)` to
a constant. Consider the following C Code:
```
void foo2 (int* restrict a, int* restrict b, int n)
{
for (int i = 0; i < 3; i += 1)
a[i] += b[i];
}
```
Before this patch:
```
void foo2 (int * restrict a, int * restrict b, int n)
{
vector([4,4]) int vect__7.27;
vector([4,4]) int vect__6.26;
vector([4,4]) int vect__4.23;
unsigned long _32;
Recently three SPEC CPU 2017 benchmarks broke when using xtheadbb:
* 500.perlbench_r
* 525.x264_r
* 557.xz_r
Tracing the issue down revealed, that we emit a 'th.ext xN,xN,15,0'
for a extendqi<SUPERQI> insn, which is obviously wrong.
This patch splits the common 'extend<SHORT:mode><SUPERQI:mode>2_th_ext'
insn into two 'extendqi<SUPERQI>' and 'extendhi<SUPERQI>' insns,
which emit the right extension instruction.
Additionally, this patch adds test cases for these insns.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext):
Remove broken INSN.
(*extendhi<SUPERQI:mode>2_th_ext): New INSN.
(*extendqi<SUPERQI:mode>2_th_ext): New INSN.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-ext-2.c: New test.
* gcc.target/riscv/xtheadbb-ext-3.c: New test.
RISC-V: Remove incorrect earliest vsetvl post optimization[PR111313]
This patch removes the incorrect earliest poset vsetvl optimization,
such bug was found in vect-double-reduc-5.c which is runtime(execution fail) and also in PR111313.
For VLMAX intrinsics, we always emit a bogus patter which is vlmax_avl (see vector.md) to
occupy a scalar register which is used by the following RVV instruction which is VLMAX AVL.
Then for O2, O3, Ofast, earliest LCM works so well.
However, for O1, the vlmax_avl is not well optimized in the before pass which confused LCM earliest
so that we will end up with some redundant vsetvli zero,zero instructions in O1. (Note that O2 O3 Ofast are all good).
To elide those redundant vsetvli zero,zero, I added cleanup_earliest_vsetvls to elide those redundant vsetvls.
Now, after I review the implementation of this post optimizaiton again, I found it is incorrect and it is hard to
do the post optimizations for vsetvls that earliest LCM failed to eliminate.
Besides, such performance issues only happen in O1 or O0, such issues may not be serious.
So remove it and we may will find another way (E.g. adjust vlmax_avl pattern COST)
to optimize it if we really need to care about performance for O1.
Tsukasa OI [Wed, 30 Aug 2023 02:34:35 +0000 (02:34 +0000)]
RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support
'XVentanaCondOps' is a vendor extension from Ventana Micro Systems
containing two instructions for conditional move and will be supported on
their Veyron V1 CPU.
And most notably (for historical reasons), 'XVentanaCondOps' and the
standard 'Zicond' extension are functionally equivalent (only encodings and
instruction names are different).
* czero.eqz == vt.maskc
* czero.nez == vt.maskcn
This commit adds support for the 'XVentanaCondOps' extension by extending
'Zicond' extension support. With this, we can now reuse the optimization
using the 'Zicond' extension for the 'XVentanaCondOps' extension.
The specification for the 'XVentanaCondOps' extension is based on:
<https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.1/ventana-custom-extensions-v1.0.1.pdf>
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_ext_flag_table):
Parse 'XVentanaCondOps' extension.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): New.
(TARGET_XVENTANACONDOPS): Ditto.
(TARGET_ZICOND_LIKE): New to represent targets with conditional
moves like 'Zicond'. It includes RV64 + 'XVentanaCondOps'.
* config/riscv/riscv.cc (riscv_rtx_costs): Replace TARGET_ZICOND
with TARGET_ZICOND_LIKE.
(riscv_expand_conditional_move): Ditto.
* config/riscv/riscv.md (mov<mode>cc): Replace TARGET_ZICOND with
TARGET_ZICOND_LIKE.
* config/riscv/riscv.opt: Add new riscv_xventana_subext.
* config/riscv/zicond.md: Modify description.
(eqz_ventana): New to match corresponding czero instructions.
(nez_ventana): Ditto.
(*czero.<eqz>.<GPR><X>): Emit a 'XVentanaCondOps' instruction if
'Zicond' is not available but 'XVentanaCondOps' + RV64 is.
(*czero.<eqz>.<GPR><X>): Ditto.
(*czero.eqz.<GPR><X>.opt1): Ditto.
(*czero.nez.<GPR><X>.opt2): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xventanacondops-primitiveSemantics.c: New test,
* gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c: New
test to make sure that XVentanaCondOps instructions are disabled
on RV32.
* gcc.target/riscv/xventanacondops-xor-01.c: New test,
RISC-V: Remove unreasonable TARGET_64BIT for VLS modes with size = 64bit
Previously, I add TARGET_64BIT condtion to block VLS modes with size = 64bit in RV32 system
E.g. V8QI
Since I realized such modes may cause inferior codegen for some situations in RV32 system.
However, this is really quite ugly and it cause ICE for some cases in RV32:
FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (internal compiler error: in require, at machmode.h:313)
3937FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (test for excess errors)
For inferior codegen in RV32 system, we should try another reasonable approach to fix it.
riscv: xtheadbb: Fix xtheadbb-li-rotr test for rv32
The test was introduced recently and tests a RV64-only feature.
However, when testing an RV32 compiler, the test gets executed as well
and fails with "cc1: error: ABI requires '-march=rv32'".
This patch fixes this by adding '-mabi=lp64' (like it is done for
other RV64-only tests as well).
Retested with RV32 and RV64 to ensure this won't pop up again.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-li-rotr.c: Don't run for RV32.
RISC-V: Keep vlmax vector operators in simple form until split1 pass
This patch keep vlmax vector pattern in simple before split1 pass which
will allow more optimization (e.g. combine) before split1 pass.
This patch changes the vlmax pattern in autovec.md to define_insn_and_split
as much as possible and clean up some combine patterns that are no longer needed.
This patch also fixed PR111232 bug which was caused by a combined failed.
RISC-V: Part-3: Output .variant_cc directive for vector function
Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual[1] and
RISC-V ELF Specification[2].
RISC-V: Part-2: Save/Restore vector registers which need to be preversed
Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.
gcc/ChangeLog:
* config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): Pass riscv_cc.
* config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
(riscv_frame_info::reset): Reset new fileds.
(riscv_call_tls_get_addr): Pass riscv_cc.
(riscv_function_arg): Return riscv_cc for call patterm.
(get_riscv_cc): New function return riscv_cc from rtl call_insn.
(riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
(riscv_save_reg_p): Add vector callee-saved check.
(riscv_stack_align): Add vector save area comment.
(riscv_compute_frame_info): Ditto.
(riscv_restore_reg): Update for type change.
(riscv_for_each_saved_v_reg): New function save vector registers.
(riscv_first_stack_step): Handle funciton with vector callee-saved registers.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_output_mi_thunk): Pass riscv_cc.
(TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
* config/riscv/riscv.h (get_riscv_cc): Export get_riscv_cc function.
* config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.
RISC-V: Part-1: Select suitable vector registers for vector type args and returns
I post the vector register calling convention rules from in the proposal[1]
directly here:
v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.
Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.
Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.
The rules for passing vector arguments are as follows:
1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.
2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.
3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.
NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.
Vector values are returned in the same manner as the first named argument of
the same type would be passed.
* config/riscv/riscv-protos.h (builtin_type_p): New function for checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector arguments.
(riscv_arguments_is_vector_type_p): New function for check vector returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: New test.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
Tsukasa OI [Sun, 3 Sep 2023 12:39:47 +0000 (12:39 +0000)]
RISC-V: Fix Zicond ICE on large constants
Large constant cons and/or alt will trigger ICEs building GCC target
libraries (libgomp and libatomic) when the 'Zicond' extension is enabled.
For instance, zicond-ice-2.c (new test case in this commit) will cause
an ICE when SOME_NUMBER is 0x1000 or larger. While opposite numbers
corresponding cons/alt (two temp2 variables) are checked, cons/alt
themselves are not checked and causing 2 ICEs building
GCC target libraries as of this writing:
riscv: Synthesize all 11-bit-rotate constants with rori
Some constants can be built up using LI+RORI instructions.
The current implementation requires one of the upper 32-bits
to be a zero bit, which is not neccesary.
Let's drop this requirement in order to be able to synthesize
a constant like 0xffffffff00ffffffL.
The tests for LI+RORI are made more strict to detect regression
in the calculation of the LI constant and the rotation amount.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer_1): Don't
require one zero bit in the upper 32 bits for LI+RORI synthesis.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-li-rotr.c: New tests.
* gcc.target/riscv/zbb-li-rotr.c: Likewise.
Jeff Law [Tue, 5 Sep 2023 21:39:16 +0000 (15:39 -0600)]
RISC-V: Expose bswapsi for TARGET_64BIT
Various bswapsi tests are failing for rv64. More importantly, we're generating
crappy code.
Let's take the first test from bswapsi-1.c as an example.
> typedef unsigned int uint32_t;
>
> #define __const_swab32(x) ((uint32_t)( \
> (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) | \
> (((uint32_t)(x) & (uint32_t)0x0000ff00UL) << 8) | \
> (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >> 8) | \
> (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))
>
> /* This byte swap implementation is used by the Linux kernel and the
> GNU C library. */
>
> uint32_t
> swap32_a (uint32_t in)
> {
> return __const_swab32 (in);
> }
>
>
>
We generate this for rv64gc_zba_zbb_zbs:
> srliw a1,a0,24
> slliw a5,a0,24
> slliw a3,a0,8
> li a2,16711680
> li a4,65536
> or a5,a5,a1
> and a3,a3,a2
> addi a4,a4,-256
> srliw a0,a0,8
> or a5,a5,a3
> and a0,a0,a4
> or a0,a5,a0
> retUrgh!
After this patch we generate:
> rev8 a0,a0
> srai a0,a0,32
> ret
Clearly better.
The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that the
RTL expanders already know how to widen a bswap, which is definitely true. But
it's the case that failure to expose a bswapsi will cause the 32bit bswap
optimizations in gimple store merging to not trigger. Thus we get crappy code.
To fix this we expose bswapsi on TARGET_64BIT. gimple-store-merging then
detects the 32bit bswap idioms and generates suitable __builtin calls. The
expander will "FAIL" expansion for TARGET_64BIT which forces the generic
expander code to synthesize the operation (we could synthesize in here, but
that'd result in duplicate code).
Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the testsuite
without any regressions.
gcc/
* config/riscv/bitmanip.md (bswapsi2): Expose for TARGET_64BIT.
Edwin Lu [Tue, 5 Sep 2023 17:09:40 +0000 (10:09 -0700)]
RISC-V: Add Types to Un-Typed Risc-v Instructions
Updates risc-v instructions to ensure that no instruction is left
without a type attribute. Added new types "trap" and "cbo" (for
cache related instructions)
Tested for regressions using rv32/64 multilib with newlib/linux and
rv32/64 gcv for linux.
riscv: xtheadbb: Enable constant synthesis with th.srri
Some constants can be built up using rotate-right instructions.
The code that enables this can be found in riscv_build_integer_1().
However, this functionality is only available for Zbb, which
includes the rori instruction. This patch enables this also for
XTheadBb, which includes the th.srri instruction.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer_1): Enable constant
synthesis with rotate-right for XTheadBb.
Fixes: 1d5bc3285e8a ("[committed][RISC-V] Fix 20010221-1.c with zicond")
This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since in
failing case, pattern semantics were not matching with asm czero.nez
We start with the following src code snippet:
if (a == 0)
return 0;
else
return x;
}
which is equivalent to: "x = (a != 0) ? x : a" where x is NOT 0.
^^^^^^^^^^^^^^^^
and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2"
| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
| (if_then_else:DI (ne (reg/v:DI 134 [ a ])
| (const_int 0 [0]))
| (reg/v:DI 136 [ x ])
| (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}
The corresponding asm pattern generates
czero.nez x, x, a ; %0, %2, %1
which implies
"x = (a != 0) ? 0 : a"
clearly not what the pattern wants to do.
Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez if X
is not guaranteed to be 0.
However this can be fixed with a small tweak
"x = (a != 0) ? x : a"
is same as
"x = (a == 0) ? a : x"
and since middle operand is 0 when a == 0, it is equivalent to
"x = (a == 0) ? 0 : x"
which can be expressed with CZERO.eqz
before fix after fix
----------------- -----------------
li a5,1 li a5,1
ld a4,8(sp) ld a4,8(sp)
czero.nez a0,a4,a5 czero.eqz a0,a4,a5
The issue only happens at -O1 as at higher optimization levels, the
whole conditional move gets optimized away.
This fixes 4 testsuite failues in a zicond build:
FAIL: gcc.c-torture/execute/pr60003.c -O1 execution test
FAIL: gcc.dg/setjmp-3.c execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 -fpic execution test
Kito Cheng [Wed, 30 Aug 2023 07:10:44 +0000 (15:10 +0800)]
RISC-V: Emit .note.GNU-stack for non-linux target as well
We only emit that on linux target before, that not problem before,
however Qemu has fix a bug to make qemu user mode honor PT_GNU_STACK[1],
that will cause problem when we test baremetal with qemu.
So the straightforward is enable that as well for non-linux toolchian,
the price is that will increase few bytes for each binary.
Pan Li [Tue, 5 Sep 2023 10:28:03 +0000 (18:28 +0800)]
RISC-V: Support FP SGNJ autovec for VLS mode
This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.
Given below code example:
void test(float * restrict out, float * restrict in1, float * restrict in2)
{
for (int i = 0; i < 128; i++)
out[i] = __builtin_copysignf (in1[i], in2[i]);
}
Before this patch:
test:
csrr a4,vlenb
slli a4,a4,1
li a5,128
bleu a5,a4,.L2
mv a5,a4
.L2:
vsetvli zero,a5,e32,m8,ta,ma
vle32.v v8,0(a1)
vle32.v v16,0(a2)
vsetvli a4,zero,e32,m8,ta,ma
vfsgnj.vv v8,v8,v16
vsetvli zero,a5,e32,m8,ta,ma
vse32.v v8,0(a0)
ret
After this patch:
test:
li a5,128
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a1)
vle32.v v2,0(a2)
vfsgnj.vv v1,v1,v2
vse32.v v1,0(a0)
ret
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/autovec-vls.md (copysign<mode>3): New pattern.
* config/riscv/vector.md: Extend iterator for VLS.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: New macro.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c: New test.
As the generated code with -Oz consumes less size, there is nothing
wrong in the code generation. Instead, let's not run the xtheadcondmov
tests with -Oz.
Pan Li [Sat, 2 Sep 2023 08:42:27 +0000 (16:42 +0800)]
RISC-V: Support FP MAX/MIN autovec for VLS mode
This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.
Given below code example:
test (float *out, float *in1, float *in2)
{
for (int i = 0; i < 128; i++)
out[i] = in1[i] > in2[i] ? in1[i] : in2[i];
// Or out[i] = fmax (in1[i], in2[i]);
}
Before this patch:
test:
csrr a4,vlenb
slli a4,a4,1
li a5,128
bleu a5,a4,.L2
mv a5,a4
.L2:
vsetvli zero,a5,e32,m8,ta,ma
vle32.v v16,0(a1)
vle32.v v8,0(a2)
vsetvli a3,zero,e32,m8,ta,ma
vmfgt.vv v0,v16,v8
vmerge.vvm v8,v8,v16,v0
vsetvli zero,a5,e32,m8,ta,ma
vse32.v v8,0(a0)
ret
After this patch:
test:
li a5,128
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a1)
vle32.v v2,0(a2)
vfmax.vv v1,v1,v2
vse32.v v1,0(a0)
ret
This MAX/MIN autovec acts on function call like fmaxf/fmax in math.h
too. And it depends on the option -ffast-math.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
fmax/fmin
* config/riscv/vector.md: Add VLS modes to vfmax/vfmin.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: New macros.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New test.
RISC-V: Adjust expand_cond_len_{unary,binop,op} api
This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code`
to `unsigned icode` and use the icode directly to determine whether the
rounding_mode operand is required.