RISC-V: Remove unreasonable TARGET_64BIT for VLS modes with size = 64bit
Previously, I add TARGET_64BIT condtion to block VLS modes with size = 64bit in RV32 system
E.g. V8QI
Since I realized such modes may cause inferior codegen for some situations in RV32 system.
However, this is really quite ugly and it cause ICE for some cases in RV32:
FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (internal compiler error: in require, at machmode.h:313)
3937FAIL: gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c (test for excess errors)
For inferior codegen in RV32 system, we should try another reasonable approach to fix it.
riscv: xtheadbb: Fix xtheadbb-li-rotr test for rv32
The test was introduced recently and tests a RV64-only feature.
However, when testing an RV32 compiler, the test gets executed as well
and fails with "cc1: error: ABI requires '-march=rv32'".
This patch fixes this by adding '-mabi=lp64' (like it is done for
other RV64-only tests as well).
Retested with RV32 and RV64 to ensure this won't pop up again.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-li-rotr.c: Don't run for RV32.
RISC-V: Keep vlmax vector operators in simple form until split1 pass
This patch keep vlmax vector pattern in simple before split1 pass which
will allow more optimization (e.g. combine) before split1 pass.
This patch changes the vlmax pattern in autovec.md to define_insn_and_split
as much as possible and clean up some combine patterns that are no longer needed.
This patch also fixed PR111232 bug which was caused by a combined failed.
RISC-V: Part-3: Output .variant_cc directive for vector function
Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual[1] and
RISC-V ELF Specification[2].
RISC-V: Part-2: Save/Restore vector registers which need to be preversed
Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.
gcc/ChangeLog:
* config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): Pass riscv_cc.
* config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
(riscv_frame_info::reset): Reset new fileds.
(riscv_call_tls_get_addr): Pass riscv_cc.
(riscv_function_arg): Return riscv_cc for call patterm.
(get_riscv_cc): New function return riscv_cc from rtl call_insn.
(riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
(riscv_save_reg_p): Add vector callee-saved check.
(riscv_stack_align): Add vector save area comment.
(riscv_compute_frame_info): Ditto.
(riscv_restore_reg): Update for type change.
(riscv_for_each_saved_v_reg): New function save vector registers.
(riscv_first_stack_step): Handle funciton with vector callee-saved registers.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_output_mi_thunk): Pass riscv_cc.
(TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
* config/riscv/riscv.h (get_riscv_cc): Export get_riscv_cc function.
* config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.
RISC-V: Part-1: Select suitable vector registers for vector type args and returns
I post the vector register calling convention rules from in the proposal[1]
directly here:
v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.
Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.
Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.
The rules for passing vector arguments are as follows:
1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.
2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.
3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.
NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.
Vector values are returned in the same manner as the first named argument of
the same type would be passed.
* config/riscv/riscv-protos.h (builtin_type_p): New function for checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector arguments.
(riscv_arguments_is_vector_type_p): New function for check vector returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: New test.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
Tsukasa OI [Sun, 3 Sep 2023 12:39:47 +0000 (12:39 +0000)]
RISC-V: Fix Zicond ICE on large constants
Large constant cons and/or alt will trigger ICEs building GCC target
libraries (libgomp and libatomic) when the 'Zicond' extension is enabled.
For instance, zicond-ice-2.c (new test case in this commit) will cause
an ICE when SOME_NUMBER is 0x1000 or larger. While opposite numbers
corresponding cons/alt (two temp2 variables) are checked, cons/alt
themselves are not checked and causing 2 ICEs building
GCC target libraries as of this writing:
riscv: Synthesize all 11-bit-rotate constants with rori
Some constants can be built up using LI+RORI instructions.
The current implementation requires one of the upper 32-bits
to be a zero bit, which is not neccesary.
Let's drop this requirement in order to be able to synthesize
a constant like 0xffffffff00ffffffL.
The tests for LI+RORI are made more strict to detect regression
in the calculation of the LI constant and the rotation amount.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer_1): Don't
require one zero bit in the upper 32 bits for LI+RORI synthesis.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-li-rotr.c: New tests.
* gcc.target/riscv/zbb-li-rotr.c: Likewise.
Jeff Law [Tue, 5 Sep 2023 21:39:16 +0000 (15:39 -0600)]
RISC-V: Expose bswapsi for TARGET_64BIT
Various bswapsi tests are failing for rv64. More importantly, we're generating
crappy code.
Let's take the first test from bswapsi-1.c as an example.
> typedef unsigned int uint32_t;
>
> #define __const_swab32(x) ((uint32_t)( \
> (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) | \
> (((uint32_t)(x) & (uint32_t)0x0000ff00UL) << 8) | \
> (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >> 8) | \
> (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))
>
> /* This byte swap implementation is used by the Linux kernel and the
> GNU C library. */
>
> uint32_t
> swap32_a (uint32_t in)
> {
> return __const_swab32 (in);
> }
>
>
>
We generate this for rv64gc_zba_zbb_zbs:
> srliw a1,a0,24
> slliw a5,a0,24
> slliw a3,a0,8
> li a2,16711680
> li a4,65536
> or a5,a5,a1
> and a3,a3,a2
> addi a4,a4,-256
> srliw a0,a0,8
> or a5,a5,a3
> and a0,a0,a4
> or a0,a5,a0
> retUrgh!
After this patch we generate:
> rev8 a0,a0
> srai a0,a0,32
> ret
Clearly better.
The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that the
RTL expanders already know how to widen a bswap, which is definitely true. But
it's the case that failure to expose a bswapsi will cause the 32bit bswap
optimizations in gimple store merging to not trigger. Thus we get crappy code.
To fix this we expose bswapsi on TARGET_64BIT. gimple-store-merging then
detects the 32bit bswap idioms and generates suitable __builtin calls. The
expander will "FAIL" expansion for TARGET_64BIT which forces the generic
expander code to synthesize the operation (we could synthesize in here, but
that'd result in duplicate code).
Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the testsuite
without any regressions.
gcc/
* config/riscv/bitmanip.md (bswapsi2): Expose for TARGET_64BIT.
Edwin Lu [Tue, 5 Sep 2023 17:09:40 +0000 (10:09 -0700)]
RISC-V: Add Types to Un-Typed Risc-v Instructions
Updates risc-v instructions to ensure that no instruction is left
without a type attribute. Added new types "trap" and "cbo" (for
cache related instructions)
Tested for regressions using rv32/64 multilib with newlib/linux and
rv32/64 gcv for linux.
riscv: xtheadbb: Enable constant synthesis with th.srri
Some constants can be built up using rotate-right instructions.
The code that enables this can be found in riscv_build_integer_1().
However, this functionality is only available for Zbb, which
includes the rori instruction. This patch enables this also for
XTheadBb, which includes the th.srri instruction.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer_1): Enable constant
synthesis with rotate-right for XTheadBb.
Fixes: 1d5bc3285e8a ("[committed][RISC-V] Fix 20010221-1.c with zicond")
This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since in
failing case, pattern semantics were not matching with asm czero.nez
We start with the following src code snippet:
if (a == 0)
return 0;
else
return x;
}
which is equivalent to: "x = (a != 0) ? x : a" where x is NOT 0.
^^^^^^^^^^^^^^^^
and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2"
| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
| (if_then_else:DI (ne (reg/v:DI 134 [ a ])
| (const_int 0 [0]))
| (reg/v:DI 136 [ x ])
| (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}
The corresponding asm pattern generates
czero.nez x, x, a ; %0, %2, %1
which implies
"x = (a != 0) ? 0 : a"
clearly not what the pattern wants to do.
Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez if X
is not guaranteed to be 0.
However this can be fixed with a small tweak
"x = (a != 0) ? x : a"
is same as
"x = (a == 0) ? a : x"
and since middle operand is 0 when a == 0, it is equivalent to
"x = (a == 0) ? 0 : x"
which can be expressed with CZERO.eqz
before fix after fix
----------------- -----------------
li a5,1 li a5,1
ld a4,8(sp) ld a4,8(sp)
czero.nez a0,a4,a5 czero.eqz a0,a4,a5
The issue only happens at -O1 as at higher optimization levels, the
whole conditional move gets optimized away.
This fixes 4 testsuite failues in a zicond build:
FAIL: gcc.c-torture/execute/pr60003.c -O1 execution test
FAIL: gcc.dg/setjmp-3.c execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 -fpic execution test
Kito Cheng [Wed, 30 Aug 2023 07:10:44 +0000 (15:10 +0800)]
RISC-V: Emit .note.GNU-stack for non-linux target as well
We only emit that on linux target before, that not problem before,
however Qemu has fix a bug to make qemu user mode honor PT_GNU_STACK[1],
that will cause problem when we test baremetal with qemu.
So the straightforward is enable that as well for non-linux toolchian,
the price is that will increase few bytes for each binary.
Pan Li [Tue, 5 Sep 2023 10:28:03 +0000 (18:28 +0800)]
RISC-V: Support FP SGNJ autovec for VLS mode
This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.
Given below code example:
void test(float * restrict out, float * restrict in1, float * restrict in2)
{
for (int i = 0; i < 128; i++)
out[i] = __builtin_copysignf (in1[i], in2[i]);
}
Before this patch:
test:
csrr a4,vlenb
slli a4,a4,1
li a5,128
bleu a5,a4,.L2
mv a5,a4
.L2:
vsetvli zero,a5,e32,m8,ta,ma
vle32.v v8,0(a1)
vle32.v v16,0(a2)
vsetvli a4,zero,e32,m8,ta,ma
vfsgnj.vv v8,v8,v16
vsetvli zero,a5,e32,m8,ta,ma
vse32.v v8,0(a0)
ret
After this patch:
test:
li a5,128
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a1)
vle32.v v2,0(a2)
vfsgnj.vv v1,v1,v2
vse32.v v1,0(a0)
ret
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/autovec-vls.md (copysign<mode>3): New pattern.
* config/riscv/vector.md: Extend iterator for VLS.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: New macro.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnj-2.c: New test.
As the generated code with -Oz consumes less size, there is nothing
wrong in the code generation. Instead, let's not run the xtheadcondmov
tests with -Oz.
Pan Li [Sat, 2 Sep 2023 08:42:27 +0000 (16:42 +0800)]
RISC-V: Support FP MAX/MIN autovec for VLS mode
This patch would like to allow the VLS mode autovec for the
floating-point binary operation MAX/MIN.
Given below code example:
test (float *out, float *in1, float *in2)
{
for (int i = 0; i < 128; i++)
out[i] = in1[i] > in2[i] ? in1[i] : in2[i];
// Or out[i] = fmax (in1[i], in2[i]);
}
Before this patch:
test:
csrr a4,vlenb
slli a4,a4,1
li a5,128
bleu a5,a4,.L2
mv a5,a4
.L2:
vsetvli zero,a5,e32,m8,ta,ma
vle32.v v16,0(a1)
vle32.v v8,0(a2)
vsetvli a3,zero,e32,m8,ta,ma
vmfgt.vv v0,v16,v8
vmerge.vvm v8,v8,v16,v0
vsetvli zero,a5,e32,m8,ta,ma
vse32.v v8,0(a0)
ret
After this patch:
test:
li a5,128
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a1)
vle32.v v2,0(a2)
vfmax.vv v1,v1,v2
vse32.v v1,0(a0)
ret
This MAX/MIN autovec acts on function call like fmaxf/fmax in math.h
too. And it depends on the option -ffast-math.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
fmax/fmin
* config/riscv/vector.md: Add VLS modes to vfmax/vfmin.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: New macros.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New test.
RISC-V: Adjust expand_cond_len_{unary,binop,op} api
This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code`
to `unsigned icode` and use the icode directly to determine whether the
rounding_mode operand is required.
Robin Dapp [Thu, 31 Aug 2023 07:18:00 +0000 (09:18 +0200)]
RISC-V: Add vec_extract for BI -> QI.
This patch adds a vec_extract expander that extracts a QImode from a
vector mask mode. In doing so, it helps recognize a "live
operation"/extract last idiom for mask modes. It fixes the ICE in
tree-vect-live-6.c by circumventing the fallback code in
extract_bit_field_1. The problem there is still latent, though, and
needs to be addressed separately.
gcc/ChangeLog:
* config/riscv/autovec.md (vec_extract<mode>qi): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/live-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/live_run-2.c: New test.
Robin Dapp [Thu, 31 Aug 2023 07:16:35 +0000 (09:16 +0200)]
testsuite/vect: Make match patterns more accurate.
On some targets we fail to vectorize with the first type the vectorizer
tries but succeed with the second. This patch changes several regex
patterns to reflect that behavior.
Before we would look for a single occurrence of e.g.
"vect_recog_dot_prod_pattern" but would possible have two (one for each
attempted mode). The new pattern tries to match sequences where we
first have a "vect_recog_dot_prod_pattern" and a "succeeded" afterwards
while making sure there is no "failed" or "Re-trying" in between.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
vls floating-point autovec.
* config/riscv/vector-iterators.md: New iterator for
floating-point V and VLS.
* config/riscv/vector.md: Add VLS to floating-point binop.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h:
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c: New test.
However, we leverage unspec instead of use to consume the FRM register
because there are some restrictions from the combine pass. Some code
path of try_combine may require the XVECLEN(pat, 0) == 2 for the
recog_for_combine, and add new use will make the XVECLEN(pat, 0) == 3
and result in the vfwmacc optimization failure. For example, in the
test widen-complicate-5.c and widen-8.c
Finally, there will be other fma cases and they will be covered in
the underlying patches.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
Palmer Dabbelt [Thu, 11 May 2023 22:28:49 +0000 (15:28 -0700)]
RISC-V: Add vector_scalar_shift_operand
The vector shift immediates happen to have the same constraints as some
of the CSR-related operands, but it's a different usage. This adds a
name for them, so I don't get confused again next time.
gcc/ChangeLog:
* config/riscv/autovec.md (shifts): Use
vector_scalar_shift_operand.
* config/riscv/predicates.md (vector_scalar_shift_operand): New
predicate.
Juzhe-Zhong [Thu, 31 Aug 2023 12:23:44 +0000 (20:23 +0800)]
RISC-V: Add Vector cost model framework for RVV
Hi, currently RVV vectorization only support picking LMUL according to
compile option --param=riscv-autovec-lmul= which is no ideal.
Compiler should be able to pick optimal LMUL/vectorization factor to
vectorize the loop according to the loop_vec_info and SSA-based register
pressure analysis.
Now, I figure out current GCC cost model provide the approach that we
can choose LMUL/vectorization factor by adjusting the COST.
This patch is just add the minimum COST model framework which is still
applying the default cost model (No vector codes changed from before).
Regression all pased and no difference.
gcc/ChangeLog:
* config.gcc: Add vector cost model framework for RVV.
* config/riscv/riscv.cc (riscv_vectorize_create_costs): Ditto.
(TARGET_VECTORIZE_CREATE_COSTS): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-vector-costs.cc: New file.
* config/riscv/riscv-vector-costs.h: New file.
Lehua Ding [Thu, 31 Aug 2023 07:22:57 +0000 (15:22 +0800)]
RISC-V: Change vsetvl tail and mask policy to default policy
This patch change the vsetvl policy to default policy
(returned by get_prefer_mask_policy and get_prefer_tail_policy) instead
fixed policy. Any policy is now returned, allowing change to agnostic
or undisturbed. In the future, users may be able to control the default
policy, such as keeping agnostic by compiler options.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (IS_AGNOSTIC): Move to here.
* config/riscv/riscv-v.cc (gen_no_side_effects_vsetvl_rtx):
Change to default policy.
* config/riscv/riscv-vector-builtins-bases.cc: Change to default policy.
* config/riscv/riscv-vsetvl.h (IS_AGNOSTIC): Delete.
* config/riscv/riscv.cc (riscv_print_operand): Use IS_AGNOSTIC to test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Adjust.
* gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-24.c: New test.
Lehua Ding [Wed, 30 Aug 2023 10:03:20 +0000 (18:03 +0800)]
RISC-V: Refactor and clean emit_{vlmax,nonvlmax}_xxx functions
This patch refactor the code of emit_{vlmax,nonvlmax}_xxx functions.
These functions are used to generate RVV insn. There are currently 31
such functions and a few duplicates. The reason so many functions are
needed is because there are more types of RVV instructions. There are
patterns that don't have mask operand, patterns that don't have merge
operand, and patterns that don't need a tail policy operand, etc.
Previously there was the insn_type enum, but it's value was just used
to indicate how many operands were passed in by caller. The rest of
the operands information is scattered throughout these functions.
For example, emit_vlmax_fp_insn indicates that a rounding mode operand
of FRM_DYN should also be passed, emit_vlmax_merge_insn means that
there is no mask operand or mask policy operand.
I introduced a new enum insn_flags to indicate some properties of these
RVV patterns. These insn_flags are then used to define insn_type enum.
For example for the defintion of WIDEN_TERNARY_OP:
This flags mean the RVV pattern has no merge operand. This flags only apply
to vwmacc instructions. After defining the desired insn_type, all the
emit_{vlmax,nonvlmax}_xxx functions are unified into three functions:
Juzhe-Zhong [Tue, 29 Aug 2023 09:39:33 +0000 (17:39 +0800)]
RISC-V: Remove movmisalign pattern for VLA modes
This patch fixed this bunch of failures in "vect" testsuite:
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-1.c execution test
FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-2.c execution test
Die Li [Tue, 29 Aug 2023 08:37:46 +0000 (08:37 +0000)]
RISC-V: support cm.mva01s cm.mvsa01 in zcmp
Signed-off-by: Die Li <lidie@eswincomputing.com> Co-Authored-By: Fei Gao <gaofei@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/peephole.md: New pattern.
* config/riscv/predicates.md (a0a1_reg_operand): New predicate.
(zcmp_mv_sreg_operand): New predicate.
* config/riscv/riscv.md: New predicate.
* config/riscv/zc.md (*mva01s<X:mode>): New pattern.
(*mvsa01<X:mode>): New pattern.
Fei Gao [Tue, 29 Aug 2023 08:37:44 +0000 (08:37 +0000)]
RISC-V: support cm.push cm.pop cm.popret in zcmp
Zcmp can share the same logic as save-restore in stack allocation: pre-allocation
by cm.push, step 1 and step 2.
Pre-allocation not only saves callee saved GPRs, but also saves callee saved FPRs and
local variables if any.
Please be noted cm.push pushes ra, s0-s11 in reverse order than what save-restore does.
So adaption has been done in .cfi directives in my patch.
gcc/ChangeLog:
* config/riscv/iterators.md
(slot0_offset): slot 0 offset in stack GPRs area in bytes
(slot1_offset): slot 1 offset in stack GPRs area in bytes
(slot2_offset): likewise
(slot3_offset): likewise
(slot4_offset): likewise
(slot5_offset): likewise
(slot6_offset): likewise
(slot7_offset): likewise
(slot8_offset): likewise
(slot9_offset): likewise
(slot10_offset): likewise
(slot11_offset): likewise
(slot12_offset): likewise
* config/riscv/predicates.md
(stack_push_up_to_ra_operand): predicates of stack adjust pushing ra
(stack_push_up_to_s0_operand): predicates of stack adjust pushing ra, s0
(stack_push_up_to_s1_operand): likewise
(stack_push_up_to_s2_operand): likewise
(stack_push_up_to_s3_operand): likewise
(stack_push_up_to_s4_operand): likewise
(stack_push_up_to_s5_operand): likewise
(stack_push_up_to_s6_operand): likewise
(stack_push_up_to_s7_operand): likewise
(stack_push_up_to_s8_operand): likewise
(stack_push_up_to_s9_operand): likewise
(stack_push_up_to_s11_operand): likewise
(stack_pop_up_to_ra_operand): predicates of stack adjust poping ra
(stack_pop_up_to_s0_operand): predicates of stack adjust poping ra, s0
(stack_pop_up_to_s1_operand): likewise
(stack_pop_up_to_s2_operand): likewise
(stack_pop_up_to_s3_operand): likewise
(stack_pop_up_to_s4_operand): likewise
(stack_pop_up_to_s5_operand): likewise
(stack_pop_up_to_s6_operand): likewise
(stack_pop_up_to_s7_operand): likewise
(stack_pop_up_to_s8_operand): likewise
(stack_pop_up_to_s9_operand): likewise
(stack_pop_up_to_s11_operand): likewise
* config/riscv/riscv-protos.h
(riscv_zcmp_valid_stack_adj_bytes_p):declaration
* config/riscv/riscv.cc (struct riscv_frame_info): comment change
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_multi_push_regs_count): num of regs in multi-push
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place
(riscv_save_libcall_count): no functional change
(riscv_compute_frame_info): add zcmp frame info
(riscv_for_each_saved_reg): save or restore fprs in specified slot for zcmp
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_gen_multi_push_pop_insn): gen function for multi push and pop
(get_multi_push_fpr_mask): get mask for the fprs pushed by cm.push
(riscv_expand_prologue): allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): allocate stack by cm.pop[ret]
(zcmp_base_adj): calculate stack adjustment base size
(zcmp_additional_adj): calculate stack adjustment additional size
(riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment valid
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
(CALLEE_SAVED_FREG_NUMBER): get x of fsx(fs0 ~ fs11)
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file. machine description for zcmp
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_push_fpr.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.
Tsukasa OI [Tue, 29 Aug 2023 02:41:44 +0000 (02:41 +0000)]
RISC-V: Make arch-24.c to test "success" case
arch-24.c and arch-25.c are exactly the same and redundant. The author
suspects that the original author intended to test two base ISAs (RV32I and
RV64I) so this commit changes arch-24.c to test that RV32I+Zcf does not
cause any errors.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-24.c: Test RV32I+Zcf instead.
Philipp Tomsich [Tue, 29 Aug 2023 22:48:24 +0000 (16:48 -0600)]
RISC-V: Use splitter to generate zicond in another case
So in analyzing Ventana's internal tree against the trunk it became apparent
that the current zicond code is missing a case that helps coremark's bitwise
CRC implementation.
Here's a minimized testcase:
long xor1(long crc, long poly)
{
if (crc & 1)
crc ^= poly;
return crc;
}
ie, it's just a conditional xor.
We generate this:
andi a5,a0,1
neg a5,a5
and a5,a5,a1
xor a0,a5,a0
ret
A splitter can rewrite the above into a suitable if-then-else construct and
squeeze an instruction out of that pesky CRC loop. Sadly it doesn't really
help anything else.
The patch includes two variants. One that uses ZBS, the other uses an ANDI
logical to produce the input condition.
gcc/
* config/riscv/zicond.md: New splitters to rewrite single bit
sign extension as the condition to a czero in the desired form.
gcc/testsuite
* gcc.target/riscv/zicond-xor-01.c: New test.
Jin Ma [Tue, 29 Aug 2023 17:01:55 +0000 (11:01 -0600)]
RISC-V: Added zvfh support for zfa extensions.
This is a follow-up for the zfa extension, added according to the recommendations
for zvfh and patch of Tsukasa OI <research_trasio@irq.a4lg.com>. At the same time,
zfa-fli-5.c of which is also based on the patch.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
zvfh can generate zfa extended instruction fli.h, just like zfh.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zfa-fli-7.c: Change fa0 to fa\[0-9\] to avoid
assigning register numbers that are non-zero.
* gcc.target/riscv/zfa-fli-8.c: Ditto.
* gcc.target/riscv/zfa-fli-5.c: New test.
Edwin Lu [Tue, 29 Aug 2023 15:34:13 +0000 (08:34 -0700)]
RISC-V: generate builtin macro for compilation with strict alignment
Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false.
Tested for regressions using rv32/64 multilib with newlib/linux
gcc/ChangeLog:
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Generate
__riscv_unaligned_avoid with value 1 or
__riscv_unaligned_slow with value 1 or
__riscv_unaligned_fast with value 1
* config/riscv/riscv.cc (riscv_option_override): Define
riscv_user_wants_strict_align. Set
riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align
gcc/testsuite/ChangeLog:
* gcc.target/riscv/attribute-1.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/attribute-4.c: Check for
__riscv_unaligned_avoid
* gcc.target/riscv/attribute-5.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.
* gcc.target/riscv/predef-align-6.c: New test.
Reviewed-by: Jeff Law <jlaw@ventanamicro.com> Signed-off-by: Edwin Lu <ewlu@rivosinc.com> Co-authored-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit 6e23440b5df4011bbe1dbee74d47641125dd7d16)
Edwin Lu [Tue, 29 Aug 2023 15:30:10 +0000 (08:30 -0700)]
RISC-V: Add Types to Un-Typed Vector Instructions
Updates vector instructions to ensure that no instruction is left
without a type attribute. Create a placeholder type "vector" for
instructions where a type isn't clear
Tested for regressions using rv32/rv64 gc/gcv multilib with newlib/linux.
The below RTL is not well handled in riscv_legitimize_const_move, and
then fall through to the default pass. Then the
default force_const_mem will NULL_RTX, and will have ICE when operating
one the NULL_RTX.
Tsukasa OI [Sat, 12 Aug 2023 00:38:18 +0000 (00:38 +0000)]
RISC-V: Add stub support for existing extensions (unprivileged)
After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).
To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.
This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.
Tsukasa OI [Sat, 12 Aug 2023 00:38:18 +0000 (00:38 +0000)]
RISC-V: Add stub support for existing extensions (vendor)
After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).
To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.
This commit adds stub supported vendor extensions to
riscv_ext_version_table (no riscv_implied_info entries to add; all
information is copied from Binutils' bfd/elfxx-riscv.c).
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add stub support for all vendor extensions supported by Binutils.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-30.c: New test for a stub
vendor extension 'XVentanaCondOps'.
Tsukasa OI [Sat, 12 Aug 2023 00:38:18 +0000 (00:38 +0000)]
RISC-V: Add stub support for existing extensions (privileged)
After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).
To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.
As a start, this commit adds stub supported *privileged* extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c).
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from privileged extensions.
(riscv_ext_version_table): Add stub support for all privileged
extensions supported by Binutils.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-29.c: New test for a stub privileged
extension 'Smstateen' with some implications.
Tsukasa OI [Fri, 11 Aug 2023 06:09:34 +0000 (06:09 +0000)]
RISC-V: Make PR 102957 tests more comprehensive
Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).
However, the latter change will partially break the original intent of PR
102957 test case because we wanted to make sure that we can parse a valid
two-letter extension name.
Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).
This commit adds pr102957-2.c to make sure that there will be no errors if
we parse a valid two-letter extension name.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr102957-2.c: New test case using the 'Zk'
extension to continue testing whether we can use valid two-letter
extensions.
Lehua Ding [Fri, 25 Aug 2023 07:50:15 +0000 (15:50 +0800)]
RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}
This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.
Tsukasa OI [Mon, 28 Aug 2023 21:13:53 +0000 (15:13 -0600)]
RISC-V: Fix documentation of __builtin_riscv_pause
This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).
gcc/ChangeLog:
* doc/extend.texi: Fix the description of __builtin_riscv_pause.
Tsukasa OI [Mon, 28 Aug 2023 21:04:13 +0000 (15:04 -0600)]
RISC-V: __builtin_riscv_pause for all environment
The "pause" RISC-V hint instruction requires the 'Zihintpause' extension (in
the assembler). However, GCC emits "pause" unconditionally, making an
assembler error while compiling code with __builtin_riscv_pause while the
'Zihintpause' extension disabled.
However, the "pause" instruction code (0x0100000f) is a HINT and emitting its
instruction code is safe in any environment.
This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x0100000f" instead of "pause" only if the extension is disabled (making
the diagnostics better).
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Implement the 'Zihintpause' extension, version 2.0.
(riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause".
(riscv_builtins) : Add "pause" extension.
* config/riscv/riscv-opts.h (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Adjust output based on
TARGET_ZIHINTPAUSE.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c: New test when the 'Zihintpause'
extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
* gcc.target/riscv/zihintpause-noarch.c: New test when the 'Zihintpause'
extension is disabled.
Juzhe-Zhong [Fri, 25 Aug 2023 03:07:20 +0000 (11:07 +0800)]
RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest fusion.
I do the refactor for the following reasons:
1. Current implementation of phase 3 is doing too many things which makes the code quality
quite messy and not easy to maintain.
2. The demand fusion I do previously is we explicitly make the fusion including how to fuse
VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion point (location)
whether it is correct and optimal...etc.
We are dong these things too much so I added these following functions:
to make sure the VSETV fusion is optimal and correct. I found in may downstream testing it is
not the reliable and optimal approach.
Instead, this patch is to use 'compute_earliest' which is the function of LCM to fuse multiple
'compatible' VSETVL demand info if they are having same earliest edge. We let LCM decide almost
everything of demand fusion for us. The only thing we do (Not the LCM do) is just checking the
VSETVLs demand info are compatible or not. That's all we need to do.
I belive such approach is much more reliable and optimal than before (We have many testcases already to check this refactor patch).
3. Using LCM approach to do the demand fusion is more reliable and better CFG than before.
...
Here is the basics of this patch approach:
Consider this following case:
for
for
for
...
for
if (...)
VSETVL 1 demand: RATIO = 32 and TU policy.
else if (...)
VSETVL 2 demand: SEW = 16.
else
VSETVL 3 demand: MU policy.
- 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 and VSETVL 3.
They are having same earliest edge which is outside the 1th inner-most loop.
- Then, we check these 3 VSETVL demand info are compatible so fuse them into a single VSETVL info:
demand SEW = 16, LMUL = MF2, TU, MU.
- Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) will hoist such VSETVL
to the outer-most loop. So that we can get optimal codegen.
Jeff Law [Sun, 27 Aug 2023 18:52:38 +0000 (12:52 -0600)]
RISC-V: Fix spill-12 test
Jivan's recent work on IRA results in more efficient code for this test. This
adjusts the expected output for the removal of 5 instructions and conversion of
an addi into a simple mv.
Jeff Law [Sun, 27 Aug 2023 18:38:30 +0000 (12:38 -0600)]
RISC-V: Fix xtheadcondmov-indirect.c
The pressure sensitive scheduling change perturbs the output ever so slightly
for this test. Seemed easiest to just turn that off rather than generalize the
expected output enough to work across all the relevant optimization options.
gcc/testsuite/
* gcc.target/riscv/xtheadcondmov-indirect.c: Turn off pressure
sensitive scheduling.
There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue.
vsetvl issue is not included in this patch but will be fixed soon.
gcc/ChangeLog:
* config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.
Edwin Lu [Fri, 25 Aug 2023 23:35:43 +0000 (16:35 -0700)]
RISC-V: Add Types to Un-Typed Sync Instructions:
Updates the sync instructions to ensure that no insn is left without
a type attribute. Updates a total of 9 insns to have type "atomic"
or type "multi" based on number of assembly instructions generated
Tested for regressions using rv32/64 multilib with newlib/linux.
gcc/Changelog:
* config/riscv/sync-rvwmo.md: updated types to "multi" or
"atomic" based on number of assembly lines generated
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise
Jeff Law [Fri, 25 Aug 2023 22:34:17 +0000 (16:34 -0600)]
RISC-V: Make stack_save_restore tests more robust
Spurred by Jivan's patch and a desire for cleaner testresults, I went ahead and
make the stack_save_restore tests independent of the precise stack size by
using a regexp.
Jin Ma [Fri, 25 Aug 2023 21:34:40 +0000 (15:34 -0600)]
[PATCH v10] RISC-V: Add support for the Zfa extension
This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb
The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html
What needs special explanation is:
1, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to
accelerate the processing of JavaScript Numbers.", so it seems that no implementation
is required.
2, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.
Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and
fmaxm<hf\sf\df>3 to prepare for later.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add zfa extension version, which depends on
the F extension.
* config/riscv/constraints.md (zfli): Constrain the floating point number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is
not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is
required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate
in assembly.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm<mode>3): New.
(fmaxm<mode>3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_zfa): New.
* config/riscv/riscv.opt: New.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fli-1.c: New test.
* gcc.target/riscv/zfa-fli-2.c: New test.
* gcc.target/riscv/zfa-fli-3.c: New test.
* gcc.target/riscv/zfa-fli-4.c: New test.
* gcc.target/riscv/zfa-fli-6.c: New test.
* gcc.target/riscv/zfa-fli-7.c: New test.
* gcc.target/riscv/zfa-fli-8.c: New test.
Vineet Gupta [Mon, 7 Aug 2023 20:45:29 +0000 (13:45 -0700)]
RISC-V: Enable Hoist to GCSE simple constants
Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates. For a simple const (say 6 which needs seperate insn "LI 6")
backend currently returns 0, causing Hoist to bail and elide GCSE.
Note that constants requiring more than 1 insns to setup were working
fine since riscv_rtx_costs () was returning non-zero (although that
itself might need refining: see bugzilla 111139).
To keep testsuite parity, some V tests need updating which started failing
in the new costing regime.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_rtx_costs): Adjust const_int
cost. Add some comments about different constants handling.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Remove test
for Jump.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.
Patrick O'Neill [Thu, 24 Aug 2023 16:56:01 +0000 (09:56 -0700)]
RISC-V: Move vector-abi testcases into rvv/base folder
Resolves failures like this on rv32gcv linux:
compiler exited with status 1
output is:
In file included from /tc-baseline/build-linux-gcv/sysroot/usr/include/features.h:515,
from /tc-baseline/build-linux-gcv/sysroot/usr/include/bits/libc-header-start.h:33,
from /tc-baseline/build-linux-gcv/sysroot/usr/include/stdint.h:26,
from /tc-baseline/build-linux-gcv/lib/gcc/riscv32-unknown-linux-gnu/14.0.0/include/stdint.h:9,
from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/stdint.h:9,
from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/riscv_vector.h:28,
from /tc-baseline/gcc/gcc/testsuite/gcc.target/riscv/vector-abi-1.c:4:
/tc-baseline/build-linux-gcv/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.
Tested using:
rv{32/64}{gc/gcv} newlib
rv{32/64}gcv linux
gcc/testsuite/ChangeLog:
* gcc.target/riscv/vector-abi-1.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-1.c: ...here.
* gcc.target/riscv/vector-abi-2.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-2.c: ...here.
* gcc.target/riscv/vector-abi-3.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-3.c: ...here.
* gcc.target/riscv/vector-abi-4.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-4.c: ...here.
* gcc.target/riscv/vector-abi-5.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-5.c: ...here.
* gcc.target/riscv/vector-abi-6.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-6.c: ...here.
* gcc.target/riscv/vector-abi-7.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-7.c: ...here.
* gcc.target/riscv/vector-abi-8.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-8.c: ...here.
* gcc.target/riscv/vector-abi-9.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-9.c: ...here.
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
(cherry picked from commit 3ea624da71095cd480c31983d13db45bd9c5a738)
This patch is depending on middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html
We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns.
Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the COND_LEN_FMS/COND_LEN_FNMS STMT fold.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_preferred_else_value): Remove it since
it forbid COND_LEN_FMS/COND_LEN_FNMS STMT fold.
(TARGET_PREFERRED_ELSE_VALUE): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adapt test.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-9.c: New test.
Robin Dapp [Fri, 18 Aug 2023 13:57:16 +0000 (15:57 +0200)]
RISC-V: Enable pressure-aware scheduling by default.
this patch enables pressure-aware scheduling for riscv. There have been
various requests for it so I figured I'd just go ahead and send
the patch.
There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
As cost and scheduling models mature I expect the situation to improve
and for now I think it's generally favorable to enable pressure-aware
scheduling so we can work with it rather than trying to find every
possible problem in advance.
Robin Dapp [Tue, 15 Aug 2023 15:15:58 +0000 (17:15 +0200)]
RISC-V: Fix reduc_strict_run-1 test case.
This patch fixes the reduc_strict_run-1 testcase by introducing
a variable that holds the reference result. This is necessary
because in presence of _Float16 emulation an intermediate
result used in a comparison is computed in higher precision.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c:
Add variable to hold reference result.
Juzhe-Zhong [Tue, 22 Aug 2023 01:58:34 +0000 (09:58 +0800)]
gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold
Hi, Richard and Richi.
Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
Consider this following case:
__attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \
TYPE *__restrict a, \
TYPE *__restrict b, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] -= a[i] * b[i]; \
}
The following fixes placement of shift operand sanitization with
MIN when the original shift operand was external but the actual
one is not.
PR tree-optimization/111128
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Emit external shift operand inline if we promoted it with
another pattern stmt.
Of particular interest is the value in a0 when we call consume. We compute that
horribly inefficiently. If we back-substitute from the final assignment to a0
we get...
That's a pretty convoluted way to compute sp - 3990616.
Something like this would be notably better (not great, but we need both the
stack adjustment and the address of the object to pass to consume):
addi sp,sp,-16
sd ra,8(sp)
li t0,-4001792
addi t0,t0,1792
add sp,sp,t0
li a0,4096
addi a0,a0,-96
add a0,sp,a0
call consume
The problem is LRA's elimination code is not handling the case where we have
(plus (reg1) (reg2) where reg1 is an eliminable register and reg2 has a known
equivalency, particularly a constant.
If we can determine that reg2 is equivalent to a constant and treat (plus
(reg1) (reg2)) in the same way we'd treat (plus (reg1) (const_int)) then we can
get the desired code.
This eliminates about 19b instructions, or roughly 1% for deepsjeng on rv64.
There are improvements elsewhere, but they're relatively small. This may
ultimately lessen the value of Manolis's fold-mem-offsets patch. So we'll have
to evaluate that again once he posts a new version.
Bootstrapped and regression tested on x86_64 as well as bootstrapped on rv64.
Earlier versions have been tested against spec2017. Pre-approved by Vlad in a
private email conversation (thanks Vlad!).
Committed to the trunk,
gcc/
* lra-eliminations.cc (eliminate_regs_in_insn): Use equivalences to
to help simplify code further.
Zhangjin Liao [Wed, 23 Aug 2023 14:02:47 +0000 (08:02 -0600)]
[PATCH] RISC-V:add a more appropriate type attribute
Due to the more accurate type attribute added to the clz, ctz, and pcnt
operations in https://github.com/gcc-mirror/gcc/commit/07e2576d6f3 the
same type attribute should be used here.
gcc/ChangeLog:
* config/riscv/bitmanip.md (*<bitmanip_optab>disi2_sext): Add a more
appropriate type attribute.
This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend.
For this C code:
void
test_3 (float *__restrict a, float *__restrict b, int *__restrict pred, int n)
{
for (int i = 0; i < n; i += 1)
{
a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i];
}
}
Before this patch:
...
vsetvli a7,zero,e32,m1,ta,ma
vfabs.v v2,v2
vmerge.vvm v1,v1,v2,v0
...
After this patch:
...
vsetvli a7,zero,e32,m1,ta,mu
vfabs.v v1,v2,v0.t
...
For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns
is enough.
For the FP abs pattern, We need to change the definition of `abs<mode>2` and
`@vcond_mask_<mode><vm>` pattern from define_expand to define_insn_and_split
in order to fuse them into a new pattern `*cond_abs<mode>` at the combine pass.
A fusion process similar to the one below:
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.