Yang Yujie [Wed, 11 Oct 2023 09:59:53 +0000 (17:59 +0800)]
LoongArch: Adjust makefile dependency for loongarch headers.
gcc/ChangeLog:
* config.gcc: Add loongarch-driver.h to tm_files.
* config/loongarch/loongarch.h: Do not include loongarch-driver.h.
* config/loongarch/t-loongarch: Append loongarch-multilib.h to $(GTM_H)
instead of $(TM_H) for building generator programs.
Paul Thomas [Thu, 12 Oct 2023 06:26:59 +0000 (07:26 +0100)]
Fortran: Set hidden string length for pointer components [PR67740].
2023-10-11 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/67740
* trans-expr.cc (gfc_trans_pointer_assignment): Set the hidden
string length component for pointer assignment to character
pointer components.
gcc/testsuite/
PR fortran/67740
* gfortran.dg/pr67740.f90: New test
Kewen Lin [Thu, 12 Oct 2023 05:05:03 +0000 (00:05 -0500)]
rs6000: Make 32 bit stack_protect support prefixed insn [PR111367]
As PR111367 shows, with prefixed insn supported, some of
checkings consider it's able to leverage prefixed insn
for stack protect related load/store, but since we don't
actually change the emitted assembly for 32 bit, it can
cause the assembler error as exposed.
Mike's commit r10-4547-gce6a6c007e5a98 has already handled
the 64 bit case (DImode), this patch is to treat the 32
bit case (SImode) by making use of mode iterator P and
ptrload attribute iterator, also fixes the constraints
to match the emitted operand formats.
PR target/111367
gcc/ChangeLog:
* config/rs6000/rs6000.md (stack_protect_setsi): Support prefixed
instruction emission and incorporate to stack_protect_set<mode>.
(stack_protect_setdi): Rename to ...
(stack_protect_set<mode>): ... this, adjust constraint.
(stack_protect_testsi): Support prefixed instruction emission and
incorporate to stack_protect_test<mode>.
(stack_protect_testdi): Rename to ...
(stack_protect_test<mode>): ... this, adjust constraint.
Kewen Lin [Thu, 12 Oct 2023 05:04:57 +0000 (00:04 -0500)]
vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE
For VMAT_CONTIGUOUS_REVERSE, the transform code in function
vectorizable_store generates a VEC_PERM_EXPR stmt before
storing, but it's never considered in costing.
This patch is to make it consider vec_perm in costing, it
adjusts the order of transform code a bit to make it easy
to early return for costing_p.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Consider generated
VEC_PERM_EXPR stmt for VMAT_CONTIGUOUS_REVERSE in costing as
vec_perm.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c: New test.
Kewen Lin [Thu, 12 Oct 2023 05:04:57 +0000 (00:04 -0500)]
vect: Get rid of vect_model_store_cost
This patch is to eventually get rid of vect_model_store_cost,
it adjusts the costing for the remaining memory access types
VMAT_CONTIGUOUS{, _DOWN, _REVERSE} by moving costing close
to the transform code. Note that in vect_model_store_cost,
there is one special handling for vectorizing a store into
the function result, since it's extra penalty and the
transform part doesn't have it, this patch keep it alone.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_store_cost): Remove.
(vectorizable_store): Adjust the costing for the remaining memory
access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE}.
Kewen Lin [Thu, 12 Oct 2023 05:04:57 +0000 (00:04 -0500)]
vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE
This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE
in function vectorizable_store. We don't call function
vect_model_store_cost for it any more. It's the case of
interleaving stores, so it skips all stmts excepting for
first_stmt_info, consider the whole group when costing
first_stmt_info. This patch shouldn't have any functional
changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_store_cost): Assert it will never
get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related
handlings.
(vectorizable_store): Adjust the cost handling on
VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost.
Kewen Lin [Thu, 12 Oct 2023 05:04:57 +0000 (00:04 -0500)]
vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANES
This patch adjusts the cost handling on VMAT_LOAD_STORE_LANES
in function vectorizable_store. We don't call function
vect_model_store_cost for it any more. It's the case of
interleaving stores, so it skips all stmts excepting for
first_stmt_info, consider the whole group when costing
first_stmt_info. This patch shouldn't have any functional
changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_store_cost): Assert it will never
get VMAT_LOAD_STORE_LANES.
(vectorizable_store): Adjust the cost handling on VMAT_LOAD_STORE_LANES
without calling vect_model_store_cost. Factor out new lambda function
update_prologue_cost.
Kewen Lin [Thu, 12 Oct 2023 05:04:57 +0000 (00:04 -0500)]
vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
This patch adjusts the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP in function vectorizable_store. We
don't call function vect_model_store_cost for them any more.
Like what we improved for PR82255 on load side, this change
helps us to get rid of unnecessary vec_to_scalar costing
for some case with VMAT_STRIDED_SLP. One typical test case
gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c has been
associated. And it helps some cases with some inconsistent
costing too.
Besides, this also special-cases the interleaving stores
for these two affected memory access types, since for the
interleaving stores the whole chain is vectorized when the
last store in the chain is reached, the other stores in the
group would be skipped. To keep consistent with this and
follows the transforming handlings like iterating the whole
group, it only costs for the first store in the group.
Ideally we can only cost for the last one but it's not
trivial and using the first one is actually equivalent.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get
VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their
related handlings.
(vectorizable_store): Adjust the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP without calling vect_model_store_cost.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c: New test.
Kewen Lin [Thu, 12 Oct 2023 05:04:57 +0000 (00:04 -0500)]
vect: Simplify costing on vectorizable_scan_store
This patch is to simplify the costing on the case
vectorizable_scan_store without calling function
vect_model_store_cost any more.
I considered if moving the costing into function
vectorizable_scan_store is a good idea, for doing
that, we have to pass several variables down which
are only used for costing, and for now we just
want to keep the costing as the previous, haven't
tried to make this costing consistent with what the
transforming does, so I think we can leave it for now.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Adjust costing on
vectorizable_scan_store without calling vect_model_store_cost
any more.
Kewen Lin [Thu, 12 Oct 2023 05:04:56 +0000 (00:04 -0500)]
vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTER
This patch adjusts the cost handling on VMAT_GATHER_SCATTER
in function vectorizable_store (all three cases), then we
won't depend on vect_model_load_store for its costing any
more. This patch shouldn't have any functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get
VMAT_GATHER_SCATTER any more, remove VMAT_GATHER_SCATTER related
handlings and the related parameter gs_info.
(vect_build_scatter_store_calls): Add the handlings on costing with
one more argument cost_vec.
(vectorizable_store): Adjust the cost handling on VMAT_GATHER_SCATTER
without calling vect_model_store_cost any more.
Kewen Lin [Thu, 12 Oct 2023 05:04:56 +0000 (00:04 -0500)]
vect: Move vect_model_store_cost next to the transform in vectorizable_store
This patch is an initial patch to move costing next to the
transform, it still adopts vect_model_store_cost for costing
but moves and duplicates it down according to the handlings
of different vect_memory_access_types or some special
handling need, hope it can make the subsequent patches easy
to review. This patch should not have any functional
changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Move and duplicate the call
to vect_model_store_cost down to some different transform paths
according to the handlings of different vect_memory_access_types
or some special handling need.
Kewen Lin [Thu, 12 Oct 2023 05:04:56 +0000 (00:04 -0500)]
vect: Ensure vect store is supported for some VMAT_ELEMENTWISE case
When making/testing patches to move costing next to the
transform code for vectorizable_store, some ICEs got
exposed when I further refined the costing handlings on
VMAT_ELEMENTWISE. The apparent cause is triggering the
assertion in rs6000 specific function for costing
rs6000_builtin_vectorization_cost:
if (TARGET_ALTIVEC)
/* Misaligned stores are not supported. */
gcc_unreachable ();
I used vect_get_store_cost instead of the original way by
record_stmt_cost with scalar_store for costing, that is to
use one unaligned_store instead, it matches what we use in
transforming, it's a vector store as below:
So IMHO it's more consistent with vector store instead of
scalar store, with the given compilation option
-mno-allow-movmisalign, the misaligned vector store is
unexpected to be used in vectorizer, but why it's still
adopted? In the current implementation of function
get_group_load_store_type, we always set alignment support
scheme as dr_unaligned_supported for VMAT_ELEMENTWISE, it
is true if we always adopt scalar stores, but as the above
code shows, we could use vector stores for some cases, so
we should use the correct alignment support scheme for it.
This patch is to ensure the vector store is supported by
further checking with vect_supportable_dr_alignment. The
ICEs got exposed with patches moving costing next to the
transform but they haven't been landed, the test coverage
would be there once they get landed. The affected test
cases are:
- gcc.dg/vect/slp-45.c
- gcc.dg/vect/vect-alias-check-{10,11,12}.c
btw, I tried to make some correctness test case, but I
realized that -mno-allow-movmisalign is mainly for noting
movmisalign optab and it doesn't guard for the actual hw
vector memory access insns, so I failed to make it unless
I also altered some conditions for them as it.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Ensure the generated
vector store for some case of VMAT_ELEMENTWISE is supported.
Pan Li [Thu, 12 Oct 2023 03:20:36 +0000 (11:20 +0800)]
RISC-V: Support FP llrint auto vectorization
This patch would like to support the FP llrint auto vectorization.
* long long llrint (double)
This will be the CVT from DF => DI from the standard name's perpsective,
which has been covered in previous PATCH(es). Thus, this patch only add
some test cases.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Add type int64_t.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c: New test.
Mo, Zewei [Mon, 6 Mar 2023 02:42:32 +0000 (10:42 +0800)]
[APX] Support Intel APX PUSH2POP2
This feature requires stack to be aligned at 16byte, therefore in
prologue/epilogue, a standalone push/pop will be emitted before any
push2/pop2 if the stack was not aligned to 16byte.
Also for current implementation we only support push2/pop2 usage in
function prologue/epilogue for those callee-saved registers.
gcc/ChangeLog:
* config/i386/i386.cc (gen_push2): New function to emit push2
and adjust cfa offset.
(ix86_pro_and_epilogue_can_use_push2_pop2): New function to
determine whether push2/pop2 can be used.
(ix86_compute_frame_layout): Adjust preferred stack boundary
and stack alignment needed for push2/pop2.
(ix86_emit_save_regs): Emit push2 when available.
(ix86_emit_restore_reg_using_pop2): New function to emit pop2
and adjust cfa info.
(ix86_emit_restore_regs_using_pop2): New function to loop
through the saved regs and call above.
(ix86_expand_epilogue): Call ix86_emit_restore_regs_using_pop2
when push2pop2 available.
* config/i386/i386.md (push2_di): New pattern for push2.
(pop2_di): Likewise for pop2.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-push2pop2-1.c: New test.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.
Co-authored-by: Hu Lin1 <lin1.hu@intel.com> Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Pan Li [Thu, 12 Oct 2023 01:43:02 +0000 (09:43 +0800)]
RISC-V: Support FP irintf auto vectorization
This patch would like to support the FP irintf auto vectorization.
* int irintf (float)
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lrintmn2 only act on SF => SI.
Given we have code like:
void
test_irintf (int *out, float *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_irintf (in[i]);
}
Before this patch:
.L3:
...
flw fa5,0(a1)
fcvt.w.s a5,fa5,dyn
sw a5,-4(a0)
...
bne a1,a4,.L3
After this patch:
.L3:
...
vle32.v v1,0(a1)
vfcvt.x.f.v v1,v1
vse32.v v1,0(a0)
...
bne a2,zero,.L3
The rest part like DF => SI/HF => SI will be covered by the hook
TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
gcc/ChangeLog:
* config/riscv/autovec.md (lrint<mode><vlconvert>2): Rename from.
(lrint<mode><v_i_l_ll_convert>2): Rename to.
* config/riscv/vector-iterators.md: Rename and remove TARGET_64BIT.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-irint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-irint-0.c: New test.
Jeff Law [Wed, 11 Oct 2023 22:18:22 +0000 (16:18 -0600)]
RISC-V Adjust long unconditional branch sequence
Andrew and I independently noted the long unconditional branch sequence was
using the "call" pseudo op. Technically it works, but it's a bit odd. This
patch flips it to use the "jump" pseudo-op.
This was tested with a hacked-up local compiler which forced all branches/jumps
to be long jumps. Naturally it triggered some failures for scan-asm tests but
no execution regressions (which is mostly what I was testing for).
I've updated the long branch support item in the RISE wiki to indicate that we
eventually want a register scavenging approach with a fallback to $ra in the
future so that we don't muck up the return address predictors. It's not
super-high priority and shouldn't be terrible to implement given we've got the
$ra fallback when a suitable register can not be found.
gcc/
* config/riscv/riscv.md (jump): Adjust sequence to use a "jump"
pseudo op instead of a "call" pseudo op.
Kito Cheng [Mon, 2 Oct 2023 14:37:50 +0000 (22:37 +0800)]
RISC-V: Extend riscv_subset_list, preparatory for target attribute support
riscv_subset_list only accept a full arch string before, but we need to
parse single extension when supporting target attribute, also we may set
a riscv_subset_list directly rather than re-parsing the ISA string
again.
Kito Cheng [Sun, 1 Oct 2023 10:14:44 +0000 (18:14 +0800)]
RISC-V: Refactor riscv_option_override and riscv_convert_vector_bits. [NFC]
Allow those funciton apply from a local gcc_options rather than the
global options.
Preparatory for target attribute, sperate this change for eaiser reivew
since it's a NFC.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_convert_vector_bits): Get setting
from argument rather than get setting from global setting.
(riscv_override_options_internal): New, splited from
riscv_override_options, also take a gcc_options argument.
(riscv_option_override): Splited most part to
riscv_override_options_internal.
Kito Cheng [Sun, 1 Oct 2023 09:03:28 +0000 (17:03 +0800)]
options: Define TARGET_<NAME>_P and TARGET_<NAME>_OPTS_P macro for Mask and InverseMask
We TARGET_<NAME>_P marcro to test a Mask and InverseMask with user
specified target_variable, however we may want to test with specific
gcc_options variable rather than target_variable.
Like RISC-V has defined lots of Mask with TargetVariable, which is not
easy to use, because that means we need to known which Mask are associate with
which TargetVariable, so take a gcc_options variable is a better interface
for such use case.
gcc/ChangeLog:
* doc/options.texi (Mask): Document TARGET_<NAME>_P and
TARGET_<NAME>_OPTS_P.
(InverseMask): Ditto.
* opth-gen.awk (Mask): Generate TARGET_<NAME>_P and
TARGET_<NAME>_OPTS_P macro.
(InverseMask): Ditto.
While `a & (b ^ ~a)` is optimized to `a & b` on the rtl level,
it is always good to optimize this at the gimple level and allows
us to match a few extra things including where a is a comparison.
Note I had to update/change the testcase and-1.c to avoid matching
this case as we can match -2 and 1 as bitwise inversions.
PR tree-optimization/111282
gcc/ChangeLog:
* match.pd (`a & ~(a ^ b)`, `a & (a == b)`,
`a & ((~a) ^ b)`): New patterns.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/and-1.c: Update testcase to avoid
matching `~1 & (a ^ 1)` simplification.
* gcc.dg/tree-ssa/bitops-6.c: New test.
Gaius Mulley [Wed, 11 Oct 2023 16:44:35 +0000 (17:44 +0100)]
modula2: Narrow subranges to int or unsigned int if ZTYPE is the base type.
This patch narrows the subrange base type to INTEGER or CARDINAL
providing the range is satisfied. It only does this when the subrange
base type is the ZTYPE.
gcc/m2/ChangeLog:
* gm2-compiler/M2GCCDeclare.mod (DeclareSubrange): Check
the base type of the subrange against the ZTYPE and call
DeclareSubrangeNarrow if necessary.
(DeclareSubrangeNarrow): New procedure function.
* lib/target-supports.exp: Add proc for the XCValu extension.
* gcc.target/riscv/cv-alu-compile.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-addn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-addrn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-addun.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-addurn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-clip.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-clipu.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-subn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-subrn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-subun.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-suburn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile.c: New test.
* lib/target-supports.exp: Add new effective target check.
* gcc.target/riscv/cv-mac-compile.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mac.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-machhsn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-machhsrn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-machhun.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-machhurn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-macsn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-macsrn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-macun.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-macurn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-msu.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulhhsn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulhhsrn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulhhun.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulhhurn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulsn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulsrn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulun.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulurn.c: New test.
* gcc.target/riscv/cv-mac-test-autogeneration.c: New test.
Gaius Mulley [Wed, 11 Oct 2023 12:26:47 +0000 (13:26 +0100)]
PR modula2/111675 Incorrect packed record field value passed to a procedure
This patch allows a packed field to be extracted and passed to a
procedure. It ensures that the subrange type is the same for both the
procedure and record field. It also extends the <* bytealignment (0) *>
to cover packed subrange types.
gcc/m2/ChangeLog:
PR modula2/111675
* gm2-compiler/M2CaseList.mod (appendTree): Replace
InitStringCharStar with InitString.
* gm2-compiler/M2GCCDeclare.mod: Import AreConstantsEqual.
(DeclareSubrange): Add zero alignment test and call
BuildSmallestTypeRange if necessary.
(WalkSubrangeDependants): Walk the align expression.
(IsSubrangeDependants): Test the align expression.
* gm2-compiler/M2Quads.mod (BuildStringAdrParam): Correct end name.
* gm2-compiler/P2SymBuild.mod (BuildTypeAlignment): Allow subranges
to be zero aligned (packed).
* gm2-compiler/SymbolTable.mod (Subrange): Add Align field.
(MakeSubrange): Set Align to NulSym.
(PutAlignment): Assign Subrange.Align to align.
(GetAlignment): Return Subrange.Align.
* gm2-gcc/m2expr.cc (noBitsRequired): Rewrite.
(calcNbits): Rename ...
(m2expr_calcNbits): ... to this and test for negative values.
(m2expr_BuildTBitSize): Replace calcNBits with m2expr_calcNbits.
* gm2-gcc/m2expr.def (calcNbits): Export.
* gm2-gcc/m2expr.h (m2expr_calcNbits): New prototype.
* gm2-gcc/m2type.cc (noBitsRequired): Remove.
(m2type_BuildSmallestTypeRange): Call m2expr_calcNbits.
(m2type_BuildSubrangeType): Create range_type from
build_range_type (type, lowval, highval).
gcc/testsuite/ChangeLog:
PR modula2/111675
* gm2/extensions/run/pass/packedrecord3.mod: New test.
* config/riscv/autovec.md: Fix index bug.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New function.
* config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug.
(gather_scatter_valid_offset_mode_p): New function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test.
Pan Li [Wed, 11 Oct 2023 07:51:33 +0000 (15:51 +0800)]
RISC-V: Support FP lrint/lrintf auto vectorization
This patch would like to support the FP lrint/lrintf auto vectorization.
* long lrint (double) for rv64
* long lrintf (float) for rv32
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lrintmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
Given we have code like:
void
test_lrint (long *out, double *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrint (in[i]);
}
Before this patch:
.L3:
...
fld fa5,0(a1)
fcvt.l.d a5,fa5,dyn
sd a5,-8(a0)
...
bne a1,a4,.L3
After this patch:
.L3:
...
vsetvli a3,zero,e64,m1,ta,ma
vfcvt.x.f.v v1,v1
vsetvli zero,a2,e64,m1,ta,ma
vse32.v v1,0(a0)
...
bne a2,zero,.L3
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
gcc/ChangeLog:
* config/riscv/autovec.md (lrint<mode><vlconvert>2): New pattern
for lrint/lintf.
* config/riscv/riscv-protos.h (expand_vec_lrint): New func decl
for expanding lint.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl
for vfcvt.x.f.v.
(expand_vec_lrint): New function impl for expanding lint.
* config/riscv/vector-iterators.md: New mode attr and iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for
CVT like test case.
* gcc.target/riscv/rvv/autovec/vls/def.h: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test.
Jakub Jelinek [Wed, 11 Oct 2023 06:58:29 +0000 (08:58 +0200)]
tree-ssa-strlen: optimization skips clobbering store [PR111519]
The following testcase is miscompiled, because count_nonzero_bytes incorrectly
uses get_strinfo information on a pointer from which an earlier instruction
loads SSA_NAME stored at the current instruction. get_strinfo shows a state
right before the current store though, so if there are some stores in between
the current store and the load, the string length information might have
changed.
The patch passes around gimple_vuse from the store and punts instead of using
strinfo on loads from MEM_REF which have different gimple_vuse from that.
2023-10-11 Richard Biener <rguenther@suse.de>
Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/111519
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Add vuse
argument and pass it through to recursive calls and
count_nonzero_bytes_addr calls. Don't shadow the stmt argument, but
change stmt for gimple_assign_single_p statements for which we don't
immediately punt.
(strlen_pass::count_nonzero_bytes_addr): Add vuse argument and pass
it through to recursive calls and count_nonzero_bytes calls. Don't
use get_strinfo if gimple_vuse (stmt) is different from vuse. Don't
shadow the stmt argument.
Roger Sayle [Wed, 11 Oct 2023 07:08:04 +0000 (08:08 +0100)]
Optimize (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) as (and:SI x 1).
This patch is the middle-end piece of an improvement to PRs 101955 and
106245, that adds a missing simplification to the RTL optimizers.
This transformation is to simplify (char)(x << 7) != 0 as x & 1.
Technically, the cast can be any truncation, where shift is by one
less than the narrower type's precision, setting the most significant
(only) bit from the least significant bit.
This transformation applies to any target, but it's easy to see
(and add a new test case) on x86, where the following function:
Juzhe-Zhong [Wed, 11 Oct 2023 05:15:02 +0000 (13:15 +0800)]
RISC-V: Enable full coverage vect tests
I have analyzed all existing FAILs.
Except these following FAILs need to be addressed:
FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-reduc-7.c execution test
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_(LEN_)?SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_(LEN_)?SUB"
All other FAILs are dumple fail can be ignored (Confirm ARM SVE also has such FAILs and didn't fix them on either tests or implementation).
Now, It's time to enable full coverage vect tests including vec_unpack, vec_pack, vec_interleave, ... etc.
To see what we are still missing:
Before this patch:
=== gcc Summary ===
# of expected passes 182839
# of unexpected failures 79
# of unexpected successes 11
# of expected failures 1275
# of unresolved testcases 4
# of unsupported tests 4223
After this patch:
=== gcc Summary ===
# of expected passes 183411
# of unexpected failures 93
# of unexpected successes 7
# of expected failures 1285
# of unresolved testcases 4
# of unsupported tests 4157
There is an important issue increased that I have noticed after this patch:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
It has a related PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111721
I am gonna fix this first in the middle-end after commit this patch.
Juzhe-Zhong [Tue, 10 Oct 2023 14:57:46 +0000 (22:57 +0800)]
RISC-V Regression: Make pattern match more accurate of vect-live-2.c
Like previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632400.html
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac11b@gmail.com/
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-live-2.c: Make pattern match more accurate.
Andrew Waterman [Tue, 10 Oct 2023 18:34:04 +0000 (12:34 -0600)]
RISC-V: far-branch: Handle far jumps and branches for functions larger than 1MB
On RISC-V, branches further than +/-1MB require a longer instruction
sequence (3 instructions): we can reuse the jump-construction in the
assmbler (which clobbers $ra) and a temporary to set up the jump
destination.
gcc/ChangeLog:
* config/riscv/riscv.cc (struct machine_function): Track if a
far-branch/jump is used within a function (and $ra needs to be
saved).
(riscv_print_operand): Implement 'N' (inverse integer branch).
(riscv_far_jump_used_p): Implement.
(riscv_save_return_addr_reg_p): New function.
(riscv_save_reg_p): Use riscv_save_return_addr_reg_p.
* config/riscv/riscv.h (FIXED_REGISTERS): Update $ra.
(CALL_USED_REGISTERS): Update $ra.
* config/riscv/riscv.md: Add new types "ret" and "jalr".
(length attribute): Handle long conditional and unconditional
branches.
(conditional branch pattern): Handle case where jump can not
reach the intended target.
(indirect_jump, tablejump): Use new "jalr" type.
(simple_return): Use new "ret" type.
(simple_return_internal, eh_return_internal): Likewise.
(gpr_restore_return, riscv_mret): Likewise.
(riscv_uret, riscv_sret): Likewise.
* config/riscv/generic.md (generic_branch): Also recognize jalr & ret
types.
* config/riscv/sifive-7.md (sifive_7_jump): Likewise.
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
Andrew Pinski [Mon, 9 Oct 2023 18:07:08 +0000 (11:07 -0700)]
MATCH: [PR111679] Add alternative simplification of `a | ((~a) ^ b)`
So currently we have a simplification for `a | ~(a ^ b)` but
that does not match the case where we had originally `(~a) | (a ^ b)`
so we need to add a new pattern that matches that and uses bitwise_inverted_equal_p
that also catches comparisons too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Richard Biener [Tue, 10 Oct 2023 11:33:34 +0000 (13:33 +0200)]
tree-optimization/111751 - support 1024 bit vector constant reinterpretation
The following ups the limit in fold_view_convert_expr to handle
1024bit vectors as used by GCN and RVV. It also robustifies
the handling in visit_reference_op_load to properly give up when
constants cannot be re-interpreted.
PR tree-optimization/111751
* fold-const.cc (fold_view_convert_expr): Up the buffer size
to 128 bytes.
* tree-ssa-sccvn.cc (visit_reference_op_load): Special case
constants, giving up when re-interpretation to the target type
fails.
The purpose of this patch is to work around false-positive warnings
emitted by GNAT SAS (also known as CodePeer). It does not change
the behavior of the modified subprogram.
Eric Botcazou [Wed, 27 Sep 2023 18:42:41 +0000 (20:42 +0200)]
ada: Fix bad finalization of limited aggregate in conditional expression
This happens when the conditional expression is immediately returned, for
example in an expression function.
gcc/ada/
* exp_aggr.adb (Is_Build_In_Place_Aggregate_Return): Return true
if the aggregate is a dependent expression of a conditional
expression being returned from a build-in-place function.
Eric Botcazou [Tue, 26 Sep 2023 20:54:12 +0000 (22:54 +0200)]
ada: Fix infinite loop with multiple limited with clauses
This occurs when one of the types has an incomplete declaration in addition
to its full declaration in its package. In this case AI05-129 says that the
incomplete type is not part of the limited view of the package, i.e. only
the full view is. Now, in the GNAT implementation, it's the opposite in the
regular view of the package, i.e. the incomplete type is the visible one.
That's why the implementation needs to also swap the types on the visibility
chain while it is swapping the views when the clauses are either installed
or removed. This works correctly for the installation, but does not for the
removal, so this change rewrites the code doing the latter.
gcc/ada/
PR ada/111434
* sem_ch10.adb (Replace): New procedure to replace an entity with
another on the homonym chain.
(Install_Limited_With_Clause): Rename Non_Lim_View to Typ for the
sake of consistency. Call Replace to do the replacements and split
the code into the regular and the special cases. Add debuggging
output controlled by -gnatdi.
(Install_With_Clause): Print the Parent_With and Implicit_With flags
in the debugging output controlled by -gnatdi.
(Remove_Limited_With_Unit.Restore_Chain_For_Shadow (Shadow)): Rewrite
using a direct replacement of E4 by E2. Call Replace to do the
replacements. Add debuggging output controlled by -gnatdi.
This patch fixes the behavior of Ada.Directories.Search when being
requested to filter out regular files or directories. One of the
configurations in which that behavior was incorrect was that when the
caller requested only the regular and special files but not the
directories, the directories would still be returned.
The concept of extended nodes was retired at the same time Gen_IL
was introduced, but there was a reference to that concept left over
in a comment. This patch removes that reference.
Also, the description of the field Comes_From_Check_Or_Contract was
incorrectly placed in a section for fields present in all nodes in
sinfo.ads. This patch fixes this.
gcc/ada/
* atree.ads, nlists.ads, types.ads: Remove references to extended
nodes. Fix typo.
* sinfo.ads: Likewise and fix position of
Comes_From_Check_Or_Contract description.
Javier Miranda [Tue, 19 Sep 2023 13:54:28 +0000 (13:54 +0000)]
ada: Crash processing pragmas Compile_Time_Error and Compile_Time_Warning
gcc/ada/
* sem_attr.adb (Analyze_Attribute): Protect the frontend against
replacing 'Size by its static value if 'Size is not known at
compile time and we are processing pragmas Compile_Time_Warning or
Compile_Time_Errors.
Richard Biener [Tue, 10 Oct 2023 09:09:16 +0000 (11:09 +0200)]
Fix missed CSE with a BLKmode entity
The following fixes fallout of r10-7145-g1dc00a8ec9aeba which made
us cautionous about CSEing a load to an object that has padding bits.
The added check also triggers for BLKmode entities like STRING_CSTs
but by definition a BLKmode entity does not have padding bits.
PR tree-optimization/111751
* tree-ssa-sccvn.cc (visit_reference_op_load): Exempt
BLKmode result from the padding bits check.
Refurbish add compare patterns: use 'r' constraint, fix identation,
and fix pattern to match 'if (a+b) { ... }' constructions.
gcc/
* config/arc/arc.cc (arc_select_cc_mode): Match NEG code with
the first operand.
* config/arc/arc.md (addsi_compare): Make pattern canonical.
(addsi_compare_2): Fix identation, constraint letters.
(addsi_compare_3): Likewise.
Verifier checks have recently been strengthened to check that
all counts and probabilities are initialized. The checks fired
during autoprofiledbootstrap build and this patch fixes it.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
* auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons
* tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count
when scaling loop profile
Robin Dapp [Fri, 7 Jul 2023 15:45:26 +0000 (17:45 +0200)]
RISC-V: Add initial pipeline description for an out-of-order core.
This adds a pipeline description for a generic out-of-order core.
Latency and units are not based on any real processor but more or less
educated guesses what such a processor would look like.
In order to account for latency scaling by LMUL != 1, sched_adjust_cost
is implemented. It will scale an instruction's latency by its LMUL
so an LMUL == 8 instruction will take 8 times the number of cycles
the same instruction with LMUL == 1 would take.
As this potentially causes very high latencies which, in turn, might
lead to scheduling anomalies and a higher number of vsetvls emitted
this feature is only enabled when specifying -madjust-lmul-cost.
Additionally, in order to easily recognize pre-RA vsetvls this patch
introduces an insn type vsetvl_pre which is used in sched_adjust_cost.
In the future we might also want a latency adjustment similar to lmul
for reductions, i.e. make the latency dependent on the type and its
number of units.
Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit:
https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520
I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern.
However, after deep investigation && reading RVV ISA again and experiment on SPIKE,
I realized I was wrong.
RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints
"If an element accessed by a vector memory instruction is not naturally aligned to the size of the element,
either the element is transferred successfully or an address misaligned exception is raised on that element."
It's obvious that RVV ISA does allow misaligned vector load/store.
We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE.
So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since
it can improve multiple vectorization tests and fix dumple FAILs.
This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled).
Xianmiao Qu [Mon, 9 Oct 2023 13:24:39 +0000 (07:24 -0600)]
THead: Fix missing CFI directives for th.sdd in prologue.
When generating CFI directives for the store-pair instruction,
if we add two parallel REG_FRAME_RELATED_EXPR expr_lists like
(expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (plus:DI (reg/f:DI 2 sp)
(const_int 8 [0x8])) [1 S8 A64])
(reg:DI 1 ra))
(expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (reg/f:DI 2 sp) [1 S8 A64])
(reg:DI 8 s0))
only the first expr_list will be recognized by dwarf2out_frame_debug
funciton. So, here we generate a SEQUENCE expression of REG_FRAME_RELATED_EXPR,
which includes two sub-expressions of RTX_FRAME_RELATED_P. Then the
dwarf2out_frame_debug_expr function will iterate through all the sub-expressions
and generate the corresponding CFI directives.
Richard Biener [Mon, 9 Oct 2023 11:05:10 +0000 (13:05 +0200)]
tree-optimization/111715 - improve TBAA for access paths with pun
The following improves basic TBAA for access paths formed by
C++ abstraction where we are able to combine a path from an
address-taking operation with a path based on that access using
a pun to avoid memory access semantics on the address-taking part.
The trick is to identify the point the semantic memory access path
starts which allows us to use the alias set of the outermost access
instead of only that of the base of this path.
PR tree-optimization/111715
* alias.cc (reference_alias_ptr_type_1): When we have
a type-punning ref at the base search for the access
path part that's still semantically valid.
Pan Li [Mon, 9 Oct 2023 08:12:15 +0000 (16:12 +0800)]
RISC-V: Refine bswap16 auto vectorization code gen
Update in v2
* Remove emit helper functions.
* Take expand_binop instead.
Original log:
This patch would like to refine the code gen for the bswap16.
We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.
Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl
for shuffle bswap.
(expand_vec_perm_const_1): Add handling for shuffle bswap pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.
Roger Sayle [Mon, 9 Oct 2023 11:02:07 +0000 (12:02 +0100)]
i386: Implement doubleword right shifts by 1 bit using s[ha]r+rcr.
This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr
functions to implement doubleword right shifts by 1 bit, using a shift
of the highpart that sets the carry flag followed by a rotate-carry-right
(RCR) instruction on the lowpart.
Conceptually this is similar to the recent left shift patch, but with two
complicating factors. The first is that although the RCR sequence is
shorter, and is a ~3x performance improvement on AMD, my microbenchmarking
shows it ~10% slower on Intel. Hence this patch also introduces a new
X86_TUNE_USE_RCR tuning parameter. The second is that I believe this is
the first time a "rotate-right-through-carry" and a right shift that sets
the carry flag from the least significant bit has been modelled in GCC RTL
(on a MODE_CC target). For this I've used the i386 back-end's UNSPEC_CC_NE
which seems appropriate. Finally rcrsi2 and rcrdi2 are separate
define_insns so that we can use their generator functions.
For the pair of functions:
unsigned __int128 foo(unsigned __int128 x) { return x >> 1; }
__int128 bar(__int128 x) { return x >> 1; }
with -O2 -march=znver4 we previously generated:
foo: movq %rdi, %rax
movq %rsi, %rdx
shrdq $1, %rsi, %rax
shrq %rdx
ret
bar: movq %rdi, %rax
movq %rsi, %rdx
shrdq $1, %rsi, %rax
sarq %rdx
ret
with this patch we now generate:
foo: movq %rsi, %rdx
movq %rdi, %rax
shrq %rdx
rcrq %rax
ret
bar: movq %rsi, %rdx
movq %rdi, %rax
sarq %rdx
rcrq %rax
ret
2023-10-09 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_split_ashr): Split shifts by
one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR
or -Oz.
(ix86_split_lshr): Likewise, split shifts by one bit into
lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz.
* config/i386/i386.h (TARGET_USE_RCR): New backend macro.
* config/i386/i386.md (rcrsi2): New define_insn for rcrl.
(rcrdi2): New define_insn for rcrq.
(<anyshiftrt><mode>3_carry): New define_insn for right shifts that
set the carry flag from the least significant bit, modelled using
UNSPEC_CC_NE.
* config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter
controlling use of rcr 1 vs. shrd, which is significantly faster on
AMD processors.
gcc/testsuite/ChangeLog
* gcc.target/i386/rcr-1.c: New 64-bit test case.
* gcc.target/i386/rcr-2.c: New 32-bit test case.
Haochen Jiang [Mon, 9 Oct 2023 08:09:49 +0000 (16:09 +0800)]
Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
Disable zmm broadcast for !TARGET_EVEX512.
* config/i386/i386-options.cc (ix86_option_override_internal):
Do not use PVW_512 when no-evex512.
(ix86_simd_clone_adjust): Add evex512 target into string.
* config/i386/i386.cc (type_natural_mode): Report ABI warning
when using zmm register w/o evex512.
(ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512.
(ix86_hard_regno_mode_ok): Ditto.
(ix86_set_reg_reg_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_vector_mode_supported_p): Ditto.
(ix86_preferred_simd_mode): Ditto.
(ix86_get_mask_mode): Ditto.
(ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit
libmvec call when !TARGET_EVEX512.
(ix86_simd_clone_usable): Ditto.
* config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment
when !TARGET_EVEX512
(MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512.
(STORE_MAX_PIECES): Ditto.
Haochen Jiang [Mon, 9 Oct 2023 08:09:23 +0000 (16:09 +0800)]
Initial support for -mevex512
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_EVEX512_SET): New.
(OPTION_MASK_ISA2_EVEX512_UNSET): Ditto.
(ix86_handle_option): Handle EVEX512.
* config/i386/i386-c.cc
(ix86_target_macros_internal): Handle EVEX512. Add __EVEX256__
when AVX512VL is set.
* config/i386/i386-options.cc: (isa2_opts): Handle EVEX512.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_option_override_internal): Set EVEX512 target if it is not
explicitly set when AVX512 is enabled. Disable
AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.
* config/i386/i386.opt: Add mevex512. Temporaily RejectNegative.
Juzhe-Zhong [Sun, 8 Oct 2023 08:20:05 +0000 (16:20 +0800)]
TEST: Fix dump FAIL for RVV (RISCV-V vector)
As this showed: https://godbolt.org/z/3K9oK7fx3
ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times.
This is because RISC-V doesn't enable vec_pack_trunc so we will failed conversion and fold_extract_last at the first time analysis.
Then we succeed at the second time.
So RVV has 4 times of showing "FOLD_EXTRACT_LAST:.