RISC-V/testsuite: Add branched cases for FP cond-move operations
Verify, for Ventana and Zicond targets and the ordered floating-point
conditional-move operations that already work as expected, that
if-conversion does *not* trigger at `-mbranch-cost=2' setting, which
makes original branched code sequences cheaper than their branchless
equivalents if-conversion would emit. Cover all ordered floating-point
relational operations to make sure no corner case escapes.
gcc/testsuite/
* gcc.target/riscv/movdibfge-ventana.c: New test.
* gcc.target/riscv/movdibfge-zicond.c: New test.
* gcc.target/riscv/movdibfgt-ventana.c: New test.
* gcc.target/riscv/movdibfgt-zicond.c: New test.
* gcc.target/riscv/movdibfle-ventana.c: New test.
* gcc.target/riscv/movdibfle-zicond.c: New test.
* gcc.target/riscv/movdibflt-ventana.c: New test.
* gcc.target/riscv/movdibflt-zicond.c: New test.
* gcc.target/riscv/movdibfne-ventana.c: New test.
* gcc.target/riscv/movdibfne-zicond.c: New test.
* gcc.target/riscv/movsibfge-ventana.c: New test.
* gcc.target/riscv/movsibfge-zicond.c: New test.
* gcc.target/riscv/movsibfgt-ventana.c: New test.
* gcc.target/riscv/movsibfgt-zicond.c: New test.
* gcc.target/riscv/movsibfle-ventana.c: New test.
* gcc.target/riscv/movsibfle-zicond.c: New test.
* gcc.target/riscv/movsibflt-ventana.c: New test.
* gcc.target/riscv/movsibflt-zicond.c: New test.
* gcc.target/riscv/movsibfne-ventana.c: New test.
* gcc.target/riscv/movsibfne-zicond.c: New test.
RISC-V/testsuite: Add branchless cases for integer cond-move operations
Verify, for T-Head, Ventana and Zicond targets and the integer
conditional-move operations that already work as expected, if-conversion
to trigger via `noce_try_cmove' at the respective sufficiently high
`-mbranch-cost=' settings that make branchless code sequences produced
by if-conversion cheaper than their original branched equivalents, and
that extraneous instructions such as SNEZ, etc. are not present in
output. Cover all integer relational operations to make sure no corner
case escapes.
gcc/testsuite/
* gcc.target/riscv/movdieq-thead.c: New test.
* gcc.target/riscv/movdige-ventana.c: New test.
* gcc.target/riscv/movdige-zicond.c: New test.
* gcc.target/riscv/movdigeu-ventana.c: New test.
* gcc.target/riscv/movdigeu-zicond.c: New test.
* gcc.target/riscv/movdigt-ventana.c: New test.
* gcc.target/riscv/movdigt-zicond.c: New test.
* gcc.target/riscv/movdile-ventana.c: New test.
* gcc.target/riscv/movdile-zicond.c: New test.
* gcc.target/riscv/movdileu-ventana.c: New test.
* gcc.target/riscv/movdileu-zicond.c: New test.
* gcc.target/riscv/movdilt-ventana.c: New test.
* gcc.target/riscv/movdilt-zicond.c: New test.
* gcc.target/riscv/movdine-thead.c: New test.
* gcc.target/riscv/movsieq-thead.c: New test.
* gcc.target/riscv/movsige-ventana.c: New test.
* gcc.target/riscv/movsige-zicond.c: New test.
* gcc.target/riscv/movsigeu-ventana.c: New test.
* gcc.target/riscv/movsigeu-zicond.c: New test.
* gcc.target/riscv/movsigt-ventana.c: New test.
* gcc.target/riscv/movsigt-zicond.c: New test.
* gcc.target/riscv/movsile-ventana.c: New test.
* gcc.target/riscv/movsile-zicond.c: New test.
* gcc.target/riscv/movsileu-ventana.c: New test.
* gcc.target/riscv/movsileu-zicond.c: New test.
* gcc.target/riscv/movsilt-ventana.c: New test.
* gcc.target/riscv/movsilt-zicond.c: New test.
* gcc.target/riscv/movsine-thead.c: New test.
RISC-V/testsuite: Add branched cases for integer cond-move operations
Verify, for T-Head, Ventana and Zicond targets and the integer
conditional-move operations that already work as expected, that
if-conversion does *not* trigger at the respective sufficiently low
`-mbranch-cost=' settings that make original branched code sequences
cheaper than their branchless equivalents if-conversion would emit.
Cover all integer relational operations to make sure no corner case
escapes.
The reason to XFAIL movdibne-thead.c and movsibne-thead.c is the
branchless T-Head sequence:
sub a1,a0,a1
th.mveqz a2,a3,a1
mv a0,a2
ret
produced rather than its original branched counterpart:
beq a0,a1,.L3
mv a0,a2
ret
.L3:
mv a0,a3
ret
at `-mbranch-cost=1', even though under this setting the latter sequence
is obviously cheaper performance-wise. This is because the final move
instruction in the branchless sequence is not counted towards its cost
and consequently the cost of both sequences works out at 8 each, making
if-conversion prefer the branchless variant. Use the XFAIL mark to keep
track of these cases for future consideration.
gcc/testsuite/
* gcc.target/riscv/movdibeq-thead.c: New test.
* gcc.target/riscv/movdibge-ventana.c: New test.
* gcc.target/riscv/movdibge-zicond.c: New test.
* gcc.target/riscv/movdibgeu-ventana.c: New test.
* gcc.target/riscv/movdibgeu-zicond.c: New test.
* gcc.target/riscv/movdibgt-ventana.c: New test.
* gcc.target/riscv/movdibgt-zicond.c: New test.
* gcc.target/riscv/movdible-ventana.c: New test.
* gcc.target/riscv/movdible-zicond.c: New test.
* gcc.target/riscv/movdibleu-ventana.c: New test.
* gcc.target/riscv/movdibleu-zicond.c: New test.
* gcc.target/riscv/movdiblt-ventana.c: New test.
* gcc.target/riscv/movdiblt-zicond.c: New test.
* gcc.target/riscv/movdibne-thead.c: New test.
* gcc.target/riscv/movsibeq-thead.c: New test.
* gcc.target/riscv/movsibge-ventana.c: New test.
* gcc.target/riscv/movsibge-zicond.c: New test.
* gcc.target/riscv/movsibgeu-ventana.c: New test.
* gcc.target/riscv/movsibgeu-zicond.c: New test.
* gcc.target/riscv/movsibgt-ventana.c: New test.
* gcc.target/riscv/movsibgt-zicond.c: New test.
* gcc.target/riscv/movsible-ventana.c: New test.
* gcc.target/riscv/movsible-zicond.c: New test.
* gcc.target/riscv/movsibleu-ventana.c: New test.
* gcc.target/riscv/movsibleu-zicond.c: New test.
* gcc.target/riscv/movsiblt-ventana.c: New test.
* gcc.target/riscv/movsiblt-zicond.c: New test.
* gcc.target/riscv/movsibne-thead.c: New test.
RISC-V: Rework branch costing model for if-conversion
The generic branch costing model for if-conversion assumes a fixed cost
of COSTS_N_INSNS (2) for a conditional branch, and that one half of that
cost comes from a preceding condition-set instruction, such as with
MODE_CC targets, and then the other half of that cost is for the actual
branch instruction. This is hardcoded for `if_info.original_cost' in
`noce_find_if_block' and regardless of the cost set for branches via
BRANCH_COST.
Then `default_max_noce_ifcvt_seq_cost' instructs if-conversion to prefer
a branchless sequence as costly as high as triple the BRANCH_COST value
set. This is apparently to make up for the inability to accurately
guess the branch penalty.
Consequently for the BRANCH_COST of 3 we commonly set for tuning,
if-conversion will consider branchless sequences costing 3 * 3 - 2 = 7
instruction units more than a corresponding branch sequence. For the
BRANCH_COST of 4 such as with `sifive-7-series' tuning this is even
worse, at 3 * 4 - 2 = 10. Effectively it means a branchless sequence
will always be chosen if available, even a very inefficient one.
Rework the branch costing model to better match our architecture,
observing in particular that we have no preparatory instructions for
branches so that the cost of a branch is naked BRANCH_COST plus any
extra overhead the processing of a branch's source RTX might incur.
Provide TARGET_INSN_COST and TARGET_MAX_NOCE_IFCVT_SEQ_COST handlers
than that return suitable cost based on BRANCH_COST. The latter hook
usually returns a value that is lower than the cost of the corresponding
branched sequence. This is because we don't really want to produce a
branchless sequence that is more expensive than the original branched
sequence. If this turns out too conservative for some corner case, then
this choice might be revisited.
Then we don't want to fiddle with `noce_find_if_block' without a lot of
cross-target verification, so add TARGET_NOCE_CONVERSION_PROFITABLE_P
defined such that it subtracts the fixed COSTS_N_INSNS (2) cost from the
cost of the original branched sequence supplied and instead adds actual
branch cost calculated from the conditional branch instruction used. It
is then further tweaked according to simple analysis of the replacement
branchless sequence produced so as to cancel the cost of an extraneous
zero extend operation produced by `noce_try_store_flag_mask' as observed
with gcc/testsuite/gcc.target/riscv/pr105314.c.
Tweak the testsuite accordingly and set `-mbranch-cost=' explicitly for
the relevant cases so that the expected if-conversion transformation is
made regardless of the default BRANCH_COST value of tuning in effect.
Some of these settings will be lowered later on as deficiencies in
branchless sequence generation have been fixed that lower their cost
calculated by if-conversion.
gcc/
* config/riscv/riscv.cc (riscv_insn_cost): New function.
(riscv_max_noce_ifcvt_seq_cost): Likewise.
(riscv_noce_conversion_profitable_p): Likewise.
(TARGET_INSN_COST): New macro.
(TARGET_MAX_NOCE_IFCVT_SEQ_COST): New macro.
(TARGET_NOCE_CONVERSION_PROFITABLE_P): New macro.
RISC-V: Fix `mode' usage in `riscv_expand_conditional_move'
In `riscv_expand_conditional_move' `mode' is initialized right away from
`GET_MODE (dest)', so remove needless references that refrain from using
the local variable.
gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Use
`mode' for `GET_MODE (dest)' throughout.
RISC-V: Sanitise NEED_EQ_NE_P case with `riscv_emit_int_compare'
For the NEED_EQ_NE_P `riscv_emit_int_compare' is documented to only emit
EQ or NE comparisons against zero, however it does not catch incorrect
use where a non-equality comparison has been requested and falls through
to the general case then. Add a safety guard to catch such a case then.
Arguably the NEED_EQ_NE_P case would best be moved into a function of
its own, but let's leave it for a separate cleanup.
gcc/
* config/riscv/riscv.cc (riscv_emit_int_compare): Bail out if
NEED_EQ_NE_P but the comparison is neither EQ nor NE.
RISC-V/testsuite: Add cases for integer SFB cond-move operations
Verify, for short forward branch targets and the conditional-move
operations that already work as expected, that if-conversion triggers
via `noce_try_cmove' already at `-mbranch-cost=1' and that extraneous
instructions such as SNEZ, etc. are not present in output. Cover all
integer relational operations to make sure no corner case escapes.
gcc/testsuite/
* gcc.target/riscv/movdieq-sfb.c: New test.
* gcc.target/riscv/movdige-sfb.c: New test.
* gcc.target/riscv/movdigeu-sfb.c: New test.
* gcc.target/riscv/movdigt-sfb.c: New test.
* gcc.target/riscv/movdigtu-sfb.c: New test.
* gcc.target/riscv/movdile-sfb.c: New test.
* gcc.target/riscv/movdileu-sfb.c: New test.
* gcc.target/riscv/movdilt-sfb.c: New test.
* gcc.target/riscv/movdiltu-sfb.c: New test.
* gcc.target/riscv/movdine-sfb.c: New test.
* gcc.target/riscv/movsieq-sfb.c: New test.
* gcc.target/riscv/movsige-sfb.c: New test.
* gcc.target/riscv/movsigeu-sfb.c: New test.
* gcc.target/riscv/movsigt-sfb.c: New test.
* gcc.target/riscv/movsigtu-sfb.c: New test.
* gcc.target/riscv/movsile-sfb.c: New test.
* gcc.target/riscv/movsileu-sfb.c: New test.
* gcc.target/riscv/movsilt-sfb.c: New test.
* gcc.target/riscv/movsiltu-sfb.c: New test.
* gcc.target/riscv/movsine-sfb.c: New test.
testsuite: Add cases for conditional-move and conditional-add operations
Add generic execution tests for expressions that are expected to expand
to conditional-move and conditional-add operations where supported. To
ensure no corner case escapes all relational operators are extensively
covered for integer comparisons and all ordered operators are covered
for floating-point comparisons. Unordered operators are not covered at
this point as they'd require a different input data set.
gcc/testsuite/
* gcc.dg/torture/addieq.c: New test.
* gcc.dg/torture/addifeq.c: New test.
* gcc.dg/torture/addifge.c: New test.
* gcc.dg/torture/addifgt.c: New test.
* gcc.dg/torture/addifle.c: New test.
* gcc.dg/torture/addiflt.c: New test.
* gcc.dg/torture/addifne.c: New test.
* gcc.dg/torture/addige.c: New test.
* gcc.dg/torture/addigeu.c: New test.
* gcc.dg/torture/addigt.c: New test.
* gcc.dg/torture/addigtu.c: New test.
* gcc.dg/torture/addile.c: New test.
* gcc.dg/torture/addileu.c: New test.
* gcc.dg/torture/addilt.c: New test.
* gcc.dg/torture/addiltu.c: New test.
* gcc.dg/torture/addine.c: New test.
* gcc.dg/torture/addleq.c: New test.
* gcc.dg/torture/addlfeq.c: New test.
* gcc.dg/torture/addlfge.c: New test.
* gcc.dg/torture/addlfgt.c: New test.
* gcc.dg/torture/addlfle.c: New test.
* gcc.dg/torture/addlflt.c: New test.
* gcc.dg/torture/addlfne.c: New test.
* gcc.dg/torture/addlge.c: New test.
* gcc.dg/torture/addlgeu.c: New test.
* gcc.dg/torture/addlgt.c: New test.
* gcc.dg/torture/addlgtu.c: New test.
* gcc.dg/torture/addlle.c: New test.
* gcc.dg/torture/addlleu.c: New test.
* gcc.dg/torture/addllt.c: New test.
* gcc.dg/torture/addlltu.c: New test.
* gcc.dg/torture/addlne.c: New test.
* gcc.dg/torture/movieq.c: New test.
* gcc.dg/torture/movifeq.c: New test.
* gcc.dg/torture/movifge.c: New test.
* gcc.dg/torture/movifgt.c: New test.
* gcc.dg/torture/movifle.c: New test.
* gcc.dg/torture/moviflt.c: New test.
* gcc.dg/torture/movifne.c: New test.
* gcc.dg/torture/movige.c: New test.
* gcc.dg/torture/movigeu.c: New test.
* gcc.dg/torture/movigt.c: New test.
* gcc.dg/torture/movigtu.c: New test.
* gcc.dg/torture/movile.c: New test.
* gcc.dg/torture/movileu.c: New test.
* gcc.dg/torture/movilt.c: New test.
* gcc.dg/torture/moviltu.c: New test.
* gcc.dg/torture/movine.c: New test.
* gcc.dg/torture/movleq.c: New test.
* gcc.dg/torture/movlfeq.c: New test.
* gcc.dg/torture/movlfge.c: New test.
* gcc.dg/torture/movlfgt.c: New test.
* gcc.dg/torture/movlfle.c: New test.
* gcc.dg/torture/movlflt.c: New test.
* gcc.dg/torture/movlfne.c: New test.
* gcc.dg/torture/movlge.c: New test.
* gcc.dg/torture/movlgeu.c: New test.
* gcc.dg/torture/movlgt.c: New test.
* gcc.dg/torture/movlgtu.c: New test.
* gcc.dg/torture/movlle.c: New test.
* gcc.dg/torture/movlleu.c: New test.
* gcc.dg/torture/movllt.c: New test.
* gcc.dg/torture/movlltu.c: New test.
* gcc.dg/torture/movlne.c: New test.
Robin Dapp [Tue, 21 Nov 2023 11:51:12 +0000 (12:51 +0100)]
vect: Allow reduc_index != 1 for COND_OPs.
In PR112406 Tamar found another problem with COND_OP reductions.
I wrongly assumed that the reduction variable will always remain in
operand 1, just as we create the COND_OP in ifcvt. But of course,
addition being commutative, we are free to swap operand 1 and 2 and we
end up with e.g.
Robin Dapp [Mon, 20 Nov 2023 15:19:46 +0000 (16:19 +0100)]
RISC-V: testsuite: Add rv64 requirement for bug-9 and bug-14.
This adds an effective target requirement to compile the tests. Since
we disabled 64-bit indices on rv32 targets those tests should be
unsupported on rv32.
Patrick O'Neill [Thu, 2 Nov 2023 17:20:43 +0000 (10:20 -0700)]
gfortran: Rely on dg-do-what-default to avoid running pr85853.f90, pr107254.f90 and vect-alias-check-1.F90 on non-vector targets
Testcases in gfortran.dg/vect/vect.exp rely on
check_vect_support_and_set_flags to set dg-do-what-default and avoid
running vector tests on non-vector targets. The three testcases in this
patch overwrite the default with dg-do run which causes issues
for non-vector targets.
Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).
Jonathan Wakely [Tue, 21 Nov 2023 11:49:22 +0000 (11:49 +0000)]
libstdc++: Do not declare strtok for C++26 freestanding (P2937R0)
This was recently approved for C++26.
We should define the __cpp_lib_freestanding_cstring macro in <string.h>
as well as <cstring>, but we do not currently install our own <string.h>
for most targets.
libstdc++-v3/ChangeLog:
* include/bits/version.def (freestanding_cstring): Add.
* include/bits/version.h: Regenerate.
* include/c_compatibility/string.h (strtok): Do not declare for
C++26 freestanding.
* include/c_global/cstring (strtok): Likewise.
* testsuite/21_strings/headers/cstring/version.cc: New test.
Jonathan Wakely [Mon, 20 Nov 2023 21:39:58 +0000 (21:39 +0000)]
libstdc++: Add freestanding feature test macros (P2407R5)
This C++26 change makes several classes "partially freestanding", but we
already fully supported them in freestanding mode. All we need to do is
define the new feature test macros and add tests for them.
libstdc++-v3/ChangeLog:
* include/bits/version.def (freestanding_algorithm)
(freestanding_array, freestanding_optional)
(freestanding_string_view, freestanding_variant): Add.
* include/bits/version.h: Regenerate.
* include/std/algorithm (__glibcxx_want_freestanding_algorithm):
Define.
* include/std/array (__glibcxx_want_freestanding_array):
Define.
* include/std/optional (__glibcxx_want_freestanding_optional):
Define.
* include/std/string_view
(__glibcxx_want_freestanding_string_view): Define.
* include/std/variant (__glibcxx_want_freestanding_variant):
Define.
* testsuite/20_util/optional/version.cc: Add checks for
__cpp_lib_freestanding_optional.
* testsuite/20_util/variant/version.cc: Add checks for
__cpp_lib_freestanding_variant.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.
* testsuite/21_strings/basic_string_view/requirements/version.cc:
New test.
* testsuite/23_containers/array/requirements/version.cc: New
test.
* testsuite/25_algorithms/fill_n/requirements/version.cc: New
test.
* testsuite/25_algorithms/swap_ranges/requirements/version.cc:
New test.
Jonathan Wakely [Sat, 18 Nov 2023 21:07:47 +0000 (21:07 +0000)]
libstdc++: Add std::span::at for C++26 (P2821R5)
Also define the new feature test macros from P2833R2, indicating that
std::span and std::expected are supported for freestanding mode.
libstdc++-v3/ChangeLog:
* include/bits/version.def (freestanding_expected): New macro.
(span): Add C++26 value.
* include/bits/version.h: Regenerate.
* include/std/expected (__glibcxx_want_freestanding_expected):
Define.
* include/std/span (span::at): New member function.
* testsuite/20_util/expected/version.cc: Add checks for
__cpp_lib_freestanding_expected.
* testsuite/23_containers/span/2.cc: Moved to...
* testsuite/23_containers/span/version.cc: ...here. Add checks
for __cpp_lib_span in <span> as well as in <version>.
* testsuite/23_containers/span/1.cc: Removed.
* testsuite/23_containers/span/at.cc: New test.
Jonathan Wakely [Sat, 18 Nov 2023 21:09:53 +0000 (21:09 +0000)]
libstdc++: Fix std::tr2::dynamic_bitset support for alternate characters
libstdc++-v3/ChangeLog:
* include/tr2/dynamic_bitset (dynamic_bitset): Pass zero and one
characters to _M_copy_from_string.
* testsuite/tr2/dynamic_bitset/string.cc: New test.
This patch adds a target-independent aligned_register_operand
predicate, for use with register constraints that use filters
to impose an alignment. The definition deliberately jetisons
some of the historical baggage in general_operand.
gcc/
* common.md (aligned_register_operand): New predicate.
This patch makes IRA apply register filters when picking hard registers.
All the new code should be optimised away on targets that don't use
register filters. On targets that do use them, the new register_filters
bitfield is expected to be only a handful of bits.
Information about register filters is recorded in process_bb_node_lives.
The information isn't really related to liveness, but it's a convenient
point because (a) we've already built the allocno structures and
(b) we've already extracted the insn and preprocessed the constraints.
gcc/
* ira-int.h (ira_allocno): Add a register_filters field.
(ALLOCNO_REGISTER_FILTERS): New macro.
(ALLOCNO_SET_REGISTER_FILTERS): Likewise.
* ira-build.cc (ira_create_allocno): Initialize register_filters.
(create_cap_allocno): Propagate register_filters.
(propagate_allocno_info): Likewise.
(propagate_some_info_from_allocno): Likewise.
* ira-lives.cc (process_register_constraint_filters): New function.
(process_bb_node_lives): Use it to record register filter
information.
* ira-color.cc (assign_hard_reg): Check register filters.
(improve_allocation, fast_allocation): Likewise.
This patch makes LRA apply register filters. This plus the recog
change is enough for correct code generation, but a follow-on IRA
patch improves the allocation.
All the new code should be optimised away on targets that don't
use register filters. That's because get_register_filter just
wraps "return nullptr" on those targets.
The main (but simplest) part of this patch makes constrain_operands
take register filters into account.
The rest of the patch adds register filter information to
operand_alternative. Generally, if two register constraints
have different register filters, it's better if they're in separate
alternatives. However, the syntax doesn't enforce that, and we can't
assert it due to inline asms. So it's a choice between (a) adding
code to enforce consistent filters or (b) dealing with mixes of filters
in a conservatively correct way (in the sense of not allowing invalid
operands). The latter seems much easier.
The patch therefore adds a mask of the filters that apply
to at least one constraint in a given operand alternative.
A register is OK if it passes all of the filters in the mask.
gcc/
* recog.h (operand_alternative): Add a register_filters field.
(alternative_register_filters): New function.
* recog.cc (preprocess_constraints): Calculate the filters field.
(constrain_operands): Check register filters.
Add register filter operand to define_register_constraint
The main way of enforcing registers to be aligned is through
HARD_REGNO_MODE_OK. But this is a global property that applies
to all operands. A given (regno, mode) pair is either globally
valid or globally invalid.
This patch instead adds a way of specifying that individual operands
must be aligned. More generally, it allows constraints to specify
a C++ condition that the operand's REGNO must satisfy. The condition
must be invariant for a given set of target options, so that it can
be precomputed and cached as a HARD_REG_SET.
This information will be used in very compile-time-sensitive
parts of the compiler. A lot of the complication is in allowing
the information to be stored and tested without much memory cost,
and without impacting targets that don't use the feature.
Specifically:
- Constraints are encouraged to test the absolute REGNO rather than
an offset from the start of the containing class. For example,
all constraints for even registers should use the same condition,
such as "regno % 2 == 0". This requires the classes to start at
even register boundaries, but that's already an implicit
requirement due to things like the ira-costs.cc code that begins:
/* Some targets allow pseudos to be allocated to unaligned sequences
of hard registers. However, selecting an unaligned sequence can
unnecessarily restrict later allocations. So increase the cost of
unaligned hard regs to encourage the use of aligned hard regs. */
- Each unique condition is given a "filter identifier".
- The total number of filters is given by NUM_REGISTER_FILTERS,
defined automatically in insn-config.h. Structures can therefore use
a bitfield of NUM_REGISTER_FILTERS to represent a mask of filters.
- There is a new target global, target_constraints, that caches the
HARD_REG_SET for each filter.
- There is a function for looking up the HARD_REG_SET filter for a given
constraint and one for looking up the filter id. Both simply return
a constant on targets that don't use the feature.
- There are functions for testing a register against a specific filter,
or against a mask of filters.
This patch just adds the information. Later ones make use of it.
gcc/
* rtl.def (DEFINE_REGISTER_CONSTRAINT): Add an optional filter
operand.
* doc/md.texi (define_register_constraint): Document it.
* doc/tm.texi.in: Reference it in discussion about aligned registers.
* doc/tm.texi: Regenerate.
* gensupport.h (register_filters, get_register_filter_id): Declare.
* gensupport.cc (register_filter_map, register_filters): New variables.
(get_register_filter_id): New function.
(process_define_register_constraint): Likewise.
(process_rtx): Pass define_register_constraints to
process_define_register_constraint.
* genconfig.cc (main): Emit a definition of NUM_REGISTER_FILTERS.
* genpreds.cc (constraint_data): Add a filter field.
(add_constraint): Update accordingly.
(process_define_register_constraint): Pass the filter operand.
(write_init_reg_class_start_regs): New function.
(write_get_register_filter): Likewise.
(write_get_register_filter_id): Likewise.
(write_tm_preds_h): Write a definition of target_constraints,
plus helpers to test its contents. Write the get_register_filter*
functions.
(write_insn_preds_c): Write init_reg_class_start_regs.
* reginfo.cc (init_reg_class_start_regs): Declare.
(init_reg_sets): Call it.
* target-globals.h (this_target_constraints): Declare.
(target_globals): Add a constraints field.
(restore_target_globals): Update accordingly.
* target-globals.cc: Include tm_p.h.
(default_target_globals): Initialize the constraints field.
(save_target_globals): Handle the constraints field.
(target_globals::~target_globals): Likewise.
For vec_pack_trunc patterns there can be an ambiguity for the
source mode for BFmode vs HFmode. The vectorizer checks
the insns operand mode for this, the following makes forwprop
do the same. That of course doesn't help if the target supports
both conversions.
PR tree-optimization/112623
* tree-ssa-forwprop.cc (simplify_vector_constructor):
Check the source mode of the insn for vector pack/unpacks.
Richard Biener [Tue, 21 Nov 2023 08:50:54 +0000 (09:50 +0100)]
Move VF based dependence check
The following moves the check whether the maximum vectorization
factor determined by data dependence analysis is in conflict with
the chosen vectorization factor to after the point where we applied
both the SLP and the unrolling adjustment to the vectorization
factor. We check the latter before applying unrolling, but the
SLP adjustment can result in both missed optimization and wrong-code.
* tree-vect-loop.cc (vect_analyze_loop_2): Move check
of VF against max_vf until VF is final.
Jan Hubicka [Tue, 21 Nov 2023 14:17:16 +0000 (15:17 +0100)]
optimize std::vector::push_back
this patch speeds up the push_back at -O3 significantly by making the
reallocation to be inlined by default. _M_realloc_insert is general
insertion that takes iterator pointing to location where the value
should be inserted. As such it contains code to move other entries around
that is quite large.
Since appending to the end of array is common operation, I think we should
have specialized code for that. Sadly it is really hard to work out this
from IPA passes, since we basically care whether the iterator points to
the same place as the end pointer, which are both passed by reference.
This is inter-procedural value numbering that is quite out of reach.
I also added extra check making it clear that the new length of the vector
is non-zero. This saves extra conditionals. Again it is quite hard case
since _M_check_len seem to be able to return 0 if its parameter is 0.
This never happens here, but we are not able to propagate this early nor
at IPA stage.
libstdc++-v3/ChangeLog:
PR libstdc++/110287
PR middle-end/109811
PR middle-end/109849
* include/bits/stl_vector.h (_M_realloc_append): New member function.
(push_back): Use it.
* include/bits/vector.tcc: (emplace_back): Use it.
(_M_realloc_insert): Let compiler know that new vector size is non-zero.
(_M_realloc_append): New member function.
Tamar Christina [Tue, 21 Nov 2023 13:21:39 +0000 (13:21 +0000)]
AArch64: only emit mismatch error when features would be disabled.
At the moment we emit a warning whenever you specify both -march and -mcpu
and the architecture of them differ. The idea originally was that the user may
not be aware of this change.
However this has a few problems:
1. Architecture revisions is not an observable part of the architecture,
extensions are. Starting with GCC 14 we have therefore relaxed the rule that
all extensions can be enabled at any architecture level. Therefore it's
incorrect, or at least not useful to keep the check on architecture.
2. It's problematic in Makefiles and other build systems, where you want to
for certain files enable CPU specific builds. i.e. you may be by default
building for -march=armv8-a but for some file for -mcpu=neoverse-n1. Since
there's no easy way to remove the earlier options we end up warning and
there's no way to disable just this warning. Build systems compiling with
-Werror face an issue in this case that compiling with GCC is needlessly
hard.
3. It doesn't actually warn for cases that may lead to issues, so e.g.
-march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that SVE would
be disabled.
For this reason I have one of two proposals:
1. Just remove this warning all together.
2. Rework the warning based on extensions and only warn when features would be
disabled by the presence of the -mcpu. This is the approach this patch has
taken.
As examples:
> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-a+sve’ switch and resulted in options +crc+sve+norcpc+nodotprod being added .arch armv8.2-a+crc+sve
The one remaining issue here is that if both -march and -mcpu are specified we
pick the -march. This is not particularly obvious and for the use case to be
more useful I think it makes sense to pick the CPU's arch?
I did not make that change in the patch as it changes semantics.
Note that I can't write a test for this because dg-warning expects warnings to
be at a particular line and doesn't support warnings at the "global" level.
Tamar Christina [Tue, 21 Nov 2023 13:20:39 +0000 (13:20 +0000)]
AArch64: Add new generic-armv8-a CPU and make it the default.
This patch adds a new generic scheduling model "generic-armv8-a" and makes it
the default for all Armv8 architectures.
-mcpu=generic and -mtune=generic is kept around for those that really want the
previous cost model.
This shows on SPECCPU 2017 the following:
generic: SPECINT 1.0% improvement in geomean, SPECFP -0.6%. The SPECFP is due
to fotonik3d_r where we vectorize an FP calculation that only ever
needs one lane of the result. This I believe is a generic costing bug
but at the moment we can't change costs of FP and INT independently.
So will defer updating that cost to stage3 after Richard's other
costing updates land.
generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement.
gcc/ChangeLog:
PR target/111370
* config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a,
armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a,
armv8.8-a): Update to generic_armv8_a.
* config/aarch64/aarch64-cores.def (generic-armv8-a): New.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc: Include generic_armv8_a.h
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to
TARGET_CPU_generic_armv8_a.
* config/aarch64/tuning_models/generic_armv8_a.h: New file.
Tamar Christina [Tue, 21 Nov 2023 13:19:36 +0000 (13:19 +0000)]
AArch64: Refactor costs models to different files.
This patch series attempts to move the generic cost model in AArch64 to a new
and modern generic standard. The current standard is quite old and generates
very suboptimal code out of the box for user of GCC.
The goal is for the new cost model to be beneficial on newer/current Arm
Microarchitectures while not being too negative for older ones.
It does not change any core specific optimization. The final changes reflect
both performance optimizations and size optimizations.
This first patch just re-organizes the cost structures to their own files.
The AArch64.cc file has gotten very big and it's hard to follow.
No functional changes are expected from this change. Note that since all the
structures have private visibility I've put them in header files instead.
gcc/ChangeLog:
PR target/111370
* config/aarch64/aarch64.cc (generic_addrcost_table,
exynosm1_addrcost_table,
xgene1_addrcost_table,
thunderx2t99_addrcost_table,
thunderx3t110_addrcost_table,
tsv110_addrcost_table,
qdf24xx_addrcost_table,
a64fx_addrcost_table,
neoversev1_addrcost_table,
neoversen2_addrcost_table,
neoversev2_addrcost_table,
generic_regmove_cost,
cortexa57_regmove_cost,
cortexa53_regmove_cost,
exynosm1_regmove_cost,
thunderx_regmove_cost,
xgene1_regmove_cost,
qdf24xx_regmove_cost,
thunderx2t99_regmove_cost,
thunderx3t110_regmove_cost,
tsv110_regmove_cost,
a64fx_regmove_cost,
neoversen2_regmove_cost,
neoversev1_regmove_cost,
neoversev2_regmove_cost,
generic_vector_cost,
a64fx_vector_cost,
qdf24xx_vector_cost,
thunderx_vector_cost,
tsv110_vector_cost,
cortexa57_vector_cost,
exynosm1_vector_cost,
xgene1_vector_cost,
thunderx2t99_vector_cost,
thunderx3t110_vector_cost,
ampere1_vector_cost,
generic_branch_cost,
generic_tunings,
cortexa35_tunings,
cortexa53_tunings,
cortexa57_tunings,
cortexa72_tunings,
cortexa73_tunings,
exynosm1_tunings,
thunderxt88_tunings,
thunderx_tunings,
tsv110_tunings,
xgene1_tunings,
emag_tunings,
qdf24xx_tunings,
saphira_tunings,
thunderx2t99_tunings,
thunderx3t110_tunings,
neoversen1_tunings,
ampere1_tunings,
ampere1a_tunings,
neoversev1_vector_cost,
neoversev1_tunings,
neoverse512tvb_vector_cost,
neoverse512tvb_tunings,
neoversen2_vector_cost,
neoversen2_tunings,
neoversev2_vector_cost,
neoversev2_tunings
a64fx_tunings): Split into own files.
* config/aarch64/tuning_models/a64fx.h: New file.
* config/aarch64/tuning_models/ampere1.h: New file.
* config/aarch64/tuning_models/ampere1a.h: New file.
* config/aarch64/tuning_models/cortexa35.h: New file.
* config/aarch64/tuning_models/cortexa53.h: New file.
* config/aarch64/tuning_models/cortexa57.h: New file.
* config/aarch64/tuning_models/cortexa72.h: New file.
* config/aarch64/tuning_models/cortexa73.h: New file.
* config/aarch64/tuning_models/emag.h: New file.
* config/aarch64/tuning_models/exynosm1.h: New file.
* config/aarch64/tuning_models/generic.h: New file.
* config/aarch64/tuning_models/neoverse512tvb.h: New file.
* config/aarch64/tuning_models/neoversen1.h: New file.
* config/aarch64/tuning_models/neoversen2.h: New file.
* config/aarch64/tuning_models/neoversev1.h: New file.
* config/aarch64/tuning_models/neoversev2.h: New file.
* config/aarch64/tuning_models/qdf24xx.h: New file.
* config/aarch64/tuning_models/saphira.h: New file.
* config/aarch64/tuning_models/thunderx.h: New file.
* config/aarch64/tuning_models/thunderx2t99.h: New file.
* config/aarch64/tuning_models/thunderx3t110.h: New file.
* config/aarch64/tuning_models/thunderxt88.h: New file.
* config/aarch64/tuning_models/tsv110.h: New file.
* config/aarch64/tuning_models/xgene1.h: New file.
Tamar Christina [Tue, 21 Nov 2023 13:17:17 +0000 (13:17 +0000)]
AArch64: Add pattern for unsigned widenings (uxtl) to zip{1,2}
This changes unpack instructions to use zip{1,2} when doing a zero-extending
widening operation. Permutes generally have a higher throughput than the
widening operations. Zeros are shuffled into the top half of the registers.
The testcase
void d2 (unsigned * restrict a, unsigned short *b, int n)
{
for (int i = 0; i < (n & -8); i++)
a[i] = b[i];
}
* gcc.target/aarch64/simd/vmovl_high_1.c: Update codegen.
* gcc.target/aarch64/uxtl-combine-1.c: New test.
* gcc.target/aarch64/uxtl-combine-2.c: New test.
* gcc.target/aarch64/uxtl-combine-3.c: New test.
* gcc.target/aarch64/uxtl-combine-4.c: New test.
* gcc.target/aarch64/uxtl-combine-5.c: New test.
* gcc.target/aarch64/uxtl-combine-6.c: New test.
This seems to be because the vectorizer thinks the vector transfers are free:
x1_4 + x2_6 1 times vector_stmt costs 0 in body
x1_4 + x2_6 1 times vec_to_scalar costs 0 in body
This happens because the stmt it's using to get the cost of register transfers
for the given type happens to be one feeding into a MUL. we incorrectly
discount the + for the register transfer.
This is fixed by guarding the check for aarch64_multiply_add_p with a kind
check and only do it for scalar_stmt and vector_stmt.
I'm sending this separate to my patch series but it's required for it.
It also seems to fix overvectorization cases in fotonik3d_r in SPECCPU 2017.
Sebastian Huber [Mon, 20 Nov 2023 14:26:38 +0000 (15:26 +0100)]
gcov: Fix integer types in gen_counter_update()
This change fixes issues like this:
gcc.dg/gomp/pr27573.c: In function ‘main._omp_fn.0’:
gcc.dg/gomp/pr27573.c:19:1: error: non-trivial conversion in ‘ssa_name’
19 | }
| ^
long int
long unsigned int
# .MEM_19 = VDEF <.MEM_18>
__gcov7.main._omp_fn.0[0] = PROF_time_profile_12;
during IPA pass: profile
gcc.dg/gomp/pr27573.c:19:1: internal compiler error: verify_gimple failed
gcc/ChangeLog:
PR middle-end/112634
* tree-profile.cc (gen_assign_counter_update): Cast the unsigned result type of
__atomic_add_fetch() to the signed counter type.
(gen_counter_update): Fix formatting.
Eric Botcazou [Fri, 20 Oct 2023 08:26:23 +0000 (10:26 +0200)]
ada: Fix issue with indefinite vector of overaligned unconstrained array
The problem is that the aligning machinery is not consistently triggered,
depending on whether a constrained view or the nominal unconstrained view
of the element type is used to perform the allocations and deallocations.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Array_Subtype>: Put
the alignment directly on the type in the constrained case too.
* gcc-interface/utils.cc (maybe_pad_type): For an array type, take
the alignment of the element type as the original alignment.
Gary Dismukes [Thu, 2 Nov 2023 23:36:12 +0000 (23:36 +0000)]
ada: Compiler crash on container aggregate with loop_parameter_specifications
The compiler crashes on a container aggregate with more than one
iterated_element_association given by a loop_parameter_specification.
In such a case, the tree contains N_Iterated_Component_Association
nodes rather than N_Iterated_Element_Association nodes, and the code
for handling those needs to obtain the bounds from the Discrete_Choices
field of each N_Iterated_Component_Association rather than assuming
that the association has a normal list of choices.
gcc/ada/
* sem_aggr.adb (Resolve_Container_Aggregate): In the case where Comp
is an N_Iterated_Component_Association, pick up Discrete_Choices rather
than Choices.
Eric Botcazou [Thu, 2 Nov 2023 14:05:06 +0000 (15:05 +0100)]
ada: Another couple of cleanups in the finalization machinery
For package specs and bodies that need finalizers, Build_Finalizer is
invoked from the Standard scope so it needs to adjust the scope stack
before creating new objects; this changes it to do so only once.
For other kinds of scopes, it is invoked from Expand_Cleanup_Actions,
which assumes that the correct scope is already on the stack; that's
why Cleanup_Scopes adjusts the scope stack explicitly, but it should
use Pop_Scope instead of End_Scope to do it.
gcc/ada/
* exp_ch7.adb (Build_Finalizer): For package specs and bodies, push
and pop the specs onto the scope stack only once.
* inline.adb (Cleanup_Scopes): Call Pop_Scope instead of End_Scope.
Steve Baird [Wed, 1 Nov 2023 21:39:35 +0000 (14:39 -0700)]
ada: Deep delta aggregates in postconditions
Fix a bug in handling array-valued deep delta aggregates occurring in
postconditions. The bug could result in a spurious compilation failure.
gcc/ada/
* sem_aggr.adb (Resolve_Delta_Array_Aggregate): In the case of a
deep delta choice, the expected type for the expression will
typically not be the component type of the array type, so a call
to Analyze_And_Resolve that assumes otherwise would be an error.
It turns out that such a call, while wrong, is usually harmless
because the expression has already been marked as analyzed. This
doesn't work if the aggregate occurs in a postcondition and, in
any case, we don't want to rely on this. So do not perform the
call in the deep case.
Eric Botcazou [Tue, 31 Oct 2023 16:49:47 +0000 (17:49 +0100)]
ada: Small consistency fix for -gnatwv warning
The goal is to arrange for the warning to be issued consistently between
objects whose address is taken and objects whose address is not taken.
gcc/ada/
* sem_warn.adb (Check_References.Type_OK_For_No_Value_Assigned):
New predicate.
(Check_References): For Warn_On_No_Value_Assigned, use the same test
on the type in the address-not-taken and default cases.
Gary Dismukes [Tue, 31 Oct 2023 19:50:46 +0000 (19:50 +0000)]
ada: Compiler error reporting illegal prefix on legal loop iterator with "in"
During semantic analysis, the compiler fails to determine the cursor type
in the case of a generalized iterator loop with "in", in the case where the
iterator type has a parent type that is a controlled type (for example) and
its ancestor iterator interface type is given after as a progenitor. It also
improperly determines the ancestor interface type during expansion (within
Expand_Iterator_Loop_Over_Container), for both "in" and "of" iterator forms.
The FE was assuming that the iterator interface is simply the parent type
of the iterator type, but that type can occur later in the interface list,
or be inherited. A new function is added that properly locates a type's
iterator interface ancestor, if any, and is called for analysis and expansion.
gcc/ada/
* exp_ch5.adb (Expand_Iterator_Loop_Over_Container): Retrieve the
iteration type's iteration interface progenitor via
Iterator_Interface_Ancestor, in the case of both "in" and "of"
iterators. Narrow the scope of Pack, so it's declared and
initialized only within the code related to "of" iterators, and
change its name to Cont_Type_Pack. Adjust comments.
* sem_ch5.adb (Get_Cursor_Type): In the case of a derived type,
retrieve the iteration type's iterator interface progenitor (if it
exists) via Iterator_Interface_Ancestor rather than assuming that
the parent type is the interface progenitor.
* sem_util.ads (Iterator_Interface_Ancestor): New function.
* sem_util.adb (Iterator_Interface_Ancestor): New function
returning a type's associated iterator interface type, if any, by
collecting and traversing the type's interfaces.
Eric Botcazou [Tue, 31 Oct 2023 16:27:15 +0000 (17:27 +0100)]
ada: Fix internal error on 'Address of task component
This happens when the prefix of the selected component is of an access type,
i.e. there is an implicit dereference. because the prefix is not resolved.
gcc/ada/
* sem_attr.adb (Resolve_Attribute) <Attribute_Address>: Remove the
bypass for prefixes with task type.
Eric Botcazou [Fri, 27 Oct 2023 09:34:31 +0000 (11:34 +0200)]
ada: Further cleanup in finalization machinery
The bodies of generic units are instantiated separately by GNAT at the end
of the processing of the compilation unit. This requires the deferral of
the generation of cleanups and finalization actions in enclosing scopes,
except for instantiations in generic units where they are not generated.
The criterion used to detect this latter case is Inside_A_Generic, but this
global variable is not properly updated during the instantiation of generic
bodies, leading to problems with nested instantiations, so it is changed to
Expander_Active instead. As a matter of fact, the exact same idiom is used
a few lines above to clear the Needs_Body variable.
gcc/ada/
* sem_ch12.adb (Analyze_Package_Instantiation): Test Expander_Active
to detect generic contexts for the generation of cleanup actions.
Justin Squirek [Fri, 27 Oct 2023 00:08:07 +0000 (00:08 +0000)]
ada: Fix string indexing within GNAT.Calendar.Time_IO.Value
The patch fixes an issue in the compiler whereby calls to
GNAT.Calendar.Time_IO.Value where the actual for formal String Date with
indexing starting at any value besides one would result in a spurious runtime
exception.
gcc/ada/
* libgnat/g-catiio.adb (Value): Modify conditionals to use 'Last
instead of 'Length
Eric Botcazou [Tue, 24 Oct 2023 07:50:25 +0000 (09:50 +0200)]
ada: Further cleanup in finalization machinery
This removes the specific treatment of transient scopes in initialization
procedures, which is obsolete.
gcc/ada/
* exp_aggr.adb (Convert_To_Assignments): Do not treat initialization
procedures specially when it comes to creating a transient scope.
* exp_ch7.adb (Build_Finalizer.Process_Declarations): Likewise.
* exp_util.adb (Requires_Cleanup_Actions): Likewise.
Doug Rupp [Thu, 12 Oct 2023 19:31:46 +0000 (12:31 -0700)]
ada: Use CLOCK_MONOTONIC on VxWorks
The monotonic clock keeps track of the time that has elapsed since
system startup; that is, the value returned by clock_gettime() is the
amount of time (in seconds and nanoseconds) that has passed since the
system booted. The monotonic clock cannot be reset. As a result,
time interval measurements made relative to the monotonic clock are
not subject to errors resulting from the clock time being unexpectedly
adjusted between the interval start and end.
gcc/ada/
* s-oscons-tmplt.c: #define CLOCK_RT_Ada "CLOCK_MONOTONIC" for
__vxworks
Steve Baird [Fri, 22 Sep 2023 18:54:13 +0000 (11:54 -0700)]
ada: Deep delta aggregates
Add support for "deep" delta aggregates, a GNAT-defined language extension
conditionally enabled via the -gnatX0 switch. In a deep delta aggregate, a
delta choice may specify a subcomponent (as opposed to just a component).
gcc/ada/
* par.adb: Add new Boolean variable Inside_Delta_Aggregate.
* par-ch4.adb (P_Simple_Expression): Add support for a deep delta
aggregate choice. We turn a sequence of selectors into a peculiar
tree. We build a component (Indexed or Selected) whose prefix is
another such component, etc. The leftmost prefix at the bottom of
the tree has a "name" which is the first selector, without any
further prefix. For something like "with delta (1)(2) => 3" where
the type of the aggregate is an array of arrays of integers, we'll
build an N_Indexed_Component whose prefix is an integer literal 1.
This is consistent with the trees built for "regular"
(Ada-defined) delta aggregates.
* sem_aggr.adb (Is_Deep_Choice, Is_Root_Prefix_Of_Deep_Choice):
New queries.
(Resolve_Deep_Delta_Assoc): new procedure.
(Resolve_Delta_Array_Aggregate): call Resolve_Deep_Delta_Assoc in
deep case.
(Resolve_Delta_Record_Aggregate): call Resolve_Deep_Delta_Assoc in
deep case.
(Get_Component_Type): new function replaces old Get_Component
function.
* sem_aggr.ads (Is_Deep_Choice, Is_Root_Prefix_Of_Deep_Choice):
New queries.
* exp_aggr.adb (Expand_Delta_Array_Aggregate): add nested function
Make_Array_Delta_Assignment_LHS; call it instead of
Make_Indexed_Component.
(Expand_Delta_Record_Aggregate): add nested function
Make_Record_Delta_Assignment_LHS; call it instead of
Make_Selected_Component.
* exp_spark.adb (Expand_SPARK_Delta_Or_Update): Insert range
checks for indexes in deep delta aggregates.
ada: Fix Ada.Text_IO.Delete with "encoding=8bits" form
Before this patch, on Windows, file with non-ASCII Latin1 names could be created
with Ada.Text_IO.Create by passing "encoding=8bits" through the Form
parameter and a Latin1-encoded string through the Name parameter,
but calling Ada.Text_IO.Delete on them raised an illegitimate exception.
This patch fixes this by making the wrappers of the unlink system function
aware of the encoding value passed through the Form parameter. It also
removes an unnecessary curly-brace block.
gcc/ada/
* adaint.c (__gnat_unlink): Add new parameter and fix text
conversion on Windows. Remove unnecessary curly braces.
* adaint.h (__gnat_unlink): Add new parameter.
* libgnat/i-cstrea.ads (unlink): Adapt to __gnat_unlink signature
change.
* libgnat/i-cstrea.adb (unlink): New Subprogram definition.
* libgnat/s-crtl.ads (unlink): Adapt to __gnat_unlink signature
change.
* libgnat/s-fileio.adb (Delete): Pass encoding argument to unlink.
Eric Botcazou [Fri, 20 Oct 2023 15:22:07 +0000 (17:22 +0200)]
ada: Fix spurious error on call with default parameter in generic package
This occurs when the default value is a function call returning a private
type, and is caused by a bad interaction between two internal mechanisms.
gcc/ada/
* sem_ch12.adb (Save_Global_References.Set_Global_Type): Beef up
comment about the setting of the full view.
* sem_res.adb (Resolve_Actuals.Insert_Default): Add another bypass
for the case of a generic context.
ada: Fix SCOs generation for aspect specifications
The recent overhaul for the representation of aspect specifications in
the tree broke SCOs generation: decisions that appeared in aspects were
processed twice, leading to the emission of erroneous obligations. Tweak
SCOs generation to skip aspect specifications the second time to go back
to the previous behavior.
This patch makes it so -gnatg is always passed to the compiler when
rebuilding the run-time library with the dedicated GPR files. Before
this patch, if a user rebuilt the run-time with -XADAFLAGS=XXX where
XXX didn't include "-gnatg", the build would immediately fail. This
case occurs when following the instructions in libada.gpr, which
use '-XADAFLAGS="-gnatn"'.
Jakub Jelinek [Tue, 21 Nov 2023 09:03:26 +0000 (10:03 +0100)]
testsuite: Fix up pr111309-2.c on arm [PR111309]
ARM defaults to -fshort-enums and the following testcase FAILs there in 2
lines. The difference is that in C++, E0 has enum E type, which normally
has unsigned int underlying type, so it isn't int nor something that
promotes to int, which is why we diagnose it (in C it is promoted to int).
But with -fshort-enums, the underlying type is unsigned char in that case,
which promotes to int just fine.
The following patch adjusts the expectations, such that we don't expect
it on arm or when people manually test with -fshort-enums.
2023-11-21 Jakub Jelinek <jakub@redhat.com>
PR c/111309
* c-c++-common/pr111309-2.c (foo): Don't expect errors for C++ with
-fshort-enums if second argument is E0.
As the testcase shows, I've missed one spot where initially the code thinks
it could use 2 argument IFN_CLZ/IFN_CTZ form, but then verifies it can't
because it doesn't have the right target value and turns it into the
arg0 ? arg1 : .C[LT]Z (arg0)
form. That form evaluates the argument twice though and so needs save_expr,
which I've missed to call in that case. In other cases where it is known
from the beginning that it will be needed (e.g. the __builtin_clzg case
on types smaller than unsigned int where we'll need to add an addend
to the clz value) or the unsigned __int128 expansion called save_expr
before.
2023-11-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112639
* builtins.cc (fold_builtin_bit_query): If arg0 has side-effects, arg1
is specified but cleared, call save_expr on arg0.
Hongyu Wang [Thu, 16 Nov 2023 07:18:07 +0000 (15:18 +0800)]
[APX PPX] Support Intel APX PPX
PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP
can be marked with a 1-bit hint to indicate that the POP reads the
value written by the PUSH from the stack. The processor tracks these marked
instructions internally and fast-forwards register data between
matching PUSH and POP instructions, without going through memory or
through the training loop of the Fast Store Forwarding Predictor (FSFP).
This feature can also be adopted to PUSH2/POP2.
For GCC, we emit explicit suffix 'p' (paired) to indicate the push/pop
pair are marked with PPX hint. To separate form original push/pop, we
add an UNSPEC on top of those PUSH/POP patterns.
In the first implementation we only emit them under prologue/epilogue
when saving/restoring callee-saved registers to make sure push/pop are
paired. So an extra flag was added to check if PPX insns can be emitted
for those register save/restore interfaces.
The PPX hint is purely a performance hint. If the 'p' suffix is not
emitted for paired push/pop, the PPX optimization will be disabled,
while program sematic will not be affected at all.
gcc/ChangeLog:
* config/i386/i386-expand.h (gen_push): Add default bool
parameter.
(gen_pop): Likewise.
* config/i386/i386-opts.h (enum apx_features): Add apx_ppx, add
it to apx_all.
* config/i386/i386.cc (ix86_emit_restore_reg_using_pop): Add
ppx_p parameter for function declaration.
(gen_push2): Add ppx_p parameter, emit push2p if ppx_p is true.
(gen_push): Likewise.
(ix86_emit_restore_reg_using_pop2): Likewise for pop2p.
(ix86_emit_save_regs): Emit pushp/push2p under TARGET_APX_PPX.
(ix86_emit_restore_reg_using_pop): Add ppx_p, emit popp insn
and adjust cfi when ppx_p is ture.
(ix86_emit_restore_reg_using_pop2): Add ppx_p and parse to its
callee.
(ix86_emit_restore_regs_using_pop2): Likewise.
(ix86_expand_epilogue): Parse TARGET_APX_PPX to
ix86_emit_restore_reg_using_pop.
* config/i386/i386.h (TARGET_APX_PPX): New.
* config/i386/i386.md (UNSPEC_APX_PPX): New unspec.
(pushp_di): New define_insn.
(popp_di): Likewise.
(push2p_di): Likewise.
(pop2p_di): Likewise.
* config/i386/i386.opt: Add apx_ppx enum.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-interrupt-1.c: Adjust option to restrict them
under certain subfeatures.
* gcc.target/i386/apx-push2pop2-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.
* gcc.target/i386/apx-ppx-1.c: New test.
Richard Biener [Mon, 20 Nov 2023 14:16:44 +0000 (15:16 +0100)]
tree-optimization/111970 - fix issue with SLP of emulated gather/scatter
There's a missed index adjustment for the SLP vector number when
computing the index/data vectors for emulated gather/scatter with SLP.
The following fixes this.
PR tree-optimization/111970
* tree-vect-stmts.cc (vectorizable_load): Fix offset calculation
for SLP gather load.
(vectorizable_store): Likewise for SLP scatter store.
Xi Ruoyao [Tue, 21 Nov 2023 01:09:25 +0000 (09:09 +0800)]
LoongArch: Fix libgcc build failure when libc is not available
To use int64_t we included <stdint.h> in loongarch-def.h.
Unfortunately, loongarch-def.h is also used by libgcc etc., causing a
build failure when building a "stage1" cross compiler at which the
target libc is not built yet.
As int64_t is used for a C-compatible replacement of HOST_WIDE_INT, it's
not directly or indirectly referred by the target libraries. So
guard everything requiring stdint.h with #if then they'll not block
target libraries.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h (stdint.h): Guard with #if to
exclude it for target libraries.
(loongarch_isa_base_features): Likewise.
(loongarch_isa): Likewise.
(loongarch_abi): Likewise.
(loongarch_target): Likewise.
(loongarch_cpu_default_isa): Likewise.
Juzhe-Zhong [Tue, 21 Nov 2023 02:13:38 +0000 (10:13 +0800)]
RISC-V: Fix reduc_run-9.c test value check bug
The current test value check is incorrect which is exposed on -march=rv64gcv_zvl256b
Confirm on X86 also abort:
[jzzhong@rios-cad121:/work/home/jzzhong/work/insn]$./a.out
------33.000000,4078.000000,45001776.000000,63369904.000000---
Aborted (core dumped)
Sebastian Huber [Mon, 20 Nov 2023 13:48:03 +0000 (14:48 +0100)]
gcov: Use unshare_expr() in gen_counter_update()
This fixes issues like this:
gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c: In function 'main':
gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c:19:1: error: incorrect sharing of tree nodes
__gcov0.main[0]
# .MEM_12 = VDEF <.MEM_9>
__gcov0.main[0] = PROF_edge_counter_4;
during IPA pass: profile
gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c:19:1: internal compiler error: verify_gimple failed
Unshare the counter expression in the second gimple_build_assign() in
gen_counter_update(). This is similar to the original
gimple_gen_edge_profiler() for "ref":
void
gimple_gen_edge_profiler (int edgeno, edge e)
{
tree one;
Jan Hubicka [Mon, 20 Nov 2023 18:35:53 +0000 (19:35 +0100)]
inter-procedural value range propagation
implement very basic propapgation of return value ranges from VRP
pass. This helps std::vector's push_back since we work out value range of
allocated block. This propagates only within single translation unit. I hoped
we will also do the propagation at WPA stage, but that needs more work on
ipa-cp side.
I also added code auto-detecting return_nonnull and corresponding -Wsuggest-attribute.
gcc/ChangeLog:
* cgraph.cc (add_detected_attribute_1): New function.
(cgraph_node::add_detected_attribute): Likewise.
* cgraph.h (cgraph_node::add_detected_attribute): Declare.
* common.opt: Add -Wsuggest-attribute=returns_nonnull.
* doc/invoke.texi: Document new flag.
* gimple-range-fold.cc (fold_using_range::range_of_call):
Use known reutrn value ranges.
* ipa-prop.cc (struct ipa_return_value_summary): New type.
(class ipa_return_value_sum_t): New type.
(ipa_return_value_sum): New summary.
(ipa_record_return_value_range): New function.
(ipa_return_value_range): New function.
* ipa-prop.h (ipa_return_value_range): Declare.
(ipa_record_return_value_range): Declare.
* ipa-pure-const.cc (warn_function_returns_nonnull): New funcion.
* ipa-utils.h (warn_function_returns_nonnull): Declare.
* symbol-summary.h: Fix comment.
* tree-vrp.cc (execute_ranger_vrp): Record return values.
Richard Biener [Mon, 20 Nov 2023 10:12:43 +0000 (11:12 +0100)]
tree-optimization/112618 - unused .MASK_CALL
We have to make sure to remove unused .MASK_CALL internal function
calls after vectorization.
PR tree-optimization/112618
* tree-vect-loop.cc (vect_transform_loop_stmt): For not
relevant and unused .MASK_CALL make sure we remove the
scalar stmt.
Richard Biener [Mon, 20 Nov 2023 12:39:52 +0000 (13:39 +0100)]
tree-optimization/112281 - loop distribution and zero dependence distances
The following fixes an omission in dependence testing for loop
distribution. When the overall dependence distance is not zero but
the dependence direction in the innermost common loop is = there is
a conflict between the partitions and we have to merge them.
PR tree-optimization/112281
* tree-loop-distribution.cc
(loop_distribution::pg_add_dependence_edges): For = in the
innermost common loop record a partition conflict.
* gcc.dg/torture/pr112281-1.c: New testcase.
* gcc.dg/torture/pr112281-2.c: Likewise.
Richard Biener [Mon, 20 Nov 2023 10:29:59 +0000 (11:29 +0100)]
middle-end/112622 - convert and vector-to-float
The following avoids ICEing when trying to convert a vector to
a scalar float.
PR middle-end/112622
* convert.cc (convert_to_real_1): Use element_precision
where a vector type might appear. Provide specific
diagnostic for unexpected vector argument.
The rootcase is we don't enable V4SImode, instead, we already have RVVMF2SI which is totally same as V4SI
on -march=rv32gcv_zvl256 + --param=riscv-autovec-preference=fixed-vlmax.
The attribute VDEMODE map to V4SI is incorrect, we remove attributes and use get_vector_mode to get
right mode.
Robin Dapp [Tue, 14 Nov 2023 13:11:09 +0000 (14:11 +0100)]
RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.
We currently allow 64-bit indices/offsets for vector indexed loads and
stores even on rv32 but we should not.
This patch adjusts the iterators as well as the insn conditions to
reflect the RVV spec.
It also fixes an oversight in the VLS modes of the demote iterator that
was found while testing the patch.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (gather_scatter_valid_offset_mode_p):
Add check for XLEN == 32.
* config/riscv/vector-iterators.md: Change VLS part of the
demote iterator to 2x elements modes
* config/riscv/vector.md: Adjust iterators and insn conditions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-1.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-10.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-11.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-12.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-2.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-3.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-4.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-5.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-6.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-7.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-8.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_32-9.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c:
Adjust include.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-1.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-10.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-11.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-2.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-3.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-4.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-5.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-6.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-7.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-8.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-9.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c:
Adjust include.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-1.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-10.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-2.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-3.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-4.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-5.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-6.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-7.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-8.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-9.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c:
Adjust include.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-1.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-10.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-2.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-4.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-5.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-6.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-7.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-8.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_32-9.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-2.c: ...here.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c:
Adjust include.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-10.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-11.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-2.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-3.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-4.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-5.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-6.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-7.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-8.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-9.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-10.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-11.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-2.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-3.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-4.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-5.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-6.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-7.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-8.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_64-9.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-10.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-2.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-3.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-4.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-5.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-6.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-7.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-8.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_64-9.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-10.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-3.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-4.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-5.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-6.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-7.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-8.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_64-9.c: New test.
Christophe Lyon [Wed, 15 Nov 2023 08:12:35 +0000 (08:12 +0000)]
arm: [MVE intrinsics] Add support for contiguous loads and stores
This patch adds base support for load/store intrinsics to the
framework, starting with loads and stores for contiguous memory
elements, without extension nor truncation.
Compared to the aarch64/SVE implementation, there's no support for
gather/scatter loads/stores yet. This will be added later as needed.
Christophe Lyon [Wed, 15 Nov 2023 07:50:57 +0000 (07:50 +0000)]
arm: Fix arm_simd_types and MVE scalar_types
So far we define arm_simd_types and scalar_types using type
definitions like intSI_type_node, etc...
This is causing problems with later patches which re-implement
load/store MVE intrinsics, leading to error messages such as:
error: passing argument 1 of 'vst1q_s32' from incompatible pointer type
note: expected 'int *' but argument is of type 'int32_t *' {aka 'long int *'}
This patch uses get_typenode_from_name (INT32_TYPE) instead, which
defines the types as appropriate for the target/C library.
Juzhe-Zhong [Mon, 20 Nov 2023 10:39:10 +0000 (18:39 +0800)]
RISC-V Regression: Remove scalable compile option
Since we already set scalable vectorization by default, this flag is redundant.
Also, we are start to full coverage testing with different compile option.
E.g --param=riscv-autovec-preference=fixed-vlmax.
To avoid compile option confusion. Remove it.
Jakub Jelinek [Mon, 20 Nov 2023 09:37:59 +0000 (10:37 +0100)]
c, c++: Add new value for vector types for __builtin_classify_type
While filing a clang request to return 18 on _BitInts for
__builtin_classify_type instead of -1 they return currently, I've
noticed that we return -1 for vector types. Initially I wanted to change
behavior just for __builtin_classify_type (type) form, as that is new in
GCC 14 and we've returned for 20+ years -1 for __builtin_classify_type
on vector expressions, but I was convinved otherwise, so this changes
the behavior even for that and now returns 19.
Robin Dapp [Fri, 17 Nov 2023 09:34:35 +0000 (10:34 +0100)]
vect: Add bool pattern handling for COND_OPs.
In order to handle masks properly for conditional operations this patch
teaches vect_recog_mask_conversion_pattern to also handle conditional
operations. Now we convert e.g.
Jakub Jelinek [Mon, 20 Nov 2023 09:03:20 +0000 (10:03 +0100)]
tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693]
On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote:
> As a follow-up, I'm considering changing in this routine the popcount
> call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
Here is the follow-up which does the rtx costs testing.
2023-11-20 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with
result only used in equality comparison against 1 with direct optab
support as .POPCOUNT call with 2 arguments.
* internal-fn.h (expand_POPCOUNT): Declare.
* internal-fn.def (DEF_INTERNAL_INT_EXT_FN): New macro, document it,
undefine at the end.
(POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN.
* internal-fn.cc (DEF_INTERNAL_INT_EXT_FN): Define to nothing before
inclusion to define expanders.
(expand_POPCOUNT): New function.
Per the earlier discussions on this PR, the following patch folds
popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
if the corresponding popcount optab isn't implemented (I think any
double-word popcount or call will be necessarily slower than the
above cheap 3 op check and even for -Os larger or same size).
I've noticed e.g. C++ aligned new starts with std::has_single_bit
which does popcount (x) == 1.
As a follow-up, I'm considering changing in this routine the popcount
call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
2023-11-20 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): New function.
(math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR
and NE_EXPR assignments and GIMPLE_CONDs.