When generating PA 1.x code or code for GNU ld, floating-point
accesses only support 5-bit displacements but integer accesses
support 14-bit displacements. I mistakenly assumed reload
could fix an invalid 14-bit displacement in a floating-point
access but this is not the case.
2024-03-14 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR target/114288
* config/pa/pa.cc (pa_legitimate_address_p): Don't allow
14-bit displacements before reload for modes that may use
a floating-point load or store.
David Faust [Thu, 14 Mar 2024 16:05:38 +0000 (09:05 -0700)]
bpf: define INT8_TYPE as signed char
Change the BPF backend to define INT8_TYPE with an explicit sign, rather
than a plain char. This is in line with other targets and removes the
risk of int8_t being affected by the signedness of the plain char type
of the host system.
The motivation for this change is that even if `char' is defined to be
signed in BPF targets, some BPF programs use the (mal)practice of
including internal libc headers, either directly or indirectly via
kernel headers, which in turn may trigger compilation errors regarding
redefinitions of types.
gcc/
* config/bpf/bpf.h (INT8_TYPE): Change to signed char.
Max Filippov [Thu, 14 Mar 2024 11:20:36 +0000 (04:20 -0700)]
gcc: xtensa: reorder movsi_internal patterns for better code generation during LRA
After switching to LRA xtensa backend generates the following code for
saving/loading registers:
movi a9, 0x190
add a9, a9, sp
s32i.n a3, a9, 0
instead of the shorter and more efficient
s32i a3, a9, 0x190
E.g. the following code can be used to reproduce it:
int f1(int a, int b, int c, int d, int e, int f, int *p);
int f2(int a, int b, int c, int d, int e, int f, int *p);
int f3(int a, int b, int c, int d, int e, int f, int *p);
int foo(int a, int b, int c, int d, int e, int f)
{
int g[100];
return
f1(a, b, c, d, e, f, g) +
f2(a, b, c, d, e, f, g) +
f3(a, b, c, d, e, f, g);
}
This happens in the LRA pass because s32i.n and l32i.n are listed before
the s32i and l32i in the movsi_internal pattern and alternative
consideration loop stops early.
gcc/
* config/xtensa/xtensa.md (movsi_internal): Move l32i and s32i
patterns ahead of the l32i.n and s32i.n.
Jonathan Wakely [Mon, 26 Feb 2024 13:17:13 +0000 (13:17 +0000)]
libstdc++: Add nodiscard in <algorithm>
Add the [[nodiscard]] attribute to several functions in <algorithm>.
These all have no side effects and are only called for their return
value (e.g. std::count) or produce a result that must not be discarded
for correctness (e.g. std::remove).
I was intending to add the attribute to a number of other functions like
std::copy_if, std::unique_copy, std::set_union, and std::set_difference.
I stopped when I noticed that MSVC doesn't use it on those functions,
which I suspect is because they're often used with an insert iterator
(e.g. std::back_insert_iterator). In that case it doesn't matter if
you discard the result, because you have the container to tell you how
many elements were copied to the output range.
Jakub Jelinek [Thu, 14 Mar 2024 16:48:30 +0000 (17:48 +0100)]
icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]
AFAIK we have no code in LTO streaming to stream out or in
SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
and let vrp1 and alias analysis after IPA recompute that. There is
just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
we saved and propagated, but that is after streaming in bodies for the
post IPA optimizations.
Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
earlier in many cases (er.g. evrp and early alias analysis but other spots
too), but IPA ICF is ignoring the ranges and points-to details when
comparing the bodies. I think ignoring that is just fine, that is
effectively what we do for LTO where we throw that information away
before the analysis, and not ignoring it could lead to fewer ICF merging
possibilities.
So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
just isn't there on SSA_NAMEs in functions into which other functions have
been ICFed, and for non-LTO throws that information away (which matches the
LTO behavior).
Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
vector (just one of the 2) on successful sem_function::equals on the
sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
leader map to SSA_NAMEs in the other function) and use that vector
to union the ranges in sem_function::merge. I can implement that for
comparison, but wanted to post this first if there is an agreement on
doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
into account. I think we can compare SSA_NAME_RANGE_INFO, but have
no idea how to try to compare points to info. And I think it will result
in less effective ICF for non-LTO vs. LTO unnecessarily.
2024-03-12 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113907
* ipa-icf.cc (sem_item_optimizer::merge_classes): Reset
SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
functions.
Xi Ruoyao [Wed, 13 Mar 2024 12:44:38 +0000 (20:44 +0800)]
LoongArch: Remove unused and incorrect "sge<u>_<X:mode><GPR:mode>" define_insn
If this insn is really used, we'll have something like
slti $r4,$r0,$r5
in the code. The assembler will reject it because slti wants 2
register operands and 1 immediate operand. But we've not got any bug
report for this, indicating this define_insn is unused at all.
Note that do_store_flag (in expr.cc) is already converting x >= 1 to
x > 0 unconditionally, so this define_insn is indeed unused and we can
just remove it.
Jakub Jelinek [Thu, 14 Mar 2024 13:09:20 +0000 (14:09 +0100)]
aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]
The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap<mode> expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap<mode>_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.
So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.
2024-03-14 Jakub Jelinek <jakub@redhat.com>
PR target/114310
* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
TImode force newval into a register.
Juergen Christ [Wed, 25 Oct 2023 12:57:03 +0000 (14:57 +0200)]
s390: fix htm-builtins test cases
Transactional and non-transactional stores to the same cache line cause
transactions to abort on newer generations. Add sufficient padding to make
sure another cache line is used.
Lewis Hyatt [Tue, 12 Dec 2023 22:46:36 +0000 (17:46 -0500)]
libcpp: Fix macro expansion for argument of __has_include [PR110558]
When the file name for a #include directive is the result of stringifying a
macro argument, libcpp needs to take some care to get the whitespace
correct; in particular stringify_arg() needs to see a CPP_PADDING token
between macro tokens so that it can figure out when to output space between
tokens. The CPP_PADDING tokens are not normally generated when handling a
preprocessor directive, but for #include-like directives, libcpp sets the
state variable pfile->state.directive_wants_padding to TRUE so that the
CPP_PADDING tokens will be output, and then everything works fine for
computed includes.
As the PR points out, things do not work fine for __has_include. Fix that by
setting the state variable the same as is done for #include.
libcpp/ChangeLog:
PR preprocessor/110558
* macro.cc (builtin_has_include): Set
pfile->state.directive_wants_padding prior to lexing the
file name, in case it comes from macro expansion.
gcc/testsuite/ChangeLog:
PR preprocessor/110558
* c-c++-common/cpp/has-include-2.c: New test.
* c-c++-common/cpp/has-include-2.h: New test.
Lewis Hyatt [Wed, 20 Dec 2023 21:27:42 +0000 (16:27 -0500)]
libcpp: Fix __has_include_next ICE in the last directory of the path [PR80755]
In libcpp/files.cc, the function _cpp_has_header(), which implements
__has_include and __has_include_next, does not check for a NULL return value
from search_path_head(), leading to an ICE tripping an assert when
_cpp_find_file() tries to use it. Fix it by checking for that case and
silently returning false instead.
As suggested by the PR author, it is easiest to make a testcase by using
the -idirafter option. To enable that, also modify the dg-additional-options
testsuite procedure to make the global $srcdir available, since -idirafter
requires the full path.
libcpp/ChangeLog:
PR preprocessor/80755
* files.cc (search_path_head): Add SUPPRESS_DIAGNOSTIC argument
defaulting to false.
(_cpp_has_header): Silently return false if the search path has been
exhausted, rather than issuing a diagnostic and then hitting an
assert.
gcc/testsuite/ChangeLog:
* lib/gcc-defs.exp (dg-additional-options): Make $srcdir usable in a
dg-additional-options directive.
* c-c++-common/cpp/has-include-next-2-dir/has-include-next-2.h: New test.
* c-c++-common/cpp/has-include-next-2.c: New test.
Gaius Mulley [Thu, 14 Mar 2024 11:23:42 +0000 (11:23 +0000)]
PR modula2/114333 set type comparison against a cardinal should cause an error
The type checker M2Check.mod needs extending to detect if a set, array or
record is in either operand at the end of the cascaded test list.
gcc/m2/ChangeLog:
PR modula2/114333
* gm2-compiler/M2Check.mod (checkUnbounded): New procedure
function.
(checkArrayTypeEquivalence): Extend checking to cover unbounded
arrays, arrays and constants.
(IsTyped): Simplified the expression and corrected a test for
IsConstructor.
(checkTypeKindViolation): New procedure function.
(doCheckPair): Call checkTypeKindViolation.
* gm2-compiler/M2GenGCC.mod (CodeStatement): Remove parameters
to CodeEqu and CodeNotEqu.
(PerformCodeIfEqu): New procedure.
(CodeIfEqu): Rewrite.
(PerformCodeIfNotEqu): New procedure.
(CodeIfNotEqu): Rewrite.
* gm2-compiler/M2Quads.mod (BuildRelOpFromBoolean): Correct
comment.
gcc/testsuite/ChangeLog:
PR modula2/114333
* gm2/cse/pass/testcse54.mod: New test.
* gm2/iso/run/pass/array9.mod: New test.
* gm2/iso/run/pass/strcons3.mod: New test.
* gm2/iso/run/pass/strcons4.mod: New test.
* gm2/pim/fail/badset1.mod: New test.
* gm2/pim/fail/badset2.mod: New test.
* gm2/pim/fail/badset3.mod: New test.
* gm2/pim/fail/badset4.mod: New test.
Chung-Lin Tang [Thu, 14 Mar 2024 10:39:52 +0000 (10:39 +0000)]
OpenACC 2.7: front-end support for readonly modifier
This patch implements the front-end support for the 'readonly' modifier for the
OpenACC 'copyin' clause and 'cache' directive.
This currently only includes front-end parsing for C/C++/Fortran and setting of
new bits OMP_CLAUSE_MAP_READONLY, OMP_CLAUSE__CACHE__READONLY. Further linking
of these bits to points-to analysis and/or utilization of read-only memory in
accelerator target are for later patches.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_oacc_data_clause): Add parsing support for
'readonly' modifier, set OMP_CLAUSE_MAP_READONLY if readonly modifier
found, update comments.
(c_parser_oacc_cache): Add parsing support for 'readonly' modifier,
set OMP_CLAUSE__CACHE__READONLY if readonly modifier found, update
comments.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_oacc_data_clause): Add parsing support for
'readonly' modifier, set OMP_CLAUSE_MAP_READONLY if readonly modifier
found, update comments.
(cp_parser_oacc_cache): Add parsing support for 'readonly' modifier,
set OMP_CLAUSE__CACHE__READONLY if readonly modifier found, update
comments.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_namelist): Print "readonly," for
OMP_LIST_MAP and OMP_LIST_CACHE if n->u.map.readonly is set.
Adjust 'n->u.map_op' to 'n->u.map.op'.
* gfortran.h (typedef struct gfc_omp_namelist): Adjust map_op as
'ENUM_BITFIELD (gfc_omp_map_op) op:8', add 'bool readonly' field,
change to named struct field 'map'.
* openmp.cc (gfc_match_omp_map_clause): Adjust 'n->u.map_op' to
'n->u.map.op'.
(gfc_match_omp_clause_reduction): Likewise.
(gfc_match_omp_clauses): Add readonly modifier parsing for OpenACC
copyin clause, set 'n->u.map.op' and 'n->u.map.readonly' for parsed
clause. Adjust 'n->u.map_op' to 'n->u.map.op'.
(gfc_match_oacc_declare): Adjust 'n->u.map_op' to 'n->u.map.op'.
(gfc_match_oacc_cache): Add readonly modifier parsing for OpenACC
cache directive.
(resolve_omp_clauses): Adjust 'n->u.map_op' to 'n->u.map.op'.
* trans-decl.cc (add_clause): Adjust 'n->u.map_op' to 'n->u.map.op'.
(finish_oacc_declare): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_CLAUSE_MAP_READONLY,
OMP_CLAUSE__CACHE__READONLY to 1 when readonly is set. Adjust
'n->u.map_op' to 'n->u.map.op'.
(gfc_add_clause_implicitly): Adjust 'n->u.map_op' to 'n->u.map.op'.
gcc/ChangeLog:
* tree.h (OMP_CLAUSE_MAP_READONLY): New macro.
(OMP_CLAUSE__CACHE__READONLY): New macro.
* tree-core.h (struct GTY(()) tree_base): Adjust comments for new
uses of readonly_flag bit in OMP_CLAUSE_MAP_READONLY and
OMP_CLAUSE__CACHE__READONLY.
* tree-pretty-print.cc (dump_omp_clause): Add support for printing
OMP_CLAUSE_MAP_READONLY and OMP_CLAUSE__CACHE__READONLY.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/readonly-1.c: New test.
* gfortran.dg/goacc/readonly-1.f90: New test.
Andreas Krebbel [Thu, 14 Mar 2024 08:54:31 +0000 (09:54 +0100)]
IBM Z: Fix -munaligned-symbols
With this fix we make sure that only symbols with a natural alignment
smaller than 2 are considered misaligned with
-munaligned-symbols. Background is that -munaligned-symbols is only
supposed to affect symbols whose natural alignment wouldn't be enough
to fulfill our ABI requirement of having all symbols at even
addresses. Because only these are the cases where we differ from other
architectures.
gcc/ChangeLog:
* config/s390/s390.cc (s390_encode_section_info): Adjust the check
for misaligned symbols.
* config/s390/s390.opt: Improve documentation.
gcc/testsuite/ChangeLog:
* gcc.target/s390/aligned-1.c: Add weak and void variables
incorporating the cases from unaligned-2.c.
* gcc.target/s390/unaligned-1.c: Likewise.
* gcc.target/s390/unaligned-2.c: Removed.
Jakub Jelinek [Thu, 14 Mar 2024 08:57:13 +0000 (09:57 +0100)]
gimple-iterator: Some gsi_safe_insert_*before fixes
When trying to use the gsi_safe_insert*before APIs in bitint lowering,
I've discovered 3 issues and the following patch addresses those:
1) both split_block and split_edge update CDI_DOMINATORS if they are
available, but because edge_before_returns_twice_call first splits
and then adds an extra EDGE_ABNORMAL edge and then removes another
one, the immediate dominators of both the new bb and the bb with
returns_twice call need to change
2) the new EDGE_ABNORMAL edge had uninitialized probability; this patch
copies the probability from the edge that is going to be removed
and similarly copies other flags (EDGE_EXECUTABLE, EDGE_DFS_BACK,
EDGE_IRREDUCIBLE_LOOP etc.)
3) if edge_before_returns_twice_call splits a block, then the bb with
returns_twice call changes, so the gimple_stmt_iterator for it is
no longer accurate, it points to the right statement, but gsi_bb
and gsi_seq are no longer correct; the patch updates it
2024-03-14 Jakub Jelinek <jakub@redhat.com>
* gimple-iterator.cc (edge_before_returns_twice_call): Copy all
flags and probability from ad_edge to e edge. If CDI_DOMINATORS
are computed, recompute immediate dominator of other_edge->src
and other_edge->dest.
(gsi_safe_insert_before, gsi_safe_insert_seq_before): Update *iter
for the returns_twice call case to the gsi_for_stmt (stmt) to deal
with update it for bb splitting.
we must copy the REG_EH_REGION note to the first insn and split the block
after the newly added insn. The REG_EH_REGION on the second insn will be
removed later since it no longer traps.
Jonathan Wakely [Wed, 13 Mar 2024 10:02:12 +0000 (10:02 +0000)]
libstdc++: Move test error_category to global scope
A recent GDB change causes this test to fail due to missing RTTI for the
custom_cast type. This is presumably because the custom_cat type was
defined as a local class, so has no linkage. Moving it to local scope
seems to fix the test regressions, and probably makes the test more
realistic as a local class with no linkage isn't practical to use as an
error category that almost certainly needs to be referred to in other
scopes.
libstdc++-v3/ChangeLog:
* testsuite/libstdc++-prettyprinters/cxx11.cc: Move custom_cat
to namespace scope.
Jonathan Wakely [Thu, 7 Mar 2024 15:19:12 +0000 (15:19 +0000)]
libstdc++: Improve documentation on debugging with libstdc++
libstdc++-v3/ChangeLog:
* doc/xml/manual/debug.xml: Improve docs on debug builds and
using ASan. Mention _GLIBCXX_ASSERTIONS. Reorder sections to put
the most relevant ones first.
* doc/xml/manual/using.xml: Add comma.
* doc/html/*: Regenerate.
Jonathan Wakely [Thu, 7 Mar 2024 12:00:55 +0000 (12:00 +0000)]
libstdc++: Document that _GLIBCXX_CONCEPT_CHECKS might be removed in future
The macro-based concept checks are unmaintained and do not support C++11
or later, so reject valid code. If nobody plans to update them we should
consider removing them. Alternatively, we could ignore the macro for
C++11 and later, so they have no effect and don't reject valid code.
libstdc++-v3/ChangeLog:
* doc/xml/manual/debug.xml: Document that concept checking might
be removed in future.
* doc/xml/manual/extensions.xml: Likewise.
Jakub Jelinek [Wed, 13 Mar 2024 14:34:59 +0000 (15:34 +0100)]
store-merging: Match bswap64 on 32-bit targets with bswapsi2 [PR114319]
gimple-ssa-store-merging.cc tests bswap_optab in 3 different places,
in 2 of them it has special exception for double-word bswap using pair
of word-mode bswap optabs, but in the last one it doesn't.
The following patch changes even the last spot.
We don't handle 128-bit bswaps in the passes at all, because currently we
just use uint64_t to represent the byte reshuffling (we'd need to use
offset_int or something like that instead) and we don't have
__builtin_bswap128 nor type-generic __builtin_bswap, so there is nothing
for 64-bit targets there.
2024-03-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114319
* gimple-ssa-store-merging.cc
(imm_store_chain_info::try_coalesce_bswap): For 32-bit targets
allow matching __builtin_bswap64 if there is bswapsi2 optab.
On arm-none-eabi, the test case fails with below warning on GCC13
.../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:63:65: warning: converting a packed 'enum obj_type' pointer (alignment 1) to a 'struct connection' pointer (alignment 4) may result in an unaligned pointer value [-Waddress-of-packed-member]
Add a dg-bogus to ensure that the warning is not reintroduced.
gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Added dg-bogus with target on offending line for short_enums.
Starting with r14-2047-gd0e891406b16dc two SI mode tests are optimized
into DI mode. Thus, the scan-assembler directives fail. For example
RTL expression
Fixed by moving operands into memory in order to enforce SI mode
computation.
Furthermore, in r9-6056-g290dfd9bc7bea2 the starting bit position of the
scan-assembler directive for rosbg was incorrectly set to 32 which
actually should be 32+SHIFT_AMOUNT, i.e., in this particular case 34.
gcc/testsuite/ChangeLog:
* gcc.target/s390/md/rXsbg_mode_sXl.c: Fix tests rosbg_si_srl
and rxsbg_si_srl.
Similar as to s390_lcbb, s390_vll, s390_vstl, et al. make use of a
signed vector type for vlbb. Furthermore, a const void pointer seems
more common and an integer for the mask.
For s390_vfi(s,d)b make use of integers for masks, too.
Use unsigned integers for all s390_vlbr/vstbr variants.
Make use of type UV16QI for the length operand of s390_vstrs(,z)(h,f).
Following the Principles of Operation, change from signed to unsigned
type for s390_va(c,cc,ccc)q and s390_vs(,c,bc)biq and s390_vmslg.
Make use of scalar type UINT128 instead of UV16QI for s390_vgfm(,a)g,
and s390_vsumq(f,g).
gcc/ChangeLog:
* config/s390/s390-builtin-types.def: Update to reflect latest
changes.
* config/s390/s390-builtins.def: Streamline vector builtins with
LLVM.
Jakub Jelinek [Wed, 13 Mar 2024 09:19:04 +0000 (10:19 +0100)]
bitint: Fix up lowering of bitfield loads/stores [PR114313]
The following testcase ICEs, because for large/huge _BitInt bitfield
loads/stores we use the DECL_BIT_FIELD_REPRESENTATIVE as the underlying
"var" and indexes into it can be larger than the precision of the
bitfield might normally allow.
The following patch fixes that by passing NULL_TREE type in that case
to limb_access, so that we always return m_limb_type type and don't
do the extra assertions, after all, the callers expect that too.
I had to add the first hunk to avoid ICE, it was using type in one place
even when it was NULL. But TYPE_SIZE (TREE_TYPE (var)) seems like the
right size to use anyway because the code uses VIEW_CONVERT_EXPR on it.
2024-03-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114313
* gimple-lower-bitint.cc (bitint_large_huge::limb_access): Use
TYPE_SIZE of TREE_TYPE (var) rather than TYPE_SIZE of type.
(bitint_large_huge::handle_load): Pass NULL_TREE rather than
rhs_type to limb_access for the bitfield load cases.
(bitint_large_huge::lower_mergeable_stmt): Pass NULL_TREE rather than
lhs_type to limb_access if nlhs is non-NULL.
Tobias Burnus [Wed, 13 Mar 2024 08:35:28 +0000 (09:35 +0100)]
OpenMP/Fortran: Fix defaultmap(none) issue with dummy procedures [PR114283]
Dummy procedures look similar to variables but aren't - neither in Fortran
nor in OpenMP. As the middle end sees PARM_DECLs, mark them as predetermined
firstprivate for mapping (as already done in gfc_omp_predetermined_sharing).
This does not address the isses related to procedure pointers, which are
still discussed on spec level [see PR].
PR fortran/114283
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_omp_predetermined_mapping): Map dummy
procedures as firstprivate.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/declare-target-indirect-4.f90: New test.
Jakub Jelinek [Wed, 13 Mar 2024 08:19:05 +0000 (09:19 +0100)]
asan: Fix ICE during instrumentation of returns_twice calls [PR112709]
The following patch on top of the previously posted ubsan/gimple-iterator
one handles asan the same. While the case of returning by hidden reference
is handled differently because of the first recently posted asan patch,
this deals with instrumentation of the aggregates returned in registers
case as well as instrumentation of loads from aggregate memory in the
function arguments of returns_twice calls.
2024-03-13 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/112709
* asan.cc (maybe_create_ssa_name, maybe_cast_to_ptrmode,
build_check_stmt, maybe_instrument_call, asan_expand_mark_ifn): Use
gsi_safe_insert_before instead of gsi_insert_before.
Jakub Jelinek [Wed, 13 Mar 2024 08:16:45 +0000 (09:16 +0100)]
gimple-iterator, ubsan: Fix ICE during instrumentation of returns_twice calls [PR112709]
ubsan, asan (both PR112709) and _BitInt lowering (PR113466) want to
insert some instrumentation or adjustment statements before some statement.
This unfortunately creates invalid IL if inserting before a returns_twice
call, because we require that such calls are the first statement in a basic
block and that we have an edge from the .ABNORMAL_DISPATCHER block to
the block containing the returns_twice call (in addition to other edge(s)).
The following patch adds helper functions for such insertions and uses it
for now in ubsan (I'll post a follow up which uses it in asan and will
work later on the _BitInt lowering PR).
In particular, if the bb with returns_twice call at the start has just
2 edges, one EDGE_ABNORMAL from .ABNORMAL_DISPATCHER and another
(non-EDGE_ABNORMAL/EDGE_EH) from some other bb, it just inserts the
statement or sequence on that other edge.
If the bb has more predecessor edges or the one not from
.ABNORMAL_DISPATCHER is e.g. an EH edge (this latter case likely shouldn't
happen, one would need labels or something like that), the patch splits the
block with returns_twice call such that there is just one edge next to
.ABNORMAL_DISPATCHER edge and adjusts PHIs as needed to make it happen.
The functions also replace uses of PHIs from the returns_twice bb with
the corresponding PHI arguments, because otherwise it would be invalid IL.
E.g. in ubsan/pr112709-2.c (qux) we have before the ubsan pass
<bb 10> :
# .MEM_5(ab) = PHI <.MEM_4(9), .MEM_25(ab)(11)>
# _7(ab) = PHI <_20(9), _8(ab)(11)>
# .MEM_21(ab) = VDEF <.MEM_5(ab)>
_22 = bar (*_7(ab));
where bar is returns_twice call and bb 11 has .ABNORMAL_DISPATCHER call,
this patch instruments it like:
<bb 9> :
# .MEM_4 = PHI <.MEM_17(ab)(4), .MEM_10(D)(5), .MEM_14(ab)(8)>
# DEBUG BEGIN_STMT
# VUSE <.MEM_4>
_20 = p;
# .MEM_27 = VDEF <.MEM_4>
.UBSAN_NULL (_20, 0B, 0);
# VUSE <.MEM_27>
_2 = __builtin_dynamic_object_size (_20, 0);
# .MEM_28 = VDEF <.MEM_27>
.UBSAN_OBJECT_SIZE (_20, 1024, _2, 0);
<bb 10> :
# .MEM_5(ab) = PHI <.MEM_28(9), .MEM_25(ab)(11)>
# _7(ab) = PHI <_20(9), _8(ab)(11)>
# .MEM_21(ab) = VDEF <.MEM_5(ab)>
_22 = bar (*_7(ab));
The edge from .ABNORMAL_DISPATCHER is there just to represent the
returning for 2nd and later times, the instrumentation can't be
done at that point as there is no code executed during that point.
The ubsan/pr112709-1.c testcase includes non-virtual PHIs to cover
the handling of those as well.
2024-03-13 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/112709
* gimple-iterator.h (gsi_safe_insert_before,
gsi_safe_insert_seq_before): Declare.
* gimple-iterator.cc: Include gimplify.h.
(edge_before_returns_twice_call, adjust_before_returns_twice_call,
gsi_safe_insert_before, gsi_safe_insert_seq_before): New functions.
* ubsan.cc (instrument_mem_ref, instrument_pointer_overflow,
instrument_nonnull_arg, instrument_nonnull_return): Use
gsi_safe_insert_before instead of gsi_insert_before.
(maybe_instrument_pointer_overflow): Use force_gimple_operand,
gimple_seq_add_seq_without_update and gsi_safe_insert_seq_before
instead of force_gimple_operand_gsi.
(instrument_object_size): Likewise. Use gsi_safe_insert_before
instead of gsi_insert_before.
* gcc.dg/ubsan/pr112709-1.c: New test.
* gcc.dg/ubsan/pr112709-2.c: New test.
Harald Anlauf [Mon, 11 Mar 2024 21:05:51 +0000 (22:05 +0100)]
Fortran: handle procedure pointer component in DT array [PR110826]
gcc/fortran/ChangeLog:
PR fortran/110826
* array.cc (gfc_array_dimen_size): When walking the ref chain of an
array and the ultimate component is a procedure pointer, do not try
to figure out its dimension even if it is a array-valued function.
gcc/testsuite/ChangeLog:
PR fortran/110826
* gfortran.dg/proc_ptr_comp_53.f90: New test.
Tobias Burnus [Tue, 12 Mar 2024 14:42:50 +0000 (15:42 +0100)]
libgomp/libgomp.texi: Fix @node order in @menu
While texinfo 7.0.3 does not warn, an older texinfo did complain about:
libgomp.texi:1964: warning: node next `omp_target_memcpy' in menu
`omp_target_memcpy_rect' and in sectioning `omp_target_memcpy_async' differ
libgomp/
* libgomp.texi (Device Memory Routines): Swap item order to match
the order of the '@node's of the '@subsection's.
Richard Biener [Tue, 12 Mar 2024 13:00:05 +0000 (14:00 +0100)]
tree-optimization/114121 - chrec_fold_{plus,multiply} and recursion
The following addresses endless recursion in the
chrec_fold_{plus,multiply} functions when handling sign-conversions.
We only need to apply tricks when we'd fail (there's a chrec in the
converted operand) and we need to make sure to not turn the other
operand into something worse (for the chrec-vs-chrec case).
Nathaniel Shead [Sun, 10 Mar 2024 11:06:18 +0000 (22:06 +1100)]
c++: Support target-specific nodes when streaming modules [PR111224]
Some targets make use of POLY_INT_CSTs and other custom builtin types,
which currently violate some assumptions when streaming. This patch adds
support for them, such as types like Aarch64 __fp16, PowerPC __ibm128,
and vector types thereof.
This patch doesn't provide "full" support of AArch64 SVE, however, since
for that we would need to support 'target' nodes (tracked in PR108080).
Adding the new builtin types means that on Aarch64 we now have 217
global trees created on initialisation (up from 191), so this patch also
slightly bumps the initial size of the fixed_trees allocation to 250.
PR c++/98645
PR c++/98688
PR c++/111224
gcc/cp/ChangeLog:
* module.cc (enum tree_tag): Add new tag for builtin types.
(trees_out::start): POLY_INT_CSTs can be emitted.
(trees_in::start): Likewise.
(trees_out::core_vals): Stream POLY_INT_CSTs.
(trees_in::core_vals): Likewise.
(trees_out::type_node): Handle vectors with multiple coeffs.
(trees_in::tree_node): Likewise.
(init_modules): Register target-specific builtin types. Bump
initial capacity slightly.
gcc/testsuite/ChangeLog:
* g++.dg/modules/target-aarch64-1_a.C: New test.
* g++.dg/modules/target-aarch64-1_b.C: New test.
* g++.dg/modules/target-powerpc-1_a.C: New test.
* g++.dg/modules/target-powerpc-1_b.C: New test.
* g++.dg/modules/target-powerpc-2_a.C: New test.
* g++.dg/modules/target-powerpc-2_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Jakub Jelinek [Tue, 12 Mar 2024 10:34:50 +0000 (11:34 +0100)]
asan: Instrument <retval> stores in callees rather than callers [PR112709]
asan currently instruments since PR69276 r6-6758 fix calls which store
the return value into memory on the caller side, before the call it
verifies the memory is writable.
Now PR112709 where we ICE on trying to instrument such calls made me
think about whether that is what we want to do.
There are 3 different cases.
One is when a function returns an aggregate which is passed e.g. in
registers, say like struct S { int a[4]; }; returning on x86_64.
That would be ideally instrumented in between the actual call and
storing of the aggregate into memory, but asan currently mostly
works as a GIMPLE pass and arranging for the instrumentation to happen
at that spot would be really hard. We could diagnose after the call
but generally asan attempts to diagnose stuff before something is
overwritten rather than after, or keep the current behavior (that is
what this patch does, which has the disadvantage that it can complain
about UB even for functions which never return and so never actually store,
and doesn't check whether the memory wasn't e.g. poisoned during the call)
or could e.g. instrument both before and after the call (that would have
the disadvantage the current state has but at least would check post-factum
the store again afterwards).
Another case is when a function returns an aggregate through a hidden
reference, struct T { int a[128]; }; on x86_64 or even the above struct S
on ia32 as example. In the actual program such stores happen when storing
something to <retval> or its parts in the callee, because <retval> there
expands to *hidden_retval. So, IMHO we should instrument those in the
callee rather than caller, that is where the writes are and we can do that
easily. This is what the patch below does.
And the last case is for builtins/internal functions. Usually those don't
return aggregates, but in case they'd do and can be expanded inline, it is
better to instrument them in the caller (as before) rather than not
instrumenting the return stores at all.
I had to tweak the expected output on the PR69276 testcase, because
with the patch it keeps previous behavior on x86_64 (structure returned
in registers, stored in the caller, so reported as UB in A::A()),
while on i686 it changed the behavior and is reported as UB in the
vnull::operator vec which stores the structure, A::A() is then a frame
above it in the backtrace.
2024-03-12 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/112709
* asan.cc (has_stmt_been_instrumented_p): Don't instrument call
stores on the caller side unless it is a call to a builtin or
internal function or function doesn't return by hidden reference.
(maybe_instrument_call): Likewise.
(instrument_derefs): Instrument stores to RESULT_DECL if
returning by hidden reference.
* gcc.dg/asan/pr112709-1.c: New test.
* g++.dg/asan/pr69276.C: Adjust expected output for some targets.
Jakub Jelinek [Tue, 12 Mar 2024 09:23:19 +0000 (10:23 +0100)]
strlen: Fix another spot that can create invalid ranges [PR114293]
This PR is similar to PR110603 fixed with r14-8487, except in a different
spot. From the memset with -1 size of non-zero value we determine minimum
of (size_t) -1 and the code uses PTRDIFF_MAX - 2 (not really sure I
understand why it is - 2 and not - 1, e.g. heap allocated array
with PTRDIFF_MAX char elements which contain '\0' in the last element
should be fine, no? One can still represent arr[PTRDIFF_MAX] - arr[0]
and arr[0] - arr[PTRDIFF_MAX] in ptrdiff_t and
strlen (arr) == PTRDIFF_MAX - 1) as the maximum, so again invalid range.
As in the other case, it is just UB that can lead to that, and we have
choice to only keep the min and use +inf for max, or only keep max
and use 0 for min, or not set the range at all, or use [min, min] or
[max, max] etc. The following patch uses [min, +inf].
2024-03-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114293
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_strlen): If
max is smaller than min, set max to ~(size_t)0.
Richard Biener [Mon, 11 Mar 2024 13:58:57 +0000 (14:58 +0100)]
tree-optimization/114297 - SLP reduction with early break fix
The following makes sure to pass in the SLP node for the live stmts
we are generating the reduction epilogue for to
vect_create_epilog_for_reduction. This follows the previous fix for
the non-SLP path.
PR tree-optimization/114297
* tree-vect-loop.cc (vectorizable_live_operation): Pass in the
live stmts SLP node to vect_create_epilog_for_reduction.
* gcc.dg/vect/vect-early-break_123-pr114297.c: New testcase.
Andrew Pinski [Tue, 12 Mar 2024 00:40:08 +0000 (17:40 -0700)]
Reject -fno-multiflags [PR114314]
When -fmultiflags option support was added in r13-3693-g6b1a2474f9e422,
it accidently allowed -fno-multiflags which then would pass on to cc1.
This fixes that oversight.
Committed as obvious after bootstrap/test on x86_64-linux-gnu.
Gaius Mulley [Mon, 11 Mar 2024 15:21:42 +0000 (15:21 +0000)]
PR modula2/114295 Incorrect location if compiling implementation without definition
This patch fixes a bug which occurred if gm2 was asked to compile an
implementation module and could not find the definition module. The error
location would be set to the SYSTEM module. The bug occurred as the
module sym was created during the peep phase after which the few tokens are
destroyed and recreated during parsing. The bug fix is to call
PutDeclared when the module is encountered during parsing which updates
the tokenno associated with the module.
gcc/m2/ChangeLog:
PR modula2/114295
* gm2-compiler/M2Batch.mod (MakeProgramSource): Call PutDeclared
if the module is known.
(MakeDefinitionSource): Ditto.
(MakeImplementationSource): Ditto.
* gm2-compiler/M2Comp.mod (ExamineHeader): New procedure.
(ExamineCompilationUnit): Rewrite.
(PeepInto): Rewrite.
* gm2-compiler/M2Error.mod (NewError): Remove default call to
GetTokenNo.
* gm2-compiler/M2Quads.mod (callRequestDependant): Push tokno with
Adr.
(BuildStringAdrParam): Ditto.
(doBuildBinaryOp): Push OperatorPos on the bool stack.
(BuildRelOp): Ditto.
* gm2-compiler/P2Build.bnf (SetType): Pass set token pos to
BuildSetType.
(PointerType): Pass pointer token pos to BuildPointerType.
* gm2-compiler/P2SymBuild.def (BuildPointerType): Add parameter
pointerpos.
(BuildSetType): Add parameter setpos.
* gm2-compiler/P2SymBuild.mod (BuildPointerType): Add parameter
pointerpos. Build combined token and use it when creating a
pointer type.
(BuildSetType): Add parameter setpos. Build combined token and
use it when creating a set type.
* gm2-compiler/SymbolTable.mod (DebugUnknownToken): New constant.
(CheckTok): New procedure function.
(MakeProcedure): Call CheckTok.
(MakeRecord): Ditto.
(MakeVarient): Ditto.
(MakeEnumeration): Ditto.
(MakeHiddenType): Ditto.
(MakeConstant): Ditto.
(MakeConstStringCnul): Ditto.
(MakeSubrange): Ditto.
(MakeTemporary): Ditto.
(MakeVariableForParam): Ditto.
(MakeParameterHeapVar): Ditto.
(MakePointer): Ditto.
(MakeSet): Ditto.
(MakeUnbounded): Ditto.
(MakeProcType): Ditto.
Szabolcs Nagy [Mon, 19 Jun 2023 11:56:41 +0000 (12:56 +0100)]
aarch64,arm: Move branch-protection data to targets
The branch-protection types are target specific, not the same on arm
and aarch64. This currently affects pac-ret+b-key, but there will be
a new type on aarch64 that is not relevant for arm.
After the move, change aarch_ identifiers to aarch64_ or arm_ as
appropriate.
Refactor aarch_validate_mbranch_protection to take the target specific
branch-protection types as an argument.
In case of invalid input currently no hints are provided: the way
branch-protection types and subtypes can be mixed makes it difficult
without causing confusion.
Richard Biener [Mon, 11 Mar 2024 08:35:07 +0000 (09:35 +0100)]
middle-end/114299 - missing error recovery from gimplify failure
When internal_get_tmp_var fails to gimplify the value the temporary
SSA name is supposed to be initialized with we can leak SSA names
with a NULL SSA_NAME_DEF_STMT into the IL. That's bad, so recover
from this by instead returning a decl in that case.
PR middle-end/114299
* gimplify.cc (internal_get_tmp_var): When gimplification
of VAL failed, return a decl.
Jakub Jelinek [Mon, 11 Mar 2024 10:00:54 +0000 (11:00 +0100)]
bitint: Avoid rewriting large/huge _BitInt vars into SSA after bitint lowering [PR114278]
The following testcase ICEs, because update-address-taken subpass of
fre5 rewrites
_BitInt(128) b;
vector(16) unsigned char _3;
<bb 2> [local count: 1073741824]:
_3 = MEM <vector(16) unsigned char> [(char * {ref-all})p_2(D)];
MEM <vector(16) unsigned char> [(char * {ref-all})&b] = _3;
b ={v} {CLOBBER(eos)};
to
_BitInt(128) b;
vector(16) unsigned char _3;
<bb 2> [local count: 1073741824]:
_3 = MEM <vector(16) unsigned char> [(char * {ref-all})p_2(D)];
b_5 = VIEW_CONVERT_EXPR<_BitInt(128)>(_3);
but we can't have large/huge _BitInt vars in SSA form after the bitint
lowering except for function arguments loaded from memory, as expansion
isn't able to deal with those, it relies on bitint lowering to lower
those operations.
The following patch fixes that by setting DECL_NOT_GIMPLE_REG_P for
large/huge _BitInt vars after bitint lowering, such that we don't
rewrite them into SSA form.
2024-03-11 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114278
* tree-ssa.cc (maybe_optimize_var): If large/huge _BitInt vars are no
longer addressable, set DECL_NOT_GIMPLE_REG_P on them.
Eric Botcazou [Mon, 11 Mar 2024 08:24:50 +0000 (09:24 +0100)]
Fix placement of recently implemented DIE
It's the DIE added for enumeration types with reverse scalar storage order.
gcc/
PR debug/113519
PR debug/113777
* dwarf2out.cc (gen_enumeration_type_die): In the reverse case,
generate the DIE with the same parent as in the regular case.
gcc/testsuite/
* gcc.dg/sso-20.c: New test.
* gcc.dg/sso-21.c: Likewise.
Andrew Pinski [Sun, 10 Mar 2024 22:17:09 +0000 (22:17 +0000)]
Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]
The problem here is that merge_truthop_with_opposite_arm would
use the type of the result of the comparison rather than the operands
of the comparison to figure out if we are honoring NaNs.
This fixes that oversight and now we get the correct results in this
case.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR middle-end/95351
gcc/ChangeLog:
* fold-const.cc (merge_truthop_with_opposite_arm): Use
the type of the operands of the comparison and not the type
of the comparison.
gcc/testsuite/ChangeLog:
* gcc.dg/float_opposite_arm-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Iain Buclaw [Sun, 10 Mar 2024 16:49:06 +0000 (17:49 +0100)]
d: Fix -fpreview=in ICEs with forward referenced parameter [PR112285]
The way that the target hook preferPassByRef is implemented, it relied
on the GCC "back-end" tree type to determine whether or not to use `ref'
ABI for D `in' parameters; e.g: prefer by value if it is expected that
the target will pass the type around in registers.
Building the GCC tree type depends on the AST type being complete - all
semantic processing is finished - but as this hook is called from the
front-end, this will not be the case for forward referenced or
self-referencing types.
The consensus in upstream is that `in' parameters should always be
implicitly `ref', but as the front-end does not yet support all types
being rvalue references, limit this just static arrays and structs.
PR d/112285
PR d/112290
gcc/d/ChangeLog:
* d-target.cc (Target::preferPassByRef): Return true for all static
array and struct types.
gcc/testsuite/ChangeLog:
* gdc.dg/pr112285.d: New test.
* gdc.dg/pr112290.d: New test.
jlaw [Sun, 10 Mar 2024 17:58:00 +0000 (11:58 -0600)]
[committed] [PR tree-optimization/110199] Simplify MIN/MAX more often
So as I mentioned in the BZ, the case of
t = MIN_EXPR (A, B)
where we know something about the relationship between A and B can be trivially
handled by some existing code in DOM. That existing code would simplify when A
== B. But by testing GE and LE instead of EQ we can cover more cases with
minimal effort. When applicable the MIN/MAX turns into a simple copy.
I made one other change. We have other binary operations that we simplify when
we know something about the relationship between the operands. That code was
not canonicalizing the order of operands when building the expression to lookup
in the hash tables to discover that relationship. Since those paths are only
testing for equality, we can trivially reverse them and not have to worry about
changing codes or anything like that. So extremely safe and avoids having to
come back and fix that code to match the MIN_EXPR/MAX_EXPR case later.
Bootstrapped on x86 and also tested on the crosses. I briefly thought there
was an sh regression, but that was actually the recent fwprop changes twiddling
code generation for one test.
PR tree-optimization/110199
gcc/
* tree-ssa-scopedtables.cc
(avail_exprs_stack::simplify_binary_operation): Generalize handling
of MIN_EXPR/MAX_EXPR to allow additional simplifications. Canonicalize
comparison operands for other cases.
gcc/testsuite
* gcc.dg/tree-ssa/minmax-27.c: New test.
* gcc.dg/tree-ssa/minmax-28.c: New test.
Pan Li [Sun, 10 Mar 2024 03:02:35 +0000 (11:02 +0800)]
VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled
This patch would like to fix one ICE in vectorizable_store when both the
loop_masks and loop_lens are enabled. The ICE looks like below when build
with "-march=rv64gcv -O3".
There are two ways to reach vectorizer LD/ST, one is the analysis and
the other is transform. We cannot have both the lens and the masks
enabled during transform but it is valid during analysis. Given the
transform doesn't required cost_vec, we can only enable the assert
based on cost_vec is NULL or not.
Below testsuites are passed for this patch:
* The x86 bootstrap tests.
* The x86 fully regression tests.
* The aarch64 fully regression tests.
* The riscv fully regressison tests.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Enable the assert
during transform process.
(vectorizable_load): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr114195-1.c: New test.
jlaw [Sun, 10 Mar 2024 02:27:32 +0000 (19:27 -0700)]
[committed] [PR target/111362] Fix compare-debug issue with mode switching
The issue here is the code we emit for mode-switching can change when -g is
added to the command line. This is caused by processing debug notes occurring
after a call which is the last real statement in a basic block.
Without -g the CALL_INSN is literally the last insn in the block and the loop
exits. If mode switching after the call is needed, it'll be handled as we
process outgoing edges.
With -g the loop iterates again and in the processing of the node the backend
signals that a mode switch is necessary.
I pondered fixing this in the target, but the better fix is to ignore the debug
notes in the insn stream.
I did a cursory review of some of the other compare-debug failures, but did not
immediately see others which would likely be fixed by this change. Sigh.
Anyway, bootstrapped and regression tested on x86. Regression tested on rv64
as well.
PR target/111362
gcc/
* mode-switching.cc (optimize_mode_switching): Only process
NONDEBUG insns.
gcc/testsuite
* gcc.target/riscv/compare-debug-1.c: New test.
* gcc.target/riscv/compare-debug-2.c: New test.
Jakub Jelinek [Sat, 9 Mar 2024 12:04:26 +0000 (13:04 +0100)]
fwprop: Restore previous behavior for forward propagation of RTL with MEMs [PR114284]
Before the recent PR111267 r14-8319 fwprop changes, fwprop would never try
to propagate what was not considered PROFITABLE, where the profitable part
actually was partly about profitability, partly about very good reasons
not to actually propagate and partly for cases where propagation is
completely incorrect.
In particular, classify_result has:
/* Allow (subreg (mem)) -> (mem) simplifications with the following
exceptions:
1) Propagating (mem)s into multiple uses is not profitable.
2) Propagating (mem)s across EBBs may not be profitable if the source EBB
runs less frequently.
3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
4) Creating new (mem/v)s is not correct, since DCE will not remove the old
ones. */
if (single_use_p
&& single_ebb_p
&& SUBREG_P (old_rtx)
&& !paradoxical_subreg_p (old_rtx)
&& MEM_P (new_rtx)
&& !MEM_VOLATILE_P (new_rtx))
return PROFITABLE;
and didn't mark any other MEM_P (new_rtx) or rtxes which contain
a MEM in its subrtxes as PROFITABLE. Now, since r14-8319 profitable_p
method has been renamed to likely_profitable_p and has just a minor role.
Now, rule 4) above is something that isn't about profitability, but about
correct behavior, if you propagate mem/v, the code is miscompiled.
This particular case has been fixed elsewhere by Haochen in r14-9379.
But I think even the 1) and 2) and maybe 3) are a strong don't do it,
don't rely solely on rtx costs, increasing the number of loads of the same
memory, even when cached, is undesirable, canceling load hoisting can
be undesirable as well.
So, the following patch restores previous behavior of src contains any MEMs,
in that case likely_profitable_p () is taken as the old profitable_p ()
as a requirement rather than just a hint. For propagation of something
which doesn't load from memory this keeps the r14-8319 behavior.
Xi Ruoyao [Fri, 26 Jan 2024 10:28:32 +0000 (18:28 +0800)]
LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation
In Binutils we need to make IE to LE relaxation only allowed when there
is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid
"partial" relaxation won't happen with the extreme code model. So if we
are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an
R_LARCH_RELAX to allow the relaxation. The IE to LE relaxation does not
require the pcalau12i and the ld instruction to be adjacent, so we don't
need to limit ourselves to use the macro.
For the distro maintainers backporting changes: this change depends on
r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in
the extreme code model.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Support 'Q' for R_LARCH_RELAX for TLS IE.
(loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS
IE.
* config/loongarch/loongarch.md (ld_from_got<mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/tls-ie-relax.c: New test.
* gcc.target/loongarch/tls-ie-norelax.c: New test.
* gcc.target/loongarch/tls-ie-extreme.c: New test.
Jakub Jelinek [Sat, 9 Mar 2024 08:37:07 +0000 (09:37 +0100)]
i386: Regenerate i386.opt.urls
When I've added the -mnoreturn-no-callee-saved-registers option
to i386.opt, I forgot to regenerate i386.opt.urls and Mark's
CI kindly reminded me of that.
Lulu Cheng [Thu, 7 Mar 2024 01:44:03 +0000 (09:44 +0800)]
LoongArch: testsuite: Add compilation options to the regname-fp-s9.c.
When the value of the macro DEFAULT_CFLAGS is set to '-ansi -pedantic-errors',
regname-s9-fp.c will test to fail. To solve this problem, add the compilation
option '-Wno-pedantic -std=gnu90' to this test case.
Lulu Cheng [Tue, 5 Mar 2024 06:43:04 +0000 (14:43 +0800)]
LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.
If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
implemented through "ll.w+sc.w". In the implementation of the instruction sequence,
it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 32-bit
and 64-bit, the two operand registers that need to be compared are symbolically
extended, and one of the operand registers is obtained from memory through the
"ll.w" instruction, which can ensure that the symbolic expansion is carried out.
However, the value of the other operand register is not guaranteed to be the
value of the sign extension.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
In loongarch64, a sign extension operation is added when
operands[2] is a register operand and the mode is SImode.
gcc/testsuite/ChangeLog:
* g++.target/loongarch/atomic-cas-int.C: New test.
Jonathan Wakely [Fri, 8 Mar 2024 16:15:57 +0000 (16:15 +0000)]
libstdc++: Do not require a time-of-day when parsing sys_days [PR114240]
When parsing a std::chrono::sys_days (or a sys_time with an even longer
period) we should not require a time-of-day to be present in the input,
because we can't represent that in the result type anyway.
Rather than trying to decide which specializations should require a
time-of-date and which should not, follow the direction of Howard
Hinnant's date library, which allows extracting a sys_time of any period
from input that only contains a date, defaulting the time-of-day part to
00:00:00. This seems consistent with the intent of the standard, which
says it's an error "If the parse fails to decode a valid date" (i.e., it
doesn't care about decoding a valid time, only a date).
libstdc++-v3/ChangeLog:
PR libstdc++/114240
* include/bits/chrono_io.h (_Parser::operator()): Assume
hours(0) for a time_point, so that a time is not required
to be present.
* testsuite/std/time/parse/114240.cc: New test.
Jonathan Wakely [Fri, 8 Mar 2024 14:41:47 +0000 (14:41 +0000)]
libstdc++: Fix parsing of leap seconds as chrono::utc_time [PR114279]
Implementing all chrono::from_stream overloads in terms of
chrono::sys_time meant that a leap second time like 23:59:60.001 cannot
be parsed, because that cannot be represented in a sys_time.
The fix to support parsing leap seconds as utc_time is to convert the
parsed date to utc_time<days> and then add the parsed time to that,
which allows the result to land in a leap second, rather than doing all
the arithmetic with sys_time which doesn't have leap seconds.
For local_time we also allow %S to parse a 60s value, because doing
otherwise might disallow some valid uses. We can't know all use cases
users have for treating times as local_time.
For all other clocks, we can reject times that have 60 or 60.nnn as the
seconds part, because that cannot occur in a valid UNIX, GPS, or TAI
time. Since our chrono::file_clock uses sys_time, it can't occur for
that clock either.
In order to support this a new _M_is_leap_second member is needed in the
_Parser type. This can be added at the end, where most targets currently
have padding bytes. Similar to what I did recently for formatter _Spec
structs, we can also reserve additional padding bits for future
expansion.
This also fixes bugs in the from_stream overloads for utc_time,
tai_time, gps_time, and file_time, which were not using time_point_cast
to explicitly convert to the result type. That's needed because the
result type might have lower precision than the value returned from
from_sys or from_utc, which has a precision no lower than seconds.
libstdc++-v3/ChangeLog:
PR libstdc++/114279
* include/bits/chrono_io.h (_Parser::_M_is_leap_second): New
data member.
(_Parser::_M_reserved): Reserve padding bits for future use.
(_Parser::operator()): Set _M_is_leap_second if %S reads 60s.
(from_stream): Only allow _M_is_leap_second for utc_time and
local_time. Adjust arithmetic for utc_time so that leap seconds
are preserved. Use time_point_cast to convert to a possibly
lower-precision result type.
* testsuite/std/time/parse.cc: Move to ...
* testsuite/std/time/parse/parse.cc: ... here.
* testsuite/std/time/parse/114279.cc: New test.
Martin Jambor [Fri, 8 Mar 2024 23:47:22 +0000 (00:47 +0100)]
ipa: Avoid excessive removing of SSAs (PR 113757)
PR 113757 shows that the code which was meant to debug-reset and
remove SSAs defined by LHSs of calls redirected to
__builtin_unreachable can trigger also when speculative
devirtualization creates a call to a noreturn function (and since it
is noreturn, it does not bother dealing with its return value).
What is more, it seems that the code handling this case is not really
necessary. I feel slightly idiotic about this because I have a
feeling that I added it because of a failing test-case but I can
neither find the testcase nor a reason why the code in
cgraph_edge::redirect_call_stmt_to_callee would not be sufficient (it
turns the SSA name into a default-def, a bit like IPA-SRA, but any
code dominated by a call to a noreturn is not dangerous when it comes
to its side-effects). So this patch just removes the handling.
LRA failed to consider all insn alternatives when non-reload pseudo
did not get a hard register. This resulted in failure to generate
code by LRA. The patch fixes this problem.
gcc/ChangeLog:
PR target/113790
* lra-assigns.cc (assign_by_spills): Set up all_spilled_pseudos
for non-reload pseudo too.
David Faust [Thu, 7 Mar 2024 17:29:32 +0000 (09:29 -0800)]
bpf: add size threshold for inlining mem builtins
BPF cannot fall back on library calls to implement memmove, memcpy and
memset, so we attempt to expand these inline always if possible.
However, this inline expansion was being attempted even for excessively
large operations, which could result in gcc consuming huge amounts of
memory and hanging.
Add a size threshold in the BPF backend below which to always expand
these operations inline, and introduce an option
-minline-memops-threshold= to control the threshold. Defaults to
1024 bytes.
gcc/
* config/bpf/bpf.cc (bpf_expand_cpymem, bpf_expand_setmem): Do
not attempt inline expansion if size is above threshold.
* config/bpf/bpf.opt (-minline-memops-threshold): New option.
* doc/invoke.texi (eBPF Options) <-minline-memops-threshold>:
Document.
gcc/testsuite/
* gcc.target/bpf/inline-memops-threshold-1.c: New test.
* gcc.target/bpf/inline-memops-threshold-2.c: New test.
This test was too simple, which meant that the compiler was sometimes
able to find a better optimization of the code than using a BICS
instruction. Fix this by changing the test slightly to produce a
sequence where BICS should always be the preferred solution.
gcc/testsuite:
PR target/113542
* gcc.target/arm/bics_3.c: Adjust code to something which should
always result in BICS.
David Faust [Thu, 7 Mar 2024 17:23:38 +0000 (09:23 -0800)]
bpf: testsuite: fix unresolved test in memset-1.c
The test was trying to do too much by both checking for an error, and
checking the resulting assembly. Of course, due to the error no asm was
produced, so the scan-asm went unresolved. Split it into two separate
tests to fix the issue.
gcc/testsuite/
* gcc.target/bpf/memset-1.c: Move error test case to...
* gcc.target/bpf/memset-2.c: ... here. New test.
Thomas Schwinge [Thu, 7 Mar 2024 14:51:54 +0000 (15:51 +0100)]
GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)
'GCN_SUPPRESS_HOST_FALLBACK' originated as 'HSA_SUPPRESS_HOST_FALLBACK' in the
libgomp HSA plugin, where the idea was -- in my understanding -- that you
wouldn't have device code available for all functions that may be called, and
in that case transparently (shared memory system!) do host-fallback execution.
Or, with 'HSA_SUPPRESS_HOST_FALLBACK' set, you'd get those diagnosed.
This has then been copied into the libgomp GCN plugin as
'GCN_SUPPRESS_HOST_FALLBACK'. However, the original meaning isn't applicable
for the libgomp GCN plugin anymore: we assume that we're generating device code
for all relevant functions, and we're implementing a non-shared memory system,
where we cannot transparently do host-fallback execution for individual
functions.
However, 'GCN_SUPPRESS_HOST_FALLBACK' has gained an additional meaning, to
enforce a fatal error in case that 'libhsa-runtime64.so.1' can't be dynamically
loaded; keep that meaning.
Thomas Schwinge [Thu, 7 Mar 2024 12:18:23 +0000 (13:18 +0100)]
nvptx: 'cuDeviceGetCount' failure is fatal
Per commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp", we're now using 'return -1'
from 'GOMP_OFFLOAD_get_num_devices' for 'omp_requires_mask' purposes. This
missed that via 'nvptx_get_num_devices', we could also 'return -1' for
'cuDeviceGetCount' failure. Before, this meant (in 'gomp_target_init') to
silently ignore the plugin/device -- which also has been doubtful behavior.
Let's instead turn 'cuDeviceGetCount' failure into a fatal error, similar to
other errors during device initialization.
libgomp/
* plugin/plugin-nvptx.c (nvptx_get_num_devices):
'cuDeviceGetCount' failure is fatal.
Thomas Schwinge [Thu, 7 Mar 2024 11:31:52 +0000 (12:31 +0100)]
GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1'
If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the corresponding
libgomp plugin/device gets disabled, as before. But if they are available,
report any inconsistencies such as missing symbols, similar to how we fail in
presence of other issues during device initialization.
Jakub Jelinek [Fri, 8 Mar 2024 14:18:56 +0000 (15:18 +0100)]
testsuite: Fix up pr113617 test for darwin [PR113617]
The test attempts to link a shared library, and apparently Darwin doesn't
allow by default for shared libraries to contain undefined symbols.
The following patch just adds dummy definitions for the symbols, so that
the library no longer has any undefined symbols at least in my linux
testing.
Furthermore, for target { !shared } targets (like darwin until the it is
fixed in target-supports.exp), because we then link a program rather than
shared library, the patch also adds a dummy main definition so that it
can link.
2024-03-08 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/113617
PR target/114233
* g++.dg/other/pr113617.C: Define -DSHARED when linking with -shared.
* g++.dg/other/pr113617-aux.cc: Add definitions for used methods and
templates not defined elsewhere.
Richard Biener [Fri, 8 Mar 2024 12:27:12 +0000 (13:27 +0100)]
tree-optimization/114269 - 434.zeusmp regression after SCEV analysis fix
The following addresses a performance regression caused by the recent
SCEV analysis fix with regard to folding multiplications and undefined
behavior on overflow. We do not handle (T) { a, +, b } * c but can
treat sign-conversions from unsigned by performing the multiplication
in the unsigned type. That's what we already do for additions (but
that misses one case that turns out important).
This fixes the 434.zeusmp regression for me.
PR tree-optimization/114269
PR tree-optimization/114074
* tree-chrec.cc (chrec_fold_plus_1): Handle sign-conversions
in the third CASE_CONVERT case as well.
(chrec_fold_multiply): Handle sign-conversions from unsigned
by performing the operation in the unsigned type.
Gaius Mulley [Fri, 8 Mar 2024 12:52:04 +0000 (12:52 +0000)]
modula2: Rebuild bootstrap tools with faster dynamic arrays
This patch configures the larger dynamic arrays to use a larger
growth factor and larger initial size. It also rebuilds mc and pge
using the improved default array sizes in Indexing.mod.
AVR: Add an insn combine pattern for offset computation.
Computing uint16_t += 2 * uint8_t can occur when an offset
into a 16-bit array is computed. Without this pattern is costs
six instructions: A move (1), a zero-extend (1), a shift (2) and
an addition (2). With this pattern it costs 4.
gcc/
* config/avr/avr.md (*addhi3_zero_extend.ashift1): New pattern.
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS]: Compute its cost.
Jakub Jelinek [Fri, 8 Mar 2024 11:49:43 +0000 (12:49 +0100)]
bb-reorder: Fix assertion
When touching bb-reorder yesterday, I've noticed the checking assert
doesn't actually check what it meant to.
Because asm_noperands returns >= 0 for inline asm patterns (in that case
number of input+output+label operands, so asm goto has at least one)
and -1 if it isn't inline asm.
The following patch fixes the assertion to actually check that it is
asm goto.
2024-03-08 Jakub Jelinek <jakub@redhat.com>
* bb-reorder.cc (fix_up_fall_thru_edges): Fix up checking assert,
asm_noperands < 0 means it is not asm goto too.
Jakub Jelinek [Fri, 8 Mar 2024 08:18:19 +0000 (09:18 +0100)]
i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]
The following patch hides the noreturn no_callee_saved_registers (except bp)
optimization with a not enabled by default option.
The reason is that most noreturn functions should be called just once in a
program (unless they are recursive or invoke longjmp or similar, for exceptions
we already punt), so it isn't that essential to save a few instructions in their
prologue, but more importantly because it interferes with debugging.
And unlike most other optimizations, doesn't actually make it harder to debug
the given function, which can be solved by recompiling the given function if
it is too hard to debug, but makes it harder to debug the callers of that
noreturn function. Those can be from a different translation unit, different
binary or even different package, so if e.g. glibc abort needs to use all
of the callee saved registers (%rbx, %rbp, %r12, %r13, %r14, %r15), debugging
any programs which abort will be harder because any DWARF expressions which
use those registers will be optimized out, not just in the immediate caller,
but in other callers as well until some frame restores a particular register
from some stack slot.
2024-03-08 Jakub Jelinek <jakub@redhat.com>
PR target/38534
* config/i386/i386.opt (mnoreturn-no-callee-saved-registers): New
option.
* config/i386/i386-options.cc (ix86_set_func_type): Don't use
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP unless
ix86_noreturn_no_callee_saved_registers is enabled.
* doc/invoke.texi (-mnoreturn-no-callee-saved-registers): Document.
Jakub Jelinek [Fri, 8 Mar 2024 08:15:39 +0000 (09:15 +0100)]
c-family, c++: Fix up handling of types which may have padding in __atomic_{compare_}exchange
On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote:
> Ah, although __atomic_compare_exchange only takes pointers, the
> compiler replaces that with a call to __atomic_compare_exchange_n
> which takes the newval by value, which presumably uses an 80-bit FP
> register and so the padding bits become indeterminate again.
The problem is that __atomic_{,compare_}exchange lowering if it has
a supported atomic 1/2/4/8/16 size emits code like:
_3 = *p2;
_4 = VIEW_CONVERT_EXPR<I_type> (_3);
so if long double or some small struct etc. has some carefully filled
padding bits, those bits can be lost on the assignment. The library call
for __atomic_{,compare_}exchange would actually work because it woiuld
load the value from memory using integral type or memcpy.
E.g. on
void
foo (long double *a, long double *b, long double *c)
{
__atomic_compare_exchange (a, b, c, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
}
we end up with -O0 with:
fldt (%rax)
fstpt -48(%rbp)
movq -48(%rbp), %rax
movq -40(%rbp), %rdx
i.e. load *c from memory into 387 register, store it back to uninitialized
stack slot (the padding bits are now random in there) and then load a
__uint128_t (pair of GPR regs). The problem is that we first load it using
whatever type the pointer points to and then VIEW_CONVERT_EXPR that value:
p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
The following patch fixes that by creating a MEM_REF instead, with the
I_type type, but with the original pointer type on the second argument for
aliasing purposes, so we actually preserve the padding bits that way.
With this patch instead of the above assembly we emit
movq 8(%rax), %rdx
movq (%rax), %rax
I had to add support for MEM_REF in pt.cc, though with the assumption
that it has been already originally created with non-dependent
types/operands (which is the case here for the __atomic*exchange lowering).
2024-03-08 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c-common.cc (resolve_overloaded_atomic_exchange): Instead of setting
p1 to VIEW_CONVERT_EXPR<I_type> (*p1), set it to MEM_REF with p1 and
(typeof (p1)) 0 operands and I_type type.
(resolve_overloaded_atomic_compare_exchange): Similarly for p2.
gcc/cp/
* pt.cc (tsubst_expr): Handle MEM_REF.
gcc/testsuite/
* g++.dg/ext/atomic-5.C: New test.
Jakub Jelinek [Fri, 8 Mar 2024 08:14:32 +0000 (09:14 +0100)]
dwarf2out: Emit DW_AT_export_symbols on anon unions/structs [PR113918]
DWARF5 added DW_AT_export_symbols both for use on inline namespaces (where
we emit it), but also on anonymous unions/structs (and we didn't emit that
attribute there).
The following patch fixes it.
2024-03-08 Jakub Jelinek <jakub@redhat.com>
PR debug/113918
gcc/
* dwarf2out.cc (gen_field_die): Emit DW_AT_export_symbols
on anonymous unions or structs for -gdwarf-5 or -gno-strict-dwarf.
gcc/c/
* c-tree.h (c_type_dwarf_attribute): Declare.
* c-objc-common.h (LANG_HOOKS_TYPE_DWARF_ATTRIBUTE): Redefine.
* c-objc-common.cc: Include dwarf2.h.
(c_type_dwarf_attribute): New function.
gcc/cp/
* cp-objcp-common.cc (cp_type_dwarf_attribute): Return 1
for DW_AT_export_symbols on anonymous structs or unions.
gcc/testsuite/
* c-c++-common/dwarf2/pr113918.c: New test.
Jakub Jelinek [Fri, 8 Mar 2024 08:11:57 +0000 (09:11 +0100)]
c++: Fix up parameter pack diagnostics on xobj vs. varargs functions [PR113802]
The simple presence of ellipsis as next token after the parameter
declaration doesn't imply it is a parameter pack, it sometimes is, e.g.
if its type is a pack, but sometimes is not and in that case it acts
the same as if the next tokens were , ... instead of just ...
The xobj param cannot be a function parameter pack though treats both
the declarator->parameter_pack_p and token->type == CPP_ELLIPSIS as
sufficient conditions for the error. The conditions for CPP_ELLIPSIS
are done a little bit later in the same function and complex enough that
IMHO shouldn't be repeated, on the other side for the
declarator->parameter_pack_p case we clear that flag for xobj params
for error recovery reasons.
This patch just moves the diagnostics later (after the CPP_ELLIPSIS handling)
and changes the error recovery behavior by pretending the this specifier
didn't appear if an error is reported.
2024-03-08 Jakub Jelinek <jakub@redhat.com>
PR c++/113802
* parser.cc (cp_parser_parameter_declaration): Move the xobj_param_p
pack diagnostics after ellipsis handling and if an error is reported,
pretend this specifier didn't appear. Formatting fix.
* g++.dg/cpp23/explicit-obj-diagnostics3.C (S0, S1, S2, S3, S4): Don't
expect any diagnostics on f and fd member function templates, add
similar templates with ...Selves instead of Selves as k and kd and
expect diagnostics for those. Expect extra diagnostics in error
recovery for g and gd member function templates.