Patrick O'Neill [Fri, 28 Apr 2023 23:40:27 +0000 (16:40 -0700)]
RISC-V: Name newly added flags in changelog
This patch fixes the changelog to explicitly name the added command line
flags introduced in this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html
2023-05-01 Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Yanzhang Wang [Wed, 26 Apr 2023 13:06:02 +0000 (21:06 +0800)]
RISC-V: ICE for vlmul_ext_v intrinsic API
PR target/109617
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Support VNx2HI and VNX4DI when MIN_VLEN >= 128.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/vlmul_ext-1.c: New test.
Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com> Co-authored-by: Pan Li <pan2.li@intel.com> Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
Romain Naour [Tue, 2 May 2023 12:21:55 +0000 (14:21 +0200)]
RISC-V: fix build issue with gcc 4.9.x
GCC should still build with GCC 4.8.3 or newer [1]
using C++03 by default. But a recent change in
RISC-V port introduced a C++11 feature "std::log2" [2].
Use log2 from the C header, without the namespace [3].
Richard Biener [Tue, 2 May 2023 08:34:48 +0000 (10:34 +0200)]
tree-optimization/109672 - properly check emulated plus during vect
The following refactors the check for emulated vector support for
the cases of plus, minus and negate. In the PR we end up with
a SImode plus, supported by the target but emulated and in this
context fail to verify we are dealing with exactly word_mode.
PR tree-optimization/109672
* tree-vect-stmts.cc (vectorizable_operation): For plus,
minus and negate always check the vector mode is word mode.
Richard Biener [Tue, 2 May 2023 09:51:51 +0000 (11:51 +0200)]
[i386] Fix testcases for emulated scatter
The following adjusts testcases where the pr88531 fail with -m32
because we do not consider MMX size vectorization there and the
pr89618 runs into load/store cost differences with -m32.
Jakub Jelinek [Tue, 2 May 2023 08:58:19 +0000 (10:58 +0200)]
ibstdc++: Shut up -Wattribute-alias warning [PR109694]
I've followed what other files do, using attribute alias with not really
matching function type (after all, it isn't really possible when it is a
constructor), but seems I've missed it warns:
../../../../../libstdc++-v3/src/c++98/ios_init.cc:203:8: warning: ‘void std::ios_base_library_init()’ alias between functions of incompatible types ‘void()’ and ‘void
+(std::ios_base::Init::)()’ [-Wattribute-alias=]
203 | void ios_base_library_init (void)
| ^~~~~~~~~~~~~~~~~~~~~
../../../../../libstdc++-v3/src/c++98/ios_init.cc:78:3: note: aliased declaration here
78 | ios_base::Init::Init()
| ^~~~~~~~
The PR talks about clang++ warning there (which I think isn't really
supported, libstdc++ sources ought to be built by GCC), but it warns
when built with GCC too.
The following patch fixes it by doing what other libstdc++ sources do in
those cases.
Marek Polacek [Thu, 9 Mar 2023 23:43:34 +0000 (18:43 -0500)]
ubsan: ubsan_maybe_instrument_array_ref tweak
In <https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613687.html>
we discussed that the copy_node in ubsan_maybe_instrument_array_ref
is redundant, but also that it'd be best to postpone the optimization
to GCC 14. So I'm making that change now.
Jason Merrill [Mon, 1 May 2023 14:57:20 +0000 (10:57 -0400)]
c++: array DMI and member fn [PR109666]
Here it turns out I also needed to adjust cfun when stepping out of the
member function to instantiate the DMI. But instead of adding that tweak,
let's unify with instantiate_body and just push_to_top_level instead of
trying to do the minimum subset of it. There was no measurable change in
compile time on stdc++.h.
This should also resolve 109506 without yet another tweak.
Andrew Pinski [Mon, 1 May 2023 16:21:59 +0000 (09:21 -0700)]
PHIOPT: Update comment about what the pass now does
I noticed I didn't update the comment about how the pass
works after I initially added match_simplify_replacement.
Anyways this updates the comment to be the current state
of the pass.
OK?
gcc/ChangeLog:
* tree-ssa-phiopt.cc: Update comment about
how the transformation are implemented.
Jeff Law [Mon, 1 May 2023 13:40:38 +0000 (07:40 -0600)]
Convert xstormy16 to LRA
This patch converts the xstormy16 patch to LRA. It introduces a code
quality regression in the shiftsi testcase, but it also fixes numerous
aborts/errors. IMHO it's a good tradeoff.
Jeff Law [Mon, 1 May 2023 13:14:50 +0000 (07:14 -0600)]
Enable LRA on several ports
Spurred by Segher's RFC, I went ahead and tested several ports with LRA
enabled. Not surprisingly, many failed, but a few built their full set
of libraries successful and of those a few even ran their testsuites
with no regressions. In fact, enabling LRA fixes a small number of
failures on the iq2000 port.
This patch converts the ports which built their libraries and have test
results that are as good as or better than without LRA. There may
be minor code quality regressions or there may be minor code quality
improvements -- I'm leaving that for the port maintainers to own going
forward.
Rasmus Villemoes [Mon, 13 Feb 2023 15:07:47 +0000 (16:07 +0100)]
apply debug-remap to file names in .su files
The .su files generated with -fstack-usage are arguably debug info. In
order to make builds more reproducible, apply the same remapping logic
to the recorded file names as for when producing the debug info
embedded in the object files.
To this end, teach print_decl_identifier() a new
PRINT_DECL_REMAP_DEBUG flag and use that from output_stack_usage_1().
gcc/ChangeLog:
* print-tree.h (PRINT_DECL_REMAP_DEBUG): New flag.
* print-tree.cc (print_decl_identifier): Implement it.
* toplev.cc (output_stack_usage_1): Use it.
Aldy Hernandez [Thu, 2 Mar 2023 22:37:20 +0000 (23:37 +0100)]
Cleanup irange::set.
Now that anti-ranges are no more and iranges contain wide_ints instead
of trees, various cleanups are possible. This is one of a handful of
patches improving the performance of irange::set() which is not on a
hot path, but quite sensitive because it is so pervasive.
gcc/ChangeLog:
* gimple-range-op.cc (cfn_ffs::fold_range): Use the correct
precision.
* gimple-ssa-warn-alloca.cc (alloca_call_type): Use <2> for
invalid_range, as it is an inverse range.
* tree-vrp.cc (find_case_label_range): Avoid trees.
* value-range.cc (irange::irange_set): Delete.
(irange::irange_set_1bit_anti_range): Delete.
(irange::irange_set_anti_range): Delete.
(irange::set): Cleanup.
* value-range.h (class irange): Remove irange_set,
irange_set_anti_range, irange_set_1bit_anti_range.
(irange::set_undefined): Remove set to m_type.
Aldy Hernandez [Fri, 17 Feb 2023 12:00:47 +0000 (13:00 +0100)]
Rewrite bounds_of_var_in_loop() to use ranges.
Little by little, bounds_of_var_in_loop() has grown into an
unmaintainable mess. This patch rewrites the code to use the relevant
APIs as well as refactor it to make it more readable.
Aldy Hernandez [Thu, 16 Feb 2023 13:25:52 +0000 (14:25 +0100)]
Replace vrp_val* with wide_ints.
This patch removes all uses of vrp_val_{min,max} in favor for a
irange_val_* which are wide_int based. This will leave only one use
of vrp_val_* which returns trees in range_of_ssa_name_with_loop_info()
because it needs to work with non-integers (floats, etc). In a
follow-up patch, this function will also be cleaned up such that
vrp_val_* can be deleted.
The functions min_limit and max_limit in range-op.cc are now useless
as they're basically irange_val*. I didn't rename them yet to avoid
churn. I'll do it in a later patch.
Aldy Hernandez [Thu, 2 Mar 2023 15:34:46 +0000 (16:34 +0100)]
Conversion to irange wide_int API.
This converts the irange API to use wide_ints exclusively, along with
its users.
This patch will slow down VRP, as there will be more useless
wide_int to tree conversions. However, this slowdown is only
temporary, as a follow-up patch will convert the internal
representation of iranges to wide_ints for a net overall gain
in performance.
Aldy Hernandez [Wed, 25 Jan 2023 10:23:33 +0000 (11:23 +0100)]
Various cleanups in vr-values.cc towards ranger API.
gcc/ChangeLog:
* vr-values.cc (check_for_binary_op_overflow): Tidy up by using
ranger API.
(compare_ranges): Delete.
(compare_range_with_value): Delete.
(bounds_of_var_in_loop): Tidy up by using ranger API.
(simplify_using_ranges::fold_cond_with_ops): Cleanup and rename
from vrp_evaluate_conditional_warnv_with_ops_using_ranges.
(simplify_using_ranges::legacy_fold_cond_overflow): Remove
strict_overflow_p and only_ranges.
(simplify_using_ranges::legacy_fold_cond): Adjust call to
legacy_fold_cond_overflow.
(simplify_using_ranges::simplify_abs_using_ranges): Adjust for
rename.
(range_fits_type_p): Rename value_range to irange.
* vr-values.h (range_fits_type_p): Adjust prototype.
Aldy Hernandez [Tue, 24 Jan 2023 12:01:39 +0000 (13:01 +0100)]
vrange_storage overhaul
[tl;dr: This is a rewrite of value-range-storage.* such that global
ranges and the internal ranger cache can use the same efficient
storage mechanism. It is optimized such that when wide_ints are
dropped into irange, the copying back and forth from storage will be
very fast, while being able to hold any number of sub-ranges
dynamically allocated at run-time. This replaces the global storage
mechanism which was limited to 6-subranges.]
Previously we had a vrange allocator for use in the ranger cache. It
worked with trees and could be used in place (fast), but it was not
memory efficient. With the upcoming switch to wide_ints for irange,
we can't afford to allocate ranges that can be used in place, because
an irange will be significantly larger, as it will hold full
wide_ints. We need a trailing_wide_int mechanism similar to what we
use for global ranges, but fast enough to use in the ranger's cache.
The global ranges had another allocation mechanism that was
trailing_wide_int based. It was memory efficient but slow given the
constant conversions from trees to wide_ints.
This patch gets us the best of both worlds by providing a storage
mechanism with a custom trailing wide int interface, while at the same
time being fast enough to use in the ranger cache.
We use a custom trailing wide_int mechanism but more flexible than
trailing_wide_int, since the latter has compile-time fixed-sized
wide_ints. The original TWI structure has the current length of each
wide_int in a static portion preceeding the variable length:
template <int N>
struct GTY((user)) trailing_wide_ints
{
...
...
/* The current length of each number.
that will, in turn, turn off TBAA on gimple, trees and RTL. */
struct {unsigned char len;} m_len[N];
/* The variable-length part of the structure, which always contains
at least one HWI. Element I starts at index I * M_MAX_LEN. */
HOST_WIDE_INT m_val[1];
};
We need both m_len[] and m_val[] to be variable-length at run-time.
In the previous incarnation of the storage mechanism the limitation of
m_len[] being static meant that we were limited to whatever [N] could
use up the unused bits in the TWI control world. In practice this
meant we were limited to 6 sub-ranges. This worked fine for global
ranges, but is a no go for our internal cache, where we must represent
things exactly (ranges for switches, etc).
The new implementation removes this restriction by making both m_len[]
and m_val[] variable length. Also, rolling our own allows future
optimization be using some of the leftover bits in the control world.
Also, in preparation for the wide_int conversion, vrange_storage is
now optimized to blast the bits directly into the ultimate irange
instead of going through the irange API. So ultimately copying back
and forth between the ranger cache and the storage mechanism is just a
matter of copying a few bits for the control word, and copying an
array of HOST_WIDE_INTs. These changes were heavily profiled, and
yielded a good chunk of the overall speedup for the wide_int
conversion.
Finally, vrange_storage is now a first class structure with GTY
markers and all, thus alleviating the void * hack in struct
tree_ssa_name and friends. This removes a few warts in the API and
looks cleaner overall.
Roger Sayle [Sun, 30 Apr 2023 22:47:13 +0000 (23:47 +0100)]
[Committed] Update xstormy16's neghi2 pattern to not clobber the carry flag.
When I converted xstormy's neghi2 pattern from a define_expand to a
define_insn, I forgot that define_expand implicitly produces a
sequence of instructions, but a define_insn is an implicit parallel,
thereby messing up the clobber (reg:BI CARRY_REG), which can then cause
an ICE in the auto-generated added_clobbers_hard_reg_p. Whilst stripping
the superfluous PARALLEL resolves this issue, an even better fix is to
use xstormy16's INC instruction, that (like NOT) doesn't affect the carry
flag, resulting in a neghi2 implementation that can more easily be CSE'd
and scheduled.
Many thanks (again) to Jeff Law for testing/reporting this issue.
2024-04-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/stormy16/stormy16.md (neghi2): Rewrite pattern using
inc to avoid clobbering the carry flag.
Andrew Pinski [Wed, 30 Nov 2022 04:02:07 +0000 (04:02 +0000)]
Improve error message for excess elements in array initializer from {"a"}
So char arrays are not the only type that be initialized from {"a"}.
We can have wchar_t (L"") and char16_t (u"") types too. So let's
print out the type of the array instead of just saying char.
Note in the testsuite I used regex . to match '[' and ']' as
I could not figure out how many '\' I needed.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/c/ChangeLog:
* c-typeck.cc (process_init_element): Print out array type
for excessive elements.
Andrew Pinski [Wed, 30 Nov 2022 02:54:57 +0000 (02:54 +0000)]
Fix C/107926: Wrong error message when initializing char array
The problem here is the code which handles {"a"} is supposed
to handle the case where the is something after the string but
it only handles the case where there is another string so
we go down the other path and error out saying "excess elements
in struct initializer" even though this was a character array.
To fix this, we need to move the ckeck if the initializer is
a string after the check for array and initializer.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Thanks,
Adnrew Pinski
gcc/c/ChangeLog:
PR c/107926
* c-typeck.cc (process_init_element): Move the check
for string cst until after the error message.
Andrew Pinski [Sun, 9 Apr 2023 01:46:37 +0000 (18:46 -0700)]
MATCH: add some of what phiopt's builtin_zero_pattern did
This adds the patterns for
POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
For "a != 0 ? FUNC(a) : CST".
CLRSB, CLRSBL, and CLRSBLL will be moved next.
Note this is not enough to remove
cond_removal_in_builtin_zero_pattern as we need to handle
the case where there is an NOP_CONVERT inside the conditional
to move out of the condition inside match_simplify_replacement.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd: Add patterns for "a != 0 ? FUNC(a) : CST"
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
Andrew Pinski [Sun, 9 Apr 2023 00:20:30 +0000 (00:20 +0000)]
PHIOPT: Allow moving of some builtin calls
While moving working on moving
cond_removal_in_builtin_zero_pattern to match, I noticed
that functions were not allowed to move as we reject all
non-assignments.
This changes to allowing a few calls which are known not
to throw/trap. Right now it is restricted to ones
which cond_removal_in_builtin_zero_pattern handles but
adding more is just adding it to the switch statement.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
Allow some builtin/internal function calls which
are known not to trap/throw.
(phiopt_worker::match_simplify_replacement):
Use name instead of getting the lhs again.
Longjun Luo [Sun, 30 Apr 2023 18:28:06 +0000 (12:28 -0600)]
[PATCH] libcpp: suppress builtin macro redefined warnings for __LINE__
From 0821df518b264e754d698d399f98be1a62945e32 Mon Sep 17 00:00:00 2001
From: Longjun Luo <luolongjuna@gmail.com>
Date: Thu, 12 Jan 2023 23:59:54 +0800
Subject: [PATCH] libcpp: suppress builtin macro redefined warnings for
__LINE__
As implied in
gcc.gnu.org/legacy-ml/gcc-patches/2008-09/msg00076.html,
gcc provides -Wno-builtin-macro-redefined to suppress warning when
redefining builtin macro. However, at that time, there was no
scenario for __LINE__ macro.
But, when we try to build a live-patch, we compare sections by using
-ffunction-sections. Some same functions are considered changed because
of __LINE__ macro.
At present, to detect such a changed caused by __LINE__ macro, we
have to analyse code and maintain a function list. For example,
in kpatch, check this commit
github.com/dynup/kpatch/commit/0e1b95edeafa36edb7bcf11da6d1c00f76d7e03d.
So, in this scenario, when we try to compared sections, it would
be better to support suppress builtin macro redefined warnings for
__LINE__ macro.
libcpp:
* init.cc (builtin_array): Do not always warn for a redefinition
of __LINE__.
gcc/testsuite
* gcc.dg/builtin-redefine.c: Test for redefintion warnings
for __LINE__.
* gcc.dg/builtin-redefine-1.c: New test.
gcc: Use ld -r when checking for HAVE_LD_RO_RW_SECTION_MIXING
Fall back to ld -r if ld -shared fails during configure. The check for
HAVE_LD_RO_RW_SECTION_MIXING can fail on targets where ld does not
support shared objects, even though the answer to the test should be
'read-write'. One such target is riscv64-unknown-elf. Failing this test
results in a libgcc crtbegin.o which has a writable .eh_frame section
leading to the default linker scripts placing the .eh_frame section in a
writable memory segment, or a linker warning when using ld scripts that
place .eh_frame unconditionally in ROM.
gcc/ChangeLog:
* configure: Regenerate.
* configure.ac: Use ld -r in the check for HAVE_LD_RO_RW_SECTION_MIXING
Gaius Mulley [Sun, 30 Apr 2023 01:53:23 +0000 (02:53 +0100)]
Remove duplicate constants created between passes
There is no need to re-create constant literals between passes.
This patch creates a constant pool and reuses a constant literal
providing it is created at the same location. This in turn avoids
generating duplicate overflow error messages when encountering an
out of range constant literal.
gcc/m2/ChangeLog:
* gm2-compiler/SymbolTable.mod (ConstLitPoolEntry): New
pointer to record.
(ConstLitSym): New field RangeError.
(ConstLitPoolTree): New SymbolTree representing name to
index.
(ConstLitArray): New dynamic array containing pointers
to a ConstLitPoolEntry.
(CreateConstLit): New procedure function.
(LookupConstLitPoolEntry): New procedure function.
(AddConstLitPoolEntry): New procedure function.
(MakeConstLit): Re-implemented to check the constant lit
pool before calling CreateConstLit.
* m2.flex: Add ability to decode binary constant literals.
reload: Handle generating reloads that also clobbers flags
* reload1.cc (emit_insn_if_valid_for_reload_1): Rename from
emit_insn_if_valid_for_reload.
(emit_insn_if_valid_for_reload): Call new helper, and if a SET fails
to be recognized, also try emitting a parallel that clobbers
TARGET_FLAGS_REGNUM, as applicable.
Roger Sayle [Sat, 29 Apr 2023 19:18:23 +0000 (20:18 +0100)]
[xstormy16] Efficient HImode rotate left by a single bit.
This patch contains some minor tweak to xstormy16's machine description
most significantly providing a pattern for HImode rotate left by a single
bit that requires only two instructions.
unsigned short foo(unsigned short x)
{
return (x << 1) | (x >> 15);
}
currently with -O2 generates:
foo: mov r7,r2
shr r7,#15
shl r2,#1
or r2,r7
ret
with this patch, GCC now generates:
foo: shl r2,#1 | adc r2,#0
ret
Additionally neghi2 is converted to a define_insn (so that the RTL
optimizers see the negation semantics), and HImode rotations by
8-bits can now be recognized and implemented using swpb.
2023-04-29 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/stormy16/stormy16.md (neghi2): Convert from a define_expand
to a define_insn.
(*rotatehi_1): New define_insn for efficient 2 insn sequence.
(*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb.
gcc/testsuite/ChangeLog
* gcc.target/xstormy16/neghi2.c: New test case.
* gcc.target/xstormy16/rotatehi-1.c: Likewise.
GCC with -O2 currently generates the nine instruction sequence:
foo: mov r7,r2
asr r2,#4
and r2,#15
mov.w r6,#-256
and r6,r7
or r2,r6
shl r7,#4
and r7,#255
or r2,r7
ret
with this patch, we now generate:
foo: swpn r2
ret
To achieve this using combine's four instruction "combinations" requires
a little wizardry. Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload. This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode. Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics. The naming of the new code iterators is taken from i386.md.
2023-04-29 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/stormy16/stormy16.md (any_lshift): New code iterator.
(any_or_plus): Likewise.
(any_rotate): Likewise.
(*<any_lshift>_and_internal): New define_insn_and_split to
recognize a logical shift followed by an AND, and split it
again after reload.
(*swpn): New define_insn matching xstormy16's swpn.
(*swpn_zext): New define_insn recognizing swpn followed by
zero_extendqihi2, i.e. with the high byte set to zero.
(*swpn_sext): Likewise, for swpn followed by cbw.
(*swpn_sext_2): Likewise, for an alternate RTL form.
(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
sequence is split in the correct place to recognize the *swpn_zext
followed by any_or_plus (ior, xor or plus) instruction.
gcc/testsuite/ChangeLog
* gcc.target/xstormy16/swpn-1.c: New QImode test case.
* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
* gcc.target/xstormy16/swpn-4.c: New HImode test case.
add glibc-stdint.h to vax and lm32 linux target (PR target/105525)
PR target/105525 is a build regression for the vax and lm32 linux
targets present in gcc-12/13/head, where the builds fail due to
unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__,
caused by these two targets failing to provide glibc-stdint.h.
Fixed thusly, tested by building crosses, which now succeeds.
Jeff Law [Sat, 29 Apr 2023 16:16:21 +0000 (10:16 -0600)]
Adjust mips test for recent ifcvt costing changes
MIPS ports have been failing a few tests since the change to add cost
checks in another path through the if-converter pass.
As with the other ports, these look like cases where we don't do good
costing in the MIPS port. Someone who cares about MIPS will need to
fix this properly.
In the mean time this patch adjusts the branch cost when running the
two affected tests and skips them at -Os. This is enough to verify
that if conversion can still happen if the costs are adjusted.
gcc/testsuite
* gcc.target/mips/mips-ps-type-2.c: Adjust branch cost to
encourage if-conversion. Skip for -Os.
* gcc.target/mips/movcc-3.c: Similarly.
RISC-V: decouple stack allocation for rv32e w/o save-restore
Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.
output of testcase rv32e_stack.c
before patch:
addi sp,sp,-16
sw ra,12(sp)
call getInt
sw a0,0(sp)
lw a0,0(sp)
call PrintInts
lw a5,0(sp)
mv a0,a5
lw ra,12(sp)
addi sp,sp,16
jr ra
after patch:
addi sp,sp,-8
sw ra,4(sp)
call getInt
sw a0,0(sp)
lw a0,0(sp)
call PrintInts
lw a5,0(sp)
mv a0,a5
lw ra,4(sp)
addi sp,sp,8
jr ra
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function
for riscv_use_save_libcall.
(riscv_use_save_libcall): call riscv_avoid_save_libcall.
(riscv_compute_frame_info): restructure to decouple stack allocation
for rv32e w/o save-restore.
testsuite: Handle empty assembly lines in check-function-bodies
I tried to make use of check-function-bodies for cris-elf and was a
bit surprised to see it failing. There's a deliberate empty line
after the filled delay slot of the return-function which was
mishandled. I thought "aha" and tried to add an empty line
(containing just a "**" prefix) to the match, but that didn't help.
While it was added as input from the function's assembly output
to-be-matched like any other line, it couldn't be matched: I had to
use "...", which works but is...distracting.
Some digging shows that an empty assembly line can't be deliberately
matched because all matcher lines (lines starting with the prefix,
the ubiquitous "**") are canonicalized by trimming leading
whitespace (the "string trim" in check-function-bodies) and instead
adding a leading TAB character, thus empty lines end up containing
just a TAB. For usability it's better to treat empty lines as fluff
than to uglifying the test-case and the code to properly match them.
Double-checking, no test-case tries to match an line containing just
TAB (by providing an a line containing just "**\s*", i.e. zero or
more whitespace characters).
* lib/scanasm.exp (parse_function_bodies): Set fluff to include
empty lines (besides optionally leading whitespace).
RISC-V: Added support clmul[r,h] instructions for Zbc extension.
clmul[h] instructions were added only for the ZBKC extension.
This patch includes them in the ZBC extension too.
Besides, added support of 'clmulr' instructions for ZBC extension.
gcc/ChangeLog:
* config/riscv/bitmanip.md: Added clmulr instruction.
* config/riscv/riscv-builtins.cc (AVAIL): Add new.
* config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type.
(type): Add clmul
* config/riscv/riscv-cmo.def: Added built-in function for clmulr.
* config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md.
* config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in
functions to riscv-cmo.def.
* config/riscv/generic.md: Add clmul to list of instructions
using the generic_imul reservation.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zbc32.c: New test.
* gcc.target/riscv/zbc64.c: New test.
Andrew Pinski [Wed, 26 Apr 2023 02:46:40 +0000 (19:46 -0700)]
PHIOPT: Move two_value_replacement to match.pd
This patch converts two_value_replacement function
into a match.pd pattern.
It is a direct translation with only one minor change,
does not check for the {0,+-1} case as that is handled
before in match.pd so there is no reason to do the extra
check for it.
OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.
gcc/ChangeLog:
PR tree-optimization/100958
* tree-ssa-phiopt.cc (two_value_replacement): Remove.
(pass_phiopt::execute): Don't call two_value_replacement.
* match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to
handle what two_value_replacement did.
Andrew Pinski [Sat, 1 Apr 2023 05:13:31 +0000 (05:13 +0000)]
MATCH: Add patterns from phiopt's minmax_replacement
This adds a few patterns from phiopt's minmax_replacement
for (A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C> .
It is progress to remove minmax_replacement from phiopt.
There are still some more cases dealing with constants on the
edges (0/INT_MAX) to handle in match.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd: Add patterns for
"(A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C>".
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly.
* gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert
as that now does the combining.
Andrew Pinski [Sun, 2 Apr 2023 17:42:58 +0000 (17:42 +0000)]
MATCH: Factor out code that for min max detection with constants
This factors out some of the code from the min/max detection
from match.pd into a function so it can be reused in other
places. This is mainly used to detect the conversions
of >= to > which causes the integer values to be changed by
one.
Changes since v1:
* factor out the checks for INTEGER_CSTs so it is more obvious.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd: Factor out the deciding the min/max from
the "(cond (cmp (convert1? x) c1) (convert2? x) c2)"
pattern to ...
* fold-const.cc (minmax_from_comparison): this new function.
* fold-const.h (minmax_from_comparison): New prototype.
Roger Sayle [Fri, 28 Apr 2023 13:21:53 +0000 (14:21 +0100)]
PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.
This patch fixes PR rtl-optimization/109476, which is a code quality
regression affecting AVR. The cause is that the lower-subreg pass is
sometimes overly aggressive, lowering the LSHIFTRT below:
but this idiom, SETs of SUBREGs, interferes with combine's ability
to associate/fuse instructions. The solution, on targets that
have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
is to split/lower LSHIFTRT to a ZERO_EXTEND.
To answer Richard's question in comment #10 of the bugzilla PR,
the function resolve_shift_zext is called with one of four RTX
codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
LSHIFTRT can the setting of low_part and high_part SUBREGs be
replaced by a ZERO_EXTEND. For ASHIFTRT, we require a sign
extension, so don't set the high_part to zero; if we're splitting
a ZERO_EXTEND then it doesn't make sense to replace it with a
ZERO_EXTEND, and for ASHIFT we've played games to swap the
high_part and low_part SUBREGs, so that we assign the low_part
to zero (for double word shifts by greater than word size bits).
2023-04-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/109476
* lower-subreg.cc: Include explow.h for force_reg.
(find_decomposable_shift_zext): Pass an additional SPEED_P argument.
If decomposing a suitable LSHIFTRT and we're not splitting
ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
instead of setting a high part SUBREG to zero, which helps combine.
(decompose_multiword_subregs): Update call to resolve_shift_zext.
gcc/testsuite/ChangeLog
PR rtl-optimization/109476
* gcc.target/avr/mmcu/pr109476.c: New test case.
Roger Sayle [Fri, 28 Apr 2023 13:20:24 +0000 (14:20 +0100)]
Synchronize include/ctf.h with upstream binutils/libctf.
This patch updates include/ctf.h to match the current libctf version in
binutils' include/. I recently attempted to build a uber tree (following
some notes that are so old they used CVS) and noticed that binutils won't
build with gcc's top-level include, due to CTF_F_IDXSORTED not being
defined in ctf.h.
2023-04-28 Roger Sayle <roger@nextmovesoftware.com>
include/ChangeLog
* ctf.h: Import latest version from binutils/libctf.
Richard Biener [Wed, 18 Jan 2023 09:59:52 +0000 (10:59 +0100)]
Add emulated scatter capability to the vectorizer
This adds a scatter vectorization capability to the vectorizer
without target support by decomposing the offset and data vectors
and then performing scalar stores in the order of vector lanes.
This is aimed at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the scatter.
The offset load is still vectorized and costed as such, but like
with emulated gather those will be turned back to scalar loads
by forwrpop.
* tree-vect-data-refs.cc (vect_analyze_data_refs): Always
consider scatters.
* tree-vect-stmts.cc (vect_model_store_cost): Pass in the
gather-scatter info and cost emulated scatters accordingly.
(get_load_store_type): Support emulated scatters.
(vectorizable_store): Likewise. Emulate them by extracting
scalar offsets and data, doing scalar stores.
Richard Biener [Wed, 18 Jan 2023 10:04:49 +0000 (11:04 +0100)]
Adjust costing of emulated vectorized gather/scatter
Emulated gather/scatter behave similar to strided elementwise
accesses in that they need to decompose the offset vector
and construct or decompose the data vector so handle them
the same way, pessimizing the cases with may elements.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Tame down element extracts and scalar loads for gather/scatter
similar to elementwise strided accesses.
* gcc.target/i386/pr89618-2.c: New testcase.
* gcc.target/i386/pr88531-2b.c: Adjust.
* gcc.target/i386/pr88531-2c.c: Likewise.
Jonathan Wakely [Fri, 28 Apr 2023 11:01:58 +0000 (12:01 +0100)]
libstdc++: Improve doxygen docs for <random>
Add @headerfile and @since tags. Add gamma_distribution to the correct
group (poisson distributions). Add a group for the sampling
distributions and add the missing definitions of their probability
functions. Add uniform_int_distribution back to the uniform
distributions group.
libstdc++-v3/ChangeLog:
* include/bits/random.h (gamma_distribution): Add to the right
doxygen group.
(discrete_distribution, piecewise_constant_distribution)
(piecewise_linear_distribution): Create a new doxygen group and
fix the incomplete doxygen comments.
* include/bits/uniform_int_dist.h (uniform_int_distribution):
Add to doxygen group.
Richard Biener [Fri, 28 Apr 2023 06:40:07 +0000 (08:40 +0200)]
ipa/109652 - ICE in modification phase of IPA SRA
There's another questionable IL transform by IPA SRA, replacing
foo (p_1(D)->x) with foo (VIEW_CONVERT <union type> (ISRA.PARM.1))
where ISRA.PARM.1 is a register. Conversion of a register to
an aggregate type is questionable but not entirely unreasonable
and not within the set of IL I am rejecting when fixing PR109644.
The following lets this slip through in IPA SRA transform by
restricting re-gimplification to the case of register type
results. To not break the previous testcase again we need to
optimize the BIT_FIELD_REF <VIEW_CONVERT <...>, ...> case
to elide the conversion.
PR ipa/109652
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::modify_expression): Allow
conversion of a register to a non-register type. Elide
conversions inside BIT_FIELD_REFs.
Julian Brown [Wed, 26 Apr 2023 14:31:53 +0000 (14:31 +0000)]
OpenACC: Stand-alone attach/detach clause fixes for Fortran [PR109622]
This patch fixes several cases where multiple attach or detach mapping
nodes were being created for stand-alone attach or detach clauses
in Fortran. After the introduction of stricter checking later during
compilation, these extra nodes could cause ICEs, as seen in the PR.
The patch also fixes cases that "happened to work" previously where
the user attaches/detaches a pointer to array using a descriptor, and
(I think!) the "_data" field has offset zero, hence the same address as
the descriptor as a whole.
libgomp/
* testsuite/libgomp.fortran/pr109622.f90: New test.
* testsuite/libgomp.fortran/pr109622-2.f90: New test.
* testsuite/libgomp.fortran/pr109622-3.f90: New test.
Richard Biener [Mon, 24 Apr 2023 11:20:25 +0000 (13:20 +0200)]
tree-optimization/109644 - missing IL checking
We fail to verify the constraints under which we allow handled
components to wrap registers. The gcc.dg/pr70022.c testcase shows
that we happily end up with
_2 = VIEW_CONVERT_EXPR<int[4]>(v_1(D))
as produced by SSA rewrite and update_address_taken. But the intent
was that we wrap registers with at most a single level of handled
components and specifically only allow __real, __imag, BIT_FIELD_REF
and VIEW_CONVERT_EXPR on them, but not ARRAY_REF or COMPONENT_REF.
The following makes IL verification stricter which catches the
problem.
PR tree-optimization/109644
* tree-cfg.cc (verify_types_in_gimple_reference): Check
register constraints on the outermost VIEW_CONVERT_EXPR
only. Do not allow register or invariant bases on
multi-level or possibly variable index handled components.
Richard Biener [Fri, 28 Apr 2023 07:25:16 +0000 (09:25 +0200)]
Avoid more invalid GIMPLE with register bases
The Ada frontend, for example with gnat.dg/inline2_pkg.adb, tends
to create VIEW_CONVERT expressions with aggregate type even of
non-aggregate entities. In this case for example
which is two operations on a register. While as seen with PR109652
we might not want to completely rule out register to aggregate type
VIEW_CONVERTs we definitely do not want to stack multiple ops here.
The solution is to make sure the gimplifier puts a non-register as
the base object. For the above this will add
number.1 = number;
and use number.1 in the compound reference. Code generation is
unchanged, FRE optimizes this to BIT_FIELD_REF <number_2(D), ...>.
I think BIT_FIELD_REF <VIEW_CONVERT (x), ...> could be always
rewritten into BIT_FIELD_REF <x, ...>, but that's a separate thing.
* gimplify.cc (gimplify_compound_lval): When there's a
non-register type produced by one of the handled component
operations make sure we get a non-register base.
Richard Biener [Fri, 10 Feb 2023 12:09:10 +0000 (13:09 +0100)]
tree-optimization/108752 - vectorize emulated vectors in lowered form
The following makes sure to emit operations lowered to bit operations
when vectorizing using emulated vectors. This avoids relying on
the vector lowering pass adhering to the exact same cost considerations
as the vectorizer.
PR tree-optimization/108752
* tree-vect-generic.cc (build_replicated_const): Rename
to build_replicated_int_cst and move to tree.{h,cc}.
(do_plus_minus): Adjust.
(do_negate): Likewise.
* tree-vect-stmts.cc (vectorizable_operation): Emit emulated
arithmetic vector operations in lowered form.
* tree.h (build_replicated_int_cst): Declare.
* tree.cc (build_replicated_int_cst): Moved from
tree-vect-generic.cc build_replicated_const.
Jakub Jelinek [Fri, 28 Apr 2023 08:49:40 +0000 (10:49 +0200)]
libstdc++: Another attempt to ensure g++ 13+ compiled programs enforce gcc 13.2+ libstdc++.so.6 [PR108969]
GCC used to emit an instance of an empty ios_base::Init class in
every TU which included <iostream> to ensure it is std::cout etc.
is initialized, but thanks to Patrick work on some targets (which have
init_priority attribute support) it is now initialized only inside of
libstdc++.so.6/libstdc++.a.
This causes a problem if people do something that has never been supported,
try to run GCC 13 compiled C++ code against GCC 12 or earlier
libstdc++.so.6 - std::cout etc. are then never initialized because code
including <iostream> expects the library to initialize it and the library
expects code including <iostream> to do that.
The following patch is second attempt to make this work cheaply as the
earlier attempt of aliasing the std::cout etc. symbols with another symbol
version didn't work out due to copy relocation breaking the aliases appart.
The patch forces just a _ZSt21ios_base_library_initv undefined symbol
into all *.o files which include <iostream> and while there is no runtime
relocation against that, it seems to enforce the right version of
libstdc++.so.6. /home/jakub/src/gcc/obj08i/usr/local/ is the install
directory of trunk patched with this patch, /home/jakub/src/gcc/obj06/
is builddir of trunk without this patch, system g++ is GCC 12.1.1.
$ cat /tmp/hw.C
#include <iostream>
int
main ()
{
std::cout << "Hello, world!" << std::endl;
}
$ cd /home/jakub/src/gcc/obj08i/usr/local/bin
$ ./g++ -o /tmp/hw /tmp/hw.C
$ readelf -Wa /tmp/hw 2>/dev/null | grep initv
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _ZSt21ios_base_library_initv@GLIBCXX_3.4.32 (4)
71: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _ZSt21ios_base_library_initv@GLIBCXX_3.4.32
$ /tmp/hw
/tmp/hw: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /tmp/hw)
$ LD_LIBRARY_PATH=/home/jakub/src/gcc/obj08i/usr/local/lib64/ /tmp/hw
Hello, world!
$ LD_LIBRARY_PATH=/home/jakub/src/gcc/obj06/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/ /tmp/hw
/tmp/hw: /home/jakub/src/gcc/obj06/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /tmp/hw)
$ g++ -o /tmp/hw /tmp/hw.C
$ /tmp/hw
Hello, world!
$ LD_LIBRARY_PATH=/home/jakub/src/gcc/obj06/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/ /tmp/hw
Hello, world!
$ LD_LIBRARY_PATH=/home/jakub/src/gcc/obj08i/usr/local/lib64/ /tmp/hw
Hello, world!
On sparc-sun-solaris2.11 one I've actually checked a version which had
defined(_GLIBCXX_SYMVER_SUN) next to defined(_GLIBCXX_SYMVER_GNU), but
init_priority attribute doesn't seem to be supported there and so I couldn't
actually test how this works there. Using gas and Sun ld, Rainer, does one
need to use gas + gld for init_priority or something else?
2023-04-28 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/108969
* config/abi/pre/gnu.ver (GLIBCXX_3.4.32): Export
_ZSt21ios_base_library_initv.
* testsuite/util/testsuite_abi.cc (check_version): Add GLIBCXX_3.4.32
symver and make it the latestp.
* src/c++98/ios_init.cc (ios_base_library_init): New alias.
* acinclude.m4 (libtool_VERSION): Change to 6:32:0.
* include/std/iostream: If init_priority attribute is supported
and _GLIBCXX_SYMVER_GNU, force undefined _ZSt21ios_base_library_initv
symbol into the object.
* configure: Regenerated.
aarch64: PR target/99195 annotate more integer unary patterns for vec-concat with zero
More of the straightforward cases to annotate plus tests, this time for simple integer unary ops.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Jakub Jelinek [Fri, 28 Apr 2023 07:10:37 +0000 (09:10 +0200)]
gimple-range-op: Handle sqrt (basic bounds only)
The following patch adds sqrt support (but similarly to sincos, only
dumb basic ranges only).
Will improve this incrementally and sin/cos as well.
2023-04-28 Jakub Jelinek <jakub@redhat.com>
* gimple-range-op.cc (class cfn_sqrt): New type.
(op_cfn_sqrt): New variable.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_CFN_SQRT{,_FN}.
* gcc.dg/tree-ssa/range-sqrt.c: New test.
* gfortran.dg/ieee/ieee_6.f90: Make x volatile to avoid
ranger optimizing sqrt (-1) call away because it is only used in
test for whether it returns NaN.
Jakub Jelinek [Fri, 28 Apr 2023 07:01:35 +0000 (09:01 +0200)]
Implement range-op entry for sin/cos
On Tue, Apr 18, 2023 at 03:12:50PM +0200, Aldy Hernandez wrote:
> [I don't know why I keep poking at floats. I must really like the pain.
>
> This is the range-op entry for sin/cos. It is meant to serve as an
> example of what we can do for glibc math functions. It is by no means
> exhaustive, just a stub to restrict the return range from sin/cos to
> [-1.0, 1.0] with appropriate smarts of NANs.
>
> As can be seen in the testcase, we see sin() as well as
> __builtin_sin() in the IL, and can resolve the resulting range
> accordingly.
Here is an updated version of the patch on top of the
Add targetm.libm_function_max_error
patch with all my comments incorporated into your patch (but still no
handling of sin/cos ranges shorter than 2*M_PI).
2023-04-28 Aldy Hernandez <aldyh@redhat.com>
Jakub Jelinek <jakub@redhat.com>
Jakub Jelinek [Fri, 28 Apr 2023 06:59:29 +0000 (08:59 +0200)]
Add targetm.libm_function_max_error
As has been discussed before, the following patch adds target hook
for math library function maximum errors measured in ulps.
The default is to return ~0U which is a magic maximum value which means
nothing is known about precision of the match function.
The first argument is unsigned int because enum combined_fn isn't available
everywhere where target hooks are included but is expected to be given
the enum combined_fn value, although it should be used solely to find out
which kind of match function (say sin vs. cos vs. sqrt vs. exp10) rather
than its variant (f suffix, no suffix, l suffix, f128 suffix, ...), for
which there is the machine_mode argument.
The last argument is a bool, if it is false, the function should return
maximum known error in ulps for a given function (taking -frounding-math
into account if enabled), with 0.5ulps being represented as 0.
If it is true, it is about whether the function can return values outside of
an intrinsic finite range for the function and by how many ulps.
E.g. sin/cos should return result in [-1.,1], if the function is expected
to never return values outside of that finite interval, the hook should
return 0. Similarly for sqrt such range is [-0.,+Inf].
The patch implements it for glibc only so far, I hope other maintainers
can submit details for Solaris, musl, perhaps BSDs, etc.
For glibc I've gathered data from:
1) https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
as latest published glibc data
2) https://www.gnu.org/software/libc/manual/2.22/html_node/Errors-in-Math-Functions.html
as a few years old glibc data
3) using attached libc-ulps.sh script from glibc git
4) using attached ulp-tester.c (how to invoke in file comment; tested
both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
modes, plus on x86_64 float/double sin/cos using libmvec - see
attached libmvec-wrapper.c as well)
5) using attached boundary-tester.c to test for whether sin/cos/sqrt return
values outside of the intrinsic ranges for those functions (again,
tested on x86_64, ppc64, ppc64le plus on x86_64 using libmvec as well;
libmvec with non-default rounding modes is pretty much random number
generator it seems)
The data is added to various hooks, the generic and generic glibc versions
being in targhooks.c so that the various targets can easily override it.
The intent is that the generic glibc version handles most of the stuff
and specific target arch overrides handle the outliers or special cases.
The patch has special case for x86_64 when __FAST_MATH__ is defined (as
one can use in that case either libm or libmvec and we don't know which
one will be used; so it uses maximum of what libm provides and libmvec),
rs6000 (had to add one because cosf has 3ulps on ppc* rather than 1-2ulps
on most other targets; MODE_COMPOSITE_P could be in theory handled in the
generic code too, but as we have rs6000-linux specific function, it can be
done just there), arc-linux (because DFmode sin has 7ulps there compared to
1ulps on other targets, both in default rounding mode and in others) and
or1k-linux (while DFmode sin has 1ulps there for default rounding mode,
for other rounding modes it has up to 7ulps).
Now, for -frounding-math I'm trying to add a few ulps more because I expect
it to be much less tested, except that for boundary_p I try to use
the numbers I got from the 5) tester.
2023-04-28 Jakub Jelinek <jakub@redhat.com>
* target.def (libm_function_max_error): New target hook.
* doc/tm.texi.in (TARGET_LIBM_FUNCTION_MAX_ERROR): Add.
* doc/tm.texi: Regenerated.
* targhooks.h (default_libm_function_max_error,
glibc_linux_libm_function_max_error): Declare.
* targhooks.cc: Include case-cfn-macros.h.
(default_libm_function_max_error,
glibc_linux_libm_function_max_error): New functions.
* config/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/linux-protos.h (linux_libm_function_max_error): Declare.
* config/linux.cc: Include target.h and targhooks.h.
(linux_libm_function_max_error): New function.
* config/arc/arc.cc: Include targhooks.h and case-cfn-macros.h.
(arc_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/i386/i386.cc (ix86_libc_has_fast_function): Formatting fix.
(ix86_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/rs6000-protos.h
(rs6000_linux_libm_function_max_error): Declare.
* config/rs6000/rs6000-linux.cc: Include target.h, targhooks.h, tree.h
and case-cfn-macros.h.
(rs6000_linux_libm_function_max_error): New function.
* config/rs6000/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/linux64.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/or1k/or1k.cc: Include targhooks.h and case-cfn-macros.h.
(or1k_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
Jan Beulich [Fri, 28 Apr 2023 06:53:00 +0000 (08:53 +0200)]
testsuite/C++: suppress filename canonicalization in module tests
The pathname underneath gcm.cache/ is determined from the effective name
used for the main input file of a particular module. When modules are
built, no canonicalization occurs for the main input file. Hence the
module file wouldn't be found if a different (the canonicalized) file
name was used when importing that same module. (This is an effect of
importing happening in the preprocessor, just like #include handling.)
Since it doesn't look easy to make module generation use libcpp's
maybe_shorter_path() (in fact I'd consider this a layering violation,
while cloning the logic would - at least in principle - be prone to both
going out of sync), simply suppress system header path canonicalization
for the respective tests.
harden-conditionals: detach values before compares
The optimization barriers inserted after compares enable GCC to derive
information about the values from e.g. the taken paths, or the absence
of exceptions. Move them before the original compares, so that the
reversed compares test copies of the original operands, without
further optimizations.
for gcc/ChangeLog
* gimple-harden-conditionals.cc (insert_edge_check_and_trap):
Move detach value calls...
(pass_harden_conditional_branches::execute): ... here.
(pass_harden_compares::execute): Detach values before
compares.
Andrew Stubbs [Wed, 26 Apr 2023 14:23:48 +0000 (15:23 +0100)]
amdgcn: Fix addsub bug
The vec_fmsubadd instuction actually had add twice, by mistake.
Also improve code-gen for all the complex patterns by using properly
undefined values. Mostly this just prevents the compiler reserving space
in the stack frame.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (cmul<conj_op><mode>3): Use gcn_gen_undef.
(cml<addsub_as><mode>4): Likewise.
(vec_addsub<mode>3): Likewise.
(cadd<rot><mode>3): Likewise.
(vec_fmaddsub<mode>4): Likewise.
(vec_fmsubadd<mode>4): Likewise, and use sub for the odd lanes.
Jason Merrill [Thu, 23 Mar 2023 14:43:58 +0000 (10:43 -0400)]
c++: print conversion error at candidate location
In testcases like this one, the printing of candidates in a diagnostic has
been longer than necessary because it jumps back and forth between the call
site and the candidate site. So here, we first say at the call site that no
match was found; then we note the candidate site, and then explain why it's
not suitable back at the call site, which means printing the call site line
with caret again. With this patch, the conversion diagnostic is at the same
location as the candidate, so we don't need to print any input line.
gcc/cp/ChangeLog:
* call.cc (print_conversion_rejection): Use iloc_sentinel.
Pan Li [Thu, 27 Apr 2023 03:31:42 +0000 (11:31 +0800)]
RISC-V: Add required tls to read thread pointer test
The read-thread-pointer test may require the gcc configured
with --enable-tls. If no, there x4 (aka tp) register will not
be presented in the assembly code.
This patch requires the tls for the dg checking. It will perform
the test checking if --enable-tls and mark the test as unsupported
if --disable-tls.
Configured with --enable-tls:
=== gcc Summary ===
of expected passes 16
Configured with --disable-tls:
=== gcc Summary ===
of unsupported tests 8
Andrew Pinski [Sat, 1 Apr 2023 04:59:11 +0000 (04:59 +0000)]
PHIOPT: Allow MIN/MAX to have up to 2 MIN/MAX expressions for early phiopt
In the early PHIOPT mode, the original minmax_replacement, would
replace a PHI node with up to 2 min/max expressions in some cases,
this allows for that too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (phiopt_early_allow): Allow for
up to 2 min/max expressions in the sequence/match code.
Andrew Pinski [Sat, 1 Apr 2023 22:50:27 +0000 (22:50 +0000)]
MIN/MAX should be treated similar as comparisons for trapping
While looking into moving optimizations from minmax_replacement
in phiopt to match.pd, I Noticed that min/max were considered
trapping even if -ffinite-math-only was being used. This changes
those expressions to be similar as comparisons so that they are
not considered trapping if -ffinite-math-only is on.
OK? Bootstrapped and tested with no regressions on x86_64-linux-gnu.
gcc/ChangeLog:
* rtlanal.cc (may_trap_p_1): Treat SMIN/SMAX similar as
COMPARISON.
* tree-eh.cc (operation_could_trap_helper_p): Treate
MIN_EXPR/MAX_EXPR similar as other comparisons.
Andrew Pinski [Mon, 24 Apr 2023 16:40:35 +0000 (09:40 -0700)]
PHIOPT: Rename tree_ssa_phiopt_worker to pass_phiopt::execute
Now that store elimination and phiopt does not
share outer code, we can move tree_ssa_phiopt_worker
directly into pass_phiopt::execute and remove
many declarations (prototypes) from the file.
Andrew Pinski [Mon, 24 Apr 2023 16:28:25 +0000 (09:28 -0700)]
PHIOPT: Split out store elimination from phiopt
Since the last cleanups, it made easier to see
that we should split out the store elimination
worker from tree_ssa_phiopt_worker function.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove
do_store_elim argument and split that part out to ...
(store_elim_worker): This new function.
(pass_cselim::execute): Call store_elim_worker.
(pass_phiopt::execute): Update call to tree_ssa_phiopt_worker.
Jan Hubicka [Thu, 27 Apr 2023 13:49:37 +0000 (15:49 +0200)]
Unloop loops that no longer loops in tree-ssa-loop-ch
I noticed this after adding sanity check that the upper bound on number
of iterations never drop to -1. It seems to be relatively common case
(happening few hundred times in testsuite and also during bootstrap)
that loop-ch duplicates enough so the loop itself no longer loops.
This is later detected in loop unrolling but since we test the number
of iterations anyway it seems better to do that earlier.
* cfgloopmanip.h (unloop_loops): Export.
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Unloop loops
that no longer loop.
* tree-ssa-loop-ivcanon.cc (unloop_loops): Export; do not free
vectors of loops to unloop.
(canonicalize_induction_variables): Free vectors here.
(tree_unroll_loops_completely): Free vectors here.
Richard Biener [Fri, 17 Mar 2023 12:14:49 +0000 (13:14 +0100)]
tree-optimization/109170 - bogus use-after-free with __builtin_expect
The following generalizes the range-op for __builtin_expect
by using the fnspec machinery.
PR tree-optimization/109170
* gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call):
Handle __builtin_expect and similar via cfn_pass_through_arg1
and inspecting the calls fnspec.
* builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
and BUILT_IN_EXPECT_WITH_PROBABILITY.
There are still shells on some systems that lack the ability to start
scripts when not using the shell name explicitly. Adjust genmultilib
to use ${CONFIG_SHELL-/bin/sh} the same way configure does.
for gcc/ChangeLog
* genmultilib: Use CONFIG_SHELL to run sub-scripts.
Normalize addresses in IPA before calling range_op_handler [PR109639]
The old legacy code would allow building ranges containing symbolics,
even though the entire ranger ecosystem does not handle them. These
were normalized into non-zero ranges by helper functions in VRP
(range_fold_*_expr) before calling the ranger. The only users of
these functions should have been legacy VRP, which is no more.
However, a handful of users crept into IPA, even though these
functions shouldn't never been called outside of VRP or vr-values.
The issue here is that IPA is building a range of [&foo, &foo] and
expecting range_fold_binary to normalize it to non-zero. Fixed by
adding a helper function before calling the range_op handler.
I think these covers the problematic ranges. If not, I'll come up
with something more generalized that does not involve polluting
irange::set with the normalization code. After all, this only
involves a handful of IPA places.
I've also added an assert in irange::set() making it easier to detect
any possible fallout without having to drill deep into the setter.
gcc/ChangeLog:
PR tree-optimization/109639
* ipa-cp.cc (ipa_value_range_from_jfunc): Normalize range.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
* ipa-prop.h (ipa_range_set_and_normalize): New.
* value-range.cc (irange::set): Assert min and max are INTEGER_CST.