H.J. Lu [Mon, 5 Feb 2024 11:46:50 +0000 (03:46 -0800)]
x86-64: Update gcc.target/i386/apx-ndd.c
Fix the following issues:
1. Replace long with int64_t to support x32.
2. Replace \\(%rdi\\) with \\(%(?:r|e)di\\) for memory operand since x32
uses (%edi).
3. Replace %(?:|r|e)al with %al in negb scan.
Xi Ruoyao [Fri, 2 Feb 2024 19:35:07 +0000 (03:35 +0800)]
MIPS: Fix wrong MSA FP vector negation
We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with MSA enabled.
Use the bnegi.df instructions to simply reverse the sign bit instead.
gcc/ChangeLog:
* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg<mode>2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg<mode>2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.
Jakub Jelinek [Mon, 5 Feb 2024 09:57:39 +0000 (10:57 +0100)]
lower-bitint: Remove single label _BitInt switches [PR113737]
The following testcase ICEs, because group_case_labels_stmt optimizes
switch (a.0_7) <default: <L6> [50.00%], case 0: <L7> [50.00%], case 2: <L7> [50.00%]>
where L7 block starts with __builtin_unreachable (); to
switch (a.0_7) <default: <L6> [50.00%]>
and single label GIMPLE_SWITCH is something the switch expansion refuses to
lower:
if (gimple_switch_num_labels (m_switch) == 1
|| range_check_type (index_type) == NULL_TREE)
return false;
(range_check_type never returns NULL for BITINT_TYPE), but the gimple
lowering pass relies on all large/huge _BitInt switches to be lowered
by that pass.
The following patch just removes those after making the single successor
edge EDGE_FALLTHRU. I've done it even if !optimize just in case in case
we'd end up with single case label from earlier passes.
2024-02-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113737
* gimple-lower-bitint.cc (gimple_lower_bitint): If GIMPLE_SWITCH
has just a single label, remove it and make single successor edge
EDGE_FALLTHRU.
Jakub Jelinek [Mon, 5 Feb 2024 08:33:26 +0000 (09:33 +0100)]
i386: Clear REG_UNUSED and REG_DEAD notes from the IL at the end of vzeroupper pass [PR113059]
The move of the vzeroupper pass from after reload pass to after
postreload_cse helped only partially, CSE-like passes can still invalidate
those notes (especially REG_UNUSED) if they use some earlier register
holding some value later on in the IL.
So, either we could try to move it one pass further after gcse2 and hope
no later pass invalidates the notes, or the following patch attempts to
restore the REG_DEAD/REG_UNUSED state from GCC 13 and earlier, where
the LRA or reload passes remove all REG_DEAD/REG_UNUSED notes and the notes
reappear only at the start of dse2 pass when it calls
df_note_add_problem ();
df_analyze ();
So, effectively
NEXT_PASS (pass_postreload_cse);
NEXT_PASS (pass_gcse2);
NEXT_PASS (pass_split_after_reload);
NEXT_PASS (pass_ree);
NEXT_PASS (pass_compare_elim_after_reload);
NEXT_PASS (pass_thread_prologue_and_epilogue);
passes operate without those notes in the IL.
While in GCC 14 mode switching computes the notes problem at the start of
vzeroupper, the patch below removes them at the end of the pass again, so
that the above passes continue to operate without them.
2024-02-05 Jakub Jelinek <jakub@redhat.com>
PR target/113059
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Remove REG_DEAD/REG_UNUSED notes at the end of the pass before
df_analyze call.
Richard Biener [Thu, 1 Feb 2024 12:54:11 +0000 (13:54 +0100)]
target/113255 - avoid REG_POINTER on a pointer difference
The following avoids re-using a register holding a pointer (and
thus might be REG_POINTER) for the result of a pointer difference
computation. That might confuse heuristics in (broken) RTL alias
analysis which relies on REG_POINTER indicating that we're
dealing with one.
This alone doesn't fix anything.
PR target/113255
* config/i386/i386-expand.cc
(expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves):
Use a new pseudo for the skipped number of bytes.
Jonathan Wakely [Thu, 1 Feb 2024 11:09:29 +0000 (11:09 +0000)]
libstdc++: Replace padding bits with bit-fields in __format::_Spec
This ensures that the unused bits will be zero-initialized reliably, and
so can be used later by assigning them values in formatter
specializations. For example, formatters for std::chrono will need to
use an extra bit for a boolean flag to optimize the conversions between
locale encodings and UTF-8.
Adding the 16-bit _M_reserved2 bit-field results in an increased size
for targets that use 1- or 2-byte alignment for all integral types, e.g.
cris-elf or m68k. Placing that member before the _M_width member
adjusts the layout for all targets, but keeps all the bit-fields
together. We can't make that change once C++20 support is ABI stable and
non-experimental, so do it now before GCC 14 is released. The _M_fill
data member already change from char to char32_t in r14-6991-g37a4c5c23a270c so _Spec is already incompatible with gcc-13
anyway.
libstdc++-v3/ChangeLog:
* include/std/format (__format::_Spec::_M_reserved): Define new
bit-field members to reserve padding bits for future extensions.
Jonathan Wakely [Fri, 2 Feb 2024 12:07:09 +0000 (12:07 +0000)]
libstdc++: Fix libstdc++exp.a so it really does contain Filesystem TS symbols
In r14-3812-gb96b554592c5cb I claimed that libstdc++exp.a now contains
all the symbols from libstdc++fs.a as well as libstdc++_libbacktrace.a,
but that wasn't true. Only the symbols from the latter were added to
libstdc++exp.a, the Filesystem TS ones weren't. This seems to be because
libtool won't combine static libs that are going to be installed
separately. Because libstdc++fs.a is still installed, libtool decides it
shouldn't be included in libstdc++exp.a.
The solution is similar to what we already do for libsupc++.a: build two
static libs, libstdc++fs.a and libstdc++fsconvenience.a, where the
former is installed and the latter isn't. If we then tell libtool to
include the latter in libstdc++exp.a it will do as it's told.
libstdc++-v3/ChangeLog:
* src/experimental/Makefile.am: Use libstdc++fsconvenience.a
instead of libstdc++fs.a.
* src/experimental/Makefile.in: Regenerate.
* src/filesystem/Makefile.am: Build libstdc++fsconvenience.a as
well.
* src/filesystem/Makefile.in: Regenerate.
Jonathan Wakely [Fri, 2 Feb 2024 13:20:11 +0000 (13:20 +0000)]
libstdc++: Add copyright and license text to new generated headers
contrib/ChangeLog:
* unicode/gen_libstdcxx_unicode_data.py: Add copyright and
license text to the output.
libstdc++-v3/ChangeLog:
* include/bits/text_encoding-data.h: Regenerate.
* include/bits/unicode-data.h: Regenerate.
* scripts/gen_text_encoding_data.py: Add copyright and license
text to the output.
Jeff Law [Sun, 4 Feb 2024 20:01:50 +0000 (13:01 -0700)]
[committed] Reasonably handle SUBREGs in risc-v cost modeling
This patch adjusts the costs so that we treat REG and SUBREG expressions the
same for costing.
This was motivated by bt_skip_func and bt_find_func in xz and results in nearly
a 5% improvement in the dynamic instruction count for input #2 and smaller, but
definitely visible improvements pretty much across the board. Exceptions would
be perlbench input #1 and exchange2 which showed very small regressions.
In the bt_find_func and bt_skip_func cases we have something like this:
Note the two uses of (reg 136). The best way to handle that in combine might be
a 3->2 split. But there's a much better approach if we look at fwprop...
(set (reg:DI 142 [ _1 ])
(plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
(reg/v:DI 139 [ b ])))
change not profitable (cost 4 -> cost 8)
So that should be the same cost as a regular DImode addition when the ZBA
extension is enabled. But it ends up costing more because the clause to cost
this variant isn't prepared to handle a SUBREG. That results in the RTL above
having too high a cost and fwprop gives up.
One approach would be to replace the REG_P with REG_P || SUBREG_P in the
costing code. I ultimately decided against that and instead check if the
operand in question passes register_operand.
By far the most important case to handle is the DImode PLUS. But for the sake
of consistency, I changed the other instances in riscv_rtx_costs as well. For
those other cases we're talking about improvements in the .000001% range.
While we are into stage4, this just hits cost modeling which we've generally
agreed is still appropriate (though we were mostly talking about vector). So
I'm going to extend that general agreement ever so slightly and include scalar
cost modeling :-)
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG
similarly.
Xi Ruoyao [Fri, 2 Feb 2024 19:16:14 +0000 (03:16 +0800)]
LoongArch: Fix wrong LSX FP vector negation
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with LSX enabled.
Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead. We are already doing this for LASX and now we can unify them
into simd.md.
gcc/ChangeLog:
* config/loongarch/lsx.md (neg<mode:FLSX>2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg<mode:FVEC>2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
thus if __builtin_constant_p (mode) is evaluated true (it happens when
GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false,
mode_size[mode] is still an out-of-bound array access (the length or the
mode_size array is NUM_MACHINE_MODES).
So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a
MIPS bug PR98491 fixed by me about 3 years ago.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
MAX_MACHINE_MODE.
This FAIL was introduced from r14-6908. The reason is that when merging
constant vector permutation implementations, the 128-bit matching situation
was not fully considered. In fact, the expansion of 128-bit vectors after
merging only supports value-based 4 elements set shuffle, so this time is a
complete implementation of the entire 128-bit vector constant permutation,
and some structural adjustments have also been made to the code.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_expand_vselect): Adjust.
(loongarch_expand_vselect_vconcat): Ditto.
(loongarch_try_expand_lsx_vshuf_const): New, use vshuf to implement
all 128-bit constant permutation situations.
(loongarch_expand_lsx_shuffle): Adjust and rename function name.
(loongarch_is_imm_set_shuffle): Renamed function name.
(loongarch_expand_vec_perm_even_odd): Function forward declaration.
(loongarch_expand_vec_perm_even_odd_1): Add implement for 128-bit
extract-even and extract-odd permutations.
(loongarch_is_odd_extraction): Delete.
(loongarch_is_even_extraction): Ditto.
(loongarch_expand_vec_perm_const): Adjust.
Jerry DeLisle [Sat, 3 Feb 2024 02:12:33 +0000 (18:12 -0800)]
libgfortran: EN0.0E0 and ES0.0E0 format editing.
F2018 and F2023 standards added zero width exponents. This required
additional special handing in the process of building formatted
floating point strings.
G formatting uses either F or E formatting as documented in
write_float.def comments. This logic changes the format token from FMT_G
to FMT_F or FMT_E. The new formatting requirements interfere with this
process when a FMT_G float string is being built. To avoid this, a new
component called 'pushed' is added to the fnode structure to save this
condition. The 'pushed' condition is then used to bypass portions of
the new ES,E,EN, and D formatting, falling through to the existing
default formatting which is retained.
libgfortran/ChangeLog:
PR libfortran/111022
* io/format.c (get_fnode): Update initialization of fnode.
(parse_format_list): Initialization.
* io/format.h (struct fnode): Added the new 'pushed' component.
* io/write.c (select_buffer): Whitespace.
(write_real): Whitespace.
(write_real_w0): Adjust logic for the d == 0 condition.
* io/write_float.def (determine_precision): Whitespace.
(build_float_string): Calculate width of ..E0 exponents and
adjust logic accordingly.
(build_infnan_string): Whitespace.
(CALCULATE_EXP): Whitespace.
(quadmath_snprintf): Whitespace.
(determine_en_precision): Whitespace.
gcc/testsuite/ChangeLog:
PR libfortran/111022
* gfortran.dg/fmt_error_10.f: Show D+0 exponent.
* gfortran.dg/pr96436_4.f90: Show E+0 exponent.
* gfortran.dg/pr96436_5.f90: Show E+0 exponent.
* gfortran.dg/pr111022.f90: New test.
libatomic: Provide FPU exception defines for __hppa__
The exception defines in <fenv.h> do not match the exception bits
in the FPU status register on hppa-linux and hppa64-hpux11.11. On
linux, they match the trap enable bits. On 64-bit hpux, they match
the exception bits for IA64. The IA64 bits are in a different
order and location than HPPA. HP uses table look ups to reorder
the bits in code to test and raise exceptions.
All the architectures that I looked at just pass the FPU status
register to __atomic_feraiseexcept(). The simplest approach for
hppa is to define FE_INEXACT, etc, to match the status register
and not include <fenv.h>..
2024-02-03 John David Anglin <danglin@gcc.gnu.org>
libatomic/ChangeLog:
PR target/59778
* configure.tgt (hppa*): Set ARCH.
* config/pa/fenv.c: New file.
Jakub Jelinek [Sat, 3 Feb 2024 13:38:27 +0000 (14:38 +0100)]
wide-int: Fix up wi::bswap_large [PR113722]
Since bswap has been converted from a method to a function we miscompile
the following testcase. The problem is the assumption that the passed in
len argument (number of limbs in the xval array) is the upper bound for the
bswap result, which is true only if precision is <= 64. If precision is
larger than that, e.g. 128 as in the testcase, if the argument has only
one limb (i.e. 0 to ~(unsigned HOST_WIDE_INT) 0), the result can still
need 2 limbs for that precision, or generally BLOCKS_NEEDED (precision)
limbs, it all depends on how many least significant limbs of the operand
are zero. bswap_large as implemented only cleared len limbs of result,
then swapped the bytes (invoking UB when oring something in all the limbs
above it) and finally passed len to canonize, saying that more limbs
aren't needed.
The following patch fixes it by renaming len to xlen (so that it is clear
it is X's length), using it solely for safe_uhwi argument when we attempt
to read from X, and using new len = BLOCKS_NEEDED (precision) instead in
the other two spots (i.e. when clearing the val array, turned it also
into memset, and in canonize argument). wi::bswap asserts it isn't invoked
on widest_int, so we are always invoked on wide_int or similar and those
have preallocated result sized for the corresponding precision (i.e.
BLOCKS_NEEDED (precision)).
2024-02-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113722
* wide-int.cc (wi::bswap_large): Rename third argument from
len to xlen and adjust use in safe_uhwi. Add len variable, set
it to BLOCKS_NEEDED (precision) and use it for clearing of val
and as canonize argument. Clear val using memset instead of
a loop.
Jakub Jelinek [Sat, 3 Feb 2024 13:37:19 +0000 (14:37 +0100)]
ggc-common: Fix save PCH assertion
We are getting a gnuradio PCH ICE
/usr/include/pybind11/stl.h:447:1: internal compiler error: in gt_pch_save, at ggc-common.cc:693
0x1304e7d gt_pch_save(_IO_FILE*)
../../gcc/ggc-common.cc:693
0x12a45fb c_common_write_pch()
../../gcc/c-family/c-pch.cc:175
0x18ad711 c_parse_final_cleanups()
../../gcc/cp/decl2.cc:5062
0x213988b c_common_parse_file()
../../gcc/c-family/c-opts.cc:1319
(unfortunately it isn't reproduceable always, but often needs
up to 100 attempts, isn't reproduceable in a cross etc.).
The bug is in the assertion I've added in gt_pch_save when adding
relocation support for the PCH files in case they happen not to be
mmapped at the selected address.
addr is a relocated address which points to a location in the PCH
blob (starting at mmi.preferred_base, with mmi.size bytes) which contains
a pointer that needs to be relocated. So the assertion is meant to
verify the address is within the PCH blob, obviously it needs to be
equal or above mmi.preferred_base, but I got the other comparison wrong
and when one is very unlucky and the last sizeof (void *) bytes of the
blob happen to be a pointer which needs to be relocated, such as on the
s390x host addr 0x8008a04ff8, mmi.preferred_base 0x8000000000 and
mmi.size 0x8a05000, addr + sizeof (void *) is equal to mmi.preferred_base +
mmi.size and that is still fine, both addresses are end of something.
2024-02-03 Jakub Jelinek <jakub@redhat.com>
* ggc-common.cc (gt_pch_save): Allow addr to be equal to
mmi.preferred_base + mmi.size - sizeof (void *).
- Import latest fixes from dmd v2.107.0-beta.1.
- Hex strings can now be cast to integer arrays.
- Add support for Interpolated Expression Sequences.
D runtime changes:
- Import latest fixes from druntime v2.107.0-beta.1.
- New core.interpolation module to provide run-time support for D
interpolated expression sequence literals.
Phobos changes:
- Import latest fixes from phobos v2.107.0-beta.1.
- `std.range.primitives.isBidirectionalRange', and
`std.range.primitives.isRandomAccessRange' now take an optional
element type.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd e770945277.
* Make-lang.in (D_FRONTEND_OBJS): Add d/basicmangle.o, d/enumsem.o,
d/funcsem.o, d/templatesem.o.
* d-builtins.cc (build_frontend_type): Update for new front-end
interface.
* d-codegen.cc (declaration_type): Likewise.
(parameter_type): Likewise.
* d-incpath.cc (add_globalpaths): Likewise.
(add_filepaths): Likewise.
(add_import_paths): Likewise.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_parse_file): Likewise.
* decl.cc (DeclVisitor::finish_vtable): Likewise.
(DeclVisitor::visit (FuncDeclaration *)): Likewise.
(get_symbol_decl): Likewise.
* expr.cc (ExprVisitor::visit (StringExp *)): Likewise.
Implement support for 8-byte hexadecimal strings.
* typeinfo.cc (create_tinfo_types): Update internal TypeInfo
representation.
(TypeInfoVisitor::visit (TypeInfoConstDeclaration *)): Update for new
front-end interface.
(TypeInfoVisitor::visit (TypeInfoInvariantDeclaration *)): Likewise.
(TypeInfoVisitor::visit (TypeInfoSharedDeclaration *)): Likewise.
(TypeInfoVisitor::visit (TypeInfoWildDeclaration *)): Likewise.
(TypeInfoVisitor::visit (TypeInfoClassDeclaration *)): Move data for
TypeInfo_Class.nameSig to the end of the object.
(create_typeinfo): Update for new front-end interface.
Xi Ruoyao [Thu, 1 Feb 2024 21:37:38 +0000 (05:37 +0800)]
LoongArch: Fix an ODR violation
When bootstrapping GCC 14 with --with-build-config=bootstrap-lto, an ODR
violation is detected:
../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
../../gcc/config/loongarch/loongarch-def.cc:186: note:
'abi_minimal_isa' was previously declared here
186 | abi_minimal_isa = array<array<loongarch_isa, N_ABI_EXT_TYPES>,
../../gcc/config/loongarch/loongarch-def.cc:186: note:
code may be misoptimized unless '-fno-strict-aliasing' is used
Fix it by adding a proper declaration of abi_minimal_isa into
loongarch-def.h and remove the ODR-violating local declaration in
loongarch-opts.cc.
Patrick Palka [Sat, 3 Feb 2024 00:07:08 +0000 (19:07 -0500)]
c++: requires-exprs and partial constraint subst [PR110006]
In r11-3261-gb28b621ac67bee we made tsubst_requires_expr never partially
substitute into a requires-expression so as to avoid checking its
requirements out of order during e.g. generic lambda regeneration.
These PRs however illustrate that we still sometimes do need to
partially substitute into a requires-expression, in particular when it
appears in associated constraints that we're directly substituting for
sake of declaration matching or dguide constraint rewriting. In these
cases we're being called from tsubst_constraint during which
processing_constraint_expression_p is true, so this patch checks this
predicate to control whether we defer substitution or partially
substitute.
In turn, we now need to propagate semantic tsubst flags through
tsubst_requires_expr rather than just using tf_none, notably for sake of
dguide constraint rewriting which sets tf_dguide.
PR c++/110006
PR c++/112769
gcc/cp/ChangeLog:
* constraint.cc (subst_info::quiet): Accomodate non-diagnostic
tsubst flags.
(tsubst_valid_expression_requirement): Likewise.
(tsubst_simple_requirement): Return a substituted _REQ node when
processing_template_decl.
(tsubst_type_requirement_1): Accomodate non-diagnostic tsubst
flags.
(tsubst_type_requirement): Return a substituted _REQ node when
processing_template_decl.
(tsubst_compound_requirement): Likewise. Accomodate non-diagnostic
tsubst flags.
(tsubst_nested_requirement): Likewise.
(tsubst_requires_expr): Don't defer partial substitution when
processing_constraint_expression_p is true, in which case return
a substituted REQUIRES_EXPR.
* pt.cc (tsubst_expr) <case REQUIRES_EXPR>: Accomodate
non-diagnostic tsubst flags.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-alias18.C: New test.
* g++.dg/cpp2a/concepts-friend16.C: New test.
Gaius Mulley [Sat, 3 Feb 2024 00:03:39 +0000 (00:03 +0000)]
PR modula2/113730 Unexpected handling of mixed-precision integer arithmetic
This patch fixes a bug which occurs when an expression is created with
a ZType and an integer or cardinal. The resulting subexpression is
incorrecly given a ZType which allows compatibility with another
integer/cardinal type. The solution was to fix the subexpression
type. In turn this required a minor change to SArgs.mod.
gcc/m2/ChangeLog:
PR modula2/113730
* gm2-compiler/M2Base.mod (IsUserType): New procedure function.
(MixTypes): Use IsUserType instead of IsType before calling MixTypes.
* gm2-compiler/M2GenGCC.mod (GetTypeMode): Remove and import from
SymbolTable.
(CodeBinaryCheck): Replace call to MixTypes with MixTypesBinary.
(CodeBinary): Replace call to MixTypes with MixTypesBinary.
(CodeIfLess): Replace MixTypes with ComparisonMixTypes.
(CodeIfGre): Replace MixTypes with ComparisonMixTypes.
(CodeIfLessEqu): Replace MixTypes with ComparisonMixTypes.
(CodeIfGreEqu): Replace MixTypes with ComparisonMixTypes.
(CodeIfEqu): Replace MixTypes with ComparisonMixTypes.
(CodeIfNotEqu): Replace MixTypes with ComparisonMixTypes.
(ComparisonMixTypes): New procedure function.
* gm2-compiler/M2Quads.mod (BuildEndFor): Replace GenQuadO
with GenQuadOtok and pass tokenpos for the operands to the AddOp
and XIndrOp.
(CheckRangeIncDec): Check etype against NulSym and dtype for a
pointer and set etype to Address.
(BuildAddAdrFunction): New variable opa. Convert operand to an
address and save result in opa. Replace GenQuad with GenQuadOtok.
(BuildSubAdrFunction): New variable opa. Convert operand to an
address and save result in opa. Replace GenQuad with GenQuadOtok.
(BuildDiffAdrFunction): New variable opa. Convert operand to an
address and save result in opa. Replace GenQuad with GenQuadOtok.
(calculateMultipicand): Replace GenQuadO with GenQuadOtok.
(ConvertToAddress): New procedure function.
(BuildDynamicArray): Convert index to address before adding to
the base.
* gm2-compiler/SymbolTable.def (GetTypeMode): New procedure function.
* gm2-compiler/SymbolTable.mod (GetTypeMode): New procedure
function implemented (moved from M2GenGCC.mod).
* gm2-libs/SArgs.mod (GetArg): Replace cast to PtrToChar with ADDRESS.
gcc/testsuite/ChangeLog:
PR modula2/113730
* gm2/extensions/fail/arith1.mod: New test.
* gm2/extensions/fail/arith2.mod: New test.
* gm2/extensions/fail/arith3.mod: New test.
* gm2/extensions/fail/arith4.mod: New test.
* gm2/extensions/fail/arithpromote.mod: New test.
* gm2/extensions/fail/extensions-fail.exp: New test.
* gm2/linking/fail/badimp.def: New test.
* gm2/linking/fail/badimp.mod: New test.
* gm2/linking/fail/linking-fail.exp: New test.
* gm2/linking/fail/testbadimp.mod: New test.
Tamar Christina [Fri, 2 Feb 2024 23:52:27 +0000 (23:52 +0000)]
middle-end: check memory accesses in the destination block [PR113588].
When analyzing loads for early break it was always the intention that for the
exit where things get moved to we only check the loads that can be reached from
the condition.
However the main loop checks all loads and we skip the destination BB. As such
we never actually check the loads reachable from the COND in the last BB unless
this BB was also the exit chosen by the vectorizer.
This leads us to incorrectly vectorize the loop in the PR and in doing so access
out of bounds.
gcc/ChangeLog:
PR tree-optimization/113588
PR tree-optimization/113467
* tree-vect-data-refs.cc
(vect_analyze_data_ref_dependence): Choose correct dest and fix checks.
(vect_analyze_early_break_dependences): Update comments.
gcc/testsuite/ChangeLog:
PR tree-optimization/113588
PR tree-optimization/113467
* gcc.dg/vect/vect-early-break_108-pr113588.c: New test.
* gcc.dg/vect/vect-early-break_109-pr113588.c: New test.
Andrew Pinski [Tue, 23 Jan 2024 00:27:49 +0000 (16:27 -0800)]
Fix some of vect-avg-*.c testcases
The vect-avg-*.c testcases are trying to make sure
the AVG internal function are used and not
doing promotion to `vector unsigned short`
but when V4QI is implemented, `vector(2) unsigned short`
shows up in the detail dump file and causes the failure.
To fix this checking the optimized dump instead of the vect dump
for `vector unsigned short` to make sure the vectorizer does not
do the promotion.
Ian Lance Taylor [Fri, 20 Oct 2023 17:11:48 +0000 (10:11 -0700)]
compiler: export the type "any" as a builtin
Otherwise we can't tell the difference between builtin type "any" and
a locally defined type "any".
This will require updates to the gccgo export data parsers in the main
Go repo and the x/tools repo. These updates are https://go.dev/cl/537195
and https://go.dev/cl/537215.
Jakub Jelinek [Fri, 2 Feb 2024 21:14:33 +0000 (22:14 +0100)]
libgcc: Fix up _BitInt division [PR113604]
The following testcase ends up with SIGFPE in __divmodbitint4.
The problem is a thinko in my attempt to implement Knuth's algorithm.
The algorithm does (where b is 65536, i.e. one larger than what
fits in their unsigned short word):
// Compute estimate qhat of q[j].
qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
again:
if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
{ qhat = qhat - 1;
rhat = rhat + vn[n-1];
if (rhat < b) goto again;
}
The problem is that it uses a double-word / word -> double-word
division (and modulo), while all we have is udiv_qrnnd unless
we'd want to do further library calls, and udiv_qrnnd is a
double-word / word -> word division and modulo.
Now, as the algorithm description says, it can produce at most
word bits + 1 bit quotient. And I believe that actually the
highest qhat the original algorithm can produce is
(1 << word_bits) + 1. The algorithm performs earlier canonicalization
where both the divisor and dividend are shifted left such that divisor
has msb set. If it has msb set already before, no shifting occurs but
we start with added 0 limb, so in the first uv1:uv0 double-word uv1
is 0 and so we can't get too high qhat, if shifting occurs, the first
limb of dividend is shifted right by UWtype bits - shift count into
a new limb, so again in the first iteration in the uv1:uv0 double-word
uv1 doesn't have msb set while vv1 does and qhat has to fit into word.
In the following iterations, previous iteration should guarantee that
the previous quotient digit is correct. Even if the divisor was the
maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs
would be larger or equal to the divisor, the previous quotient digit
would increase and another divisor would be subtracted, which I think
implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1,
but uv0 could be up to all ones, e.g. in case of all lower limbs
of divisor being all ones and at least one dividend limb below uv0
being not all ones. So, we can e.g. for 64-bit UWtype see
uv1:uv0 / vv1 0x8000000000000000UL:0xffffffffffffffffUL / 0x8000000000000000UL
or 0xffffffffffffffffUL:0xffffffffffffffffUL / 0xffffffffffffffffUL
In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is
0x10000000000000001UL, i.e. 2 more than fits into UWtype result,
if uv1 == vv1 && uv0 < uv1 it would be 0x10000000000000000UL, i.e.
1 more than fits into UWtype result.
Because we only have udiv_qrnnd which can't deal with those too large
cases (SIGFPEs or otherwise invokes undefined behavior on those), I've
tried to handle the uv1 >= vv1 case separately, but for one thing
I thought it would be at most 1 larger than what fits, and for two
have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting
0:vv1 from uv1:uv0.
For the uv1 < vv1 case, the implementation already performs roughly
what the algorithm does.
Now, let's see what happens with the two possible extra cases in
the original algorithm.
If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take
if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add
vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because
qhat already fits we can goto to the again label in the uv1 < vv1
code). rhat in this case is uv0 and rhat + vv1 can but doesn't
have to overflow, say for uv0 42UL and vv1 0x8000000000000000UL
it will not (and so we should goto again), while for uv0
0x8000000000000000UL and vv1 0x8000000000000001UL it will (and
we shouldn't goto again).
If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we
take if (qhat >= b, decrement qhat by 1 (it becomes b), add
vn[n-1] aka vv1 to rhat. But because vv1 has msb set and
rhat in this case is uv0 - vv1, the rhat + vv1 addition
certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0,
so in the algorithm we goto again, again take if (qhat >= b and
decrement qhat so it finally becomes b - 1, and add vn[n-1]
aka vv1 to rhat again. But this time I believe it must always
overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and
vv1 has msb set, so already vv1 + vv1 must overflow. And
because it overflowed, it will not goto again.
So, I believe the following patch implements this correctly, by
subtracting vv1 from uv1:uv0 double-word once, then comparing
again if uv1 >= vv1. If that is true, subtract vv1 from uv1:uv0
again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed
as we know it always overflowed and so won't goto again.
If after the first subtraction uv1 < vv1, use __builtin_add_overflow
when adding vv1 to rhat, because it can but doesn't have to overflow.
I've added an extra testcase which tests the behavior of all the changed
cases, so it has a case where uv1:uv0 / vv1 is 1:1, where it is
1:0 and rhat + vv1 overflows and where it is 1:0 and rhat + vv1 does not
overflow, and includes tests also from Zdenek's other failing tests.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113604
* libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract
vv1 from uv1:uv0 once or twice as needed, rather than
subtracting vv1:vv1.
* gcc.dg/torture/bitint-53.c: New test.
* gcc.dg/torture/bitint-55.c: New test.
This change implements __builtin_get_fpsr() and __builtin_set_fpsr(x)
to get and set the floating-point status register. They are used to
implement pa_atomic_assign_expand_fenv().
2024-02-02 John David Anglin <danglin@gcc.gnu.org>
Such situation can only happen on auto-vectorization, never happen on intrinsic codes.
Since the reduction is passed VLMAX AVL, it should be more natural to pass VLMAX to the scalar move which initial the value of the reduction.
Iain Sandoe [Mon, 29 Jan 2024 10:09:25 +0000 (10:09 +0000)]
testsuite, Darwin: Allow for undefined symbols in shared test.
Darwin's linker defaults to error on undefined (which makes it look as
if we do not support shared, leading to tests being marked incorrectly
as unsupported).
This fixes the issue by allowing the symbols used in the target
supports test to be undefined.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_shared):
Allow the external symbols referenced in the test to be undefined.
Iain Sandoe [Fri, 26 Jan 2024 15:23:19 +0000 (15:23 +0000)]
testsuite, ubsan: Add libstdc++ deps where required.
We use the ubsan tests from both C, C++, D and Fortran.
thee sanitizer libraries link to libstdc++.
When we are using the C/gdc/gfortran driver, and the target might
require a path to the libstdc++ (e.g. for handing -static-xxxx or
for embedded runpaths), we need to add a suitable option (or we get
fails at execution time because of the missing paths).
Conversely, we do not want to add multiple instances of these
paths (since that leads to failures on tools that report warnings
for duplicate runpaths).
This patch modifies the _init function to allow a sigle parameter
that determines whether the *asan_init should add a path for
libstdc++ (yes for C driver, no for C++ driver).
gcc/testsuite/ChangeLog:
* g++.dg/ubsan/ubsan.exp:Add a parameter to init to say that
we expect the C++ driver to provide paths for libstdc++.
* gcc.dg/ubsan/ubsan.exp: Add a parameter to init to say that
we need a path added for libstdc++.
* gdc.dg/ubsan/ubsan.exp: Likewise.
* gfortran.dg/ubsan/ubsan.exp: Likewise.
* lib/ubsan-dg.exp: Handle a single parameter to init that
requests addition of a path to libstdc++ to link flags.
Iain Sandoe [Fri, 26 Jan 2024 15:22:44 +0000 (15:22 +0000)]
testsuite, asan, hwsan: Add libstdc++ deps where required.
We use the shared asan/hwasan rom both C,C++,D and Fortran.
The sanitizer libraries link to libstdc++.
When we are using the C/gdc/gfortran driver, and the target might
require a path to the libstdc++ (e.g. for handing -static-xxxx or
for embedded runpaths), we need to add a suitable option (or we get
fails at execution time because of the missing paths).
Conversely, we do not want to add multiple instances of these
paths (since that leads to failures on tools that report warnings
for duplicate runpaths).
This patch modifies the _init function to allow a single parameter
that determines whether the *asan_init should add a path for
libstdc++ (yes for C driver, no for C++ driver).
gcc/testsuite/ChangeLog:
* g++.dg/asan/asan.exp: Add a parameter to init to say that
we expect the C++ driver to provide paths for libstdc++.
* g++.dg/hwasan/hwasan.exp: Likewise
* gcc.dg/asan/asan.exp: Add a parameter to init to say that
we need a path added for libstdc++.
* gcc.dg/hwasan/hwasan.exp: Likewise.
* gdc.dg/asan/asan.exp: Likewise.
* gfortran.dg/asan/asan.exp: Likewise.
* lib/asan-dg.exp: Handle a single parameter to init that
requests addition of a path to libstdc++ to link flags.
* lib/hwasan-dg.exp: Likewise.
Jonathan Wakely [Fri, 2 Feb 2024 10:03:12 +0000 (10:03 +0000)]
libstdc++: Make std::function deduction guide support explicit object functions [PR113335]
This makes the deduction guides for std::function and std::packaged_task
work for explicit object member functions, i.e. "deducing this", as per
LWG 3617.
libstdc++-v3/ChangeLog:
PR libstdc++/113335
* include/bits/std_function.h (__function_guide_helper): Add
partial specialization for explicit object member functions, as
per LWG 3617.
* testsuite/20_util/function/cons/deduction_c++23.cc: Check
explicit object member functions.
* testsuite/30_threads/packaged_task/cons/deduction_c++23.cc:
Likewise.
Jakub Jelinek [Fri, 2 Feb 2024 10:46:34 +0000 (11:46 +0100)]
libgcc: Export XF, TF, HF and BFmode specific _BitInt symbols from libgcc_s.so.1 [PR113700]
Rainer pointed out that __PFX__ and __FIXPTPFX__ prefix replacement is done
solely for libgcc-std.ver.in and not for the *.ver files in config.
I've used the __PFX__ prefix even in config/i386/libgcc-glibc.ver because it
was used for similar symbols in libgcc-std.ver.in, and that results in those
symbols being STB_LOCAL in libgcc_s.so.1. Tests still work because gcc by
default uses -static-libgcc when linking (unlike g++ etc.), but would
have failed when using -shared-libgcc (but I see nothing in the testsuite
actually testing with -shared-libgcc, so am not adding tests).
With the patch, libgcc_s.so.1 now exports
__fixtfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__fixxfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintbf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinthf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinttf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintxf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
on x86_64-linux which it wasn't before.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR target/113700
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Remove __PFX prefixes
from symbol names.
Jakub Jelinek [Fri, 2 Feb 2024 10:28:31 +0000 (11:28 +0100)]
lower-bitint: Handle casts from large/huge _BitInt to pointer/reference types [PR113692]
I thought one needs to cast first to pointer-sized integer before casting to
pointer, but apparently that is not the case.
So the following patch arranges for the large/huge _BitInt to
pointer/reference conversions to use the same code as for conversions
of them to small integral types.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113692
* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Handle casts
from large/huge BITINT_TYPEs to POINTER_TYPE/REFERENCE_TYPE as
final_cast_p.
Jakub Jelinek [Fri, 2 Feb 2024 10:27:37 +0000 (11:27 +0100)]
lower-bitint: Handle uninitialized large/huge SSA_NAMEs as inline asm inputs [PR113699]
Similar problem to calls with uninitialized large/huge _BitInt SSA_NAME
arguments, var_to_partition will not work for those, but unlike calls
where we just create a new uninitialized SSA_NAME here we need to change
the inline asm input to be an uninitialized VAR_DECL.
Jonathan Wakely [Thu, 1 Feb 2024 10:08:05 +0000 (10:08 +0000)]
libstdc++: Implement some missing functions for net::ip::network_v6
libstdc++-v3/ChangeLog:
* include/experimental/internet (network_v6::network): Define.
(network_v6::hosts): Finish implementing.
(network_v6::to_string): Do not concatenate std::string to
arbitrary std::basic_string specialization.
* testsuite/experimental/net/internet/network/v6/cons.cc: New
test.
Jonathan Wakely [Wed, 31 Jan 2024 10:41:49 +0000 (10:41 +0000)]
libstdc++: Avoid reusing moved-from iterators in PSTL tests [PR90276]
The reverse_invoker utility for PSTL tests uses forwarding references for
all parameters, but some of those parameters get forwarded to move
constructors which then leave the objects in a moved-from state. When
the parameters are forwarded a second time that results in making new
copies of moved-from iterators. For libstdc++ debug mode iterators, the
moved-from state is singular, which means copying them will abort at
runtime.
The fix is to make copies of iterator arguments instead of forwarding
them.
The callers of reverse_invoker::operator() also forward the iterators
multiple times, but that's OK because reverse_invoker accepts them by
forwarding reference but then breaks the chain of forwarding and copies
them as lvalues.
libstdc++-v3/ChangeLog:
PR libstdc++/90276
* testsuite/util/pstl/test_utils.h (reverse_invoker): Do not use
perfect forwarding for iterator arguments.
This change broke bootstrap on aarch64-linux.
The problem can be seen even on the reduced testcase.
The IL on the unreduced testcase before widening_mul has:
# val_583 = PHI <val_26(13), val_164(40)>
...
pretmp_266 = MEM[(const struct wide_int_storage *)&D.160657].len;
_264 = pretmp_266 & 65535;
...
_176 = (sizetype) val_583;
_439 = (sizetype) _264;
_284 = _439 * 8;
_115 = _176 + _284;
where 583/266/264 have unsigned int type and 176/439/284/115 have sizetype.
widening_mul first turns that into:
# val_583 = PHI <val_26(13), val_164(40)>
...
pretmp_266 = MEM[(const struct wide_int_storage *)&D.160657].len;
_264 = pretmp_266 & 65535;
...
_176 = (sizetype) val_583;
_439 = (sizetype) _264;
_284 = _264 w* 8;
_115 = _176 + _284;
and then is_widening_mult_rhs_p is called, with type sizetype (64-bit),
rhs _264, hprec 32 and prec 64. Now tree_nonzero_bits (rhs) is
65535, so bits is 64-bit wide_int 65535, stmt is BIT_AND_EXPR,
but we ICE on the
wi::to_wide (gimple_assign_rhs2 (stmt)) == wi::mask (hprec, false, prec)
comparison because wi::to_wide on gimple_assign_rhs2 (stmt) - unsigned int
65535 gives 32-bit wide_int 65535, while wi::mask (hprec, false, prec)
gives 64-bit wide_int 0xffffffff and comparison between different precision
wide_ints is forbidden.
The following patch fixes it the same way how bits is computed earlier,
by calling wide_int::from on the wi::to_wide (gimple_assign_rhs2 (stmt)),
so we compare 64-bit 65535 with 64-bit 0xffffffff.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113705
* tree-ssa-math-opts.cc (is_widening_mult_rhs_p): Use wide_int_from
around wi::to_wide in order to compare value in prec precision.
The problem is that /bin/as doesn't fully support cfi directives, so the
.eh_frame section is specified explicitly, which includes ".long 0".
The regular expression above includes ".*", which does multiline
matches. AFAICS those aren't needed here.
This patch changes the RE not to use multiline patches.
Tested on i386-pc-solaris2.11 (as and gas) and i686-pc-linux-gnu.
Jonathan Wakely [Thu, 1 Feb 2024 21:40:33 +0000 (21:40 +0000)]
libstdc++: Allow explicit conversion of string views with different traits
This was changed by LWG 3857.
libstdc++-v3/ChangeLog:
* include/std/string_view (basic_string_view(R&&)): Remove
constraint that traits_type must be the same, as per LWG 3857.
* testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc:
Explicit conversion between different specializations should be
allowed.
* testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc:
Likewise.
Jonathan Wakely [Thu, 1 Feb 2024 21:15:20 +0000 (21:15 +0000)]
libstdc++: Remove noexcept from std::generator::promise_type::yield_value
This overload of std::generator::promise_type::yield_value calls things
which might throw, so should not be noexcept. The noexcept was remove by
LWG 3894.
libstdc++-v3/ChangeLog:
* include/std/generator (promise_type::yield_value): Remove
noexcept from fourth overload, as per LWG 3894.
Iain Sandoe [Wed, 24 Jan 2024 08:05:41 +0000 (08:05 +0000)]
testsuite, Objective-C++: Update link flags [PR112863].
These regressions are caused by missing or duplicate runpaths which
now fire linker warnings.
We need to add options to locate libobjc (and on Darwin libobjc-gnu)
along with libstdc++.
Usually '-L' options are added to point to the relevant directories for
the uninstalled libraries.
In cases where libraries are available as both shared and convenience
some additional checks are made.
For some targets -static-xxxx options are handled by specs substitution
and need a '-B' option rather than '-L'. For Darwin, when embedded
runpaths are in use (the default for all versions after macOS 10.11),
'-B' is also needed to provide the runpath.
When '-B' is used, this results in a '-L' for each path that exists (so
that appending a '-L' as well is a needless duplicate). There are also
cases where tools warn for duplicates, leading to spurious fails.
PR target/112863
gcc/testsuite/ChangeLog:
* lib/obj-c++.exp: Decide on whether to present -B or -L to
reference the paths to uninstalled libobjc/libobjc-gnu and
libstdc++ and use that to generate the link flags.
Iain Sandoe [Wed, 24 Jan 2024 08:05:30 +0000 (08:05 +0000)]
testsuite, gfortran: Update link flags [PR112862].
The regressions here are caused by two issues:
1. In some cases there is no generated runpath for libatomic
2. In other cases there are duplicate paths.
This patch simplifies the addition of the options in the main
gfortran exp and removes the duplicates elewhere.
We need to add options to locate libgfortran and the dependent libs
libquadmath (supporting REAL*16) and libatomic (supporting operations
used by coarrays). Usually '-L' options are added to point to the
relevant directories for the uninstalled libraries.
In cases where libraries are available as both shared and convenience
some additional checks are made.
For some targets -static-xxxx options are handled by specs substitution
and need a '-B' option rather than '-L'. For Darwin, when embedded
runpaths are in use (the default for all versions after macOS 10.11),
'-B' is also needed to provide the runpath.
When '-B' is used, this results in a '-L' for each path that exists (so
that appending a '-L' as well is a needless duplicate). There are also
cases where tools warn for duplicates, leading to spurious fails.
PR target/112862
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/caf.exp: Remove duplicate additions of
libatomic handling.
* gfortran.dg/dg.exp: Likewise.
* lib/gfortran.exp: Decide on whether to present -B or -L to
reference the paths to uninstalled libgfortran, libqadmath and
libatomic and use that to generate the link flags.
Realize in recent benchmark evaluation (coremark-pro zip-test):
vid.v v2
vmv.v.i v5,0
.L9:
vle16.v v3,0(a4)
vrsub.vx v4,v2,a6 ---> LICM failed to hoist it outside the loop.
The root cause is:
(insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0)
(reg:DI 223)) "rvv.c":11:9 208 {*movdi_64bit} -> Its result used by the following vrsub.vx then supress the hoist of the vrsub.vx
(nil))
Jason Merrill [Thu, 1 Feb 2024 22:23:53 +0000 (17:23 -0500)]
c++: no_unique_address and constexpr [PR112439]
Here, because we don't build a CONSTRUCTOR for an empty base, we were
wrongly marking the Foo CONSTRUCTOR as complete after initializing the Empty
member. Fixed by checking empty_base here as well.
PR c++/112439
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_store_expression): Check empty_base
before marking a CONSTRUCTOR readonly.
Li Wei [Fri, 26 Jan 2024 08:41:11 +0000 (16:41 +0800)]
LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.
Xi Ruoyao [Mon, 29 Jan 2024 07:20:07 +0000 (15:20 +0800)]
LoongArch: Don't split the instructions containing relocs for extreme code model.
The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent. So model them as "one large
instruction", i.e. define_insn, with two output registers. The real
address is the sum of these two registers.
The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.
gcc/ChangeLog:
* config/loongarch/loongarch.md (unspec): Add
UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
(la_pcrel64_two_parts): New define_insn.
* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
typo in the comment.
(loongarch_call_tls_get_addr): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for
addressing the TLS symbol and __tls_get_addr. Emit an REG_EQUAL
note to allow CSE addressing __tls_get_addr.
(loongarch_legitimize_tls_address): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address TLS IE symbols with
la_pcrel64_two_parts.
(loongarch_split_symbol): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address symbols with
la_pcrel64_two_parts.
(loongarch_output_mi_thunk): Clean up unreachable code. If
-mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI
thunks with la_pcrel64_two_parts.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
instruction sequences are not reordered by the compiler.
(NOIPA): Disallow interprocedural optimizations.
* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
duplicated from func-call-extreme-1.c, include it instead.
(dg-options): Likewise.
* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
Likewise.
* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
Likewise.
* gcc.target/loongarch/cmodel-extreme-1.c: New test.
* gcc.target/loongarch/cmodel-extreme-2.c: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.
Lulu Cheng [Fri, 26 Jan 2024 02:46:51 +0000 (10:46 +0800)]
LoongArch: Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.
Binutils does not support relaxation using four instructions to obtain
symbol addresses
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
When the code model of the symbol is extreme and -mexplicit-relocs=auto,
the macro instruction loading symbol address is not applicable.
(loongarch_call_tls_get_addr): Adjust code.
(loongarch_legitimize_tls_address): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New test.
* gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New test.
Lulu Cheng [Thu, 25 Jan 2024 11:10:46 +0000 (19:10 +0800)]
LoongArch: Add the macro implementation of mcmodel=extreme.
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p):
Add function declaration.
* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend"
is not allowed
(loongarch_load_tls): Added macro support in extreme mode.
(loongarch_call_tls_get_addr): Likewise.
(loongarch_legitimize_tls_address): Likewise.
(loongarch_force_address): Likewise.
(loongarch_legitimize_move): Likewise.
(loongarch_output_mi_thunk): Likewise.
(loongarch_option_override_internal): Remove the code that detects
explicit relocs status.
(loongarch_handle_model_attribute): Likewise.
* config/loongarch/loongarch.md (movdi_symbolic_off64): New template.
* config/loongarch/predicates.md (symbolic_off64_operand): New predicate.
(symbolic_off64_or_reg_operand): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/attr-model-5.c: New test.
* gcc.target/loongarch/func-call-extreme-5.c: New test.
* gcc.target/loongarch/func-call-extreme-6.c: New test.
* gcc.target/loongarch/tls-extreme-macro.c: New test.
Lulu Cheng [Tue, 30 Jan 2024 07:02:32 +0000 (15:02 +0800)]
LoongArch: Modify the address calculation logic for obtaining array element values through fp.
Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C).
Thereby modifying the register dependencies and optimizing the code.
The value of C is 2 4 or 8.
The following is the assembly code before and after a loop modification in spec2006 401.bzip:
Marek Polacek [Thu, 1 Feb 2024 21:11:43 +0000 (16:11 -0500)]
c++: -Wdangling-reference tweak to unbreak aarch64
My recent -Wdangling-reference change to not warn on std::span-like classes
unfortunately caused a new warning: extending reference_like_class_p also
opens the door to new warnings since we use reference_like_class_p for
checking the return type of the function: either it must be a reference
or a reference_like_class_p.
We can consider even non-templates as std::span-like to get rid of the
warning here.
gcc/cp/ChangeLog:
* call.cc (reference_like_class_p): Consider even non-templates for
std::span-like classes.
The fix for PR70321 introduced a splitter that split a doubleword
comparison into a pair of XORs followed by an IOR to set the (zero)
flags register. To help the reload, splitter forced SUBREG pieces of
double-word input values to a pseudo, but this regressed
gcc.target/i386/pr82580.c:
To mitigate the regression, remove this legacy heuristic (workaround?).
There have been many incremental changes and improvements to x86 TImode
and register allocation, so this legacy workaround is not only no longer
useful, but it actually hurts register allocation. The patched compiler
now produces:
Jakub Jelinek [Thu, 1 Feb 2024 20:07:01 +0000 (21:07 +0100)]
libgcc: Avoid warnings on __gcc_nested_func_ptr_created [PR113402]
I'm seeing hundreds of
In file included from ../../../libgcc/libgcc2.c:56:
../../../libgcc/libgcc2.h:32:13: warning: conflicting types for built-in function ‘__gcc_nested_func_ptr_created’; expected ‘void(void *, void *, void *)’
+[-Wbuiltin-declaration-mismatch]
32 | extern void __gcc_nested_func_ptr_created (void *, void *, void **);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warnings.
Either we need to add like in r14-6218
#pragma GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
(but in that case because of the libgcc2.h prototype (why is it there?)
it would need to be also with #pragma GCC diagnostic push/pop around),
or we could go with just following how the builtins are prototyped on the
compiler side and only cast to void ** when dereferencing (which is in
a single spot in each TU).
2024-02-01 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113402
* libgcc2.h (__gcc_nested_func_ptr_created): Change type of last
argument from void ** to void *.
* config/i386/heap-trampoline.c (__gcc_nested_func_ptr_created):
Change type of dst from void ** to void * and cast dst to void **
before dereferencing it.
* config/aarch64/heap-trampoline.c (__gcc_nested_func_ptr_created):
Likewise.
Patrick Palka [Thu, 1 Feb 2024 19:59:46 +0000 (14:59 -0500)]
libstdc++: Implement P2165R4 changes to std::pair/tuple/etc [PR113309]
This implements the C++23 paper P2165R4 Compatibility between tuple,
pair and tuple-like objects, which builds upon many changes from the
earlier C++23 paper P2321R2 zip.
Some declarations had to be moved around so that they're visible from
<bits/stl_pair.h> without introducing new includes and bloating the
header. In the end, the only new include is for <bits/utility.h> from
<bits/stl_iterator.h>, for tuple_element_t.
PR libstdc++/113309
PR libstdc++/109203
libstdc++-v3/ChangeLog:
* include/bits/ranges_util.h (__detail::__pair_like): Don't
define in C++23 mode.
(__detail::__pair_like_convertible_from): Adjust as per P2165R4.
(__detail::__is_subrange<subrange>): Moved from <ranges>.
(__detail::__is_tuple_like_v<subrange>): Likewise.
* include/bits/stl_iterator.h: Include <bits/utility.h> for
C++23.
(__different_from): Move to <concepts>.
(__iter_key_t): Adjust for C++23 as per P2165R4.
(__iter_val_t): Likewise.
* include/bits/stl_pair.h (pair, array): Forward declare.
(get): Forward declare all overloads relevant to P2165R4
tuple-like constructors.
(__is_tuple_v): Define for C++23.
(__is_tuple_like_v): Define for C++23.
(__tuple_like): Define for C++23 as per P2165R4.
(__pair_like): Define for C++23 as per P2165R4.
(__eligibile_tuple_like): Define for C++23.
(__eligibile_pair_like): Define for C++23.
(pair::_S_constructible_from_pair_like): Define for C++23.
(pair::_S_convertible_from_pair_like): Define for C++23.
(pair::_S_dangles_from_pair_like): Define for C++23.
(pair::pair): Define overloads taking a tuple-like type for
C++23 as per P2165R4.
(pair::_S_assignable_from_tuple_like): Define for C++23.
(pair::_S_const_assignable_from_tuple_like): Define for C++23.
(pair::operator=): Define overloads taking a tuple-like type for
C++23 as per P2165R4.
* include/bits/utility.h (ranges::__detail::__is_subrange):
Moved from <ranges>.
* include/bits/version.def (tuple_like): Define for C++23.
* include/bits/version.h: Regenerate.
* include/std/concepts (__different_from): Moved from
<bits/stl_iterator.h>.
(ranges::__swap::__adl_swap): Clarify which __detail namespace.
* include/std/map (__cpp_lib_tuple_like): Define C++23.
* include/std/ranges (__detail::__is_subrange): Moved to
<bits/utility.h>.
(__detail::__is_subrange<subrange>): Moved to <bits/ranges_util.h>
(__detail::__has_tuple_element): Adjust for C++23 as per P2165R4.
(__detail::__tuple_or_pair): Remove as per P2165R4. Replace all
uses with plain tuple as per P2165R4.
* include/std/tuple (__cpp_lib_tuple_like): Define for C++23.
(__tuple_like_tag_t): Define for C++23.
(__tuple_cmp): Forward declare for C++23.
(_Tuple_impl::_Tuple_impl): Define overloads taking
__tuple_like_tag_t and a tuple-like type for C++23.
(_Tuple_impl::_M_assign): Likewise.
(tuple::__constructible_from_tuple_like): Define for C++23.
(tuple::__convertible_from_tuple_like): Define for C++23.
(tuple::__dangles_from_tuple_like): Define for C++23.
(tuple::tuple): Define overloads taking a tuple-like type for
C++23 as per P2165R4.
(tuple::__assignable_from_tuple_like): Define for C++23.
(tuple::__const_assignable_from_tuple_like): Define for C++23.
(tuple::operator=): Define overloads taking a tuple-like type
for C++23 as per P2165R4.
(tuple::__tuple_like_common_comparison_category): Define for C++23.
(tuple::operator<=>): Define overload taking a tuple-like type
for C++23 as per P2165R4.
(array, get): Forward declarations moved to <bits/stl_pair.h>.
(tuple_cat): Constrain with __tuple_like for C++23 as per P2165R4.
(apply): Likewise.
(make_from_tuple): Likewise.
(__tuple_like_common_reference): Define for C++23.
(basic_common_reference): Adjust as per P2165R4.
(__tuple_like_common_type): Define for C++23.
(common_type): Adjust as per P2165R4.
* include/std/unordered_map (__cpp_lib_tuple_like): Define for
C++23.
* include/std/utility (__cpp_lib_tuple_like): Define for C++23.
* testsuite/std/ranges/zip/1.cc (test01): Adjust to handle pair
and 2-tuple interchangeably.
(test05): New test.
* testsuite/20_util/pair/p2165r4.cc: New test.
* testsuite/20_util/tuple/p2165r4.cc: New test.
The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.
2024-02-01 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.