gcc.gnu.org Git - gcc.git/log

Fix fallout of peeling for gap improvements

The following hopefully addresses an observed bootstrap issue on aarch64
where maybe-uninit diagnostics occur. It also fixes bogus napkin math
from myself when I was confusing rounded up size of a single access
with rounded up size of the group accessed in a single scalar iteration.
So the following puts in a correctness check, leaving a set of peeling
for gaps as insufficient. This could be rectified by splitting the
last load into multiple ones but I'm leaving this for a followup, better
quickly fix the reported wrong-code.

* tree-vect-stmts.cc (get_group_load_store_type): Do not
re-use poly-int remain but re-compute with non-poly values.
Verify the shortened load is good enough to be covered with
a single scalar gap iteration before accepting it.

* gcc.dg/vect/pr115385.c: Enable AVX2 if available.

Adjust ix86_rtx_costs for pternlog_operand_p.

r15-1100-gec985bc97a0157 improves handling of ternlog instructions,
now GCC can recognize lots of pternlog_operand with different
variants.

The patch adjust rtx_costs for that, so pass_combine can
reasonably generate more optimal vpternlog instructions.

.i.e
for avx512f-vpternlog-3.c, with the patch, 2 vpternlog are combined into one.

1532,1533c1526
<       vpternlogd      $168, %zmm1, %zmm0, %zmm2
<       vpternlogd      $0x55, %zmm2, %zmm2, %zmm2

>       vpternlogd      $87, %zmm1, %zmm0, %zmm2
1732,1733c1725,1726
<       vpand   %xmm0, %xmm1, %xmm0
<       vpternlogd      $0x55, %zmm0, %zmm0, %zmm0

>       vpternlogd      $63, %zmm1, %zmm0, %zmm1
>       vmovdqa %xmm1, %xmm0
1804,1805c1797
<       vpternlogd      $188, %zmm2, %zmm0, %zmm1
<       vpternlogd      $0x55, %zmm1, %zmm1, %zmm1

>       vpternlogd      $37, %zmm0, %zmm2, %zmm1

gcc/ChangeLog:

* config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for
pternlog_operand under AVX512, also adjust VEC_DUPLICATE
according since vec_dup:mem can't be that cheap.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx2-pr98461.c: Scan either notl or
vpternlog.
* gcc.target/i386/avx512f-pr96891-3.c: Also scan for inversed
condition.
* gcc.target/i386/avx512f-vpternlogd-3.c: Adjust vpternlog
number to 673.
* gcc.target/i386/avx512f-vpternlogd-4.c: Ditto.
* gcc.target/i386/avx512f-vpternlogd-5.c: Ditto.
* gcc.target/i386/sse2-v1ti-vne.c: Add -mno-avx512f.

Remove one_if_conv for latest Intel processors.

The tune is added by PR79390 for SciMark2 on Broadwell.
For latest GCC, with and without the -mtune-ctrl=^one_if_conv_insn.
GCC will generate the same binary for SciMark2. And for SPEC2017,
there's no big impact for SKX/CLX/ICX, and small improvements on SPR
and later.

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_ONE_IF_CONV_INSN): Remove
latest Intel processors.

Co-authored by: Lingling Kong <lingling.kong@intel.com>

i386: More use of m{32,64}bcst addressing modes with ternlog.

This patch makes more use of m32bcst and m64bcst addressing modes in
ix86_expand_ternlog.  Previously, the i386 backend would only consider
using a m32bcst if the inner mode of the vector was 32-bits, or using
m64bcst if the inner mode was 64-bits.  For ternlog (and other logic
operations) this is a strange restriction, as how the same constant
is materialized is dependent upon the mode it is used/operated on.
Hence, the V16QI constant {2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2} wouldn't
use m??bcst, but (V4SI){0x02020202,0x02020202,0x02020202,0x02020202}
which has the same bit pattern would.  This can optimized by (re)checking
whether a CONST_VECTOR can be broadcast from memory after casting it
to VxSI (or for m64bst to VxDI) where x has the appropriate vector size.

Taking the test case from pr115407:

__attribute__((__vector_size__(64))) char v;
void foo() {
  v = v | v << 7;
}

Compiled with -O2 -mcmodel=large -mavx512bw
GCC 14 generates a 64-byte (512-bit) load from the constant pool:

foo: movabsq $v, %rax // 10
        movabsq $.LC0, %rdx // 10
        vpsllw  $7, (%rax), %zmm1 // 7
        vmovdqa64       (%rax), %zmm0 // 6
        vpternlogd      $248, (%rdx), %zmm1, %zmm0 // 7
        vmovdqa64       %zmm0, (%rax) // 6
        vzeroupper // 3
        ret // 1
.LC0: .byte   -12 // 64 = 114 bytes
.byte -128
;; repeated another 62 times

mainline currently generates two instructions, using interunit broadcast:

foo: movabsq $v, %rdx // 10
        movl    $-2139062144, %eax // 5
        vmovdqa64       (%rdx), %zmm2 // 6
        vpbroadcastd    %eax, %zmm0 // 6
        vpsllw  $7, %zmm2, %zmm1 // 7
        vpternlogd      $236, %zmm0, %zmm2, %zmm1 // 7
        vmovdqa64       %zmm1, (%rdx) // 6
        vzeroupper // 3
        ret // 1 = 51 bytes

With this patch, we now generate a broadcast addressing mode:

foo: movabsq $v, %rax    // 10
        movabsq $.LC1, %rdx    // 10
        vmovdqa64       (%rax), %zmm1    // 6
        vpsllw  $7, %zmm1, %zmm0    // 7
        vpternlogd      $236, (%rdx){1to16}, %zmm1, %zmm0  // 7
        vmovdqa64       %zmm0, (%rax)    // 6
        vzeroupper    // 3
        ret    // 1 = 50 total

Without -mcmodel=large, the benefit is two instructions:

foo: vmovdqa64       v(%rip), %zmm1        // 10
        vpsllw  $7, %zmm1, %zmm0        // 7
        vpternlogd      $236, .LC2(%rip){1to16}, %zmm1, %zmm0  // 11
        vmovdqa64       %zmm0, v(%rip)        // 10
        vzeroupper        // 3
        ret        // 1 = 42 total

2024-06-14  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_ternlog): Try performing
logic operation in a different vector mode if that enables use of
a 32-bit or 64-bit broadcast addressing mode.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr115407.c: New test case.

expand: constify sepops operand to expand_expr_real_2 and expand_widen_pattern_expr [PR113212]

While working on an expand patch back in January I noticed that
the first argument (of sepops type) of expand_expr_real_2 could be
constified as it was not to be touched by the function (nor should it be).
There is code in internal-fn.cc that depends on expand_expr_real_2 not touching
the ops argument so constification makes this more obvious.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR middle-end/113212
* expr.h (const_seqpops): New typedef.
(expand_expr_real_2): Constify the first argument.
* optabs.cc (expand_widen_pattern_expr): Likewise.
* optabs.h (expand_widen_pattern_expr): Likewise.
* expr.cc (expand_expr_real_2): Likewise
(do_store_flag): Likewise. Remove incorrect store to ops->code.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Daily bump.

Revert "map packed field type to unpacked for debug info"

This reverts commit ea5c9f25241ae0658180afbcad7f4e298352f561.

RISC-V: Add support for subword atomic loads/stores

Andrea Parri recently pointed out that we were emitting overly conservative
fences for seq_cst atomic loads/stores. This adds support for the optimized
fences specified in the PSABI:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/2092568f7896ceaa1ec0f02569b19eaa42cd51c9/riscv-atomic.adoc

gcc/ChangeLog:

* config/riscv/sync-rvwmo.md: Add support for subword fenced
loads/stores.
* config/riscv/sync-ztso.md: Ditto.
* config/riscv/sync.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Increase test coverage to
include longs, shorts, chars, and bools.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-3.c: Ditto.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Tested-by: Andrea Parri <andrea@rivosinc.com>

[libstdc++] [testsuite] require cmath for [PR114359]

When !_GLIBCXX_USE_C99_MATH_TR1, binomial_distribution doesn't use the
optimized algorithm that was fixed in response to PR114359.  Without
that optimized algorithm, operator() ends up looping very very long
for the test, to the point that it would time out by several orders of
magnitude, without even exercising the optimized algorithm that we're
testing for regressions.  Arrange for the test to be skipped if that
bit won't be exercised.

for  libstdc++-v3/ChangeLog

PR libstdc++/114359
* testsuite/26_numerics/random/binomial_distribution/114359.cc:
Require cmath.

c: Implement C2Y complex increment/decrement support

Support for complex increment and decrement (previously supported as
an extension) was voted into C2Y today (paper N3259). Thus, change
the pedwarn to a pedwarn_c23 and add associated tests.

Note: the type of the 1 to be added / subtracted is underspecified (to
be addressed in a subsequent paper), but understood to be intended to
be a real type (so the sign of a zero imaginary part is never changed)
and this is what is implemented; the tests added include verifying
that there is no undesired change to the sign of a zero imaginary
part.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.

gcc/c/
* c-typeck.cc (build_unary_op): Use pedwarn_c23 for complex
increment and decrement.

gcc/testsuite/
* gcc.dg/c23-complex-1.c, gcc.dg/c23-complex-2.c,
gcc.dg/c23-complex-3.c, gcc.dg/c23-complex-4.c,
gcc.dg/c2y-complex-1.c, gcc.dg/c2y-complex-2.c: New tests.

rs6000, altivec-2-runnable.c should be a runnable test

The test case has "dg-do compile" set not "dg-do run" for a runnable
test. This patch changes the dg-do command argument to run.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
argument to run.

doc: Spell "command-line option" with a hypen

gcc:
* doc/extend.texi (AArch64 Function Attributes): Add
(AVR Variable Attributes): Ditto.
(Common Type Attributes): Ditto.

c++/modules: export using across namespace [PR114683]

Currently we represent a non-function using-declaration by inserting the
named declaration into the target scope. In general this works fine, but in
the case of an exported using-declaration we have nowhere to mark the
using-declaration as exported, so we mark the original declaration as
exported instead, and then treat all using-declarations that name it as
exported as well. We were doing this only if there was also a previous
non-exported using, so for this testcase the export got lost; this patch
broadens the workaround to also apply to the using that first brings the
declaration into the current scope.

This does not fully resolve 114683, but replaces a missing exports bug with
an extra exports bug, which should be a significant usability improvement.
The testcase has xfails for extra exports.

I imagine a complete fix should involve inserting a USING_DECL.

PR c++/114683

gcc/cp/ChangeLog:

* name-lookup.cc (do_nonmember_using_decl): Allow exporting
a newly inserted decl.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-22_a.C: New test.
* g++.dg/modules/using-22_b.C: New test.

c++/modules: multiple usings of the same decl [PR115194]

add_binding_entity creates an OVERLOAD to represent a using-declaration in
module purview of a declaration in the global module, even for
non-functions, and we were failing to merge that with the original
declaration in name lookup.

It's not clear to me that building the OVERLOAD is what should be happening,
but let's work around it for now pending an overhaul of using-decl handling
for c++/114683.

PR c++/115194

gcc/cp/ChangeLog:

* name-lookup.cc (name_lookup::process_module_binding): Strip an
OVERLOAD from a non-function.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-23_a.C: New test.
* g++.dg/modules/using-23_b.C: New test.

c++: adjust comment

Adjusting the comment I added in r15-1223 to clarify that this is a
workaround for a bug elsewhere.

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_binding_entity): Adjust comment.

c++: undeclared identifier in requires-clause [PR99678]

Since the terms of a requires-clause are grammatically primary-expressions
and not e.g. postfix-expressions, it seems we need to explicitly handle
and diagnose the case where a term parses to a bare unresolved identifier,
like cp_parser_postfix_expression does, since cp_parser_primary_expression
leaves that up to its callers. Otherwise we incorrectly accept the first
three requires-clauses below.

Note that the only occurrences of primary-expression in the grammar are
postfix-expression and constraint-logical-and-expression, so it's not too
surprising that we need this special handling here.

PR c++/99678

gcc/cp/ChangeLog:

* parser.cc (cp_parser_constraint_primary_expression): Diagnose
a bare unresolved unqualified-id.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires38.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

[APX CCMP] Add targetm.have_ccmp hook [PR115370]

In cfgexpand, there is an optimization for branch which tests
targetm.gen_ccmp_first == NULL. However for target like x86-64, the
hook was implemented but it does not indicate that ccmp was enabled.
Add a new target hook TARGET_HAVE_CCMP and replace the middle-end
check for the existance of gen_ccmp_first to avoid misoptimization.

gcc/ChangeLog:

PR target/115370
PR target/115463
* target.def (have_ccmp): New target hook.
* targhooks.cc (default_have_ccmp): New function.
* targhooks.h (default_have_ccmp): New prototype.
* doc/tm.texi.in: Add TARGET_HAVE_CCMP.
* doc/tm.texi: Regenerate.
* cfgexpand.cc (expand_gimple_cond): Call targetm.have_ccmp
instead of checking if targetm.gen_ccmp_first exists.
* expr.cc (expand_expr_real_gassign): Likewise.
* config/i386/i386.cc (ix86_have_ccmp): New target hook to
check if APX_CCMP enabled.
(TARGET_HAVE_CCMP): Define.

c++: ICE w/ ambig and non-strictly-viable cands [PR115239]

Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522. These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate. To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).

PR c++/115239

gcc/cp/ChangeLog:

* call.cc (tourney): Don't consider a non-strictly viable
candidate as the champ if there was ambiguity between two
strictly viable candidates.

gcc/testsuite/ChangeLog:

* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

libstdc++: Optimize std::add_rvalue_reference compilation performance

This patch optimizes the compilation performance of
std::add_rvalue_reference by dispatching to the new
__add_rvalue_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (__add_rval_ref_t): Use
__add_rvalue_reference built-in trait.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::add_lvalue_reference compilation performance

This patch optimizes the compilation performance of
std::add_lvalue_reference by dispatching to the new
__add_lvalue_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (__add_lval_ref_t): Use
__add_lvalue_reference built-in trait.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::is_pointer compilation performance

This patch optimizes the compilation performance of std::is_pointer
by dispatching to the new __is_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__is_pointer): Use
__is_pointer built-in trait.
* include/std/type_traits (is_pointer): Likewise. Optimize its
implementation.
(is_pointer_v): Likewise.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

ada: Compiler goes into loop

In some cases that are difficult to characterize, the compiler fails an
assertion check (if the compiler is built with assertions enabled) or
loops forever (if assertions are not enabled). One way this can happen is if
Exp_Util.Insert_Actions is called with an N_Itype_Reference node as its first
parameter. This, in turn, can happen when an instance of
Exp_Attr.Expand_N_Attribute_Reference.Built_And_Insert_Type_Attr_Subp
calls Insert_Action (which will call Insert_Actions).

gcc/ada/

* exp_util.adb
(Insert_Actions): Code was relying on an incorrect assumption that an
N_Itype_Reference cannot occur in declaration list or a statement
list. Fix the code to handle this case.

ada: Remove -gnatdJ switch

Using -gnatdJ with various other switches was error prone.
Remove this switch since the primary users of this mode
GNATCheck and Codepeer no longer need it.

gcc/ada/

* debug.adb: Remove mentions of -gnatdJ.
* errout.adb: Remove printing subprogram names to JSON.
* erroutc.adb: Remove printing subprogram names in messages.
* erroutc.ads: Remove Node and Subprogram_Name_Ptr used for -gnatdJ.
* errutil.adb: Remove Node used for -gnatdJ
* gnat1drv.adb: Remove references of -gnatdJ and
Include_Subprgram_In_Messages.
* opt.ads: Remove Include_Subprgram_In_Messages
* par-util.adb: Remove behavior related to
Include_Subprgram_In_Messages.
* sem_util.adb: Remove Subprogram_Name used for -gnatdJ

ada: Fix segmentation fault on slice of array with Unbounded_String component

This fixes a regression introduced by the overhaul of the implementation
of finalization. When the first subtype of an array type is declared as
constrained, the Finalize_Address primitive of the base type synthesized
by the compiler is tailored to this first subtype, which means that this
primitive cannot be used for other subtypes of the array type, which may
for example be generated when an aggregate is assigned to a slice of an
object of the first subtype.

The straightforward solution would be to synthesize the Finalize_Address
primitive for the base type instead, but its clean implementation would
require changing the way allocators are implemented to always allocate
the bounds alongside the data, which may turn out to be delicate.

This instead changes the compiler to synthesize a local Finalize_Address
primitive in the problematic cases, which should be rare in practice, and
also contains a fixlet for Find_Last_Init, which fails to get to the base
type again in the indirect case and, therefore, mishandles array subtypes.

gcc/ada/

* exp_ch7.adb (Attach_Object_To_Master_Node): Fix formatting.
(Build_Finalizer.Process_Object_Declaration): Synthesize a local
Finalize_Address primitive if the object's subtype is an array
that has a constrained first subtype and is not this first subtype.
* exp_util.adb (Find_Last_Init): Get again to the base type in the
indirect case.

ada: Remove Iterable from list of GNAT-specific attributes

The attribute is rejected except in attribute definition clauses, where it
is silently ignored (it's a by-product of the processing of the aspect).

gcc/ada/

* doc/gnat_rm/implementation_defined_attributes.rst (Iterable):
Delete entry.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Fix test for giving hint on ambiguous aggregate

In the case the type of an aggregate cannot be determined due to
an ambiguity, caused by the existence of container aggregates,
a hint can be given by GNAT. The test for giving this hint should
be the Ada language version, not the fact that extensions are allowed.
Now fixed.

There is no impact on code generation.

gcc/ada/

* sem_util.adb (Check_Ambiguous_Aggregate): Fix test.

ada: Missing postcondition runtime check in inherited primitive

When a derived tagged type implements more interface interface types
than its parent type, and a primitive inherited from its parent type
covers a primitive of these additional interface types that has
classwide postconditions, the code generated by the compiler does not
check the classwide postconditions inherited from the interface primitive.

gcc/ada/

* freeze.ads (Check_Condition_Entities): Complete documentation.
* freeze.adb (Check_Inherited_Conditions): Extend its functionality to
build two kind of wrappers: the existing LSP wrappers, and wrappers
required to handle postconditions of interface primitives implemented
by inherited primitives.
(Build_Inherited_Condition_Pragmas): Rename formal.
(Freeze_Record_Type): For derived tagged types, move call to
Check_Inherited_Conditions to subprogram Freeze_Entity_Checks;
done to improve the performance of Check_Inherited_Conditions since it
can rely on the internal entities that link interface primitives with
tagged type primitives that implement them.
(Check_Interface_Primitives_Strub_Mode): New subprogram.
* sem_ch13.adb (Freeze_Entity_Checks): Call Check_Inherited_Conditions.
Call Check_Inherited_Conditions with derived interface types to check
strub mode compatibility of their primitives.
* sem_disp.adb (Check_Dispatching_Operation): Adjust assertion to accept
wrappers of interface primitives that have classwide postconditions.
* exp_disp.adb (Write_DT): Adding text to identify wrappers.

ada: Revert changing a GNATProve mode message to a non-warning

GNATProve compiles the program multiple times. During the
first run the warnings are suppressed. These messages need
to be suppressed during that run in order to avoid having
them duplicated in the following runs. Revert the previous
changes as there currently is not a way to simply suppress
info messages.

gcc/ada/

* sem_res.adb (Resolve_Call): add warning insertion
character into the info message.

ada: Deep copy of an expression sometimes fails to copy entities

An entity can be defined within an expression (the best example is probably a
declare expression, but a quantified expression is another; there are others).
When making a deep copy of an expression, the Entity nodes for such entities
were sometimes not copied, apparently for performance reasons. This caused
correctness problems in some cases, so do not perform that "optimization".

gcc/ada/

* sem_util.adb
(New_Copy_Tree.Visit_Entity): Delete code that prevented copying some entities.

ada: Minor cleanups in generic formal matching

Minor rewording of a warning.
Disallow positional notation for <> (but disable this check),
and fix resulting errors.
Copy use clauses.

gcc/ada/

* sem_ch12.adb (Check_Fixed_Point_Actual): Minor rewording; it seems
more proper to say "operator" rather than "operation".
(Matching_Actual): Give an error for <> in positional notation.
This is a syntax error. Disable this for now.
(Analyze_Associations): Copy the use clause in all cases.
The "mustn't recopy" comment seems wrong, because New_Copy_Tree
preserves Slocs.
* libgnat/a-ticoau.ads: Fix violation of new postion-box error.
* libgnat/a-wtcoau.ads: Likewise.
* libgnat/a-ztcoau.ads: Likewise.

ada: Remove message about goto rewritten as a loop

This message provides only inner details of how the compiler
handles this kind of construct and does not provide meaningful
information that the user can interact on.

gcc/ada/

* par-labl.adb (Rewrite_As_Loop): Remove info message

ada: Remove warning insertion characters from info messages

Remove warning insertion characters without switch characters
from info messages.

gcc/ada/

* par-ch7.adb: Remove warning characters from info message
* par-endh.adb: Remove warning characters from info message
* sem_res.adb: Remove warning characters from info message

ada: Convert an info message to a continuation

The info message about the freeze point should be considered
a continuation of the error message about the change of visibility
after the freeze point. This improves the error layout for formatted
error messages with the -gnatdF switch.

gcc/ada/

* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): change the
info message to a continuation message.

ada: Simplify code in Cannot_Inline

gcc/ada/

* inline.adb (Cannot_Inline): Simplify string handling logic.

ada: List subprogram body entities in scopes

Add entities of kind E_Subprogram_Body to the list of entities associated
to a given scope. This ensures that representation information is
correctly output for object and type declarations inside these subprogram
bodies. This is useful for outputing that information fron the compiler
with the switch -gnatR, as well as for getting precise representation
information inside GNATprove.

Remove ad-hoc code inside repinfo.adb that retrieved this information
in only some cases.

gcc/ada/

* exp_ch5.adb (Expand_Iterator_Loop_Over_Container): Skip entities
of kind E_Subprogram_Body.
* repinfo.adb (List_Entities): Remove special case for subprogram
bodies.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper): List subprogram
body entities in the enclosing scope.

ada: Interfaces order disables class-wide prefix notation calls

When the first formal parameter of a subprogram is a class-wide
interface type (or an access to a class-wide interface type),
changing the order of the interface types implemented by a
type declaration T enables or disables the ability to use the
prefix notation to call it with objects of type T. When the
call is disabled the compiler rejects it reporting an error.

gcc/ada/

* sem_ch4.adb (Traverse_Interfaces): Add missing support
for climbing to parents of interface types.

ada: Fix Super attribute documentation

The GNAT-defined Super attribute was formerly disallowed for an object of a
derived tagged type having an abstract parent type. This rule has been relaxed;
an abstract parent type is now permitted as long as it is not an interface type.
Update the GNAT RM accordingly.

gcc/ada/

* doc/gnat_rm/implementation_defined_attributes.rst:
Update Super attribute documentation.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Fix expansion of protected subprogram bodies

System.Tasking.Protected_Objects.Lock can raise exceptions, but that
wasn't taken into account by the expansion of protected subprogram
bodies before this patch. More precisely, there were cases where
calls to System.Tasking.Initialization.Abort_Undefer were
incorrectly omitted. This patch fixes this.

gcc/ada/

* exp_ch7.adb (Build_Cleanup_Statements): Adapt to changes
made to Build_Protected_Subprogram_Call_Cleanup.
* exp_ch9.adb (Make_Unlock_Statement, Wrap_Unprotected_Call):
New functions.
(Build_Protected_Subprogram_Body): Fix resource management in
generated code.
(Build_Protected_Subprogram_Call_Cleanup): Make use of newly
introduced Make_Unlock_Statement.

ada: Fix oversight in latest finalization fix

The Defining_Identifier of a renaming may be a E_Constant in the context.

gcc/ada/

PR ada/114710
* exp_util.adb (Find_Renamed_Object): Recurse for any renaming.

ada: Check global mode restriction on encapsulating abstract states

We already checked that a global item of mode Output is not an Input of
the enclosing subprograms. With this change we also check that if this
global item is a constituent, then none of its encapsulating abstract
states is an Input of the enclosing subprograms.

gcc/ada/

* sem_prag.adb (Check_Mode_Restriction_In_Enclosing_Context):
Iterate over encapsulating abstract states.

ada: Streamline elaboration of local tagged types

This set of changes is aimed at streamlining the code generated for the
elaboration of local tagged types.  The dispatch tables and other related
data structures are built dynamically on the stack for them and a few of
the patterns used for this turn out to be problematic for the optimizer:

  1. the array of primitives in the dispatch table is default-initialized to
     null values by calling the initialization routine of an unconstrained
     array type, and then immediately assigned an aggregate made up of the
     same null values.

  2. the external tag is initialized by means of a dynamic concatenation
     involving the secondary stack, but all the elements have a fixed size.

  3. the _size primitive is saved in the TSD by means of the dereference of
     the address of the TSD that was previously saved in the dispatch table.

gcc/ada/

* Makefile.rtl (GNATRTL_NONTASKING_OBJS): Add s-imad32$(objext),
s-imad64$(objext) and s-imagea$(objext).
* exp_atag.ads (Build_Set_Size_Function): Replace Tag_Node parameter
with Typ parameter.
* exp_atag.adb: Add clauses for Sinfo.Utils.
(Build_Set_Size_Function): Retrieve the TSD object statically.
* exp_disp.adb: Add clauses for Ttypes.
(Make_DT): Call Address_Image{32,64] instead of Address_Image.
(Register_Primitive): Pass Tag_Typ to Build_Set_Size_Function.
* rtsfind.ads (RTU_Id): Remove System_Address_Image and add
System_Img_Address_{32;64}.
(RE_Id): Remove entry for RE_Address_Image and add entries for
RE_Address_Image{32,64}.
* rtsfind.adb (System_Descendant): Adjust to above changes.
* libgnat/a-tags.ads (Address_Array): Suppress initialization.
* libgnat/s-addima.adb (System.Address_Image): Call the appropriate
routine based on the address size.
* libgnat/s-imad32.ads: New file.
* libgnat/s-imad64.ads: Likewise.
* libgnat/s-imagea.ads: Likewise.
* libgnat/s-imagea.adb: Likewise.
* gcc-interface/Make-lang.in (GNAT_ADA_OBJS) [$(STAGE1)=False]: Add
ada/libgnat/s-imad32.o and ada/libgnat/s-imad64.o.

ada: Do not inline subprogram which could cause SPARK violation

Inlining in GNATprove a subprogram containing a constant declaration with
an address clause/aspect might lead to a spurious error if the address
expression is based on a constant view of a mutable object at call site.
Do not allow such inlining in GNATprove.

gcc/ada/

* inline.adb (Can_Be_Inlined_In_GNATprove_Mode): Do not inline
when constant with address clause is found.

ada: Fix incorrect String lower bound in gnatlink

This patch fixes code in gnatlink that incorrectly assumed that the
lower bound of a particular string was always 1.

gcc/ada/

* gnatlink.adb (Gnatlink): Fix incorrect lower bound assumption.
(Is_Prefix): New function.

ada: Reject too-strict alignment specifications.

In some cases the compiler incorrectly concludes that a package body is
required for a package specification that includes the implicit declaration
of one or more inherited subprograms for an explicitly declared derived type.
Spurious error messages (e.g., "cannot generate code for file") may result.

gcc/ada/

* sem_ch7.adb
(Requires_Completion_In_Body): Modify the Comes_From_Source test so that
the implicit declaration of an inherited subprogram does not cause
an incorrect result of True.

ada: Inline if -gnatn in CCG mode even if -O0

gcc/ada/

* exp_ch6.adb (Expand_Ctrl_Function_Call): Inline if -gnatn in
CCG mode even if -O0.

ada: Fix fallout of previous finalization change

Now that Is_Finalizable_Transient only looks at the renamings coming from
nontransient objects serviced by transient scopes, it must find the object
ultimately renamed by them through a chain of renamings.

gcc/ada/

PR ada/114710
* exp_util.adb (Find_Renamed_Object): Recurse if the renamed object
is itself a renaming.

ada: Missing support for 'Old with overloaded function

The compiler reports an error when the prefix of 'Old is
a call to an overloaded function that has no parameters.

gcc/ada/

* sem_attr.adb (Analyze_Attribute): Enhance support for
using 'Old with a prefix that references an overloaded
function that has no parameters; add missing support
for the use of 'Old within qualified expressions.

* sem_util.ads (Preanalyze_And_Resolve_Without_Errors):
New subprogram.

* sem_util.adb (Preanalyze_And_Resolve_Without_Errors):
New subprogram.

ada: Simplify checks for Address and Object_Size clauses

Where possible, we can use high-level wrapper routines instead of the
low-level Get_Attribute_Definition_Clause.

Code cleanup; semantics is unaffected.

gcc/ada/

* layout.adb (Layout_Type): Use high-level wrapper routine.
* sem_ch13.adb (Inherit_Delayed_Rep_Aspects): Likewise.
* sem_ch3.adb (Analyze_Object_Declaration): Likewise.

ada: Add support for symbolic backtraces with DLLs on Windows

This puts Windows on par with Linux as far as backtraces are concerned.

gcc/ada/

* libgnat/s-tsmona__linux.adb (Get): Move down descriptive comment.
* libgnat/s-tsmona__mingw.adb: Add with clause and use clause for
System.Storage_Elements.
(Get): Pass GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT in the call
to GetModuleHandleEx and remove the subsequent call to FreeLibrary.
Upon success, set Load_Addr to the base address of the module.
* libgnat/s-win32.ads (GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS): Use
shorter literal.
(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT): New constant.

ada: Fix too late finalization of temporary object

The problem is that Is_Finalizable_Transient returns false when a transient
object is subject to a renaming by another transient object present in the
same transient scope, thus forcing its finalization to be deferred to the
enclosing scope. That's not necessary, as only renamings by nontransient
objects serviced by transient scopes need to be rejected by the predicate.

The change also removes now dead code in the finalization machinery.

gcc/ada/

PR ada/114710
* exp_ch7.adb (Build_Finalizer.Process_Declarations): Remove dead
code dealing with renamings.
* exp_util.ads (Is_Finalizable_Transient): Rename Rel_Node to N.
* exp_util.adb (Is_Finalizable_Transient): Likewise.
(Is_Aliased): Remove obsolete code dealing wih EWA nodes and only
consider renamings present in N itself.
(Requires_Cleanup_Actions): Remove dead code dealing with renamings.

ada: Missing dynamic predicate checks

The compiler does not generate dynamic predicate checks when
they are enabled for one type declaration and ignored for
other type declarations defined in the same scope.

gcc/ada/

* sem_ch13.adb (Analyze_One_Aspect): Set the applicable policy
of a type declaration when its aspect Dynamic_Predicate is
analyzed.

* sem_prag.adb (Handle_Dynamic_Predicate_Check): New subprogram
that enables or ignores dynamic predicate checks depending on
whether dynamic checks are enabled in the context where the
associated type declaration is defined; used in the analysis
of pragma check. In addition, for pragma Predicate, do not
disable it when the aspect was internally build as part of
processing a dynamic predicate aspect.

libstdc++: Add ranges::range_common_reference_t for C++20 (LWG 3860)

LWG 3860 added this alias template. Both libc++ and MSVC treat this as a
DR for C++20, so this change does so too.

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (range_common_reference_t): New
alias template, as per LWG 3860.
* testsuite/std/ranges/range.cc: Check it.

libstdc++: Use __glibcxx_ranges_as_const to guard P2278R4 changes

The P2278R4 additions for C++23 are currently guarded by a check for
__cplusplus > 202002L but can use __glibcxx_ranges_as_const instead.

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (const_iterator_t): Change
preprocessor condition to use __glibcxx_ranges_as_const.
(const_sentinel_t, range_const_reference_t): Likewise.
(__access::__possibly_const_range, cbegin, cend, crbegin)
(crend, cdata): Likewise.
* include/bits/stl_iterator.h (iter_const_reference_t)
(basic_const_iterator, const_iterator, const_sentinel)
(make_const_iterator): Likewise.

libstdc++: Improve diagnostics for invalid std::hash specializations [PR115420]

When using a key type without a valid std::hash specialization the
unordered containers give confusing diagnostics about the default
constructor being deleted. Add a static_assert that will fail for
disabled std::hash specializations (and for a subset of custom hash
functions).

libstdc++-v3/ChangeLog:

PR libstdc++/115420
* include/bits/hashtable.h (_Hashtable): Add static_assert to
check that hash function is copy constructible.
* testsuite/23_containers/unordered_map/115420.cc: New test.

libstdc++: Fix unwanted #pragma messages from PSTL headers [PR113376]

When we rebased the PSTL on upstream, in r14-2109-g3162ca09dbdc2e, a
change to how _PSTL_USAGE_WARNINGS is set was missed out, but the change
to how it's tested was included. This means that the macro is always
defined, so testing it with #ifdef (instead of using #if to test its
value) doesn't work as intended.

Revert the test to use #if again, since that part of the upstream change
was unnecessary in the first place (the macro is always defined, so
there's no need to use #ifdef to avoid -Wundef warnings).

libstdc++-v3/ChangeLog:

PR libstdc++/113376
* include/pstl/pstl_config.h: Use #if instead of #ifdef to test
the _PSTL_USAGE_WARNINGS macro.

libstdc++: Optimize std::is_nothrow_invocable compilation performance

This patch optimizes the compilation performance of
std::is_nothrow_invocable by dispatching to the new
__is_nothrow_invocable built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_nothrow_invocable): Use
__is_nothrow_invocable built-in trait.
* testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc:
Handle the new error from __is_nothrow_invocable.
* testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc:
Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::is_invocable compilation performance

This patch optimizes the compilation performance of std::is_invocable
by dispatching to the new __is_invocable built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_invocable): Use __is_invocable
built-in trait.
* testsuite/20_util/is_invocable/incomplete_args_neg.cc: Handle
the new error from __is_invocable.
* testsuite/20_util/is_invocable/incomplete_neg.cc: Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::rank compilation performance

This patch optimizes the compilation performance of std::rank
by dispatching to the new __array_rank built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (rank): Use __array_rank built-in
trait.
(rank_v): Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::decay compilation performance

This patch optimizes the compilation performance of std::decay
by dispatching to the new __decay built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (decay): Use __decay built-in trait.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::remove_all_extents compilation performance

This patch optimizes the compilation performance of
std::remove_all_extents by dispatching to the new
__remove_all_extents built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_all_extents): Use
__remove_all_extents built-in trait.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::remove_extent compilation performance

This patch optimizes the compilation performance of std::remove_extent
by dispatching to the new __remove_extent built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_extent): Use __remove_extent
built-in trait.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::add_pointer compilation performance

This patch optimizes the compilation performance of std::add_pointer
by dispatching to the new __add_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_pointer): Use __add_pointer
built-in trait.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::is_unbounded_array compilation performance

This patch optimizes the compilation performance of
std::is_unbounded_array by dispatching to the new
__is_unbounded_array built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_unbounded_array_v): Use
__is_unbounded_array built-in trait.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::is_volatile compilation performance

This patch optimizes the compilation performance of std::is_volatile
by dispatching to the new __is_volatile built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_volatile): Use __is_volatile
built-in trait.
(is_volatile_v): Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Optimize std::is_const compilation performance

This patch optimizes the compilation performance of std::is_const
by dispatching to the new __is_const built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_const): Use __is_const built-in
trait.
(is_const_v): Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

aarch64: Fix invalid nested subregs [PR115464]

The testcase extracts one arm_neon.h vector from a pair (one subreg)
and then reinterprets the result as an SVE vector (another subreg).
Each subreg makes sense individually, but we can't fold them together
into a single subreg: it's 32 bytes -> 16 bytes -> 16*N bytes,
but the interpretation of 32 bytes -> 16*N bytes depends on
whether N==1 or N>1.

Since the second subreg makes sense individually, simplify_subreg
should bail out rather than ICE on it. simplify_gen_subreg will
then do the same (because it already checks validate_subreg).
This leaves simplify_gen_subreg returning null, requiring the
caller to take appropriate action.

I think this is relatively likely to occur elsewhere, so the patch
adds a helper for forcing a subreg, allowing a temporary pseudo to
be created where necessary.

I'll follow up by using force_subreg in more places. This patch
is intended to be a minimal backportable fix for the PR.

gcc/
PR target/115464
* simplify-rtx.cc (simplify_context::simplify_subreg): Don't try
to fold two subregs together if their relationship isn't known
at compile time.
* explow.h (force_subreg): Declare.
* explow.cc (force_subreg): New function.
* config/aarch64/aarch64-sve-builtins-base.cc
(svset_neonq_impl::expand): Use it instead of simplify_gen_subreg.

gcc/testsuite/
PR target/115464
* gcc.target/aarch64/sve/acle/general/pr115464.c: New test.

RISC-V: Bugfix vec_extract vls mode iterator restriction mismatch

We have vec_extract pattern which takes ZVFHMIN as the mode
iterator of the VLS mode.  Aka V_VLS.  But it will expand to
pred_extract_first pattern which takes the ZVFH as the mode
iterator of the VLS mode.  AKa V_VLSF.  The mismatch will
result in one ICE similar as below:

error: unrecognizable insn:
   27 | }
      | ^
(insn 19 18 20 2 (set (reg:HF 150 [ _13 ])
        (unspec:HF [
                (vec_select:HF (reg:V4HF 134 [ _1 ])
                    (parallel [
                            (const_int 0 [0])
                        ]))
                (reg:SI 67 vtype)
            ] UNSPEC_VPREDICATE)) "compress_run-2.c":24:5 -1
     (nil))
during RTL pass: vregs
compress_run-2.c:27:1: internal compiler error: in extract_insn, at
recog.cc:2812
0x1a627ef _fatal_insn(char const*, rtx_def const*, char const*, int,
char const*)
        ../../../gcc/gcc/rtl-error.cc:108
0x1a62834 _fatal_insn_not_found(rtx_def const*, char const*, int, char
const*)
        ../../../gcc/gcc/rtl-error.cc:116
0x1a0f356 extract_insn(rtx_insn*)
        ../../../gcc/gcc/recog.cc:2812
0x159ee61 instantiate_virtual_regs_in_insn
        ../../../gcc/gcc/function.cc:1612
0x15a04aa instantiate_virtual_regs
        ../../../gcc/gcc/function.cc:1995
0x15a058e execute
        ../../../gcc/gcc/function.cc:2042

This patch would like to fix this issue by align the mode
iterator restriction to ZVFH.

The below test suites are passed for this patch.
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc.

PR target/115456

gcc/ChangeLog:

* config/riscv/autovec.md: Take ZVFH mode iterator instead of
the ZVFHMIN for the alignment.
* config/riscv/vector-iterators.md: Add 2 new iterator
V_VLS_ZVFH and VLS_ZVFH.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr115456-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

[APX CCMP] Use ctestcc when comparing to const 0

For CTEST, we don't have conditional AND so there's no optimization
opportunity to write a new ctest pattern. Emit ctest when ccmp did
comparison to const 0 to save bytes.

gcc/ChangeLog:

* config/i386/i386.md (@ccmp<mode>): Add new alternative
<r>,C and adjust output templates. Also adjust UNSPEC mode
to CCmode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
* gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
compare with 0.

doc: Streamline requirements on the build compiler

No need to talk about potential implementation bugs in older versions
than what we require. And no need to talk about building GCC 3.3 and
earlier at this point.

gcc:
PR other/69374
* doc/install.texi (Prerequisites): Simplify note on the C++
compiler required. Drop requirements for versions of GCC prior
to 3.4. Fix grammar.

Improve code generation of strided SLP loads

This avoids falling back to elementwise accesses for strided SLP
loads when the group size is not a multiple of the vector element
size. Instead we can use a smaller vector or integer type for the load.

For stores we can do the same though restrictions on stores we handle
and the fact that store-merging covers up makes this mostly effective
for cost modeling which shows for gcc.target/i386/vect-strided-3.c
which we now vectorize with V4SI vectors rather than just V2SI ones.

For all of this there's still the opportunity to use non-uniform
accesses, say for a 6-element group with a VF of two do
V4SI, { V2SI, V2SI }, V4SI. But that's for a possible followup.

* tree-vect-stmts.cc (get_group_load_store_type): Consistently
use VMAT_STRIDED_SLP for strided SLP accesses and not
VMAT_ELEMENTWISE.
(vectorizable_store): Adjust VMAT_STRIDED_SLP handling to
allow not only half-size but also smaller accesses.
(vectorizable_load): Likewise.

* gcc.target/i386/vect-strided-1.c: New testcase.
* gcc.target/i386/vect-strided-2.c: Likewise.
* gcc.target/i386/vect-strided-3.c: Likewise.
* gcc.target/i386/vect-strided-4.c: Likewise.

tree-optimization/115385 - handle more gaps with peeling of a single iteration

The following makes peeling of a single scalar iteration handle more
gaps, including non-power-of-two cases. This can be done by rounding
up the remaining access to the next power-of-two which ensures that
the next scalar iteration will pick at least the number of excess
elements we access.

I've added a correctness testcase and one x86 specific scanning for
the optimization.

PR tree-optimization/115385
* tree-vect-stmts.cc (get_group_load_store_type): Peeling
of a single scalar iteration is sufficient if we can narrow
the access to the next power of two of the bits in the last
access.
(vectorizable_load): Ensure that the last access is narrowed.

* gcc.dg/vect/pr115385.c: New testcase.
* gcc.target/i386/vect-pr115385.c: Likewise.

tree-optimization/114107 - avoid peeling for gaps in more cases

The following refactors the code to detect necessary peeling for
gaps, in particular the PR103116 case when there is no gap but
the group size is smaller than the vector size.  The testcase in
PR114107 shows we fail to SLP

  for (int i=0; i<n; i++)
    for (int k=0; k<4; k++)
      data[4*i+k] *= factor[i];

because peeling one scalar iteration isn't enough to cover a gap
of 3 elements of factor[i].  But the code detecting this is placed
after the logic that detects cases we handle properly already as
we'd code generate { factor[i], 0., 0., 0. } for V4DFmode vectorization
already.  In fact the check to detect when peeling a single iteration
isn't enough seems improperly guarded as it should apply to all cases.

I'm not sure we correctly handle VMAT_CONTIGUOUS_REVERSE but I
checked that VMAT_STRIDED_SLP and VMAT_ELEMENTWISE correctly avoid
touching excess elements.

With this change we can use SLP for the above testcase and the
PR103116 testcases no longer require an epilogue on x86-64.  It
might be different on other targets so I made those testcases
runtime FAIL only instead of relying on dump scanning there's
currently no easy way to properly constrain.

PR tree-optimization/114107
PR tree-optimization/110445
* tree-vect-stmts.cc (get_group_load_store_type): Refactor
contiguous access case.  Make sure peeling for gap constraints
are always tested and consistently relax when we know we can
avoid touching excess elements during code generation.  But
rewrite the check poly-int aware.

* gcc.dg/vect/pr114107.c: New testcase.
* gcc.dg/vect/pr103116-1.c: Adjust.
* gcc.dg/vect/pr103116-2.c: Likewise.

Fix error message

gcc/cp/ChangeLog:

* parser.cc (cp_parser_asm_string_expression): Use correct error
message.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-asm-3.C: Adjust for new message.

Parse close paren even when constexpr extraction fails

To get better error recovery.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_asm_string_expression): Parse close
parent when constexpr extraction fails.

Remove const char * support for asm constexpr

asm constexpr now only accepts the same string types as C++26 assert,
e.g. string_view and string. Adjust test suite and documentation.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_asm_string_expression): Remove support
for const char * for asm constexpr.

gcc/ChangeLog:

* doc/extend.texi: Use std::string_view in asm constexpr
example.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-asm-1.C: Use std::std_string_view.
* g++.dg/cpp1z/constexpr-asm-3.C: Dito.

Fix ICE due to REGNO of a SUBREG.

Use reg_or_subregno instead.

gcc/ChangeLog:

PR target/115452
* config/i386/i386-features.cc (scalar_chain::convert_op): Use
reg_or_subregno instead of REGNO to avoid ICE.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115452.c: New test.

Test: Move target independent test cases to gcc.dg/torture

The test cases of pr115387 are target independent, at least x86
and riscv are able to reproduce. Thus, move these cases to
the gcc.dg/torture.

The below test suites are passed.
1. The rv64gcv fully regression test.
2. The x86 fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: Move to...
* gcc.dg/torture/pr115387-1.c: ...here.
* gcc.target/riscv/pr115387-2.c: Move to...
* gcc.dg/torture/pr115387-2.c: ...here.

Signed-off-by: Pan Li <pan2.li@intel.com>

rs6000: Fix pr66144-3.c test to accept multiple equivalent insns. [PR115262]

Jeff's commit r15-831-g05daf617ea22e1 changed the instruction we expected
for this test case into an equivalent instruction.  Modify the test case
so it will accept any of three instructions we could get depending on the
options used.

2024-06-12  Peter Bergner  <bergner@linux.ibm.com>

gcc/testsuite/
PR testsuite/115262
* gcc.target/powerpc/pr66144-3.c (dg-do): Compile for all targets.
(dg-options): Add -fno-unroll-loops and remove -mvsx.
(scan-assembler): Change from this...
(scan-assembler-times): ...to this.  Tweak regex to accept multiple
allowable instructions.

MIPS: Use FPU-enabled tune for mips32/mips64/mips64r2/mips64r3/mips64r5

Currently, the default tune value of mips32 is PROCESSOR_4KC, and
the default tune value of mips64/mips64r2/mips64r3/mips64r5 is
PROCESSOR_5KC. PROCESSOR_4KC and PROCESSOR_5KC are both FPU-less.

Let's use PROCESSOR_24KF1_1 for mips32, and PROCESSOR_5KF for mips64/
mips64r2/mips64r3/mips64r5.

We find this problem when we try to fix gcc.target/mips/movcc-3.c.

gcc:
* config/mips/mips-cpus.def: Use PROCESSOR_24KF1_1 for mips32;
Use PROCESSOR_5KF for mips64/mips64r2/mips64r3/mips64r5.

MIPS: Use signaling fcmp instructions for LT/LE/LTGT

LT/LE: c.lt.fmt/c.le.fmt on pre-R6 and cmp.lt.fmt/cmp.le.fmt have
different semantic:
   c.lt.fmt will signal for all NaN, including qNaN;
   cmp.lt.fmt will only signal sNaN, while not qNaN;
   cmp.slt.fmt has the same semantic as c.lt.fmt;
   lt/le of RTL will signaling qNaN.

while in `s<code>_<SCALARF:mode>_using_<FPCC:mode>`, RTL operation
`lt`/`le` are convert to c/cmp's lt/le, which is correct for C.cond.fmt,
while not for CMP.cond.fmt. Let's convert them to slt/sle if ISA_HAS_CCF.

For LTGT, which signals qNaN, `sne` of r6 has same semantic, while pre-R6
has only inverse one `ngl`.  Thus for RTL we have to use the `uneq` as the
operator, and introduce a new CC mode: CCEmode to mark it as signaling.

This patch can fix
   gcc.dg/torture/pr91323.c for pre-R6;
   gcc.dg/torture/builtin-iseqsig-* for R6.

gcc:
* config/mips/mips-modes.def: New CC_MODE CCE.
* config/mips/mips-protos.h(mips_output_compare): New function.
* config/mips/mips.cc(mips_allocate_fcc): Set CCEmode count=1.
(mips_emit_compare): Use CCEmode for LTGT/LT/LE for pre-R6.
(mips_output_compare): New function. Convert lt/le to slt/sle
for R6; convert ueq to ngl for CCEmode.
(mips_hard_regno_mode_ok_uncached): Mention CCEmode.
* config/mips/mips.h: Mention CCEmode for LOAD_EXTEND_OP.
* config/mips/mips.md(FPCC): Add CCE.
(define_mode_iterator MOVECC): Mention CCE.
(define_mode_attr reg): Add CCE with "z".
(define_mode_attr fpcmp): Add CCE with "c".
(define_code_attr fcond): ltgt should use sne instead of ne.
(s<code>_<SCALARF:mode>_using_<FPCC:mode>): call mips_output_compare.

[APX ZU] Support APX zero-upper

Enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc.

gcc/ChangeLog:

* config/i386/i386-opts.h (enum apx_features): Add apx_zu.
* config/i386/i386.h (TARGET_APX_ZU): Define.
* config/i386/i386.md (*imulhi<mode>zu): New define_insn.
(*setcc_<mode>_zu): Ditto.
* config/i386/i386.opt: Add enum value for zu.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-zu-1.c: New test.
* gcc.target/i386/apx-zu-2.c: New test.

Daily bump.

c++: visibility wrt concept-id as targ [PR115283]

Like with alias templates, it seems we don't maintain visibility flags
for concepts either, so min_vis_expr_r should ignore them for now.
Otherwise after r14-6789 we may incorrectly give a function template that
uses a concept-id in its signature internal linkage.

PR c++/115283

gcc/cp/ChangeLog:

* decl2.cc (min_vis_expr_r) <case TEMPLATE_DECL>: Ignore
concepts.

gcc/testsuite/ChangeLog:

* g++.dg/template/linkage5.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

[libstdc++] [testsuite] require cmath for c++23 cmath tests

Some c++23 tests fail on targets that don't satisfy dg-require-cmath,
because referenced math functions don't get declared in std. Add the
missing requirement.

for libstdc++-v3/ChangeLog

* testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc:
Require cmath.
* testsuite/26_numerics/headers/cmath/functions_std_c++23.cc:
Likewise.
* testsuite/26_numerics/headers/cmath/nextafter_c++23.cc:
Likewise.

[libstdc++] [testsuite] xfail double-prec from_chars for float128_t

Tests involving float128_t were xfailed or otherwise worked around for
vxworks on aarch64. The same issue came up on rtems. This patch
adjusts them similarly.

for libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/8.cc: Skip float128_t testing
on aarch64-rtems*.
* testsuite/20_util/to_chars/float128_c++23.cc: Xfail run on
aarch64-rtems*.

c++: repeated export using

A sample implementation of module std was breaking because the exports
included 'using std::operator&' twice.  Since Nathaniel's r15-964 for
PR114867, the first using added an extra instance of each function that was
revealed/exported by that using, resulting in duplicates for
lookup_maybe_add to dedup.  But if the duplicate is the first thing in the
list, lookup_add doesn't make an OVERLOAD, so trying to set OVL_USING_P
crashes.  Fixed by using ovl_make in the case where we want to set the flag.

gcc/cp/ChangeLog:

* tree.cc (lookup_maybe_add): Use ovl_make when setting OVL_USING_P.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-21_a.C: New test.

c++: module std and exception_ptr

exception_ptr.h contains

  namespace __exception_ptr
  {
    class exception_ptr;
  }
  using __exception_ptr::exception_ptr;

so when module std tries to 'export using std::exception_ptr', it names
another using-directive rather than the class directly, so __exception_ptr
is never explicitly opened in module purview.

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_binding_entity): Set
DECL_MODULE_PURVIEW_P instead of asserting.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-20_a.C: New test.

c++: fix testcase diagnostics

The r15-1180 adjustments to this testcase broke a couple of tests in C++26
mode.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/static_assert1.C: Fix diagnostic typos.

Whitespace cleanup for target-supports.exp

This patch removes trailing whitespace and replaces leading groups of 8-16
spaces with tabs.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Cleanup whitespace.

pretty_printer: unbreak build on aarch64 [PR115465]

I missed this target-specific usage of pretty_printer::buffer when
making the fields private in r15-1209-gc5e3be456888aa; sorry.

gcc/ChangeLog:
PR bootstrap/115465
* config/aarch64/aarch64-early-ra.cc (early_ra::process_block):
Update for fields of pretty_printer becoming private in
r15-1209-gc5e3be456888aa.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

RISC-V: Allow any temp register to be used in amo tests

We artifically restrict the temp registers to be a[0-9]+ when other
registers like t[0-9]+ are valid too. Update to make the regex
accept any register for the temp value.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Update temp register regex.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-store-3.c: Ditto.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Fix amoadd call arguments

Update __atomic_add_fetch arguments to be a pointer and value rather
than two pointers.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c: Update
__atomic_add_fetch args.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-zaamo-preferred-over-zalrsc.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c: Ditto.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Move amo tests into subfolder

There's a large number of atomic related testcases in the riscv folder.
Move them into a subfolder similar to what was done for rvv testcases.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c: ...here.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c: ...here.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c: ...here.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c: ...here.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c: ...here.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-1.c: ...here.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-2.c: ...here.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-3.c: ...here.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-4.c: ...here.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-5.c: ...here.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-6.c: ...here.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-7.c: ...here.
* gcc.target/riscv/amo-table-a-6-fence-1.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-fence-1.c: ...here.
* gcc.target/riscv/amo-table-a-6-fence-2.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-fence-2.c: ...here.
* gcc.target/riscv/amo-table-a-6-fence-3.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-fence-3.c: ...here.
* gcc.target/riscv/amo-table-a-6-fence-4.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-fence-4.c: ...here.
* gcc.target/riscv/amo-table-a-6-fence-5.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-fence-5.c: ...here.
* gcc.target/riscv/amo-table-a-6-load-1.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-load-1.c: ...here.
* gcc.target/riscv/amo-table-a-6-load-2.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: ...here.
* gcc.target/riscv/amo-table-a-6-load-3.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: ...here.
* gcc.target/riscv/amo-table-a-6-store-1.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: ...here.
* gcc.target/riscv/amo-table-a-6-store-2.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: ...here.
* gcc.target/riscv/amo-table-a-6-store-compat-3.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: ...here.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-1.c: ...here.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-2.c: ...here.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-3.c: ...here.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-4.c: ...here.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Move to...
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-5.c: ...here.
* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-amo-add-1.c: ...here.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-amo-add-2.c: ...here.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-amo-add-3.c: ...here.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-amo-add-4.c: ...here.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-amo-add-5.c: ...here.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-1.c: ...here.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-2.c: ...here.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-3.c: ...here.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-4.c: ...here.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-5.c: ...here.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-6.c: ...here.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-7.c: ...here.
* gcc.target/riscv/amo-table-ztso-fence-1.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-fence-1.c: ...here.
* gcc.target/riscv/amo-table-ztso-fence-2.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-fence-2.c: ...here.
* gcc.target/riscv/amo-table-ztso-fence-3.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-fence-3.c: ...here.
* gcc.target/riscv/amo-table-ztso-fence-4.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-fence-4.c: ...here.
* gcc.target/riscv/amo-table-ztso-fence-5.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-fence-5.c: ...here.
* gcc.target/riscv/amo-table-ztso-load-1.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-load-1.c: ...here.
* gcc.target/riscv/amo-table-ztso-load-2.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-load-2.c: ...here.
* gcc.target/riscv/amo-table-ztso-load-3.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-load-3.c: ...here.
* gcc.target/riscv/amo-table-ztso-store-1.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-store-1.c: ...here.
* gcc.target/riscv/amo-table-ztso-store-2.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-store-2.c: ...here.
* gcc.target/riscv/amo-table-ztso-store-3.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-store-3.c: ...here.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-1.c: ...here.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-2.c: ...here.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-3.c: ...here.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-4.c: ...here.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Move to...
* gcc.target/riscv/amo/amo-table-ztso-subword-amo-add-5.c: ...here.
* gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c: Move to...
* gcc.target/riscv/amo/amo-zaamo-preferred-over-zalrsc.c: ...here.
* gcc.target/riscv/amo-zalrsc-amo-add-1.c: Move to...
* gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c: ...here.
* gcc.target/riscv/amo-zalrsc-amo-add-2.c: Move to...
* gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c: ...here.
* gcc.target/riscv/amo-zalrsc-amo-add-3.c: Move to...
* gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c: ...here.
* gcc.target/riscv/amo-zalrsc-amo-add-4.c: Move to...
* gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c: ...here.
* gcc.target/riscv/amo-zalrsc-amo-add-5.c: Move to...
* gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c: ...here.
* gcc.target/riscv/inline-atomics-1.c: Move to...
* gcc.target/riscv/amo/inline-atomics-1.c: ...here.
* gcc.target/riscv/inline-atomics-2.c: Move to...
* gcc.target/riscv/amo/inline-atomics-2.c: ...here.
* gcc.target/riscv/inline-atomics-3.c: Move to...
* gcc.target/riscv/amo/inline-atomics-3.c: ...here.
* gcc.target/riscv/inline-atomics-4.c: Move to...
* gcc.target/riscv/amo/inline-atomics-4.c: ...here.
* gcc.target/riscv/inline-atomics-5.c: Move to...
* gcc.target/riscv/amo/inline-atomics-5.c: ...here.
* gcc.target/riscv/inline-atomics-6.c: Move to...
* gcc.target/riscv/amo/inline-atomics-6.c: ...here.
* gcc.target/riscv/inline-atomics-7.c: Move to...
* gcc.target/riscv/amo/inline-atomics-7.c: ...here.
* gcc.target/riscv/inline-atomics-8.c: Move to...
* gcc.target/riscv/amo/inline-atomics-8.c: ...here.
* gcc.target/riscv/pr114130.c: Move to...
* gcc.target/riscv/amo/pr114130.c: ...here.
* gcc.target/riscv/pr89835.c: Move to...
* gcc.target/riscv/amo/pr89835.c: ...here.
* gcc.target/riscv/amo/amo.exp: New file.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

aarch64: Use bitreverse rtl code instead of unspec [PR115176]

Bitreverse rtl code was added with r14-1586-g6160572f8d243c. So let's
use it instead of an unspec. This is just a small cleanup but it does
have one small fix with respect to rtx costs which didn't handle vector modes
correctly for the UNSPEC and now it does.
This is part of the first step in adding __builtin_bitreverse's builtins
but it is independent of it though.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR target/115176
* config/aarch64/aarch64-simd.md (aarch64_rbit<mode><vczle><vczbe>): Use
bitreverse instead of unspec.
* config/aarch64/aarch64-sve-builtins-base.cc (svrbit): Convert over to using
rtx_code_function instead of unspec_based_function.
* config/aarch64/aarch64-sve.md: Update comment where RBIT is included.
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle BITREVERSE like BSWAP.
Remove UNSPEC_RBIT support.
* config/aarch64/aarch64.md (unspec): Remove UNSPEC_RBIT.
(aarch64_rbit<mode>): Use bitreverse instead of unspec.
* config/aarch64/iterators.md (SVE_INT_UNARY): Add bitreverse.
(optab): Likewise.
(sve_int_op): Likewise.
(SVE_INT_UNARY): Remove UNSPEC_RBIT.
(optab): Likewise.
(sve_int_op): Likewise.
(min_elem_bits): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

match: Improve gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p for truncating casts [PR115449]

As mentioned by Jeff in r15-831-g05daf617ea22e1d818295ed2d037456937e23530, we don't handle
`(X | Y) & ~Y` -> `X & ~Y` on the gimple level when there are some different signed
(but same precision) types dealing with matching `~Y` with the `Y` part. This
improves both gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p to
be able to say `(truncate)a` and `(truncate)a` are bitwise_equal and
that `~(truncate)a` and `(truncate)a` are bitwise_invert_equal.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115449

gcc/ChangeLog:

* gimple-match-head.cc (gimple_maybe_truncate): New declaration.
(gimple_bitwise_equal_p): Match truncations that differ only
in types with the same precision.
(gimple_bitwise_inverted_equal_p): For matching after bit_not_with_nop
call gimple_bitwise_equal_p.
* match.pd (maybe_truncate): New match pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-10.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Move cexpr_stree tree string build into utility function

No semantics changes.

gcc/cp/ChangeLog:

* cp-tree.h (extract): Add new overload to return tree.
* parser.cc (cp_parser_asm_string_expression): Use tree extract.
* semantics.cc (cexpr_str::extract): Add new overload to return
tree.

libstdc++: Fix std::tr2::dynamic_bitset shift operations [PR115399]

The shift operations for dynamic_bitset fail to zero out words where the
non-zero bits were shifted to a completely different word.

For a right shift we don't need to sanitize the unused bits in the high
word, because we know they were already clear and a right shift doesn't
change that.

libstdc++-v3/ChangeLog:

PR libstdc++/115399
* include/tr2/dynamic_bitset (operator>>=): Remove redundant
call to _M_do_sanitize.
* include/tr2/dynamic_bitset.tcc (_M_do_left_shift): Zero out
low bits in words that should no longer be populated.
(_M_do_right_shift): Likewise for high bits.
* testsuite/tr2/dynamic_bitset/pr115399.cc: New test.

libstdc++: Do not use memset in _Hashtable::clear()

Using memset is incorrect if the __bucket_ptr type is non-trivial, or
does not use an all-zero bit pattern for its null value.

Replace the three uses of memset with std::fill_n to set the pointers to
nullptr.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::clear): Do not use
memset to zero out bucket pointers.
(_Hashtable::_M_assign_elements): Likewise.

middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]

At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'.  A simple example of such
loop is given below:

void foo(double * restrict a, double * restrict b, int n){
  int i;
  for(i=0; i<n; ++i){
    a[i] = a[i] + b[i];
    __builtin_prefetch(&(b[i+8]));
  }
}

The failure stems from two issues:

1. Given that it is typically not possible to fully reason about a
   function call due to the possibility of side effects, the
   autovectorizer does not attempt to vectorize loops which make such
   calls.

   Given the memory reference passed to `__builtin_prefetch', in the
   absence of assurances about its effect on the passed memory
   location the compiler deems the function unsafe to vectorize,
   marking it as clobbering memory in `vect_find_stmt_data_reference'.
   This leads to the failure in autovectorization.

2. Notwithstanding the above issue, though the prefetch statement
   would be classed as `vect_unused_in_scope', the loop invariant that
   is used in the address of the prefetch is the scalar loop's and not
   the vector loop's IV. That is, it still uses `i' and not `vec_iv'
   because the instruction wasn't vectorized, causing DCE to think the
   value is live, such that we now have both the vector and scalar loop
   invariant actively used in the loop.

This patch addresses both of these:

1. About the issue regarding the memory clobber, data prefetch does
   not generate faults if its address argument is invalid and does not
   write to memory.  Therefore, it does not alter the internal state
   of the program or its control flow under any circumstance.  As
   such, it is reasonable that the function be marked as not affecting
   memory contents.

   To achieve this, we add the necessary logic to
   `get_references_in_stmt' to ensure that builtin functions are given
   given the same treatment as internal functions.  If the gimple call
   is to a builtin function and its function code is
   `BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.

2. Finding precedence in the way clobber statements are handled,
   whereby the vectorizer drops these from both the scalar and
   vectorized versions of a given loop, we choose to drop prefetch
   hints in a similar fashion.  This seems appropriate given how
   software prefetch hints are typically ignored by processors across
   architectures, as they seldom lead to performance gain over their
   hardware counterparts.

gcc/ChangeLog:

PR tree-optimization/114061
* tree-data-ref.cc (get_references_in_stmt): set
`clobbers_memory' to false for __builtin_prefetch.
* tree-vect-loop.cc (vect_transform_loop): Drop all
__builtin_prefetch calls from loops.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-prefetch-drop.c: New test.
* gcc.target/aarch64/vect-prefetch-drop.c: Likewise.

pretty_printer: convert chunk_info into a class

No functional change intended.

gcc/cp/ChangeLog:
* error.cc (append_formatted_chunk): Move part of body into
chunk_info::append_formatted_chunk.

gcc/ChangeLog:
* dumpfile.cc (dump_pretty_printer::emit_items): Update for
changes to chunk_info.
* pretty-print.cc (chunk_info::append_formatted_chunk): New, based
on code in cp/error.cc's append_formatted_chunk.
(chunk_info::pop_from_output_buffer): New, based on code in
pp_output_formatted_text and dump_pretty_printer::emit_items.
(on_begin_quote): Convert to...
(chunk_info::on_begin_quote): ...this.
(on_end_quote): Convert to...
(chunk_info::on_end_quote): ...this.
(pretty_printer::format): Update for chunk_info becoming a class
and its fields gaining "m_" prefixes.  Update for on_begin_quote
and on_end_quote moving to chunk_info.
(quoting_info::handle_phase_3): Update for changes to chunk_info.
(pp_output_formatted_text): Likewise.  Move cleanup code to
chunk_info::pop_from_output_buffer.
* pretty-print.h (class output_buffer): New forward decl.
(class urlifier): New forward decl.
(struct chunk_info): Convert to...
(class chunk_info): ...this.  Add friend class pretty_printer.
(chunk_info::get_args): New accessor.
(chunk_info::get_quoting_info): New accessor.
(chunk_info::append_formatted_chunk): New decl.
(chunk_info::pop_from_output_buffer): New decl.
(chunk_info::on_begin_quote): New decl.
(chunk_info::on_end_quote): New decl.
(chunk_info::prev): Rename to...
(chunk_info::m_prev): ...this.
(chunk_info::args): Rename to...
(chunk_info::m_args): ...this.
(output_buffer::cur_chunk_array): Drop "struct" from decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>