Richard Biener [Fri, 14 Jun 2024 05:54:15 +0000 (07:54 +0200)]
Fix fallout of peeling for gap improvements
The following hopefully addresses an observed bootstrap issue on aarch64
where maybe-uninit diagnostics occur. It also fixes bogus napkin math
from myself when I was confusing rounded up size of a single access
with rounded up size of the group accessed in a single scalar iteration.
So the following puts in a correctness check, leaving a set of peeling
for gaps as insufficient. This could be rectified by splitting the
last load into multiple ones but I'm leaving this for a followup, better
quickly fix the reported wrong-code.
* tree-vect-stmts.cc (get_group_load_store_type): Do not
re-use poly-int remain but re-compute with non-poly values.
Verify the shortened load is good enough to be covered with
a single scalar gap iteration before accepting it.
* gcc.dg/vect/pr115385.c: Enable AVX2 if available.
* config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for
pternlog_operand under AVX512, also adjust VEC_DUPLICATE
according since vec_dup:mem can't be that cheap.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-pr98461.c: Scan either notl or
vpternlog.
* gcc.target/i386/avx512f-pr96891-3.c: Also scan for inversed
condition.
* gcc.target/i386/avx512f-vpternlogd-3.c: Adjust vpternlog
number to 673.
* gcc.target/i386/avx512f-vpternlogd-4.c: Ditto.
* gcc.target/i386/avx512f-vpternlogd-5.c: Ditto.
* gcc.target/i386/sse2-v1ti-vne.c: Add -mno-avx512f.
liuhongt [Mon, 3 Jun 2024 02:38:19 +0000 (10:38 +0800)]
Remove one_if_conv for latest Intel processors.
The tune is added by PR79390 for SciMark2 on Broadwell.
For latest GCC, with and without the -mtune-ctrl=^one_if_conv_insn.
GCC will generate the same binary for SciMark2. And for SPEC2017,
there's no big impact for SKX/CLX/ICX, and small improvements on SPR
and later.
Roger Sayle [Fri, 14 Jun 2024 05:29:27 +0000 (06:29 +0100)]
i386: More use of m{32,64}bcst addressing modes with ternlog.
This patch makes more use of m32bcst and m64bcst addressing modes in
ix86_expand_ternlog. Previously, the i386 backend would only consider
using a m32bcst if the inner mode of the vector was 32-bits, or using
m64bcst if the inner mode was 64-bits. For ternlog (and other logic
operations) this is a strange restriction, as how the same constant
is materialized is dependent upon the mode it is used/operated on.
Hence, the V16QI constant {2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2} wouldn't
use m??bcst, but (V4SI){0x02020202,0x02020202,0x02020202,0x02020202}
which has the same bit pattern would. This can optimized by (re)checking
whether a CONST_VECTOR can be broadcast from memory after casting it
to VxSI (or for m64bst to VxDI) where x has the appropriate vector size.
Taking the test case from pr115407:
__attribute__((__vector_size__(64))) char v;
void foo() {
v = v | v << 7;
}
Compiled with -O2 -mcmodel=large -mavx512bw
GCC 14 generates a 64-byte (512-bit) load from the constant pool:
2024-06-14 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_ternlog): Try performing
logic operation in a different vector mode if that enables use of
a 32-bit or 64-bit broadcast addressing mode.
gcc/testsuite/ChangeLog
* gcc.target/i386/pr115407.c: New test case.
Andrew Pinski [Thu, 13 Jun 2024 20:07:10 +0000 (13:07 -0700)]
expand: constify sepops operand to expand_expr_real_2 and expand_widen_pattern_expr [PR113212]
While working on an expand patch back in January I noticed that
the first argument (of sepops type) of expand_expr_real_2 could be
constified as it was not to be touched by the function (nor should it be).
There is code in internal-fn.cc that depends on expand_expr_real_2 not touching
the ops argument so constification makes this more obvious.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR middle-end/113212
* expr.h (const_seqpops): New typedef.
(expand_expr_real_2): Constify the first argument.
* optabs.cc (expand_widen_pattern_expr): Likewise.
* optabs.h (expand_widen_pattern_expr): Likewise.
* expr.cc (expand_expr_real_2): Likewise
(do_store_flag): Likewise. Remove incorrect store to ops->code.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Patrick O'Neill [Thu, 13 Jun 2024 00:10:13 +0000 (17:10 -0700)]
RISC-V: Add support for subword atomic loads/stores
Andrea Parri recently pointed out that we were emitting overly conservative
fences for seq_cst atomic loads/stores. This adds support for the optimized
fences specified in the PSABI:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/2092568f7896ceaa1ec0f02569b19eaa42cd51c9/riscv-atomic.adoc
gcc/ChangeLog:
* config/riscv/sync-rvwmo.md: Add support for subword fenced
loads/stores.
* config/riscv/sync-ztso.md: Ditto.
* config/riscv/sync.md: Ditto.
Alexandre Oliva [Thu, 13 Jun 2024 23:10:55 +0000 (20:10 -0300)]
[libstdc++] [testsuite] require cmath for [PR114359]
When !_GLIBCXX_USE_C99_MATH_TR1, binomial_distribution doesn't use the
optimized algorithm that was fixed in response to PR114359. Without
that optimized algorithm, operator() ends up looping very very long
for the test, to the point that it would time out by several orders of
magnitude, without even exercising the optimized algorithm that we're
testing for regressions. Arrange for the test to be skipped if that
bit won't be exercised.
Joseph Myers [Thu, 13 Jun 2024 22:41:02 +0000 (22:41 +0000)]
c: Implement C2Y complex increment/decrement support
Support for complex increment and decrement (previously supported as
an extension) was voted into C2Y today (paper N3259). Thus, change
the pedwarn to a pedwarn_c23 and add associated tests.
Note: the type of the 1 to be added / subtracted is underspecified (to
be addressed in a subsequent paper), but understood to be intended to
be a real type (so the sign of a zero imaginary part is never changed)
and this is what is implemented; the tests added include verifying
that there is no undesired change to the sign of a zero imaginary
part.
Bootstrapped with no regressions on x86_64-pc-linux-gnu.
gcc/c/
* c-typeck.cc (build_unary_op): Use pedwarn_c23 for complex
increment and decrement.
gcc/testsuite/
* gcc.dg/c23-complex-1.c, gcc.dg/c23-complex-2.c,
gcc.dg/c23-complex-3.c, gcc.dg/c23-complex-4.c,
gcc.dg/c2y-complex-1.c, gcc.dg/c2y-complex-2.c: New tests.
Jason Merrill [Wed, 12 Jun 2024 22:24:35 +0000 (18:24 -0400)]
c++/modules: export using across namespace [PR114683]
Currently we represent a non-function using-declaration by inserting the
named declaration into the target scope. In general this works fine, but in
the case of an exported using-declaration we have nowhere to mark the
using-declaration as exported, so we mark the original declaration as
exported instead, and then treat all using-declarations that name it as
exported as well. We were doing this only if there was also a previous
non-exported using, so for this testcase the export got lost; this patch
broadens the workaround to also apply to the using that first brings the
declaration into the current scope.
This does not fully resolve 114683, but replaces a missing exports bug with
an extra exports bug, which should be a significant usability improvement.
The testcase has xfails for extra exports.
I imagine a complete fix should involve inserting a USING_DECL.
PR c++/114683
gcc/cp/ChangeLog:
* name-lookup.cc (do_nonmember_using_decl): Allow exporting
a newly inserted decl.
gcc/testsuite/ChangeLog:
* g++.dg/modules/using-22_a.C: New test.
* g++.dg/modules/using-22_b.C: New test.
Jason Merrill [Thu, 13 Jun 2024 01:44:10 +0000 (21:44 -0400)]
c++/modules: multiple usings of the same decl [PR115194]
add_binding_entity creates an OVERLOAD to represent a using-declaration in
module purview of a declaration in the global module, even for
non-functions, and we were failing to merge that with the original
declaration in name lookup.
It's not clear to me that building the OVERLOAD is what should be happening,
but let's work around it for now pending an overhaul of using-decl handling
for c++/114683.
PR c++/115194
gcc/cp/ChangeLog:
* name-lookup.cc (name_lookup::process_module_binding): Strip an
OVERLOAD from a non-function.
gcc/testsuite/ChangeLog:
* g++.dg/modules/using-23_a.C: New test.
* g++.dg/modules/using-23_b.C: New test.
Patrick Palka [Thu, 13 Jun 2024 14:16:10 +0000 (10:16 -0400)]
c++: undeclared identifier in requires-clause [PR99678]
Since the terms of a requires-clause are grammatically primary-expressions
and not e.g. postfix-expressions, it seems we need to explicitly handle
and diagnose the case where a term parses to a bare unresolved identifier,
like cp_parser_postfix_expression does, since cp_parser_primary_expression
leaves that up to its callers. Otherwise we incorrectly accept the first
three requires-clauses below.
Note that the only occurrences of primary-expression in the grammar are
postfix-expression and constraint-logical-and-expression, so it's not too
surprising that we need this special handling here.
PR c++/99678
gcc/cp/ChangeLog:
* parser.cc (cp_parser_constraint_primary_expression): Diagnose
a bare unresolved unqualified-id.
Hongyu Wang [Wed, 12 Jun 2024 16:18:32 +0000 (00:18 +0800)]
[APX CCMP] Add targetm.have_ccmp hook [PR115370]
In cfgexpand, there is an optimization for branch which tests
targetm.gen_ccmp_first == NULL. However for target like x86-64, the
hook was implemented but it does not indicate that ccmp was enabled.
Add a new target hook TARGET_HAVE_CCMP and replace the middle-end
check for the existance of gen_ccmp_first to avoid misoptimization.
gcc/ChangeLog:
PR target/115370
PR target/115463
* target.def (have_ccmp): New target hook.
* targhooks.cc (default_have_ccmp): New function.
* targhooks.h (default_have_ccmp): New prototype.
* doc/tm.texi.in: Add TARGET_HAVE_CCMP.
* doc/tm.texi: Regenerate.
* cfgexpand.cc (expand_gimple_cond): Call targetm.have_ccmp
instead of checking if targetm.gen_ccmp_first exists.
* expr.cc (expand_expr_real_gassign): Likewise.
* config/i386/i386.cc (ix86_have_ccmp): New target hook to
check if APX_CCMP enabled.
(TARGET_HAVE_CCMP): Define.
Patrick Palka [Thu, 13 Jun 2024 14:02:43 +0000 (10:02 -0400)]
c++: ICE w/ ambig and non-strictly-viable cands [PR115239]
Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522. These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.
We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate. To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).
PR c++/115239
gcc/cp/ChangeLog:
* call.cc (tourney): Don't consider a non-strictly viable
candidate as the champ if there was ambiguity between two
strictly viable candidates.
This patch optimizes the compilation performance of std::is_pointer
by dispatching to the new __is_pointer built-in trait.
libstdc++-v3/ChangeLog:
* include/bits/cpp_type_traits.h (__is_pointer): Use
__is_pointer built-in trait.
* include/std/type_traits (is_pointer): Likewise. Optimize its
implementation.
(is_pointer_v): Likewise.
Co-authored-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Steve Baird [Fri, 3 May 2024 23:50:00 +0000 (16:50 -0700)]
ada: Compiler goes into loop
In some cases that are difficult to characterize, the compiler fails an
assertion check (if the compiler is built with assertions enabled) or
loops forever (if assertions are not enabled). One way this can happen is if
Exp_Util.Insert_Actions is called with an N_Itype_Reference node as its first
parameter. This, in turn, can happen when an instance of
Exp_Attr.Expand_N_Attribute_Reference.Built_And_Insert_Type_Attr_Subp
calls Insert_Action (which will call Insert_Actions).
gcc/ada/
* exp_util.adb
(Insert_Actions): Code was relying on an incorrect assumption that an
N_Itype_Reference cannot occur in declaration list or a statement
list. Fix the code to handle this case.
Using -gnatdJ with various other switches was error prone.
Remove this switch since the primary users of this mode
GNATCheck and Codepeer no longer need it.
gcc/ada/
* debug.adb: Remove mentions of -gnatdJ.
* errout.adb: Remove printing subprogram names to JSON.
* erroutc.adb: Remove printing subprogram names in messages.
* erroutc.ads: Remove Node and Subprogram_Name_Ptr used for -gnatdJ.
* errutil.adb: Remove Node used for -gnatdJ
* gnat1drv.adb: Remove references of -gnatdJ and
Include_Subprgram_In_Messages.
* opt.ads: Remove Include_Subprgram_In_Messages
* par-util.adb: Remove behavior related to
Include_Subprgram_In_Messages.
* sem_util.adb: Remove Subprogram_Name used for -gnatdJ
Eric Botcazou [Fri, 3 May 2024 23:30:40 +0000 (01:30 +0200)]
ada: Fix segmentation fault on slice of array with Unbounded_String component
This fixes a regression introduced by the overhaul of the implementation
of finalization. When the first subtype of an array type is declared as
constrained, the Finalize_Address primitive of the base type synthesized
by the compiler is tailored to this first subtype, which means that this
primitive cannot be used for other subtypes of the array type, which may
for example be generated when an aggregate is assigned to a slice of an
object of the first subtype.
The straightforward solution would be to synthesize the Finalize_Address
primitive for the base type instead, but its clean implementation would
require changing the way allocators are implemented to always allocate
the bounds alongside the data, which may turn out to be delicate.
This instead changes the compiler to synthesize a local Finalize_Address
primitive in the problematic cases, which should be rare in practice, and
also contains a fixlet for Find_Last_Init, which fails to get to the base
type again in the indirect case and, therefore, mishandles array subtypes.
gcc/ada/
* exp_ch7.adb (Attach_Object_To_Master_Node): Fix formatting.
(Build_Finalizer.Process_Object_Declaration): Synthesize a local
Finalize_Address primitive if the object's subtype is an array
that has a constrained first subtype and is not this first subtype.
* exp_util.adb (Find_Last_Init): Get again to the base type in the
indirect case.
Yannick Moy [Fri, 3 May 2024 13:02:39 +0000 (15:02 +0200)]
ada: Fix test for giving hint on ambiguous aggregate
In the case the type of an aggregate cannot be determined due to
an ambiguity, caused by the existence of container aggregates,
a hint can be given by GNAT. The test for giving this hint should
be the Ada language version, not the fact that extensions are allowed.
Now fixed.
Javier Miranda [Fri, 3 May 2024 17:52:20 +0000 (17:52 +0000)]
ada: Missing postcondition runtime check in inherited primitive
When a derived tagged type implements more interface interface types
than its parent type, and a primitive inherited from its parent type
covers a primitive of these additional interface types that has
classwide postconditions, the code generated by the compiler does not
check the classwide postconditions inherited from the interface primitive.
gcc/ada/
* freeze.ads (Check_Condition_Entities): Complete documentation.
* freeze.adb (Check_Inherited_Conditions): Extend its functionality to
build two kind of wrappers: the existing LSP wrappers, and wrappers
required to handle postconditions of interface primitives implemented
by inherited primitives.
(Build_Inherited_Condition_Pragmas): Rename formal.
(Freeze_Record_Type): For derived tagged types, move call to
Check_Inherited_Conditions to subprogram Freeze_Entity_Checks;
done to improve the performance of Check_Inherited_Conditions since it
can rely on the internal entities that link interface primitives with
tagged type primitives that implement them.
(Check_Interface_Primitives_Strub_Mode): New subprogram.
* sem_ch13.adb (Freeze_Entity_Checks): Call Check_Inherited_Conditions.
Call Check_Inherited_Conditions with derived interface types to check
strub mode compatibility of their primitives.
* sem_disp.adb (Check_Dispatching_Operation): Adjust assertion to accept
wrappers of interface primitives that have classwide postconditions.
* exp_disp.adb (Write_DT): Adding text to identify wrappers.
Viljar Indus [Thu, 2 May 2024 18:04:28 +0000 (21:04 +0300)]
ada: Revert changing a GNATProve mode message to a non-warning
GNATProve compiles the program multiple times. During the
first run the warnings are suppressed. These messages need
to be suppressed during that run in order to avoid having
them duplicated in the following runs. Revert the previous
changes as there currently is not a way to simply suppress
info messages.
gcc/ada/
* sem_res.adb (Resolve_Call): add warning insertion
character into the info message.
Steve Baird [Tue, 30 Apr 2024 20:44:25 +0000 (13:44 -0700)]
ada: Deep copy of an expression sometimes fails to copy entities
An entity can be defined within an expression (the best example is probably a
declare expression, but a quantified expression is another; there are others).
When making a deep copy of an expression, the Entity nodes for such entities
were sometimes not copied, apparently for performance reasons. This caused
correctness problems in some cases, so do not perform that "optimization".
gcc/ada/
* sem_util.adb
(New_Copy_Tree.Visit_Entity): Delete code that prevented copying some entities.
Bob Duff [Mon, 29 Apr 2024 12:35:43 +0000 (08:35 -0400)]
ada: Minor cleanups in generic formal matching
Minor rewording of a warning.
Disallow positional notation for <> (but disable this check),
and fix resulting errors.
Copy use clauses.
gcc/ada/
* sem_ch12.adb (Check_Fixed_Point_Actual): Minor rewording; it seems
more proper to say "operator" rather than "operation".
(Matching_Actual): Give an error for <> in positional notation.
This is a syntax error. Disable this for now.
(Analyze_Associations): Copy the use clause in all cases.
The "mustn't recopy" comment seems wrong, because New_Copy_Tree
preserves Slocs.
* libgnat/a-ticoau.ads: Fix violation of new postion-box error.
* libgnat/a-wtcoau.ads: Likewise.
* libgnat/a-ztcoau.ads: Likewise.
ada: Remove message about goto rewritten as a loop
This message provides only inner details of how the compiler
handles this kind of construct and does not provide meaningful
information that the user can interact on.
gcc/ada/
* par-labl.adb (Rewrite_As_Loop): Remove info message
ada: Remove warning insertion characters from info messages
Remove warning insertion characters without switch characters
from info messages.
gcc/ada/
* par-ch7.adb: Remove warning characters from info message
* par-endh.adb: Remove warning characters from info message
* sem_res.adb: Remove warning characters from info message
The info message about the freeze point should be considered
a continuation of the error message about the change of visibility
after the freeze point. This improves the error layout for formatted
error messages with the -gnatdF switch.
gcc/ada/
* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): change the
info message to a continuation message.
Add entities of kind E_Subprogram_Body to the list of entities associated
to a given scope. This ensures that representation information is
correctly output for object and type declarations inside these subprogram
bodies. This is useful for outputing that information fron the compiler
with the switch -gnatR, as well as for getting precise representation
information inside GNATprove.
Remove ad-hoc code inside repinfo.adb that retrieved this information
in only some cases.
gcc/ada/
* exp_ch5.adb (Expand_Iterator_Loop_Over_Container): Skip entities
of kind E_Subprogram_Body.
* repinfo.adb (List_Entities): Remove special case for subprogram
bodies.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper): List subprogram
body entities in the enclosing scope.
Javier Miranda [Fri, 26 Apr 2024 18:22:19 +0000 (18:22 +0000)]
ada: Interfaces order disables class-wide prefix notation calls
When the first formal parameter of a subprogram is a class-wide
interface type (or an access to a class-wide interface type),
changing the order of the interface types implemented by a
type declaration T enables or disables the ability to use the
prefix notation to call it with objects of type T. When the
call is disabled the compiler rejects it reporting an error.
gcc/ada/
* sem_ch4.adb (Traverse_Interfaces): Add missing support
for climbing to parents of interface types.
Steve Baird [Fri, 19 Apr 2024 23:08:39 +0000 (16:08 -0700)]
ada: Fix Super attribute documentation
The GNAT-defined Super attribute was formerly disallowed for an object of a
derived tagged type having an abstract parent type. This rule has been relaxed;
an abstract parent type is now permitted as long as it is not an interface type.
Update the GNAT RM accordingly.
System.Tasking.Protected_Objects.Lock can raise exceptions, but that
wasn't taken into account by the expansion of protected subprogram
bodies before this patch. More precisely, there were cases where
calls to System.Tasking.Initialization.Abort_Undefer were
incorrectly omitted. This patch fixes this.
gcc/ada/
* exp_ch7.adb (Build_Cleanup_Statements): Adapt to changes
made to Build_Protected_Subprogram_Call_Cleanup.
* exp_ch9.adb (Make_Unlock_Statement, Wrap_Unprotected_Call):
New functions.
(Build_Protected_Subprogram_Body): Fix resource management in
generated code.
(Build_Protected_Subprogram_Call_Cleanup): Make use of newly
introduced Make_Unlock_Statement.
Piotr Trojanek [Tue, 9 Apr 2024 12:39:25 +0000 (14:39 +0200)]
ada: Check global mode restriction on encapsulating abstract states
We already checked that a global item of mode Output is not an Input of
the enclosing subprograms. With this change we also check that if this
global item is a constituent, then none of its encapsulating abstract
states is an Input of the enclosing subprograms.
gcc/ada/
* sem_prag.adb (Check_Mode_Restriction_In_Enclosing_Context):
Iterate over encapsulating abstract states.
Eric Botcazou [Wed, 24 Apr 2024 15:13:34 +0000 (17:13 +0200)]
ada: Streamline elaboration of local tagged types
This set of changes is aimed at streamlining the code generated for the
elaboration of local tagged types. The dispatch tables and other related
data structures are built dynamically on the stack for them and a few of
the patterns used for this turn out to be problematic for the optimizer:
1. the array of primitives in the dispatch table is default-initialized to
null values by calling the initialization routine of an unconstrained
array type, and then immediately assigned an aggregate made up of the
same null values.
2. the external tag is initialized by means of a dynamic concatenation
involving the secondary stack, but all the elements have a fixed size.
3. the _size primitive is saved in the TSD by means of the dereference of
the address of the TSD that was previously saved in the dispatch table.
gcc/ada/
* Makefile.rtl (GNATRTL_NONTASKING_OBJS): Add s-imad32$(objext),
s-imad64$(objext) and s-imagea$(objext).
* exp_atag.ads (Build_Set_Size_Function): Replace Tag_Node parameter
with Typ parameter.
* exp_atag.adb: Add clauses for Sinfo.Utils.
(Build_Set_Size_Function): Retrieve the TSD object statically.
* exp_disp.adb: Add clauses for Ttypes.
(Make_DT): Call Address_Image{32,64] instead of Address_Image.
(Register_Primitive): Pass Tag_Typ to Build_Set_Size_Function.
* rtsfind.ads (RTU_Id): Remove System_Address_Image and add
System_Img_Address_{32;64}.
(RE_Id): Remove entry for RE_Address_Image and add entries for
RE_Address_Image{32,64}.
* rtsfind.adb (System_Descendant): Adjust to above changes.
* libgnat/a-tags.ads (Address_Array): Suppress initialization.
* libgnat/s-addima.adb (System.Address_Image): Call the appropriate
routine based on the address size.
* libgnat/s-imad32.ads: New file.
* libgnat/s-imad64.ads: Likewise.
* libgnat/s-imagea.ads: Likewise.
* libgnat/s-imagea.adb: Likewise.
* gcc-interface/Make-lang.in (GNAT_ADA_OBJS) [$(STAGE1)=False]: Add
ada/libgnat/s-imad32.o and ada/libgnat/s-imad64.o.
ada: Do not inline subprogram which could cause SPARK violation
Inlining in GNATprove a subprogram containing a constant declaration with
an address clause/aspect might lead to a spurious error if the address
expression is based on a constant view of a mutable object at call site.
Do not allow such inlining in GNATprove.
gcc/ada/
* inline.adb (Can_Be_Inlined_In_GNATprove_Mode): Do not inline
when constant with address clause is found.
Steve Baird [Wed, 24 Apr 2024 02:10:34 +0000 (19:10 -0700)]
ada: Reject too-strict alignment specifications.
In some cases the compiler incorrectly concludes that a package body is
required for a package specification that includes the implicit declaration
of one or more inherited subprograms for an explicitly declared derived type.
Spurious error messages (e.g., "cannot generate code for file") may result.
gcc/ada/
* sem_ch7.adb
(Requires_Completion_In_Body): Modify the Comes_From_Source test so that
the implicit declaration of an inherited subprogram does not cause
an incorrect result of True.
Eric Botcazou [Tue, 23 Apr 2024 17:54:32 +0000 (19:54 +0200)]
ada: Fix fallout of previous finalization change
Now that Is_Finalizable_Transient only looks at the renamings coming from
nontransient objects serviced by transient scopes, it must find the object
ultimately renamed by them through a chain of renamings.
gcc/ada/
PR ada/114710
* exp_util.adb (Find_Renamed_Object): Recurse if the renamed object
is itself a renaming.
Javier Miranda [Tue, 23 Apr 2024 17:30:23 +0000 (17:30 +0000)]
ada: Missing support for 'Old with overloaded function
The compiler reports an error when the prefix of 'Old is
a call to an overloaded function that has no parameters.
gcc/ada/
* sem_attr.adb (Analyze_Attribute): Enhance support for
using 'Old with a prefix that references an overloaded
function that has no parameters; add missing support
for the use of 'Old within qualified expressions.
* sem_util.ads (Preanalyze_And_Resolve_Without_Errors):
New subprogram.
* sem_util.adb (Preanalyze_And_Resolve_Without_Errors):
New subprogram.
Eric Botcazou [Mon, 22 Apr 2024 14:52:14 +0000 (16:52 +0200)]
ada: Add support for symbolic backtraces with DLLs on Windows
This puts Windows on par with Linux as far as backtraces are concerned.
gcc/ada/
* libgnat/s-tsmona__linux.adb (Get): Move down descriptive comment.
* libgnat/s-tsmona__mingw.adb: Add with clause and use clause for
System.Storage_Elements.
(Get): Pass GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT in the call
to GetModuleHandleEx and remove the subsequent call to FreeLibrary.
Upon success, set Load_Addr to the base address of the module.
* libgnat/s-win32.ads (GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS): Use
shorter literal.
(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT): New constant.
Eric Botcazou [Mon, 22 Apr 2024 07:35:44 +0000 (09:35 +0200)]
ada: Fix too late finalization of temporary object
The problem is that Is_Finalizable_Transient returns false when a transient
object is subject to a renaming by another transient object present in the
same transient scope, thus forcing its finalization to be deferred to the
enclosing scope. That's not necessary, as only renamings by nontransient
objects serviced by transient scopes need to be rejected by the predicate.
The change also removes now dead code in the finalization machinery.
gcc/ada/
PR ada/114710
* exp_ch7.adb (Build_Finalizer.Process_Declarations): Remove dead
code dealing with renamings.
* exp_util.ads (Is_Finalizable_Transient): Rename Rel_Node to N.
* exp_util.adb (Is_Finalizable_Transient): Likewise.
(Is_Aliased): Remove obsolete code dealing wih EWA nodes and only
consider renamings present in N itself.
(Requires_Cleanup_Actions): Remove dead code dealing with renamings.
Javier Miranda [Mon, 22 Apr 2024 16:36:58 +0000 (16:36 +0000)]
ada: Missing dynamic predicate checks
The compiler does not generate dynamic predicate checks when
they are enabled for one type declaration and ignored for
other type declarations defined in the same scope.
gcc/ada/
* sem_ch13.adb (Analyze_One_Aspect): Set the applicable policy
of a type declaration when its aspect Dynamic_Predicate is
analyzed.
* sem_prag.adb (Handle_Dynamic_Predicate_Check): New subprogram
that enables or ignores dynamic predicate checks depending on
whether dynamic checks are enabled in the context where the
associated type declaration is defined; used in the analysis
of pragma check. In addition, for pragma Predicate, do not
disable it when the aspect was internally build as part of
processing a dynamic predicate aspect.
Jonathan Wakely [Tue, 11 Jun 2024 10:08:12 +0000 (11:08 +0100)]
libstdc++: Improve diagnostics for invalid std::hash specializations [PR115420]
When using a key type without a valid std::hash specialization the
unordered containers give confusing diagnostics about the default
constructor being deleted. Add a static_assert that will fail for
disabled std::hash specializations (and for a subset of custom hash
functions).
libstdc++-v3/ChangeLog:
PR libstdc++/115420
* include/bits/hashtable.h (_Hashtable): Add static_assert to
check that hash function is copy constructible.
* testsuite/23_containers/unordered_map/115420.cc: New test.
Jonathan Wakely [Wed, 12 Jun 2024 15:47:17 +0000 (16:47 +0100)]
libstdc++: Fix unwanted #pragma messages from PSTL headers [PR113376]
When we rebased the PSTL on upstream, in r14-2109-g3162ca09dbdc2e, a
change to how _PSTL_USAGE_WARNINGS is set was missed out, but the change
to how it's tested was included. This means that the macro is always
defined, so testing it with #ifdef (instead of using #if to test its
value) doesn't work as intended.
Revert the test to use #if again, since that part of the upstream change
was unnecessary in the first place (the macro is always defined, so
there's no need to use #ifdef to avoid -Wundef warnings).
libstdc++-v3/ChangeLog:
PR libstdc++/113376
* include/pstl/pstl_config.h: Use #if instead of #ifdef to test
the _PSTL_USAGE_WARNINGS macro.
This patch optimizes the compilation performance of
std::is_nothrow_invocable by dispatching to the new
__is_nothrow_invocable built-in trait.
libstdc++-v3/ChangeLog:
* include/std/type_traits (is_nothrow_invocable): Use
__is_nothrow_invocable built-in trait.
* testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc:
Handle the new error from __is_nothrow_invocable.
* testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc:
Likewise.
Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
This patch optimizes the compilation performance of std::is_invocable
by dispatching to the new __is_invocable built-in trait.
libstdc++-v3/ChangeLog:
* include/std/type_traits (is_invocable): Use __is_invocable
built-in trait.
* testsuite/20_util/is_invocable/incomplete_args_neg.cc: Handle
the new error from __is_invocable.
* testsuite/20_util/is_invocable/incomplete_neg.cc: Likewise.
Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
The testcase extracts one arm_neon.h vector from a pair (one subreg)
and then reinterprets the result as an SVE vector (another subreg).
Each subreg makes sense individually, but we can't fold them together
into a single subreg: it's 32 bytes -> 16 bytes -> 16*N bytes,
but the interpretation of 32 bytes -> 16*N bytes depends on
whether N==1 or N>1.
Since the second subreg makes sense individually, simplify_subreg
should bail out rather than ICE on it. simplify_gen_subreg will
then do the same (because it already checks validate_subreg).
This leaves simplify_gen_subreg returning null, requiring the
caller to take appropriate action.
I think this is relatively likely to occur elsewhere, so the patch
adds a helper for forcing a subreg, allowing a temporary pseudo to
be created where necessary.
I'll follow up by using force_subreg in more places. This patch
is intended to be a minimal backportable fix for the PR.
gcc/
PR target/115464
* simplify-rtx.cc (simplify_context::simplify_subreg): Don't try
to fold two subregs together if their relationship isn't known
at compile time.
* explow.h (force_subreg): Declare.
* explow.cc (force_subreg): New function.
* config/aarch64/aarch64-sve-builtins-base.cc
(svset_neonq_impl::expand): Use it instead of simplify_gen_subreg.
gcc/testsuite/
PR target/115464
* gcc.target/aarch64/sve/acle/general/pr115464.c: New test.
We have vec_extract pattern which takes ZVFHMIN as the mode
iterator of the VLS mode. Aka V_VLS. But it will expand to
pred_extract_first pattern which takes the ZVFH as the mode
iterator of the VLS mode. AKa V_VLSF. The mismatch will
result in one ICE similar as below:
This patch would like to fix this issue by align the mode
iterator restriction to ZVFH.
The below test suites are passed for this patch.
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc.
PR target/115456
gcc/ChangeLog:
* config/riscv/autovec.md: Take ZVFH mode iterator instead of
the ZVFHMIN for the alignment.
* config/riscv/vector-iterators.md: Add 2 new iterator
V_VLS_ZVFH and VLS_ZVFH.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr115456-1.c: New test.
Hongyu Wang [Thu, 9 May 2024 02:12:16 +0000 (10:12 +0800)]
[APX CCMP] Use ctestcc when comparing to const 0
For CTEST, we don't have conditional AND so there's no optimization
opportunity to write a new ctest pattern. Emit ctest when ccmp did
comparison to const 0 to save bytes.
gcc/ChangeLog:
* config/i386/i386.md (@ccmp<mode>): Add new alternative
<r>,C and adjust output templates. Also adjust UNSPEC mode
to CCmode.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
* gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
compare with 0.
Gerald Pfeifer [Thu, 13 Jun 2024 07:49:29 +0000 (09:49 +0200)]
doc: Streamline requirements on the build compiler
No need to talk about potential implementation bugs in older versions
than what we require. And no need to talk about building GCC 3.3 and
earlier at this point.
gcc:
PR other/69374
* doc/install.texi (Prerequisites): Simplify note on the C++
compiler required. Drop requirements for versions of GCC prior
to 3.4. Fix grammar.
Richard Biener [Mon, 10 Jun 2024 13:31:35 +0000 (15:31 +0200)]
Improve code generation of strided SLP loads
This avoids falling back to elementwise accesses for strided SLP
loads when the group size is not a multiple of the vector element
size. Instead we can use a smaller vector or integer type for the load.
For stores we can do the same though restrictions on stores we handle
and the fact that store-merging covers up makes this mostly effective
for cost modeling which shows for gcc.target/i386/vect-strided-3.c
which we now vectorize with V4SI vectors rather than just V2SI ones.
For all of this there's still the opportunity to use non-uniform
accesses, say for a 6-element group with a VF of two do
V4SI, { V2SI, V2SI }, V4SI. But that's for a possible followup.
* tree-vect-stmts.cc (get_group_load_store_type): Consistently
use VMAT_STRIDED_SLP for strided SLP accesses and not
VMAT_ELEMENTWISE.
(vectorizable_store): Adjust VMAT_STRIDED_SLP handling to
allow not only half-size but also smaller accesses.
(vectorizable_load): Likewise.
Richard Biener [Fri, 7 Jun 2024 12:47:12 +0000 (14:47 +0200)]
tree-optimization/115385 - handle more gaps with peeling of a single iteration
The following makes peeling of a single scalar iteration handle more
gaps, including non-power-of-two cases. This can be done by rounding
up the remaining access to the next power-of-two which ensures that
the next scalar iteration will pick at least the number of excess
elements we access.
I've added a correctness testcase and one x86 specific scanning for
the optimization.
PR tree-optimization/115385
* tree-vect-stmts.cc (get_group_load_store_type): Peeling
of a single scalar iteration is sufficient if we can narrow
the access to the next power of two of the bits in the last
access.
(vectorizable_load): Ensure that the last access is narrowed.
* gcc.dg/vect/pr115385.c: New testcase.
* gcc.target/i386/vect-pr115385.c: Likewise.
Richard Biener [Fri, 7 Jun 2024 09:29:05 +0000 (11:29 +0200)]
tree-optimization/114107 - avoid peeling for gaps in more cases
The following refactors the code to detect necessary peeling for
gaps, in particular the PR103116 case when there is no gap but
the group size is smaller than the vector size. The testcase in
PR114107 shows we fail to SLP
for (int i=0; i<n; i++)
for (int k=0; k<4; k++)
data[4*i+k] *= factor[i];
because peeling one scalar iteration isn't enough to cover a gap
of 3 elements of factor[i]. But the code detecting this is placed
after the logic that detects cases we handle properly already as
we'd code generate { factor[i], 0., 0., 0. } for V4DFmode vectorization
already. In fact the check to detect when peeling a single iteration
isn't enough seems improperly guarded as it should apply to all cases.
I'm not sure we correctly handle VMAT_CONTIGUOUS_REVERSE but I
checked that VMAT_STRIDED_SLP and VMAT_ELEMENTWISE correctly avoid
touching excess elements.
With this change we can use SLP for the above testcase and the
PR103116 testcases no longer require an epilogue on x86-64. It
might be different on other targets so I made those testcases
runtime FAIL only instead of relying on dump scanning there's
currently no easy way to properly constrain.
PR tree-optimization/114107
PR tree-optimization/110445
* tree-vect-stmts.cc (get_group_load_store_type): Refactor
contiguous access case. Make sure peeling for gap constraints
are always tested and consistently relax when we know we can
avoid touching excess elements during code generation. But
rewrite the check poly-int aware.
* gcc.dg/vect/pr114107.c: New testcase.
* gcc.dg/vect/pr103116-1.c: Adjust.
* gcc.dg/vect/pr103116-2.c: Likewise.
Peter Bergner [Thu, 13 Jun 2024 02:05:34 +0000 (21:05 -0500)]
rs6000: Fix pr66144-3.c test to accept multiple equivalent insns. [PR115262]
Jeff's commit r15-831-g05daf617ea22e1 changed the instruction we expected
for this test case into an equivalent instruction. Modify the test case
so it will accept any of three instructions we could get depending on the
options used.
2024-06-12 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
PR testsuite/115262
* gcc.target/powerpc/pr66144-3.c (dg-do): Compile for all targets.
(dg-options): Add -fno-unroll-loops and remove -mvsx.
(scan-assembler): Change from this...
(scan-assembler-times): ...to this. Tweak regex to accept multiple
allowable instructions.
YunQiang Su [Mon, 10 Jun 2024 06:31:12 +0000 (14:31 +0800)]
MIPS: Use FPU-enabled tune for mips32/mips64/mips64r2/mips64r3/mips64r5
Currently, the default tune value of mips32 is PROCESSOR_4KC, and
the default tune value of mips64/mips64r2/mips64r3/mips64r5 is
PROCESSOR_5KC. PROCESSOR_4KC and PROCESSOR_5KC are both FPU-less.
Let's use PROCESSOR_24KF1_1 for mips32, and PROCESSOR_5KF for mips64/
mips64r2/mips64r3/mips64r5.
We find this problem when we try to fix gcc.target/mips/movcc-3.c.
gcc:
* config/mips/mips-cpus.def: Use PROCESSOR_24KF1_1 for mips32;
Use PROCESSOR_5KF for mips64/mips64r2/mips64r3/mips64r5.
YunQiang Su [Sat, 8 Jun 2024 03:31:19 +0000 (11:31 +0800)]
MIPS: Use signaling fcmp instructions for LT/LE/LTGT
LT/LE: c.lt.fmt/c.le.fmt on pre-R6 and cmp.lt.fmt/cmp.le.fmt have
different semantic:
c.lt.fmt will signal for all NaN, including qNaN;
cmp.lt.fmt will only signal sNaN, while not qNaN;
cmp.slt.fmt has the same semantic as c.lt.fmt;
lt/le of RTL will signaling qNaN.
while in `s<code>_<SCALARF:mode>_using_<FPCC:mode>`, RTL operation
`lt`/`le` are convert to c/cmp's lt/le, which is correct for C.cond.fmt,
while not for CMP.cond.fmt. Let's convert them to slt/sle if ISA_HAS_CCF.
For LTGT, which signals qNaN, `sne` of r6 has same semantic, while pre-R6
has only inverse one `ngl`. Thus for RTL we have to use the `uneq` as the
operator, and introduce a new CC mode: CCEmode to mark it as signaling.
This patch can fix
gcc.dg/torture/pr91323.c for pre-R6;
gcc.dg/torture/builtin-iseqsig-* for R6.
gcc:
* config/mips/mips-modes.def: New CC_MODE CCE.
* config/mips/mips-protos.h(mips_output_compare): New function.
* config/mips/mips.cc(mips_allocate_fcc): Set CCEmode count=1.
(mips_emit_compare): Use CCEmode for LTGT/LT/LE for pre-R6.
(mips_output_compare): New function. Convert lt/le to slt/sle
for R6; convert ueq to ngl for CCEmode.
(mips_hard_regno_mode_ok_uncached): Mention CCEmode.
* config/mips/mips.h: Mention CCEmode for LOAD_EXTEND_OP.
* config/mips/mips.md(FPCC): Add CCE.
(define_mode_iterator MOVECC): Mention CCE.
(define_mode_attr reg): Add CCE with "z".
(define_mode_attr fpcmp): Add CCE with "c".
(define_code_attr fcond): ltgt should use sne instead of ne.
(s<code>_<SCALARF:mode>_using_<FPCC:mode>): call mips_output_compare.
Patrick Palka [Thu, 13 Jun 2024 00:05:05 +0000 (20:05 -0400)]
c++: visibility wrt concept-id as targ [PR115283]
Like with alias templates, it seems we don't maintain visibility flags
for concepts either, so min_vis_expr_r should ignore them for now.
Otherwise after r14-6789 we may incorrectly give a function template that
uses a concept-id in its signature internal linkage.
Alexandre Oliva [Wed, 12 Jun 2024 22:48:06 +0000 (19:48 -0300)]
[libstdc++] [testsuite] require cmath for c++23 cmath tests
Some c++23 tests fail on targets that don't satisfy dg-require-cmath,
because referenced math functions don't get declared in std. Add the
missing requirement.
Alexandre Oliva [Wed, 12 Jun 2024 22:48:04 +0000 (19:48 -0300)]
[libstdc++] [testsuite] xfail double-prec from_chars for float128_t
Tests involving float128_t were xfailed or otherwise worked around for
vxworks on aarch64. The same issue came up on rtems. This patch
adjusts them similarly.
for libstdc++-v3/ChangeLog
* testsuite/20_util/from_chars/8.cc: Skip float128_t testing
on aarch64-rtems*.
* testsuite/20_util/to_chars/float128_c++23.cc: Xfail run on
aarch64-rtems*.
Jason Merrill [Wed, 12 Jun 2024 12:06:47 +0000 (08:06 -0400)]
c++: repeated export using
A sample implementation of module std was breaking because the exports
included 'using std::operator&' twice. Since Nathaniel's r15-964 for
PR114867, the first using added an extra instance of each function that was
revealed/exported by that using, resulting in duplicates for
lookup_maybe_add to dedup. But if the duplicate is the first thing in the
list, lookup_add doesn't make an OVERLOAD, so trying to set OVL_USING_P
crashes. Fixed by using ovl_make in the case where we want to set the flag.
gcc/cp/ChangeLog:
* tree.cc (lookup_maybe_add): Use ovl_make when setting OVL_USING_P.
Jason Merrill [Wed, 12 Jun 2024 04:13:45 +0000 (00:13 -0400)]
c++: module std and exception_ptr
exception_ptr.h contains
namespace __exception_ptr
{
class exception_ptr;
}
using __exception_ptr::exception_ptr;
so when module std tries to 'export using std::exception_ptr', it names
another using-directive rather than the class directly, so __exception_ptr
is never explicitly opened in module purview.
gcc/cp/ChangeLog:
* module.cc (depset::hash::add_binding_entity): Set
DECL_MODULE_PURVIEW_P instead of asserting.
David Malcolm [Wed, 12 Jun 2024 18:24:47 +0000 (14:24 -0400)]
pretty_printer: unbreak build on aarch64 [PR115465]
I missed this target-specific usage of pretty_printer::buffer when
making the fields private in r15-1209-gc5e3be456888aa; sorry.
gcc/ChangeLog:
PR bootstrap/115465
* config/aarch64/aarch64-early-ra.cc (early_ra::process_block):
Update for fields of pretty_printer becoming private in r15-1209-gc5e3be456888aa.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Patrick O'Neill [Tue, 11 Jun 2024 00:00:38 +0000 (17:00 -0700)]
RISC-V: Allow any temp register to be used in amo tests
We artifically restrict the temp registers to be a[0-9]+ when other
registers like t[0-9]+ are valid too. Update to make the regex
accept any register for the temp value.
Andrew Pinski [Tue, 11 Jun 2024 20:36:34 +0000 (20:36 +0000)]
aarch64: Use bitreverse rtl code instead of unspec [PR115176]
Bitreverse rtl code was added with r14-1586-g6160572f8d243c. So let's
use it instead of an unspec. This is just a small cleanup but it does
have one small fix with respect to rtx costs which didn't handle vector modes
correctly for the UNSPEC and now it does.
This is part of the first step in adding __builtin_bitreverse's builtins
but it is independent of it though.
Bootstrapped and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
PR target/115176
* config/aarch64/aarch64-simd.md (aarch64_rbit<mode><vczle><vczbe>): Use
bitreverse instead of unspec.
* config/aarch64/aarch64-sve-builtins-base.cc (svrbit): Convert over to using
rtx_code_function instead of unspec_based_function.
* config/aarch64/aarch64-sve.md: Update comment where RBIT is included.
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle BITREVERSE like BSWAP.
Remove UNSPEC_RBIT support.
* config/aarch64/aarch64.md (unspec): Remove UNSPEC_RBIT.
(aarch64_rbit<mode>): Use bitreverse instead of unspec.
* config/aarch64/iterators.md (SVE_INT_UNARY): Add bitreverse.
(optab): Likewise.
(sve_int_op): Likewise.
(SVE_INT_UNARY): Remove UNSPEC_RBIT.
(optab): Likewise.
(sve_int_op): Likewise.
(min_elem_bits): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Wed, 12 Jun 2024 00:16:42 +0000 (17:16 -0700)]
match: Improve gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p for truncating casts [PR115449]
As mentioned by Jeff in r15-831-g05daf617ea22e1d818295ed2d037456937e23530, we don't handle
`(X | Y) & ~Y` -> `X & ~Y` on the gimple level when there are some different signed
(but same precision) types dealing with matching `~Y` with the `Y` part. This
improves both gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p to
be able to say `(truncate)a` and `(truncate)a` are bitwise_equal and
that `~(truncate)a` and `(truncate)a` are bitwise_invert_equal.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/115449
gcc/ChangeLog:
* gimple-match-head.cc (gimple_maybe_truncate): New declaration.
(gimple_bitwise_equal_p): Match truncations that differ only
in types with the same precision.
(gimple_bitwise_inverted_equal_p): For matching after bit_not_with_nop
call gimple_bitwise_equal_p.
* match.pd (maybe_truncate): New match pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bitops-10.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andi Kleen [Wed, 12 Jun 2024 13:59:37 +0000 (06:59 -0700)]
Move cexpr_stree tree string build into utility function
No semantics changes.
gcc/cp/ChangeLog:
* cp-tree.h (extract): Add new overload to return tree.
* parser.cc (cp_parser_asm_string_expression): Use tree extract.
* semantics.cc (cexpr_str::extract): Add new overload to return
tree.
The shift operations for dynamic_bitset fail to zero out words where the
non-zero bits were shifted to a completely different word.
For a right shift we don't need to sanitize the unused bits in the high
word, because we know they were already clear and a right shift doesn't
change that.
libstdc++-v3/ChangeLog:
PR libstdc++/115399
* include/tr2/dynamic_bitset (operator>>=): Remove redundant
call to _M_do_sanitize.
* include/tr2/dynamic_bitset.tcc (_M_do_left_shift): Zero out
low bits in words that should no longer be populated.
(_M_do_right_shift): Likewise for high bits.
* testsuite/tr2/dynamic_bitset/pr115399.cc: New test.
middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]
At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'. A simple example of such
loop is given below:
void foo(double * restrict a, double * restrict b, int n){
int i;
for(i=0; i<n; ++i){
a[i] = a[i] + b[i];
__builtin_prefetch(&(b[i+8]));
}
}
The failure stems from two issues:
1. Given that it is typically not possible to fully reason about a
function call due to the possibility of side effects, the
autovectorizer does not attempt to vectorize loops which make such
calls.
Given the memory reference passed to `__builtin_prefetch', in the
absence of assurances about its effect on the passed memory
location the compiler deems the function unsafe to vectorize,
marking it as clobbering memory in `vect_find_stmt_data_reference'.
This leads to the failure in autovectorization.
2. Notwithstanding the above issue, though the prefetch statement
would be classed as `vect_unused_in_scope', the loop invariant that
is used in the address of the prefetch is the scalar loop's and not
the vector loop's IV. That is, it still uses `i' and not `vec_iv'
because the instruction wasn't vectorized, causing DCE to think the
value is live, such that we now have both the vector and scalar loop
invariant actively used in the loop.
This patch addresses both of these:
1. About the issue regarding the memory clobber, data prefetch does
not generate faults if its address argument is invalid and does not
write to memory. Therefore, it does not alter the internal state
of the program or its control flow under any circumstance. As
such, it is reasonable that the function be marked as not affecting
memory contents.
To achieve this, we add the necessary logic to
`get_references_in_stmt' to ensure that builtin functions are given
given the same treatment as internal functions. If the gimple call
is to a builtin function and its function code is
`BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
2. Finding precedence in the way clobber statements are handled,
whereby the vectorizer drops these from both the scalar and
vectorized versions of a given loop, we choose to drop prefetch
hints in a similar fashion. This seems appropriate given how
software prefetch hints are typically ignored by processors across
architectures, as they seldom lead to performance gain over their
hardware counterparts.
gcc/ChangeLog:
PR tree-optimization/114061
* tree-data-ref.cc (get_references_in_stmt): set
`clobbers_memory' to false for __builtin_prefetch.
* tree-vect-loop.cc (vect_transform_loop): Drop all
__builtin_prefetch calls from loops.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-prefetch-drop.c: New test.
* gcc.target/aarch64/vect-prefetch-drop.c: Likewise.
David Malcolm [Wed, 12 Jun 2024 13:15:09 +0000 (09:15 -0400)]
pretty_printer: convert chunk_info into a class
No functional change intended.
gcc/cp/ChangeLog:
* error.cc (append_formatted_chunk): Move part of body into
chunk_info::append_formatted_chunk.
gcc/ChangeLog:
* dumpfile.cc (dump_pretty_printer::emit_items): Update for
changes to chunk_info.
* pretty-print.cc (chunk_info::append_formatted_chunk): New, based
on code in cp/error.cc's append_formatted_chunk.
(chunk_info::pop_from_output_buffer): New, based on code in
pp_output_formatted_text and dump_pretty_printer::emit_items.
(on_begin_quote): Convert to...
(chunk_info::on_begin_quote): ...this.
(on_end_quote): Convert to...
(chunk_info::on_end_quote): ...this.
(pretty_printer::format): Update for chunk_info becoming a class
and its fields gaining "m_" prefixes. Update for on_begin_quote
and on_end_quote moving to chunk_info.
(quoting_info::handle_phase_3): Update for changes to chunk_info.
(pp_output_formatted_text): Likewise. Move cleanup code to
chunk_info::pop_from_output_buffer.
* pretty-print.h (class output_buffer): New forward decl.
(class urlifier): New forward decl.
(struct chunk_info): Convert to...
(class chunk_info): ...this. Add friend class pretty_printer.
(chunk_info::get_args): New accessor.
(chunk_info::get_quoting_info): New accessor.
(chunk_info::append_formatted_chunk): New decl.
(chunk_info::pop_from_output_buffer): New decl.
(chunk_info::on_begin_quote): New decl.
(chunk_info::on_end_quote): New decl.
(chunk_info::prev): Rename to...
(chunk_info::m_prev): ...this.
(chunk_info::args): Rename to...
(chunk_info::m_args): ...this.
(output_buffer::cur_chunk_array): Drop "struct" from decl.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>