Skip -fdelete-null-pointer-check tests if target keeps_null_pointer_checks
A bunch of tests explicitly pass in -fdelete-null-pointer-checks and
fail if the target keeps null pointer checks. Skip such tests by
adding a dg-skip-if for keeps_null_pointer_checks.
Andrew Pinski [Mon, 15 May 2023 21:44:27 +0000 (21:44 +0000)]
MATCH: [PR109424] Simplify min/max of boolean arguments
This is version 2 of https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html
which does not depend on adding gimple_truth_valued_p at this point.
Instead will use zero_one_valued_p which is already used for mult simplifications
to make sure that we only have [0,1] rather having the mistake of maybe having [-1,0]
as the range for signed bools.
This shows up in a few places in GCC itself but only at -O1, we miss the min/max conversion
because of PR 107888 (which I will be testing seperately).
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Thanks,
Andrew Pinski
PR tree-optimization/109424
gcc/ChangeLog:
* match.pd: Add patterns for min/max of zero_one_valued
values to `&`/`|`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bool-12.c: New test.
* gcc.dg/tree-ssa/bool-13.c: New test.
* gcc.dg/tree-ssa/minmax-20.c: New test.
* gcc.dg/tree-ssa/minmax-21.c: New test.
Joseph Myers [Mon, 15 May 2023 23:17:48 +0000 (23:17 +0000)]
c: Ignore _Atomic on function return type for C2x
For C2x it was decided that _Atomic would be completely ignored on
function return types (just as was done for qualifiers in C11 DR#423),
to eliminate the potential for an rvalue returned by a function having
_Atomic-qualified type when an rvalue resulting from lvalue-to-rvalue
conversion could not have such a type. Implement this for GCC.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
* c-decl.cc (grokdeclarator): Ignore _Atomic on function return
type for C2x.
gcc/testsuite/
* gcc.dg/qual-return-9.c, gcc.dg/qual-return-10.c: New tests.
Joseph Myers [Mon, 15 May 2023 21:27:33 +0000 (21:27 +0000)]
c: Update __has_c_attribute values for C2x
WG14 decided that __has_c_attribute should return the same value
(equal to the intended __STDC_VERSION__ value) for all standard
attributes in C2x, with values associated with when an attribute was
added to the working draft (or had semantics added or changed in the
working draft) only being used in earlier stages of development of
that draft. The intent is that the values for existing attributes
increase in future standard versions only if there are new features /
semantic changes for those attributes. Implement this change for GCC.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c-family/
* c-lex.cc (c_common_has_attribute): Use 202311 as
__has_c_attribute return for all C2x attributes.
gcc/testsuite/
* gcc.dg/c2x-has-c-attribute-2.c: Expect 202311L return value from
__has_c_attribute for all C2x attributes.
Harald Anlauf [Sun, 14 May 2023 19:53:51 +0000 (21:53 +0200)]
Fortran: CLASS pointer function result in variable definition context [PR109846]
gcc/fortran/ChangeLog:
PR fortran/109846
* expr.cc (gfc_check_vardef_context): Check appropriate pointer
attribute for CLASS vs. non-CLASS function result in variable
definition context.
gcc/testsuite/ChangeLog:
PR fortran/109846
* gfortran.dg/ptr-func-5.f90: New test.
Aldy Hernandez [Mon, 15 May 2023 10:25:58 +0000 (12:25 +0200)]
Add auto-resizing capability to irange's [PR109695]
<tldr>
We can now have int_range<N, RESIZABLE=false> for automatically
resizable ranges. int_range_max is now int_range<3, true>
for a 69X reduction in size from current trunk, and 6.9X reduction from
GCC12. This incurs a 5% performance penalty for VRP that is more than
covered by our > 13% improvements recently.
</tldr>
int_range_max is the temporary range object we use in the ranger for
integers. With the conversion to wide_int, this structure bloated up
significantly because wide_ints are huge (80 bytes a piece) and are
about 10 times as big as a plain tree. Since the temporary object
requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
This means the structure grew from 4112 bytes to 40912 bytes.
This patch adds the ability to resize ranges as needed, defaulting to
no resizing, while int_range_max now defaults to 3 sub-ranges (instead
of 255) and grows to 255 when the range being calculated does not fit.
For example:
int_range<1> foo; // 1 sub-range with no resizing.
int_range<5> foo; // 5 sub-ranges with no resizing.
int_range<5, true> foo; // 5 sub-ranges with resizing.
I ran some tests and found that 3 sub-ranges cover 99% of cases, so
I've set the int_range_max default to that:
We don't bother growing incrementally, since the default covers most
cases and we have a 255 hard-limit. This hard limit could be reduced
to 128, since my tests never saw a range needing more than 124, but we
could do that as a follow-up if needed.
With 3-subranges, int_range_max is now 592 bytes versus 40912 for
trunk, and versus 4112 bytes for GCC12! The penalty is 5.04% for VRP
and 3.02% for threading, with no noticeable change in overall
compilation (0.27%). This is more than covered by our 13.26%
improvements for the legacy removal + wide_int conversion.
I think this approach is a good alternative, while providing us with
flexibility going forward. For example, we could try defaulting to a
8 sub-ranges for a noticeable improvement in VRP. We could also use
large sub-ranges for switch analysis to avoid resizing.
Another approach I tried was always resizing. With this, we could
drop the whole int_range<N> nonsense, and have irange just hold a
resizable range. This simplified things, but incurred a 7% penalty on
ipa_cp. This was hard to pinpoint, and I'm not entirely convinced
this wasn't some artifact of valgrind. However, until we're sure,
let's avoid massive changes, especially since IPA changes are coming
up.
For the curious, a particular hot spot for IPA in this area was:
The problem isn't the resizing (since we do that at most once) but the
fact that for some functions with lots of callers we end up a huge
range that gets copied and compared for every meet operation. Maybe
the IPA algorithm could be adjusted somehow??.
Anywhooo... for now there is nothing to worry about, since value_range
still has 2 subranges and is not resizable. But we should probably
think what if anything we want to do here, as I envision IPA using
infinite ranges here (well, int_range_max) and handling frange's, etc.
gcc/ChangeLog:
PR tree-optimization/109695
* value-range.cc (irange::operator=): Resize range.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(int_range_max): Default to 3 sub-ranges and resize as needed.
* value-range.h (irange::maybe_resize): New.
(~int_range): New.
(int_range::int_range): Adjust for resizing.
(int_range::operator=): Same.
Aldy Hernandez [Mon, 15 May 2023 13:10:11 +0000 (15:10 +0200)]
Only return changed=true in union_nonzero when appropriate.
irange::union_ was being overly pessimistic in its return value. It
was returning false when the nonzero mask was possibly the same.
The reason for this is because the nonzero mask is not entirely kept
up to date. We avoid setting it up when a new range is set (from a
set, intersect, union, etc), because calculating a mask from a range
is measurably expensive. However, irange::get_nonzero_bits() will
always return the correct mask because it will calculate the nonzero
mask inherit in the mask on the fly and bitwise or it with the saved
mask. This was an optimization because last release it was a big
penalty to keep the mask up to date. This may not necessarily be the
case with the conversion to wide_int's. We should investigate.
Just to be clear, the result from get_nonzero_bits() is always correct
as seen by the user, but the wide_int in the irange does not contain
all the information, since part of the nonzero bits can be determined
by the range itself, on the fly.
The fix here is to make sure that the result the user sees (callers of
get_nonzero_bits()) changed when unioning bits. This allows
ipcp_vr_lattice::meet_with_1 to avoid unnecessary copies when
determining if a range changed.
This patch yields an 6.89% improvement to the ipa_cp pass. I'm
including the IPA changes in this patch, as it's a testcase of sorts for
the change.
gcc/ChangeLog:
* ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Avoid unnecessary
range copying
* value-range.cc (irange::union_nonzero_bits): Return TRUE only
when range changed.
Juzhe-Zhong [Mon, 15 May 2023 14:23:45 +0000 (22:23 +0800)]
RISC-V: Add rounding mode operand for fixed-point patterns
Since we are going to have fixed-point intrinsics that are modeling
rounding mode
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
We should have operand to specify rounding mode in fixed-point instructions.
We don't support these modeling rounding mode intrinsics yet but we will
definetely support them later.
This is the preparing patch for new coming intrinsics.
Pan Li [Mon, 15 May 2023 14:05:44 +0000 (22:05 +0800)]
OPTABS: Extend the number of expanding instructions pattern
We (RVV) is going to add a rounding mode operand into floating-point
instructions which have 11 operands.
Since we are going have intrinsic that is adding rounding mode argument:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
This is the patch that is adding rounding mode operand in RISC-V port:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618573.html
You can see there are 11 operands in these patterns.
gcc/ChangeLog:
* optabs.cc (maybe_gen_insn): Add case to generate instruction
that has 11 operands.
Kyrylo Tkachov [Mon, 15 May 2023 11:05:35 +0000 (12:05 +0100)]
aarch64: Cost vector comparisons more accurately
We are missing cases for combining of FACGE/FACGT instructions. In the testcase of the patch we generate:
foo:
fabs v3.4s, v0.4s
fabs v0.4s, v1.4s
fabs v1.4s, v2.4s
fcmgt v0.4s, v3.4s, v0.4s
fcmgt v1.4s, v3.4s, v1.4s
b g
This is because combine is rejecting the pattern due to costs:
Successfully matched this instruction:
(set (reg:V4SI 106)
(neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113))
(abs:V4SF (reg:V4SF 111)))))
rejecting combination of insns 8, 9 and 10
original costs 8 + 8 + 12 = 28
replacement costs 8 + 28 = 36
It is obviously recursing in the various arms of the RTX and such.
This patch teaches the aarch64 rtx costs routine that our vector comparisons are represented as a NEG of
compare operators, with the FACGE/FAGT operations in particular having ABS on each arm. With this patch we get
the much more reasonable dump:
original costs 8 + 8 + 8 = 24
replacement costs 8 + 8 = 16
and generate the optimal assembly:
foo:
mov v31.16b, v0.16b
facgt v0.4s, v0.4s, v1.4s
facgt v1.4s, v31.4s, v2.4s
b g
Bootstrapped and tested on aarch64-none-linux-gnu.
Thomas Schwinge [Tue, 25 Apr 2023 21:53:12 +0000 (23:53 +0200)]
Support parallel testing in libgomp, part II [PR66005]
..., and enable if 'flock' is available for serializing execution testing.
Regarding the default of 19 parallel slots, this turned out to be a local
minimum for wall time when testing this on:
$ uname -srvi
Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64
$ grep '^model name' < /proc/cpuinfo | uniq -c
32 model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
... in two configurations: case (a) standard configuration, no offloading
configured, case (b) offloading for GCN and nvptx configured but no devices
available. For both cases, default plus '-m32' variant.
$ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}"
$ uname -srvi
Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
$ grep '^model name' < /proc/cpuinfo | uniq -c
12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
$ nvidia-smi -L
GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)
... in two configurations: case (c) standard configuration, no offloading
configured, case (d) offloading for nvptx configured and device available.
For both cases, only default variant, no '-m32'.
$ \time make check-target-libgomp
Case (c), baseline; roughly half of case (a) (just one variant):
Worth noting is that with nvptx offloading, there is one execution test case
that times out ('libgomp.fortran/reverse-offload-5.f90'). This effectively
stalls progress for almost 5 min: quickly other executions test cases queue up
on the lock for all parallel slots. That's working as expected; just noting
this as it accordingly does skew the wall time numbers.
Thomas Schwinge [Wed, 10 May 2023 13:01:55 +0000 (15:01 +0200)]
libgomp testsuite: As appropriate, use the 'gcc', 'g++', 'gfortran' driver [PR91884]
..., that is, 'GCC_UNDER_TEST', 'GXX_UNDER_TEST', 'GFORTRAN_UNDER_TEST' instead
of 'GCC_UNDER_TEST' for all of them. No need anymore for 'gcc -lstdc++ -x c++'
for C++ code, or 'gcc -lgfortran' plus conditional '-lquadmath' for Fortran
code. (Getting rid of explicit '-foffload=-lgfortran' is for another day.)
Sören Tempel [Sun, 14 May 2023 17:30:21 +0000 (19:30 +0200)]
fix assert in __deregister_frame_info_bases
The assertion in __deregister_frame_info_bases assumes that for every
frame something was inserted into the lookup data structure by
__register_frame_info_bases. Unfortunately, this does not necessarily
hold true as the btree_insert call in __register_frame_info_bases will
not insert anything for empty ranges. Therefore, we need to explicitly
account for such empty ranges in the assertion as `ob` will be a null
pointer for such ranges, hence causing the assertion to fail.
Yannick Moy [Mon, 16 Jan 2023 11:33:03 +0000 (11:33 +0000)]
ada: Add annotations for proof of termination of runtime units
String-manipulating functions should always terminate. Add justification
for the termination of Mapping function parameter, and loop variants
where needed. This is needed for GNATprove to prove termination.
gcc/ada/
* libgnat/a-strbou.ads: Add justifications for Mapping.
* libgnat/a-strfix.adb: Same.
* libgnat/a-strfix.ads: Same.
* libgnat/a-strsea.adb: Same.
* libgnat/a-strsea.ads: Same.
* libgnat/a-strsup.adb: Same and add loop variants.
* libgnat/a-strsup.ads: Same and add specification of termination.
Bob Duff [Fri, 13 Jan 2023 19:48:46 +0000 (14:48 -0500)]
ada: Use Inline aspect instead of pragma in Einfo.Utils
This package was using the Ada 83 renaming idiom for inlining
Next_Component and other Next_... procedures without inlining the
same-named functions. Using the Inline aspect avoids that sort
of horsing around.
We change all the other pragmas Inline in this package to aspects
as well, which is a more-minor improvement. Fix too-long lines
without wrapping lines.
gcc/ada/
* einfo-utils.ads, einfo-utils.adb: Get rid of the Proc_Next_...
procedures. Use Inline aspect instead of pragma Inline.
Is_Discrete_Or_Fixed_Point_Type did not have pragma Inline, but
now has the aspect; this was probably an oversight
(which illustrates why aspects are better).
Bob Duff [Sun, 8 Jan 2023 23:22:17 +0000 (18:22 -0500)]
ada: Clean up vanishing entity fields
Fix all the failures caused by enabling Check_Vanishing_Fields on
entities in all cases except the case of converting to or from E_Void.
But leave Check_Vanishing_Fields disabled by default (controlled by
-gnatd_v flag), because it might be too slow even for assertions-on
mode, and we should deal with the E_Void cases eventually.
The failures are fixed either by adding calls to Reinit_Field_To_Zero,
or by changing which entities have which fields.
Note that in a series of Reinit_Field_To_Zero calls, the optional
Old_Ekind parameter is only useful on the first such call.
gcc/ada/
* atree.adb
(Check_Vanishing_Fields): Disable the check for "root/base type
only" fields. This is a bug fix -- if we're checking some subtype
S, we don't want to reach over to the root or base type and
Reinit_Field_To_Zero of that, thus modifying the field for lots of
subtypes other than S. Disable in the to/from E_Void cases. Misc
cleanup.
* gen_il-gen-gen_entities.adb: Define First_Entity, Last_Entity,
and Stored_Constraint for all type entities, because there are too
many cases where Reinit_Field_To_Zero would otherwise be needed.
In any case, it seems cleaner to have First_Entity and Last_Entity
defined in the same entity kinds.
* einfo.ads:
(First_Entity, Last_Entity, Stored_Constraint): Update comments to
reflect gen_il-gen-gen_entities.adb changes.
(Lit_Hash): Add missing "[root type only]" comment.
* exp_ch5.adb: Add Reinit_Field_To_Zero calls for vanishing
fields.
* sem_ch10.adb: Likewise.
* sem_ch6.adb: Likewise.
* sem_ch7.adb: Likewise.
* sem_ch8.adb: Likewise.
* sem_ch3.adb: Likewise. Also remove now-unnecessary
Reinit_Field_To_Zero calls.
Eric Botcazou [Thu, 12 Jan 2023 14:51:40 +0000 (15:51 +0100)]
ada: Fix internal error on instance in package body with -gnatn
This plugs a small loophole in the procedure responsible for attempting to
hide entities that have been previously made public by the semantic analyzer
in package bodies.
gcc/ada/
* sem_ch7.adb (Hide_Public_Entities): Use the same condition for
subprogram bodies without specification as for those with one.
Piotr Trojanek [Tue, 10 Jan 2023 23:22:03 +0000 (00:22 +0100)]
ada: Accept aggregates with OTHERS clause in unchecked type conversions
When inlining subprogram calls in GNATprove mode, the actual parameter
is wrapped in an unchecked conversion. If this actual parameter is an
aggregate OTHERS clause, then the type of unchecked conversion allows us
to resolve this clause (just like for aggregates wrapped in a qualified
expression).
Previously such aggregates were rejected, which caused spurious and
cryptic errors; now they are accepted.
Steve Baird [Fri, 16 Dec 2022 00:50:05 +0000 (16:50 -0800)]
ada: Emit warnings for (some) ineffective static predicate tests
Generate a warning if a static predicate tests for a value that
does not belong to the parent subtype. For example, in
subtype S is Positive with Static_Predicate => S not in 0 | 11 | 222;
the 0 is ineffective because Positive already excludes that value.
Generation of this new warning is controlled by the -gnatw_s switch,
which can also be enabled via -gnatwa.
gcc/ada/
* warnsw.ads: Add a new element,
Warn_On_Ineffective_Predicate_Test, to the Opt_Warnings_Enum
enumeration type.
* warnsw.adb: Bind "-gnatw_s" to the new
Warn_On_Ineffective_Predicate_Test switch. Add the new switch to
the set of switches enabled by -gnata .
* sem_ch13.adb
(Build_Discrete_Static_Predicate): Declare new local procedure,
Warn_If_Test_Ineffective, which conditionally generates new
warning. Call this new procedure when building a new element of an
RList.
* doc/gnat_ugn/building_executable_programs_with_gnat.rst:
Document the -gnatw_s switch (and the corresponding -gnatw_S
switch).
* gnat_ugn.texi: Regenerate.
Before this patch, the front end failed to catch many illegal uses
of access attributes of task types.
This patch makes referring to the access attributes of a task type
raise an error, except in the current instance case defined in
clause 8.6 of the reference manual.
gcc/ada/
* sem_attr.adb: sem_attr.adb (Analyze_Access_Attribute): Tighten
validity check for task types.
Bob Duff [Fri, 6 Jan 2023 18:23:36 +0000 (13:23 -0500)]
ada: Optimize 2**N to avoid explicit 'if' in modular case
The compiler usually turns 2**N into Shift_Left(1,N).
This patch removes the check for "shift amount too big" in the
modular case, because Shift_Left works properly in that case
(i.e. if N is very large, it returns 0).
This removes a redundant check on most hardware; Shift_Left
takes care of large shirt amounts as necessary, even though
most hardware does not.
gcc/ada/
* exp_ch4.adb
(Expand_N_Op_Expon): Remove the too-big check. Simplify. Signed
and modular cases are combined, etc. Remove code with comment "We
only handle cases where the right type is a[sic] integer", because
the right operand must always be an integer at this point.
Bob Duff [Fri, 6 Jan 2023 01:21:15 +0000 (20:21 -0500)]
ada: Add Check_Error_Detected before "raise Bad_Attribute"
We shouldn't raise Bad_Attribute if there is no error.
This patch adds a call to Check_Error_Detected to make sure that's true.
(There are other cases where we raise Bad_Attribute;
this patch doesn't try to fix them all.)
gcc/ada/
* sem_attr.adb
(Analyze_Attribute): Add a call to Check_Error_Detected.
Yannick Moy [Fri, 6 Jan 2023 10:10:53 +0000 (11:10 +0100)]
ada: Fix handling of pragma Warnings (Toolname, Off/On)
Pragma Warnings On/Off with a preceding toolname (which could be GNAT
or GNATprove) was ignored due an error in accessing the expression of
a pragma association in the parser. Now fixed.
gcc/ada/
* par-prag.adb (First_Arg_Is_Matching_Tool_Name): Fix access to
expression in pragma association.
Eric Botcazou [Wed, 4 Jan 2023 15:41:47 +0000 (16:41 +0100)]
ada: Fix invalid JSON for extended variant record with -gnatRj
This fixes the output of -gnatRj for an extension of a tagged type which has
a variant part and also deals with the case where the parent type is private
with unknown discriminants.
gcc/ada/
* repinfo.ads (JSON output format): Document special case of
Present member of a Variant object.
* repinfo.adb (List_Structural_Record_Layout): Change the type of
Ext_Level parameter to Integer. Restrict the first recursion with
increasing levels to the fixed part and implement a second
recursion with decreasing levels for the variant part. Deal with
an extension of a type with unknown discriminants.
Joel Brobecker [Tue, 6 Dec 2022 14:25:38 +0000 (18:25 +0400)]
ada: GNAT UGN: Add section documenting PIE being enabled by default on Linux
This commit updates the Linux-specific chapter to add a new section
documenting the fact that PIE is enabled by default, and provides
some information about the impact that this might have on some
projects, as well as recommendations on how to handle issues.
Javier Miranda [Mon, 2 Jan 2023 14:03:11 +0000 (14:03 +0000)]
ada: Skip dynamic interface conversion under native runtime
gcc/ada/
* exp_disp.adb
(Has_Dispatching_Constructor_Call): New subprogram.
(Expand_Interface_Conversion): No need to perform dynamic
interface conversion when the operand and the target type are
interface types and the target interface type is an ancestor of
the operand type. The unique exception to this rule is when the
operand has a dispatching constructor call (as documented in the
sources).
Piotr Trojanek [Thu, 22 Dec 2022 11:14:08 +0000 (12:14 +0100)]
ada: Reject attribute Initialize on unchecked unions
Attribute Initialized is expanded into Valid_Scalars, which can't work
on unchecked unions, so Initialized on unchecked unions needs to be
rejected before expansion.
gcc/ada/
* sem_attr.adb (Analyze_Attribute): Reject attribute Initialized
on unchecked unions; fix grammar in comment.
Before this patch, Set_Can_Use_Internal_Rep was called on access
to subprogram subtypes when instantiating Unchecked_Conversion
from System.Address to an access to subprogram subtype (or the
reverse). This was incorrect and caused an assertion failure.
This patch fixes that by modifying the Can_Use_Internal_Rep
attribute of the base type of the subtype instead.
gcc/ada/
* sem_ch13.adb (Validate_Unchecked_Conversion): Fix behavior on
System.Address to access to subprogram subtype conversion.
Piotr Trojanek [Thu, 22 Dec 2022 22:36:47 +0000 (23:36 +0100)]
ada: Fix link to parent when copying with Copy_Separate_Tree
When flag More_Ids is set on a node, then syntactic children will have
their Parent link set to the last node in the chain of Mode_Ids.
For example, parameter associations in declaration like:
procedure P (X, Y : T);
will have More_Ids set for "X", Prev_Ids set on "Y" and both will have
the same node of "T" as their child. However, "T" will have only one
parent, i.e. "Y".
This anomaly was taken into account in New_Copy_Tree, but not in
Copy_Separate_Tree. This was leading to spurious errors in check for
ghost-correctness applied to copied specs.
gcc/ada/
* atree.ads
(Is_Syntactic_Node): Refactored from New_Copy_Tree.
* atree.adb
(Is_Syntactic_Node): Likewise.
(Copy_Separate_Tree): Use Is_Syntactic_Node.
* sem_util.adb
(Has_More_Ids): Move to Atree.
(Is_Syntactic_Node): Likewise.
Kyrylo Tkachov [Mon, 15 May 2023 08:55:44 +0000 (09:55 +0100)]
aarch64: PR target/99195 annotate vector compare patterns for vec-concat-zero
This instalment of the series goes through the vector comparison patterns in the backend.
One wart are the int64x1_t comparisons that this patch doesn't touch.
Those are a bit trickier because they have define_insn_and_split mechanisms for falling back to
GP reg comparisons after reload and I don't think a simple annotation will catch those cases correctly.
Those will need more custom thinking.
As said, this patch doesn't touch those and is a decent straightforward improvement on its own.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
__attribute__ ((noipa)) void
f_vnx2qi (int8_t a, int8_t b, int8_t *out)
{
vnx2qi v = {a, b};
*(vnx2qi *) out = v;
}
Before this patch:
f_vnx2qi:
vsetvli a5,zero,e8,mf8,ta,ma
vmv.v.x v1,a0
vslide1down.vx v1,v1,a1
vse8.v v1,0(a2)
ret
After this patch:
f_vnx2qi:
vsetivli zero,2,e8,mf8,ta,ma
vmv.v.x v1,a0
vslide1down.vx v1,v1,a1
vse8.v v1,0(a2)
ret
Signed-off-by: Pan Li <pan2.li@intel.com> Co-authored-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> Co-authored-by: kito-cheng <kito.cheng@sifive.com>
gcc/ChangeLog:
* config/riscv/riscv-v.cc (const_vlmax_p): New function for
deciding the mode is constant or not.
(set_len_and_policy): Optimize VLS-VLMAX code gen to vsetivli.
Richard Biener [Mon, 15 May 2023 07:10:08 +0000 (09:10 +0200)]
tree-optimization/109848 - fix TARGET_MEM_REF store from CTOR simplification
I've put the preparation stmt in the wrong place.
PR tree-optimization/109848
* tree-ssa-forwprop.cc (pass_forwprop::execute): Put the
TARGET_MEM_REF address preparation before the store, not
before the CTOR.
Juzhe-Zhong [Mon, 15 May 2023 06:00:59 +0000 (14:00 +0800)]
RISC-V: Support TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT to optimize codegen of both VLA && VLS auto-vectorization
This patch optimizes both RVV VLA && VLS vectorization.
Consider this following case:
void __attribute__((noinline, noclone))
f (int * __restrict dst, int * __restrict op1, int * __restrict op2, int
count)
{
for (int i = 0; i < count; ++i)
dst[i] = op1[i] + op2[i];
}
Andrew Pinski [Sat, 13 May 2023 22:25:21 +0000 (22:25 +0000)]
MATCH: Add pattern for `signbit(x) ? x : -x` into abs (and swapped)
This adds a simple pattern to match.pd for `signbit(x) ? x : -x`
into abs<x>. This can be done for all types even ones that honor
signed zeros and NaNs because both signbit and - are considered
only looking at/touching the sign bit of those types and does
not trap either.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/109829
gcc/ChangeLog:
* match.pd: Add pattern for `signbit(x) !=/== 0 ? x : -x`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/abs-3.c: New test.
* gcc.dg/tree-ssa/abs-4.c: New test.
Uros Bizjak [Sun, 14 May 2023 19:53:17 +0000 (21:53 +0200)]
i386: Handle unsupported modes from ix86_widen_mult_cost [PR109807]
Revert my previous change that faked handling of V4HI and V2SImodes
in ix86_widen_mult_cost and rather return arbitrary high value
for unsupported modes. This should prevent cost estimator from
selecting non-existent vector widen multiply operation.
gcc/ChangeLog:
PR target/109807
* config/i386/i386.cc: Revert the 2023-05-11 change.
(ix86_widen_mult_cost): Return high value instead of
ICEing for unsupported modes.
gcc/testsuite/ChangeLog:
PR target/109807
* gcc.target/i386/pr109825.c: New test.
Ard Biesheuvel [Sun, 14 May 2023 16:18:38 +0000 (18:18 +0200)]
i386: Honour -mdirect-extern-access when calling __fentry__
The small and medium PIC code models generate profiling calls that
always load the address of __fentry__() via the GOT, even if
-mdirect-extern-access is in effect.
This deviates from the behavior with respect to other external
references, and results in a longer opcode that relies on linker
relaxation to eliminate the GOT load. In this particular case, the
transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)'
with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the
NOP is a 1 byte NOP that preserves the 6 byte length of the sequence.
This is problematic for the Linux kernel, which generally relies on
-mdirect-extern-access and hidden visibility to eliminate GOT based
symbol references in code generated with -fpie/-fpic, without having to
depend on linker relaxation.
The Linux kernel relies on code patching to replace these opcodes with
NOPs at runtime, and this is complicated code that we'd prefer not to
complicate even more by adding support for patching both 5 and 6 byte
sequences as well as parsing the instruction stream to decide which
variant of CALL+NOP we are dealing with.
So let's honour -mdirect-extern-access, and only load the address of
__fentry__ via the GOT if direct references to external symbols are not
permitted.
Note that the GOT reference in question is in fact a data reference: we
explicitly load the address of __fentry__ from the GOT, which amounts to
eager binding, rather than emitting a PLT call that could bind eagerly,
lazily or directly at link time.
Andrew Pinski [Fri, 12 May 2023 23:33:44 +0000 (16:33 -0700)]
MATCH: Fix PR 109834, ICE with popcount combined with bswap
After r14-673-gc0dd80e4c4c3, there was a check in the match
patterns which was checking the type is unsigned but
instead of using the type, the patch used the expression.
This adds the needed TREE_TYPE so get the correct answer and don't ICE.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR tree-optimization/109834
gcc/ChangeLog:
* match.pd (popcount(bswap(x))->popcount(x)): Fix up unsigned type checking.
(popcount(rotate(x,y))->popcount(x)): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr109834-1.c: New test.
* gcc.dg/tree-ssa/pr109834-1.c: New test.
Uros Bizjak [Fri, 12 May 2023 17:50:06 +0000 (19:50 +0200)]
i386: Cleanup ix86_expand_vecop_qihi{,2}
Some cleanups while looking at these two functions.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also
reject ymm instructions for TARGET_PREFER_AVX128. Use generic
gen_extend_insn to generate zero/sign extension instructions.
Fix comments.
(ix86_expand_vecop_qihi): Initialize interleave functions
for MULT code only. Fix comments.
Jonathan Wakely [Fri, 12 May 2023 13:25:50 +0000 (14:25 +0100)]
libstdc++: Remove redundant dependencies on _GLIBCXX_USE_C99_STDINT_TR1
We never need to use std::make_unsigned in std::char_traits<char16_t>
and std::char_traits<char32_t> because <cstdint> guarantees to provide
the types we need, since r9-2028-g8ba7f29e3dd064.
Similarly, experimental::source_location can just assume uint_least32_t
is defined by <cstdint>.
libstdc++-v3/ChangeLog:
* include/bits/char_traits.h (char_traits<char16_t>): Do not
depend on _GLIBCXX_USE_C99_STDINT_TR1.
(char_traits<char32_t>): Likewise.
* include/experimental/source_location: Likewise.
Jonathan Wakely [Fri, 12 May 2023 13:04:04 +0000 (14:04 +0100)]
libstdc++: Reduce <atomic> dependency on _GLIBCXX_USE_C99_STDINT_TR1
Since r9-2028-g8ba7f29e3dd064 we've defined most of <cstdint>
unconditionally, so we can do the same for most of the std::atomic
aliases such as std::atomic_int_least32_t.
The only aliases that need to depend on _GLIBCXX_USE_C99_STDINT_TR1 are
the ones for the integer types that are not guaranteed to be defined,
e.g. std::atomic_int32_t.
Jonathan Wakely [Fri, 12 May 2023 12:55:17 +0000 (13:55 +0100)]
libstdc++: Remove <random> dependency on _GLIBCXX_USE_C99_STDINT_TR1
Since r9-2028-g8ba7f29e3dd064 we've defined most of <cstdint>
unconditionally, including uint_least32_t. This means that all of
<random> can be defined unconditionally, which means that std::shuffle
and std::ranges::shuffle can be too.
Gaius Mulley [Fri, 12 May 2023 16:44:29 +0000 (17:44 +0100)]
PR modula2/109830 m2iso library SeqFile.mod appending to a file overwrites content
This patch is for the m2iso library SeqFile.mod to fix a bug when a
file is opened using OpenAppend. The patch checks to see if the file
exists and it uses FIO.OpenForRandom to ensure the file is not
overwritten.
gcc/m2/ChangeLog:
PR modula2/109830
* gm2-libs-iso/SeqFile.mod (newCid): New parameter toAppend
used to select FIO.OpenForRandom.
(OpenRead): Pass extra parameter to newCid.
(OpenWrite): Pass extra parameter to newCid.
(OpenAppend): Pass extra parameter to newCid.
gcc/testsuite/ChangeLog:
PR modula2/109830
* gm2/isolib/run/pass/seqappend.mod: New test.
Uros Bizjak [Fri, 12 May 2023 16:37:13 +0000 (18:37 +0200)]
i386: Remove mulv2si emulated sequence for TARGET_SSE2 [PR109797]
Remove mulv2si emulated sequence for TARGET_SSE2 and enable
only native PMULLD instruction for TARGET_SSE4_1. Ideally, the
vectorization for TARGET_SSE2 should depend on more precise cost
estimation (the PR contains patch for ix86_multiplication_cost),
but even with patched cost function the runtime regression
was not fixed.
Tobias Burnus [Fri, 12 May 2023 14:27:40 +0000 (16:27 +0200)]
LTO: Fix writing of toplevel asm with offloading [PR109816]
When offloading was enabled, top-level 'asm' were added to the offloading
section, confusing assemblers which did not support the syntax. Additionally,
with offloading and -flto, the top-level assembler code did not end up
in the host files.
As r14-321-g9a41d2cdbcd added top-level 'asm' to one libstdc++ header file,
the issue became more apparent, causing fails with nvptx for some
C++ testcases.
PR libstdc++/109816
gcc/ChangeLog:
* lto-cgraph.cc (output_symtab): Guard lto_output_toplevel_asms by
'!lto_stream_offload_p'.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-map-class-1.C: New test.
* testsuite/libgomp.c++/target-map-class-2.C: New test.
Jonathan Wakely [Fri, 12 May 2023 12:44:21 +0000 (13:44 +0100)]
libstdc++: Remove test dependency on _GLIBCXX_USE_C99_STDINT_TR1
This should have been done in r9-2028-g8ba7f29e3dd064 when
std::shared_mutex was changed to be defined without depending on
_GLIBCXX_USE_C99_STDINT_TR1.
libstdc++-v3/ChangeLog:
* testsuite/experimental/feat-cxx14.cc: Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1.
Jonathan Wakely [Fri, 12 May 2023 12:38:50 +0000 (13:38 +0100)]
libstdc++: Remove test dependency on _GLIBCXX_USE_C99_STDINT_TR1
This should have been removed in r9-2029-g612c9c702e2c9e when the
char16_t and char32_t specializations of std::codecvt were changed to be
defined unconditionally.
libstdc++-v3/ChangeLog:
* testsuite/22_locale/locale/cons/unicode.cc: Remove dependency
on _GLIBCXX_USE_C99_STDINT_TR1.
Jonathan Wakely [Fri, 12 May 2023 12:34:37 +0000 (13:34 +0100)]
libstdc++: Remove test dependencies on _GLIBCXX_USE_C99_STDINT_TR1
These #ifdef checks should have been removed in r9-2029-g612c9c702e2c9e
when the u16string_view and u32string_view aliases were changed to be
defined unconditionally.
libstdc++-v3/ChangeLog:
* testsuite/21_strings/basic_string_view/typedefs.cc: Remove
dependency on _GLIBCXX_USE_C99_STDINT_TR1.
* testsuite/experimental/string_view/typedefs.cc: Likewise.
LCM will hoist vsetvl of bb 2 into bb 1.
We don't do AVL propagation for this situation since it's complicated that
we should analyze the code sequence between vsetvli in bb 1 and RVV insn in bb 2.
They are not necessary the consecutive blocks.
This patch is doing the optimizations after LCM, we will check and eliminate the vsetvli
in LCM inserted edge if such vsetvli is redundant. Such approach is much simplier and safe.
code:
void
foo2 (int32_t *a, int32_t *b, int n)
{
if (n <= 0)
return;
int i = n;
size_t vl = __riscv_vsetvl_e32m1 (i);
for (; i >= 0; i--)
{
vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl);
__riscv_vse32_v_i32m1 (b, v, vl);
if (i >= vl)
continue;
if (i == 0)
return;
vl = __riscv_vsetvl_e32m1 (i);
}
}
Before this patch:
foo2:
.LFB2:
.cfi_startproc
ble a2,zero,.L1
mv a4,a2
li a3,-1
vsetvli a5,a2,e32,m1,ta,mu
vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated.
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeu a4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,mu
addi a4,a4,-1
vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated.
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw a2,a2,-1
bltu a4,a5,.L10
.L3:
addiw a2,a2,-1
addi a4,a4,-1
bne a2,a3,.L5
.L1:
ret
After this patch:
f:
ble a2,zero,.L1
mv a4,a2
li a3,-1
vsetvli a5,a2,e32,m1,ta,ma
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeu a4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,ma
addi a4,a4,-1
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw a2,a2,-1
bltu a4,a5,.L10
.L3:
addiw a2,a2,-1
addi a4,a4,-1
bne a2,a3,.L5
.L1:
ret
PR target/109743
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vsetvl_at_end): New.
(local_avl_compatible_p): New.
(pass_vsetvl::local_eliminate_vsetvl_insn): Enhance local optimizations
for LCM, rewrite as a backward algorithm.
(pass_vsetvl::cleanup_insns): Use new local_eliminate_vsetvl_insn
interface, handle a BB at once.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test.
Richard Biener [Fri, 12 May 2023 11:43:27 +0000 (13:43 +0200)]
tree-optimization/64731 - extend store-from CTOR lowering to TARGET_MEM_REF
The following also covers TARGET_MEM_REF when decomposing stores from
CTORs to supported elementwise operations. This avoids spilling
and cleans up after vector lowering which doesn't touch loads or
stores. It also mimics what we already do for loads.
PR tree-optimization/64731
* tree-ssa-forwprop.cc (pass_forwprop::execute): Also
handle TARGET_MEM_REF destinations of stores from vector
CTORs.
Patrick Palka [Fri, 12 May 2023 12:37:54 +0000 (08:37 -0400)]
c++: remove redundant testcase [PR83258]
I noticed only after the fact that the new testcase template/function2.C
(from r14-708-gc3afdb8ba8f183) is just a subset of ext/visibility/anon8.C,
so let's get rid of it.
The following adds another variant of address difference simplification.
The utility ptr_difference_const only handles constant differences
(we also cannot code generate anything else), so exposing a possible
POINTER_PLUS_EXPR in the match and computing the difference on the
base only makes it possible to handle one case of a variable offset.
This simplifies
down to (1 - (unsigned long) _69) during niter analysis, allowing
ranger to eliminate a condition later and avoiding a bogus
-Wstringop-overflow diagnostic for the testcase in the PR.