gcc.gnu.org Git - gcc.git/log

]> gcc.gnu.org Git - gcc.git/log

git://gcc.gnu.org / gcc.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

David Malcolm [Fri, 7 Jun 2024 20:14:28 +0000 (16:14 -0400)]

analyzer: new warning: -Wanalyzer-undefined-behavior-ptrdiff (PR analyzer/105892)

Add a new warning to complain about pointer subtraction involving
different chunks of memory.

For example, given:

  #include <stddef.h>

  int arr[42];
  int sentinel;

  ptrdiff_t
  test_invalid_calc_of_array_size (void)
  {
    return &sentinel - arr;
  }

this emits:

demo.c: In function ‘test_invalid_calc_of_array_size’:
demo.c:9:20: warning: undefined behavior when subtracting pointers [CWE-469] [-Wanalyzer-undefined-behavior-ptrdiff]
    9 |   return &sentinel - arr;
      |                    ^
  events 1-2
    │
    │    3 | int arr[42];
    │      |     ~~~
    │      |     |
    │      |     (2) underlying object for right-hand side of subtraction created here
    │    4 | int sentinel;
    │      |     ^~~~~~~~
    │      |     |
    │      |     (1) underlying object for left-hand side of subtraction created here
    │
    └──> ‘test_invalid_calc_of_array_size’: event 3
           │
           │    9 |   return &sentinel - arr;
           │      |                    ^
           │      |                    |
           │      |                    (3) ⚠️  subtraction of pointers has undefined behavior if they do not point into the same array object
           │

gcc/analyzer/ChangeLog:
PR analyzer/105892
* analyzer.opt (Wanalyzer-undefined-behavior-ptrdiff): New option.
* analyzer.opt.urls: Regenerate.
* region-model.cc (class undefined_ptrdiff_diagnostic): New.
(check_for_invalid_ptrdiff): New.
(region_model::get_gassign_result): Call it for POINTER_DIFF_EXPR.

gcc/ChangeLog:
* doc/invoke.texi: Add -Wanalyzer-undefined-behavior-ptrdiff.

gcc/testsuite/ChangeLog:
PR analyzer/105892
* c-c++-common/analyzer/out-of-bounds-pr110387.c: Add
expected warnings about pointer subtraction.
* c-c++-common/analyzer/ptr-subtraction-1.c: New test.
* c-c++-common/analyzer/ptr-subtraction-CWE-469-example.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

commit | commitdiff | tree

Jonathan Wakely [Fri, 7 Jun 2024 08:49:06 +0000 (09:49 +0100)]

libstdc++: Add missing header to <bits/ranges_algobase.h> for std::__memcmp

As noticed by Michael Levine.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h: Include <bits/stl_algobase.h>.

commit | commitdiff | tree

Simon Martin [Tue, 4 Jun 2024 19:20:23 +0000 (21:20 +0200)]

c++: Handle erroneous DECL_LOCAL_DECL_ALIAS in duplicate_decls [PR107575]

We currently ICE upon the following because we don't properly handle local
functions with an error_mark_node as DECL_LOCAL_DECL_ALIAS in duplicate_decls.

=== cut here ===
void f (void) {
virtual int f (void) const;
virtual int f (void);
}
=== cut here ===

This patch fixes this by checking for error_mark_node.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/107575

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Check for error_mark_node
DECL_LOCAL_DECL_ALIAS.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash74.C: New test.

commit | commitdiff | tree

Jason Merrill [Wed, 5 Jun 2024 02:27:56 +0000 (22:27 -0400)]

c++: -include and header unit translation

Within a source file, #include is translated to import if a suitable header
unit is available, but this wasn't working with -include. This turned out
to be because we suppressed the translation before the beginning of the
main file. After removing that, I had to tweak libcpp file handling to
accommodate the way it moves from an -include to the main file.

gcc/ChangeLog:

* doc/invoke.texi (C++ Modules): Mention -include.

gcc/cp/ChangeLog:

* module.cc (maybe_translate_include): Allow before the main file.

libcpp/ChangeLog:

* files.cc (_cpp_stack_file): LC_ENTER for -include header unit.

gcc/testsuite/ChangeLog:

* g++.dg/modules/dashinclude-1_b.C: New test.
* g++.dg/modules/dashinclude-1_a.H: New test.

commit | commitdiff | tree

Patrick Palka [Fri, 7 Jun 2024 16:12:30 +0000 (12:12 -0400)]

c++: lambda in pack expansion [PR115378]

Here find_parameter_packs_r is incorrectly treating the 'auto' return
type of a lambda as a parameter pack due to Concepts-TS specific logic
added in r6-4517, leading to confusion later when expanding the pattern.

Since we intend on removing Concepts TS support soon anyway, this patch
fixes this by restricting the problematic logic with flag_concepts_ts.
Doing so revealed that add_capture was relying on this logic to set
TEMPLATE_TYPE_PARAMETER_PACK for the 'auto' type of an pack expansion
init-capture, which we now need to do explicitly.

PR c++/115378

gcc/cp/ChangeLog:

* lambda.cc (lambda_capture_field_type): Set
TEMPLATE_TYPE_PARAMETER_PACK on the auto type of an init-capture
pack expansion.
* pt.cc (find_parameter_packs_r) <case TEMPLATE_TYPE_PARM>:
Restrict TEMPLATE_TYPE_PARAMETER_PACK promotion with
flag_concepts_ts.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto-103497.C: Adjust expected diagnostic.
* g++.dg/template/pr95672.C: Likewise.
* g++.dg/cpp2a/lambda-targ5.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

commit | commitdiff | tree

Simon Martin [Fri, 7 Jun 2024 14:14:58 +0000 (16:14 +0200)]

lto: Fix build on MacOS

The build fails on x86_64-apple-darwin19.6.0 starting with 5b6d5a886ee because
vector is included after system.h and runs into poisoned identifiers.

This patch fixes this by defining INCLUDE_VECTOR before including system.h.

Validated by doing a full build on x86_64-apple-darwin19.6.0.

gcc/lto/ChangeLog:

* lto-partition.cc: Define INCLUDE_VECTOR to avoid running into
poisoned identifiers.

commit | commitdiff | tree

Roger Sayle [Fri, 7 Jun 2024 13:03:20 +0000 (14:03 +0100)]

i386: PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

This patch addresses PR target/115351, which is a code quality regression
on x86 when passing floating point complex numbers.  The ABI considers
these arguments to have TImode, requiring interunit moves to place the
FP values (which are actually passed in SSE registers) into the upper
and lower parts of a TImode pseudo, and then similar moves back again
before they can be used.

The cause of the regression is that changes in how TImode initialization
is represented in RTL now prevents the RTL optimizers from eliminating
these redundant moves.  The specific cause is that the *concatditi3
pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
rtx_cost, preventing fwprop1 from propagating it.  This pattern just
sets the hipart and lopart of a double-word register, typically two
instructions (less if reload can allocate things appropriately) but
the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.

propagating insn 5 into insn 6, replacing:
(set (reg:TI 110)
    (ior:TI (and:TI (reg:TI 110)
            (const_wide_int 0x0ffffffffffffffff))
        (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
            (const_int 64 [0x40]))))
successfully matched this instruction to *concatditi3_3:
(set (reg:TI 110)
    (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
            (const_int 64 [0x40]))
        (zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0))))
change not profitable (cost 50 -> cost 52)

This issue is resolved by having ix86_rtx_costs return more reasonable
values for these (place-holder) patterns.

2024-06-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/115351
* config/i386/i386.cc (ix86_rtx_costs): Provide estimates for
the *concatditi3 and *insvti_highpart patterns, about two insns.

gcc/testsuite/ChangeLog
PR target/115351
* g++.target/i386/pr115351.C: New test case.

commit | commitdiff | tree

Roger Sayle [Fri, 7 Jun 2024 12:57:23 +0000 (13:57 +0100)]

i386: Improve handling of ternlog instructions in i386/sse.md

This patch improves the way that the x86 backend recognizes and
expands AVX512's bitwise ternary logic (vpternlog) instructions.

As a motivating example consider the following code which calculates
the carry out from a (binary) full adder:

typedef unsigned long long v4di __attribute((vector_size(32)));

v4di foo(v4di a, v4di b, v4di c)
{
    return (a & b) | ((a ^ b) & c);
}

with -O2 -march=cascadelake current mainline produces:

foo:    vpternlogq      $96, %ymm0, %ymm1, %ymm2
        vmovdqa %ymm0, %ymm3
        vmovdqa %ymm2, %ymm0
        vpternlogq      $248, %ymm3, %ymm1, %ymm0
        ret

with the patch below, we now generate a single instruction:

foo:    vpternlogq      $232, %ymm2, %ymm1, %ymm0
        ret

The AVX512 vpternlog[qd] instructions are a very cool addition to the
x86 instruction set, that can calculate any Boolean function of three
inputs in a single fast instruction.  As the truth table for any
three-input function has 8 rows, any specific function can be represented
by specifying those bits, i.e. by a 8-bit byte, an immediate integer
between 0 and 256.

Examples of ternary functions and their indices are given below:

0x01   1:  ~((b|a)|c)
0x02   2:  (~(b|a))&c
0x03   3:  ~(b|a)
0x04   4:  (~(c|a))&b
0x05   5:  ~(c|a)
0x06   6:  (c^b)&~a
0x07   7:  ~((c&b)|a)
0x08   8:  (~a&c)&b (~a&b)&c (c&b)&~a
0x09   9:  ~((c^b)|a)
0x0a  10:  ~a&c
0x0b  11:  ~((~c&b)|a) (~b|c)&~a
0x0c  12:  ~a&b
0x0d  13:  ~((~b&c)|a) (~c|b)&~a
0x0e  14:  (c|b)&~a
0x0f  15:  ~a
0x10  16:  (~(c|b))&a
0x11  17:  ~(c|b)
...
0xf4 244:  (~c&b)|a
0xf5 245:  ~c|a
0xf6 246:  (c^b)|a
0xf7 247:  (~(c&b))|a
0xf8 248:  (c&b)|a
0xf9 249:  (~(c^b))|a
0xfa 250:  c|a
0xfb 251:  (c|a)|~b (~b|a)|c (~b|c)|a
0xfc 252:  b|a
0xfd 253:  (b|a)|~c (~c|a)|b (~c|b)|a
0xfe 254:  (b|a)|c (c|a)|b (c|b)|a

A naive implementation (in many compilers) might be add define_insn
patterns for all 256 different functions.  The situation is even
worse as many of these Boolean functions don't have a "canonical form"
(as produced by simplify_rtx) and would each need multiple patterns.
See the space-separated equivalent expressions in the table above.

This need to provide instruction "templates" might explain why GCC,
LLVM and ICC all exhibit similar coverage problems in their ability
to recognize x86 ternlog ternary functions.

Perhaps a unique feature of GCC's design is that in addition to regular
define_insn templates, machine descriptions can also perform pattern
matching via a match_operator (and its corresponding predicate).
This patch introduces a ternlog_operand predicate that matches a
(possibly infinite) set of expression trees, identifying those that
have at most three unique operands.  This then allows a
define_insn_and_split to recognize suitable expressions and then
transform them into the appropriate UNSPEC_VTERNLOG as a pre-reload
splitter.  This design allows combine to smash together arbitrarily
complex Boolean expressions, then transform them into an UNSPEC
before register allocation.  As an "optimization", where possible
ix86_expand_ternlog generates a simpler binary operation, using
AND, XOR, IOR or ANDN where possible, and in a few cases attempts
to "canonicalize" the ternlog, by reordering or duplicating operands,
so that later CSE passes have a hope of spotting equivalent values.

This patch leaves the existing ternlog patterns in sse.md (for now),
many of which are made obsolete by these changes.  In theory we now
only need one define_insn for UNSPEC_VTERNLOG.  One complication from
these previous variants was that they inconsistently used decimal vs.
hexadecimal to specify the immediate constant operand in assembly
language, making the list of tweaks to the testsuite with this patch
larger than it might have been.  I propose to remove the vestigial
patterns in a follow-up patch, once this approach has baked (proven
to be stable) on mainline.

2024-06-07  Roger Sayle  <roger@nextmovesoftware.com>
    Hongtao Liu  <hongtao.liu@intel.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Call
fixup_modeless_constant before testing predicates.  Only call
copy_to_mode_reg on memory operands (after the first one).
(ix86_gen_bcst_mem): Helper function to convert a CONST_VECTOR
into a VEC_DUPLICATE if possible.
(ix86_ternlog_idx):  Convert an RTX expression into a ternlog
index between 0 and 255, recording the operands in ARGS, if
possible or return -1 if this is not possible/valid.
(ix86_ternlog_leaf_p): Helper function to identify "leaves"
of a ternlog expression, e.g. REG_P, MEM_P, CONST_VECTOR, etc.
(ix86_ternlog_operand_p): Test whether a expression is suitable
for and prefered as an UNSPEC_TERNLOG.
(ix86_expand_ternlog_binop): Helper function to construct the
binary operation corresponding to a sufficiently simple ternlog.
(ix86_expand_ternlog_andnot): Helper function to construct a
ANDN operation corresponding to a sufficiently simple ternlog.
(ix86_expand_ternlog): Expand a 3-operand ternary logic
expression, constructing either an UNSPEC_TERNLOG or simpler
rtx expression.  Called from builtin expanders and pre-reload
splitters.
* config/i386/i386-protos.h (ix86_ternlog_idx): Prototype here.
(ix86_ternlog_operand_p): Likewise.
(ix86_expand_ternlog): Likewise.
* config/i386/predicates.md (ternlog_operand): New predicate
that calls xi86_ternlog_operand_p.
* config/i386/sse.md (<avx512>_vpternlog<mode>_0): New
define_insn_and_split that recognizes a SET_SRC of ternlog_operand
and expands it via ix86_expand_ternlog pre-reload.
(<avx512>_vternlog<mode>_mask): Convert from define_insn to
define_expand.  Use ix86_expand_ternlog if the mask operand is
~0 (or 255 or -1).
(*<avx512>_vternlog<mode>_mask): define_insn renamed from above.

gcc/testsuite/ChangeLog
* gcc.target/i386/avx512f-vpternlogd-1.c: Update test case.
* gcc.target/i386/avx512f-vpternlogq-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise.
* gcc.target/i386/pr100711-4.c: Likewise.
* gcc.target/i386/pr100711-5.c: Likewise.

* gcc.target/i386/avx512f-vpternlogd-3.c: New 128-bit test case.
* gcc.target/i386/avx512f-vpternlogd-4.c: New 256-bit test case.
* gcc.target/i386/avx512f-vpternlogd-5.c: New 512-bit test case.
* gcc.target/i386/avx512f-vpternlogq-3.c: New test case.

commit | commitdiff | tree

Michal Jires [Fri, 17 Nov 2023 20:17:18 +0000 (21:17 +0100)]

lto: Implement cache partitioning

This patch implements new cache partitioning. It tries to keep symbols
from single source file together to minimize propagation of divergence.

It starts with symbols already grouped by source files. If reasonably
possible it only either combines several files into one final partition,
or, if a file is large, split the file into several final partitions.

Intermediate representation is partition_set which contains set of
groups of symbols (each group corresponding to original source file) and
number of final partitions this partition_set should split into.

First partition_fixed_split splits partition_set into constant number of
partition_sets with equal number of symbols groups. If for example there
are 39 source files, the resulting partition_sets will contain 10, 10,
10, and 9 source files. This splitting intentionally ignores estimated
instruction counts to minimize propagation of divergence.

Second partition_over_target_split separates too large files and splits
them into individual symbols to be combined back into several smaller
files in next step.

Third partition_binary_split splits partition_set into two halves until
it should be split into only one final partition, at which point the
remaining symbols are joined into one final partition.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* common.opt: Add cache partitioning.
* flag-types.h (enum lto_partition_model): Likewise.

gcc/lto/ChangeLog:

* lto-partition.cc (new_partition): Use new_partition_no_push.
(new_partition_no_push): New.
(free_ltrans_partition): New.
(free_ltrans_partitions): Use free_ltrans_partition.
(join_partitions): New.
(split_partition_into_nodes): New.
(is_partition_reorder): New.
(class partition_set): New.
(distribute_n_partitions): New.
(partition_over_target_split): New.
(partition_binary_split): New.
(partition_fixed_split): New.
(class partitioner_base): New.
(class partitioner_default): New.
(lto_cache_map): New.
* lto-partition.h (lto_cache_map): New.
* lto.cc (do_whole_program_analysis): Use lto_cache_map.

gcc/testsuite/ChangeLog:

* gcc.dg/completion-2.c: Add -flto-partition=cache.

commit | commitdiff | tree

Alexandre Oliva [Fri, 7 Jun 2024 10:00:11 +0000 (07:00 -0300)]

[libstdc++] drop workaround for clang<=7

In response to a request in the review of the patch that introduced
_GLIBCXX_CLANG, this patch removes from std/variant an obsolete
workaround for clang 7-.

for libstdc++-v3/ChangeLog

* include/std/variant: Drop obsolete workaround.

commit | commitdiff | tree

Richard Biener [Fri, 7 Jun 2024 07:41:11 +0000 (09:41 +0200)]

Fix fold-left reduction vectorization with multiple stmt copies

There's a typo when code generating the mask operand for conditional
fold-left reductions in the case we have multiple stmt copies. The
latter is now allowed for SLP and possibly disabled for non-SLP by
accident.

This fixes the observed run-FAIL for
gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c with AVX512
and 256bit sized vectors.

* tree-vect-loop.cc (vectorize_fold_left_reduction): Fix
mask vector operand indexing.

commit | commitdiff | tree

Jonathan Wakely [Mon, 18 Mar 2024 16:58:23 +0000 (16:58 +0000)]

libstdc++: Optimize std::to_address

We can use if-constexpr and variable templates to simplify and optimize
std::to_address. This should compile faster (and run faster for -O0)
than dispatching to the pre-C++20 std::__to_address overloads.

libstdc++-v3/ChangeLog:

* include/bits/ptr_traits.h (to_address): Optimize.
* testsuite/20_util/to_address/1_neg.cc: Adjust dg-error text.

commit | commitdiff | tree

Francois-Xavier Coudert [Sun, 2 Jun 2024 19:07:23 +0000 (21:07 +0200)]

fixincludes: bypass some fixes for recent darwin headers

fixincludes/ChangeLog:

* fixincl.x: Regenerate.
* inclhack.def (darwin_stdint_7, darwin_dispatch_object_1,
darwin_os_trace_2, darwin_os_base_1): Include bypasses
for recent headers, fixed by Apple.

commit | commitdiff | tree

Andre Vehreschild [Thu, 27 Jul 2023 12:51:34 +0000 (14:51 +0200)]

Add finalizer creation to array constructor for functions of derived type.

PR fortran/90068

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_trans_array_ctor_element): Eval non-
variable expressions once only.
(gfc_trans_array_constructor_value): Add statements of
final block.
(trans_array_constructor): Detect when final block is required.

gcc/testsuite/ChangeLog:

* gfortran.dg/finalize_57.f90: New test.

commit | commitdiff | tree

Jakub Jelinek [Fri, 7 Jun 2024 08:32:08 +0000 (10:32 +0200)]

bitint: Fix up lower_addsub_overflow [PR115352]

The following testcase is miscompiled because of a flawed optimization.
If one changes the 65 in the testcase to e.g. 66, one gets:
...
  _25 = .USUBC (0, _24, _14);
  _12 = IMAGPART_EXPR <_25>;
  _26 = REALPART_EXPR <_25>;
  if (_23 >= 1)
    goto <bb 8>; [80.00%]
  else
    goto <bb 11>; [20.00%]

  <bb 8> :
  if (_23 != 1)
    goto <bb 10>; [80.00%]
  else
    goto <bb 9>; [20.00%]

  <bb 9> :
  _27 = (signed long) _26;
  _28 = _27 >> 1;
  _29 = (unsigned long) _28;
  _31 = _29 + 1;
  _30 = _31 > 1;
  goto <bb 11>; [100.00%]

  <bb 10> :
  _32 = _26 != _18;
  _33 = _22 | _32;

  <bb 11> :
  # _17 = PHI <_30(9), _22(7), _33(10)>
  # _19 = PHI <_29(9), _18(7), _18(10)>
...
so there is one path for limbs below the boundary (in this case there are
actually no limbs there, maybe we could consider optimizing that further,
say with simply folding that _23 >= 1 condition to 1 == 1 and letting
cfg cleanup handle it), another case where it is exactly the limb on the
boundary (that is the bb 9 handling where it extracts the interesting
bits (the first 3 statements) and then checks if it is zero or all ones and
finally the case of limbs above that where it compares the current result
limb against the previously recorded 0 or all ones and ors differences into
accumulated result.

Now, the optimization which the first hunk removes was based on the idea
that for that case the extraction of the interesting bits from the limb
don't need anything special, so the _27/_28/_29 statements above aren't
needed, the whole limb is interesting bits, so it handled the >= 1
case like the bb 9 above without the first 3 statements and bb 10 wasn't
there at all.  There are 2 problems with that, for the higher limbs it
only checks if the the result limb bits are all zeros or all ones, but
doesn't check if they are the same as the other extension bits, and
it forgets the previous flag whether there was an overflow.
First I wanted to fix it just by adding the _33 = _22 | _30; statement
to the end of bb 9 above, which fixed the originally filed huge testcase
and the first 2 foo calls in the testcase included in the patch, it no
longer forgets about previously checked differences from 0/1.
But as the last 2 foo calls show, it still didn't check whether each
even (or each odd depending on the exact position) result limb is
equal to the first one, so every second limb it could choose some other
0 vs. all ones value and as long as it repeated in another limb above it
it would be ok.

So, the optimization just can't work properly and the following patch
removes it.

2024-06-07  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/115352
* gimple-lower-bitint.cc (lower_addsub_overflow): Don't disable
single_comparison if cmp_code is GE_EXPR.

* gcc.dg/torture/bitint-71.c: New test.

commit | commitdiff | tree

Rainer Orth [Fri, 7 Jun 2024 08:14:23 +0000 (10:14 +0200)]

go: Fix gccgo -v on Solaris with ld

The Go testsuite's go.sum file ends in

Couldn't determine version of /var/gcc/regression/master/11.4-gcc-64/build/gcc/gccgo

on Solaris.  It turns out this happens because gccgo -v is confused:

[...]
gcc version 15.0.0 20240531 (experimental) [master a0d60660f2aae2d79685f73d568facb2397582d8] (GCC)
COMPILER_PATH=./:/usr/ccs/bin/
LIBRARY_PATH=./:/lib/amd64/:/usr/lib/amd64/:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-g1' '-B' './' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a.'
./collect2 -V -M ./libgcc-unwind.map -Qy /usr/lib/amd64/crt1.o ./crtp.o /usr/lib/amd64/crti.o /usr/lib/amd64/values-Xa.o /usr/lib/amd64/values-xpg6.o ./crtbegin.o -L. -L/lib/amd64 -L/usr/lib/amd64 -t -lgcc_s -lgcc -lc -lgcc_s -lgcc ./crtend.o /usr/lib/amd64/crtn.o
ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.3297
Undefined first referenced
symbol       in file
main                                /usr/lib/amd64/crt1.o
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status

trying to invoke the linker without adding any object file.  This only
happens when Solaris ld is in use.  gccgo passes -t to the linker in
that case, but does it unconditionally, even with -v.

When configured to use GNU ld, gccgo -v is fine instead.

This patch avoids this by restricting the -t to actually linking.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (ld and gld).

2024-06-05  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/go:
* gospec.cc (lang_specific_driver) [TARGET_SOLARIS !USE_GLD]: Only
add -t if linking.

commit | commitdiff | tree

Rainer Orth [Fri, 7 Jun 2024 08:12:09 +0000 (10:12 +0200)]

testsuite: go: Require split-stack support for go.test/test/index0.go [PR87589]

The index0-out.go test FAILs on Solaris (SPARC and x86, 32 and 64-bit),
as well as several others:

FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments

The test SEGVs because it tries a stack acess way beyond the stack
area.  As Ian analyzed in the PR, the testcase currently requires
split-stack support, so this patch requires just that.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.

2024-06-05  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
PR go/87589
* go.test/go-test.exp (go-gc-tests): Require split-stack support
for index0.go.

commit | commitdiff | tree

Andre Vehreschild [Wed, 19 Jul 2023 09:57:43 +0000 (11:57 +0200)]

Fix returned type to be allocatable for user-functions.

The returned type of user-defined function returning a
class object was not detected and handled correctly, which
lead to memory leaks.

PR fortran/90072

gcc/fortran/ChangeLog:

* expr.cc (gfc_is_alloc_class_scalar_function): Detect
allocatable class return types also for user-defined
functions.
* trans-expr.cc (gfc_conv_procedure_call): Same.
(trans_class_vptr_len_assignment): Compute vptr len
assignment correctly for user-defined functions.

gcc/testsuite/ChangeLog:

* gfortran.dg/class_77.f90: New test.

commit | commitdiff | tree

Alexandre Oliva [Wed, 29 May 2024 05:52:07 +0000 (02:52 -0300)]

enable adjustment of return_pc debug attrs

This patch introduces infrastructure for targets to add an offset to
the label issued after the call_insn to set the call_return_pc
attribute.  This will be used on rs6000, that sometimes issues another
instruction after the call proper as part of a call insn.

for  gcc/ChangeLog

* target.def (call_offset_return_label): New hook.
* doc/tm.texi.in (TARGET_CALL_OFFSET_RETURN_LABEL): Add
placeholder.
* doc/tm.texi: Rebuild.
* dwarf2out.cc (struct call_arg_loc_node): Record call_insn
instead of call_arg_loc_note.
(add_AT_lbl_id): Add optional offset argument.
(gen_call_site_die): Compute and pass on a return pc offset.
(gen_subprogram_die): Move call_arg_loc_note computation...
(dwarf2out_var_location): ... from here.  Set call_insn.

commit | commitdiff | tree

liuhongt [Fri, 7 Jun 2024 01:29:24 +0000 (09:29 +0800)]

Add additional option --param max-completely-peeled-insns=200 for power64*-*-*

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr112325.c:Add additional option --param
max-completely-peeled-insns=200 for power64*-*-*.

commit | commitdiff | tree

Pan Li [Mon, 3 Jun 2024 02:43:10 +0000 (10:43 +0800)]

RISC-V: Add testcases for scalar unsigned SAT_ADD form 5

After the middle-end support the form 5 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 5 of unsigned .SAT_ADD.

Form 5:
  #define SAT_ADD_U_5(T) \
  T sat_add_u_5_##T(T x, T y) \
  { \
    return (T)(x + y) < x ? -1 : (x + y); \
  }

Passed the riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for form 5.
* gcc.target/riscv/sat_u_add-21.c: New test.
* gcc.target/riscv/sat_u_add-22.c: New test.
* gcc.target/riscv/sat_u_add-23.c: New test.
* gcc.target/riscv/sat_u_add-24.c: New test.
* gcc.target/riscv/sat_u_add-run-21.c: New test.
* gcc.target/riscv/sat_u_add-run-22.c: New test.
* gcc.target/riscv/sat_u_add-run-23.c: New test.
* gcc.target/riscv/sat_u_add-run-24.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Mon, 3 Jun 2024 02:33:15 +0000 (10:33 +0800)]

RISC-V: Add testcases for scalar unsigned SAT_ADD form 4

After the middle-end support the form 4 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 4 of unsigned .SAT_ADD.

Form 4:
  #define SAT_ADD_U_4(T) \
  T sat_add_u_4_##T (T x, T y) \
  { \
    T ret; \
    return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for form 4.
* gcc.target/riscv/sat_u_add-17.c: New test.
* gcc.target/riscv/sat_u_add-18.c: New test.
* gcc.target/riscv/sat_u_add-19.c: New test.
* gcc.target/riscv/sat_u_add-20.c: New test.
* gcc.target/riscv/sat_u_add-run-17.c: New test.
* gcc.target/riscv/sat_u_add-run-18.c: New test.
* gcc.target/riscv/sat_u_add-run-19.c: New test.
* gcc.target/riscv/sat_u_add-run-20.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Mon, 3 Jun 2024 02:24:47 +0000 (10:24 +0800)]

RISC-V: Add testcases for scalar unsigned SAT_ADD form 3

After the middle-end support the form 3 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 3 of unsigned .SAT_ADD.

Form 3:
  #define SAT_ADD_U_3(T) \
  T sat_add_u_3_##T (T x, T y) \
  { \
    T ret; \
    return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for form 3.
* gcc.target/riscv/sat_u_add-13.c: New test.
* gcc.target/riscv/sat_u_add-14.c: New test.
* gcc.target/riscv/sat_u_add-15.c: New test.
* gcc.target/riscv/sat_u_add-16.c: New test.
* gcc.target/riscv/sat_u_add-run-13.c: New test.
* gcc.target/riscv/sat_u_add-run-14.c: New test.
* gcc.target/riscv/sat_u_add-run-15.c: New test.
* gcc.target/riscv/sat_u_add-run-16.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Mon, 3 Jun 2024 01:35:49 +0000 (09:35 +0800)]

RISC-V: Add testcases for scalar unsigned SAT_ADD form 2

After the middle-end support the form 2 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 2 of unsigned .SAT_ADD.

Form 2:

  #define SAT_ADD_U_2(T) \
  T sat_add_u_2_##T(T x, T y) \
  { \
    T ret; \
    T overflow = __builtin_add_overflow (x, y, &ret); \
    return (T)(-overflow) | ret; \
  }

Passed the rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for form 2.
* gcc.target/riscv/sat_u_add-10.c: New test.
* gcc.target/riscv/sat_u_add-11.c: New test.
* gcc.target/riscv/sat_u_add-12.c: New test.
* gcc.target/riscv/sat_u_add-9.c: New test.
* gcc.target/riscv/sat_u_add-run-10.c: New test.
* gcc.target/riscv/sat_u_add-run-11.c: New test.
* gcc.target/riscv/sat_u_add-run-12.c: New test.
* gcc.target/riscv/sat_u_add-run-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Pan Li [Wed, 29 May 2024 06:15:45 +0000 (14:15 +0800)]

RISC-V: Add testcases for scalar unsigned SAT_ADD form 1

After the middle-end support the form 1 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 1 of unsigned .SAT_ADD.

Form 1:

  #define SAT_ADD_U_1(T)                   \
  T sat_add_u_1_##T(T x, T y)              \
  {                                        \
    return (T)(x + y) >= x ? (x + y) : -1; \
  }

Passed the riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for form 1.
* gcc.target/riscv/sat_u_add-5.c: New test.
* gcc.target/riscv/sat_u_add-6.c: New test.
* gcc.target/riscv/sat_u_add-7.c: New test.
* gcc.target/riscv/sat_u_add-8.c: New test.
* gcc.target/riscv/sat_u_add-run-5.c: New test.
* gcc.target/riscv/sat_u_add-run-6.c: New test.
* gcc.target/riscv/sat_u_add-run-7.c: New test.
* gcc.target/riscv/sat_u_add-run-8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

GCC Administrator [Fri, 7 Jun 2024 00:16:38 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

Pan Li [Thu, 6 Jun 2024 01:19:53 +0000 (09:19 +0800)]

Match: Support more form for scalar unsigned SAT_ADD

After we support one gassign form of the unsigned .SAT_ADD,  we
would like to support more forms including both the branch and
branchless.  There are 5 other forms of .SAT_ADD,  list as below:

Form 1:
  #define SAT_ADD_U_1(T) \
  T sat_add_u_1_##T(T x, T y) \
  { \
    return (T)(x + y) >= x ? (x + y) : -1; \
  }

Form 2:
  #define SAT_ADD_U_2(T) \
  T sat_add_u_2_##T(T x, T y) \
  { \
    T ret; \
    T overflow = __builtin_add_overflow (x, y, &ret); \
    return (T)(-overflow) | ret; \
  }

Form 3:
  #define SAT_ADD_U_3(T) \
  T sat_add_u_3_##T (T x, T y) \
  { \
    T ret; \
    return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
  }

Form 4:
  #define SAT_ADD_U_4(T) \
  T sat_add_u_4_##T (T x, T y) \
  { \
    T ret; \
    return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
  }

Form 5:
  #define SAT_ADD_U_5(T) \
  T sat_add_u_5_##T(T x, T y) \
  { \
    return (T)(x + y) < x ? -1 : (x + y); \
  }

Take the forms 3 of above as example:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  uint64_t ret;
  return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
}

Before this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  long unsigned int _2;
  uint64_t _3;
  __complex__ long unsigned int _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  _2 = IMAGPART_EXPR <_6>;
  if (_2 != 0)
    goto <bb 4>; [35.00%]
  else
    goto <bb 3>; [65.00%]
;;    succ:       4
;;                3

;;   basic block 3, loop depth 0
;;    pred:       2
  _1 = REALPART_EXPR <_6>;
;;    succ:       4

;;   basic block 4, loop depth 0
;;    pred:       3
;;                2
  # _3 = PHI <_1(3), 18446744073709551615(2)>
  return _3;
;;    succ:       EXIT
}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _12;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  return _12;
;;    succ:       EXIT
}

The flag '^' acts on cond_expr will generate matching code similar as below:

else if (gphi *_a1 = dyn_cast <gphi *> (_d1))
  {
    basic_block _b1 = gimple_bb (_a1);
    if (gimple_phi_num_args (_a1) == 2)
      {
        basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
        basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
        basic_block _db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1))
                            ? _pb_0_1 : _pb_1_1;
        basic_block _other_db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1))
                                  ? _pb_1_1 : _pb_0_1;
        gcond *_ct_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_db_1));
        if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
          && EDGE_COUNT (_other_db_1->succs) == 1
          && EDGE_PRED (_other_db_1, 0)->src == _db_1)
          {
            tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
            tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
            tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node,
                               _cond_lhs_1, _cond_rhs_1);
            bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & EDGE_TRUE_VALUE;
            tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
            tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
            ....

The below test suites are passed for this patch.
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

* doc/match-and-simplify.texi: Add doc for the matching flag '^'.
* genmatch.cc (cmp_operand): Add match_phi comparation.
(dt_node::gen_kids_1): Add cond_expr bool flag for phi match.
(dt_operand::gen_phi_on_cond): Add new func to gen phi matching
on cond_expr.
(parser::parse_expr): Add handling for the expr flag '^'.
* match.pd: Add more form for unsigned .SAT_ADD.
* tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
new func impl to build call for phi gimple.
(match_unsigned_saturation_add): Add new func impl to match the
.SAT_ADD for phi gimple.
(math_opts_dom_walker::after_dom_children): Add phi matching
try for all gimple phi stmt.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Jakub Jelinek [Thu, 6 Jun 2024 20:12:11 +0000 (22:12 +0200)]

c: Fix up pointer types to may_alias structures [PR114493]

The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
          gcc_assert (TYPE_CANONICAL (t2) != t2
                      && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.

This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.

The following patch does that in the C FE as well.

2024-06-06  Jakub Jelinek  <jakub@redhat.com>

PR c/114493
* c-decl.cc (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.

* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.

commit | commitdiff | tree

Pengxuan Zheng [Fri, 31 May 2024 00:53:23 +0000 (17:53 -0700)]

aarch64: Add vector floating point extend pattern [PR113880, PR113869]

This patch adds vector floating point extend pattern for V2SF->V2DF and
V4HF->V4SF conversions by renaming the existing aarch64_float_extend_lo_<Vwide>
pattern to the standard optab one, i.e., extend<mode><Vwide>2. This allows the
vectorizer to vectorize certain floating point widening operations for the
aarch64 target.

PR target/113880
PR target/113869

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (VAR1): Remap float_extend_lo_
builtin codes to standard optab ones.
* config/aarch64/aarch64-simd.md (aarch64_float_extend_lo_<Vwide>): Rename
to...
(extend<mode><Vwide>2): ... This.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/extend-vec.c: New test.

Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>

commit | commitdiff | tree

Gaius Mulley [Thu, 6 Jun 2024 18:27:56 +0000 (19:27 +0100)]

modula2: Simplify REAL/LONGREAL/SHORTREAL node creation.

This patch simplifies the real type build functions by using
the default float_type_node, double_type_node rather than create
new nodes. It also uses the default GCC long_double_type_node
or float128_type_nodes for longreal.

gcc/m2/ChangeLog:

* gm2-gcc/m2type.cc (build_m2_short_real_node): Rewrite
to use the default float_type_node.
(build_m2_real_node): Rewrite to use the default
double_type_node.
(build_m2_long_real_node): Rewrite to use the default
long_double_type_node or float128_type_node.

Co-Authored-By: Kewen.Lin <linkw@linux.ibm.com>
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

commit | commitdiff | tree

Uros Bizjak [Thu, 6 Jun 2024 17:18:41 +0000 (19:18 +0200)]

testsuite/i386: Add vector sat_sub testcases [PR112600]

PR middle-end/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-2a.c: New test.
* gcc.target/i386/pr112600-2b.c: New test.

commit | commitdiff | tree

Andrew Pinski [Thu, 30 May 2024 14:59:00 +0000 (07:59 -0700)]

Plugins: Add label-text.h to CPPLIB_H so it will be installed [PR115288]

After r15-874-g9bda2c4c81b668, out of tree plugins won't compile
as the new libcpp header file label-text.h is not installed.

This adds the new header file to CPPLIB_H which is used for
the plugin headers to install.

Committed as obvious after a build and install and make sure
the new header file is installed.

gcc/ChangeLog:

PR plugins/115288
* Makefile.in (CPPLIB_H): Add label-text.h.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Richard Ball [Thu, 6 Jun 2024 15:28:00 +0000 (16:28 +0100)]

aarch64: Add missing ACLE macro for NEON-SVE Bridge

__ARM_NEON_SVE_BRIDGE was missed in the original patch and is
added by this patch.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Add missing __ARM_NEON_SVE_BRIDGE.

commit | commitdiff | tree

Richard Ball [Thu, 6 Jun 2024 15:10:14 +0000 (16:10 +0100)]

arm: Fix CASE_VECTOR_SHORTEN_MODE for thumb2.

The CASE_VECTOR_SHORTEN_MODE query is missing some equals signs
which causes suboptimal codegen due to missed optimisation
opportunities. This patch also adds a test for thumb2
switch statements as none exist currently.

gcc/ChangeLog:
PR target/115353
* config/arm/arm.h (enum arm_auto_incmodes):
Correct CASE_VECTOR_SHORTEN_MODE query.

gcc/testsuite/ChangeLog:

* gcc.target/arm/thumb2-switchstatement.c: New test.

commit | commitdiff | tree

Andre Vieira [Thu, 6 Jun 2024 15:02:50 +0000 (16:02 +0100)]

arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]

This patch adds missing assembly directives to the CMSE library wrapper to call
functions with attribute cmse_nonsecure_call. Without the .type directive the
linker will fail to produce the correct veneer if a call to this wrapper
function is to far from the wrapper itself. The .size was added for
completeness, though we don't necessarily have a usecase for it.

libgcc/ChangeLog:

PR target/115360
* config/arm/cmse_nonsecure_call.S: Add .type and .size directives.

commit | commitdiff | tree

Tobias Burnus [Thu, 6 Jun 2024 14:37:55 +0000 (16:37 +0200)]

libgomp.texi (nvptx): Add missing preposition

libgomp/
* libgomp.texi (nvptx): Add missing preposition.

commit | commitdiff | tree

Tamar Christina [Thu, 6 Jun 2024 13:35:48 +0000 (14:35 +0100)]

AArch64: correct constraint on Upl early clobber alternatives

I made an oversight in the previous patch, where I added a ?Upa
alternative to the Upl cases. This causes it to create the tie
between the larger register file rather than the constrained one.

This fixes the affected patterns.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>,
*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest,
@aarch64_pred_cmp<cmp_op><mode>_wide,
*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
*aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Fix Upl tie alternative.
* config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Fix
Upl tie alternative.

commit | commitdiff | tree

Thomas Schwinge [Wed, 5 Jun 2024 11:13:24 +0000 (13:13 +0200)]

nvptx, libgfortran: Switch out of "minimal" mode

..., in order to enable (portions of) Fortran I/O, for example.

libgfortran/
* configure.ac: No longer set 'LIBGFOR_MINIMAL' for nvptx.
* configure: Regenerate.
libgomp/
* libgomp.texi (nvptx): Update.
* testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove.
* testsuite/libgomp.fortran/target-print-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/error_stop-2-nvptx.f: New.
* testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust.
* testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Adjust.
* testsuite/libgomp.oacc-fortran/print-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/stop-2-nvptx.f: New.
* testsuite/libgomp.oacc-fortran/stop-2.f: Adjust.

Co-authored-by: Andrew Stubbs <ams@gcc.gnu.org>

commit | commitdiff | tree

Thomas Schwinge [Fri, 31 May 2024 15:04:39 +0000 (17:04 +0200)]

nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable [PR97384, PR105274]

... as a means to manually set the "native" GPU thread stack size.

PR libgomp/97384
PR libgomp/105274
libgomp/
* plugin/cuda-lib.def (cuCtxSetLimit): Add.
* plugin/plugin-nvptx.c (nvptx_open_device): Handle
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable.

commit | commitdiff | tree

Thomas Schwinge [Wed, 5 Jun 2024 11:11:04 +0000 (13:11 +0200)]

nvptx, libgcc: Stub unwinding implementation

Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.

The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.

libgcc/ChangeLog:

* config/nvptx/t-nvptx: Add unwind-nvptx.c.
* config/nvptx/unwind-nvptx.c: New file.

Co-authored-by: Andrew Stubbs <ams@gcc.gnu.org>

commit | commitdiff | tree

Thomas Schwinge [Wed, 5 Jun 2024 10:40:50 +0000 (12:40 +0200)]

nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld'

This extends commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'"
for offloading.

libgcc/
* config/nvptx/gbl-ctors.c ["mgomp"]
(__do_global_ctors__entry__mgomp)
(__do_global_dtors__entry__mgomp): New.
[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
(nvptx_close_device, GOMP_OFFLOAD_load_image)
(GOMP_OFFLOAD_unload_image): Call it.

commit | commitdiff | tree

Thomas Schwinge [Fri, 10 May 2024 10:50:23 +0000 (12:50 +0200)]

nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via 'vote.all.pred'

For example, this allows for '-muniform-simt' code to be executed
single-threaded, which currently fails (device-side 'trap'): the '0xffffffff'
bitmask isn't correct if not all 32 threads of a warp are active.  The same
issue/fix, I suppose but have not verified, would apply if we were to allow for
OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.

We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
which evidently appears to do the right thing.  (I've tested '-muniform-simt'
code executing single-threaded.)

The change that I proposed on 2022-12-15 was to emit PTX code to calculate
'(1 << %ntid.x) - 1' as the actual bitmask to use instead of '0xffffffff'.
This works, but the PTX JIT generates SASS code to do this computation.

In turn, this change now uses PTX 'vote.all.pred' -- which even simplifies upon
the original code a little bit, see the following examplary SASS 'diff' before
vs. after this change:

    [...]
              /*[...]*/                   SYNC                                                        (*"BRANCH_TARGETS .L_x_332"*)        }
      .L_x_332:
    -         /*[...]*/                   VOTE.ANY R9, PT, PT ;
    +         /*[...]*/                   VOTE.ALL P1, PT ;
    -         /*[...]*/                   ISETP.NE.U32.AND P1, PT, R9, -0x1, PT ;
    -         /*[...]*/              @!P1 BRA `(.L_x_333) ;
    +         /*[...]*/               @P1 BRA `(.L_x_333) ;
              /*[...]*/                   BPT.TRAP 0x1 ;
      .L_x_333:
    -         /*[...]*/               @P1 EXIT ;
    +         /*[...]*/              @!P1 EXIT ;
    [...]

gcc/
* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
non-full-warp execution, via 'vote.all.pred'.
gcc/testsuite/
* gcc.target/nvptx/nvptx.exp
(check_effective_target_default_ptx_isa_version_at_least_6_0):
New.
* gcc.target/nvptx/uniform-simt-2.c: Adjust.
* gcc.target/nvptx/uniform-simt-5.c: New.

commit | commitdiff | tree

Thomas Schwinge [Wed, 5 Jun 2024 12:34:06 +0000 (14:34 +0200)]

Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]"

PR target/85463
libgfortran/
* runtime/minimal.c [__nvptx__] (exit): Don't override.
libgomp/
* config/nvptx/error.c (exit): Don't override.
* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.

commit | commitdiff | tree

Pan Li [Wed, 29 May 2024 08:18:31 +0000 (16:18 +0800)]

Vect: Support IFN SAT_SUB for unsigned vector int

This patch would like to support the .SAT_SUB for the unsigned
vector int.  Given we have below example code:

void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  for (unsigned i = 0; i < n; i++)
    out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i]));
}

Before this patch:
void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
  ivtmp_56 = _77 * 8;
  vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0);
  vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0);

  mask__7.11_64 = vect__4.7_59 >= vect__6.10_63;
  _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... });

  .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66);
  vectp_x.5_58 = vectp_x.5_57 + ivtmp_56;
  vectp_y.8_62 = vectp_y.8_61 + ivtmp_56;
  vectp_out.15_72 = vectp_out.15_71 + ivtmp_56;
  ivtmp_76 = ivtmp_75 - _77;
  ...
}

After this patch:
void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]);
  ivtmp_60 = _76 * 8;
  vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0);
  vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0);

  vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67);

  .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, vect_patt_37.11_68);
  vectp_x.5_62 = vectp_x.5_61 + ivtmp_60;
  vectp_y.8_66 = vectp_y.8_65 + ivtmp_60;
  vectp_out.12_71 = vectp_out.12_70 + ivtmp_60;
  ivtmp_75 = ivtmp_74 - _76;
  ...
}

The below test suites are passed for this patch
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression tests.

gcc/ChangeLog:

* match.pd: Add new form for vector mode recog.
* tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add
new match func decl;
(vect_recog_build_binary_gimple_call): Extract helper func to
build gcall with given internal_fn.
(vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Michal Jires [Tue, 9 Jan 2024 16:49:34 +0000 (17:49 +0100)]

lto: Remove random_seed from section name.

This patch removes suffixes from section names during LTO linking.

These suffixes were originally added for ld -r to work (PR lto/44992).
They were added to all LTO object files, but are only useful before WPA.
After that they waste space, and if kept random, make LTO caching impossible.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA.

gcc/lto/ChangeLog:

* lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.

commit | commitdiff | tree

Michal Jires [Fri, 17 Nov 2023 20:16:37 +0000 (21:16 +0100)]

lto: Skip flag OPT_fltrans_output_list_.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.

commit | commitdiff | tree

Robin Dapp [Thu, 6 Jun 2024 07:32:28 +0000 (09:32 +0200)]

RISC-V: Regenerate opt urls.

I wasn't aware that I needed to regenerate the opt urls when
adding an option. This patch does that.

gcc/ChangeLog:

* config/riscv/riscv.opt.urls: Regenerate.

commit | commitdiff | tree

Hongyu Wang [Wed, 8 May 2024 03:08:42 +0000 (11:08 +0800)]

[APX CCMP] Support ccmp for float compare

The ccmp insn itself doesn't support fp compare, but x86 has fp comi
insn that changes EFLAG which can be the scc input to ccmp. Allow
scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD
compare which can not be identified in ccmp.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_gen_ccmp_first):
Add fp compare and check the allowed fp compare type.
(ix86_gen_ccmp_next): Adjust compare_code input to ccmp for
fp compare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ccmp-1.c: Add test for fp compare.
* gcc.target/i386/apx-ccmp-2.c: Likewise.

commit | commitdiff | tree

Hongyu Wang [Tue, 9 Apr 2024 08:05:26 +0000 (16:05 +0800)]

[APX CCMP] Adjust startegy for selecting ccmp candidates

For general ccmp scenario, the tree sequence is like

_1 = (a < b)
_2 = (c < d)
_3 = _1 & _2

current ccmp expanding will try to swap compare order for _1 and _2,
compare the expansion cost/cost2 for expanding _1 or _2 first, then
return the sequence with lower cost.

It is possible that one expansion succeeds and the other fails.
For example, x86 has int ccmp but not fp ccmp, so a combined fp and
int comparison must be ordered such that the fp comparison happens
first. The costs are not meaningful for failed expansions.

Check the expand_ccmp_next result ret and ret2, returns the valid one
before cost comparison.

gcc/ChangeLog:

* ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
expand_ccmp_next, returns the valid one first instead of
comparing cost.

commit | commitdiff | tree

Hongyu Wang [Wed, 27 Mar 2024 02:13:06 +0000 (10:13 +0800)]

[APX CCMP] Support APX CCMP

APX CCMP feature implements conditional compare which executes compare
when EFLAGS matches certain condition.

CCMP introduces default flags value (dfv), when conditional compare does
not execute, it will directly set the flags according to dfv.

The instruction goes like

ccmpeq {dfv=sf,of,cf,zf} %rax, %r16

For this instruction, it will test EFLAGS regs if it matches conditional
code EQ, if yes, compare %rax and %r16 like legacy cmp. If no, the
EFLAGS will be updated according to dfv, which means SF,OF,CF,ZF are
set. PF will be set according to CF in dfv, and AF will always be
cleared.

The dfv part can be a combination of sf,of,cf,zf, like {dfv=cf,zf} which
sets CF and ZF only and clear others, or {dfv=} which clears all EFLAGS.

To enable CCMP, we implemented the target hook TARGET_GEN_CCMP_FIRST and
TARGET_GEN_CCMP_NEXT to reuse the current ccmp infrastructure. Also we
extended the cstorem4 optab to support storing different CCmode to fit
current ccmp infrasturcture.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_gen_ccmp_first): New function
that test if the first compare can be generated.
(ix86_gen_ccmp_next): New function to emit a simgle compare and ccmp
sequence.
* config/i386/i386-opts.h (enum apx_features): Add apx_ccmp.
* config/i386/i386-protos.h (ix86_gen_ccmp_first): New proto
declare.
(ix86_gen_ccmp_next): Likewise.
(ix86_get_flags_cc): Likewise.
* config/i386/i386.cc (ix86_flags_cc): New enum.
(ix86_ccmp_dfv_mapping): New string array to map conditional
code to dfv.
(ix86_print_operand): Handle special dfv flag for CCMP.
(ix86_get_flags_cc): New function to return x86 CC enum.
(TARGET_GEN_CCMP_FIRST): Define.
(TARGET_GEN_CCMP_NEXT): Likewise.
* config/i386/i386.h (TARGET_APX_CCMP): Define.
* config/i386/i386.md (@ccmp<mode>): New define_insn to support
ccmp.
(UNSPEC_APX_DFV): New unspec for ccmp dfv.
(ALL_CC): New mode iterator.
(cstorecc4): Change to ...
(cstore<mode>4) ... this, use ALL_CC to loop through all
available CCmodes.
* config/i386/i386.opt (apx_ccmp): Add enum value for ccmp.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ccmp-1.c: New compile test.
* gcc.target/i386/apx-ccmp-2.c: New runtime test.

commit | commitdiff | tree

Hongyu Wang [Thu, 6 Jun 2024 05:00:26 +0000 (13:00 +0800)]

[APX] Adjust target-support check [PR 115341]

Current target apxf check does not specify sub-features that assembler
supports, so the check with older binutils will fail at assemble stage
for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check
for all apx subfeatures.

gcc/testsuite/ChangeLog:

PR target/115341
* lib/target-supports.exp (check_effective_target_apxf):
Check for all apx sub-features.

commit | commitdiff | tree

Richard Biener [Tue, 5 Mar 2024 14:46:24 +0000 (15:46 +0100)]

Allow single-lane SLP in-order reductions

The single-lane case isn't different from non-SLP, no re-association
implied. But the transform stage cannot handle a conditional reduction
op which isn't checked during analysis - this makes it work, exercised
with a single-lane non-reduction-chain by gcc.target/i386/pr112464.c

* tree-vect-loop.cc (vectorizable_reduction): Allow
single-lane SLP in-order reductions.
(vectorize_fold_left_reduction): Handle SLP reduction with
conditional reduction op.

commit | commitdiff | tree

Richard Biener [Tue, 5 Mar 2024 14:28:58 +0000 (15:28 +0100)]

Add double reduction support for SLP vectorization

The following makes double reduction vectorization work when
using (single-lane) SLP vectorization.

* tree-vect-loop.cc (vect_analyze_scalar_cycles_1): Queue
double reductions in LOOP_VINFO_REDUCTIONS.
(vect_create_epilog_for_reduction): Remove asserts disabling
SLP for double reductions.
(vectorizable_reduction): Analyze SLP double reductions
only once and start off the correct places.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Allow
vect_double_reduction_def.
(vect_build_slp_tree_2): Fix condition for the ignored
reduction initial values.
* tree-vect-stmts.cc (vect_analyze_stmt): Allow
vect_double_reduction_def.

commit | commitdiff | tree

Richard Biener [Fri, 1 Mar 2024 13:39:08 +0000 (14:39 +0100)]

Allow single-lane COND_REDUCTION vectorization

The following enables single-lane COND_REDUCTION vectorization.

* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Adjust for single-lane COND_REDUCTION SLP vectorization.
(vectorizable_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.

commit | commitdiff | tree

Richard Biener [Fri, 23 Feb 2024 15:16:38 +0000 (16:16 +0100)]

Relax COND_EXPR reduction vectorization SLP restriction

Allow one-lane SLP but for the case where we need to swap the arms.

* tree-vect-stmts.cc (vectorizable_condition): Allow
single-lane SLP, but not when we need to swap then and
else clause.

commit | commitdiff | tree

Jakub Jelinek [Thu, 6 Jun 2024 06:30:42 +0000 (08:30 +0200)]

libgomp: Mark Loop transformation constructs as implemented in the implementation status

The implementation has been committed in r15-1037.

2024-06-06 Jakub Jelinek <jakub@redhat.com>

* libgomp.texi (OpenMP 5.1 status): Mark Loop transformation constructs
as implemented.

commit | commitdiff | tree

YunQiang Su [Thu, 6 Jun 2024 04:28:31 +0000 (12:28 +0800)]

MIPS: Need COSTS_N_INSNS in mips_insn_cost

In mips_insn_cost, COSTS_N_INSNS is missing when we return the cost
if count * ratio > 0.

gcc
* config/mips/mips.cc(mips_insn_cost): Add missing COSTS_N_INSNS
to count.

commit | commitdiff | tree

liuhongt [Thu, 6 Jun 2024 03:27:53 +0000 (11:27 +0800)]

Refine testcase for power10.

For power10, there're extra 3 REG_EQUIV notes with (fix:SI. to avoid
the failure. Check (fix:SI is from the pattern not NOTE.

gcc/testsuite/ChangeLog:

PR target/115365
* gcc.dg/pr100927.c: Don't scan fix:SI from the note.

commit | commitdiff | tree

Alexandre Oliva [Thu, 6 Jun 2024 01:43:54 +0000 (22:43 -0300)]

[libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

A proprietary embedded operating system that uses clang as its primary
compiler ships headers that require __clang__ to be defined.  Defining
that macro causes libstdc++ to adopt workarounds that work for clang
but that break for GCC.

So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it
rather than for __clang__, so that a GCC variant that adds -D__clang__
to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid
workarounds that are not meant for GCC.

I've left fast_float and ryu files alone, their tests for __clang__
don't seem to be harmful for GCC, they don't include bits/c++config,
and patching such third-party files would just make trouble for
updating them without visible benefit.  pstl_config.h, though also
imported, required adjustment.

for  libstdc++-v3/ChangeLog

* include/bits/c++config (_GLIBCXX_CLANG): Define or undefine.
* include/bits/locale_facets_nonio.tcc: Test for it.
* include/bits/stl_bvector.h: Likewise.
* include/c_compatibility/stdatomic.h: Likewise.
* include/experimental/bits/simd.h: Likewise.
* include/experimental/bits/simd_builtin.h: Likewise.
* include/experimental/bits/simd_detail.h: Likewise.
* include/experimental/bits/simd_x86.h: Likewise.
* include/experimental/simd: Likewise.
* include/std/complex: Likewise.
* include/std/ranges: Likewise.
* include/std/variant: Likewise.
* include/pstl/pstl_config.h: Likewise.

commit | commitdiff | tree

liuhongt [Fri, 19 Apr 2024 02:39:53 +0000 (10:39 +0800)]

Adjust rtx_cost for MEM to enable more simplication

For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or
variants in ix86_vector_duplicate_simode_const.
Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little
bit larger than broadcast.

gcc/ChangeLog:
PR target/114428
* config/i386/i386.cc (ix86_rtx_costs): Adjust cost for
CONST_VECTOR_DUPLICATE_P in constant_pool.
* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
Remove static.
* config/i386/i386-protos.h (ix86_broadcast_from_constant):
Declare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114428.c: New test.

commit | commitdiff | tree

liuhongt [Fri, 19 Apr 2024 02:29:34 +0000 (10:29 +0800)]

Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.

i.e Simplify
(and:v8hi
(ashifrt:v8hi A 8)
(const_vector 0xff x8))
to
(lshifrt:v8hi A 8)

gcc/ChangeLog:

PR target/114428
* simplify-rtx.cc
(simplify_context::simplify_binary_operation_1):
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
specific mask.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114428-1.c: New test.

commit | commitdiff | tree

GCC Administrator [Thu, 6 Jun 2024 00:16:43 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

Jonathan Wakely [Wed, 5 Jun 2024 19:46:19 +0000 (20:46 +0100)]

contrib: Fix spelling and capitalization in header-tools

contrib/header-tools/ChangeLog:

* README: Fix spelling and capitalization typos.
* gcc-order-headers: Fix spelling typo.

commit | commitdiff | tree

Sundeep KOKKONDA [Fri, 29 Mar 2024 10:22:11 +0000 (03:22 -0700)]

contrib: header-tools scripts updated to python3

The scripts in contrib/header-tools/ are incompatible with python3.
This updates them to use python3.

contrib/header-tools/ChangeLog:

* count-headers: Adapt to Python 3.
* gcc-order-headers: Likewise.
* graph-header-logs: Likewise.
* graph-include-web: Likewise.
* headerutils.py: Likewise.
* included-by: Likewise.
* reduce-headers: Likewise.
* replace-header: Likewise.
* show-headers: Likewise.

Signed-off-by: Sundeep KOKKONDA <sundeep.kokkonda@windriver.com>

commit | commitdiff | tree

Robin Dapp [Mon, 13 May 2024 20:05:57 +0000 (22:05 +0200)]

check_GNU_style: Use raw strings.

This silences some warnings when using check_GNU_style.

contrib/ChangeLog:

* check_GNU_style_lib.py: Use raw strings for regexps.

commit | commitdiff | tree

Robin Dapp [Tue, 28 May 2024 19:19:26 +0000 (21:19 +0200)]

RISC-V: Introduce -mvector-strict-align.

this patch disables movmisalign by default and introduces
the -mno-vector-strict-align option to override it and re-enable
movmisalign. For now, generic-ooo is the only uarch that supports
misaligned vector access.

The patch also adds a check_effective_target_riscv_v_misalign_ok to
the testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

Changes from v3:
- Adressed Kito's comments.
- Made -mscalar-strict-align a real alias.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and map to riscv_vector_unaligned_access_p.
* config/riscv/riscv.opt: Add -mvector-strict-align.
* config/riscv/riscv.cc (struct riscv_tune_param): Add
vector_unaligned_access.
(riscv_override_options_internal): Set
riscv_vector_unaligned_access_p.
* doc/invoke.texi: Document -mvector-strict-align.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
-mno-vector-strict-align.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.

commit | commitdiff | tree

Tamar Christina [Wed, 5 Jun 2024 18:32:16 +0000 (19:32 +0100)]

AArch64: enable new predicate tuning for Neoverse cores.

This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2.
It is kept off for generic codegen.

Note the reason for the +sve even though they are in aarch64-sve.exp is if the
testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then
the intrinsics end up being disabled because the -march is preferred over the
-mcpu even though the -mcpu comes later.

This prevents the tests from failing in such runs.

gcc/ChangeLog:

* config/aarch64/tuning_models/neoversen2.h (neoversen2_tunings): Add
AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
* config/aarch64/tuning_models/neoversev1.h (neoversev1_tunings): Add
AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
* config/aarch64/tuning_models/neoversev2.h (neoversev2_tunings): Add
AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred_clobber_1.c: New test.
* gcc.target/aarch64/sve/pred_clobber_2.c: New test.
* gcc.target/aarch64/sve/pred_clobber_3.c: New test.
* gcc.target/aarch64/sve/pred_clobber_4.c: New test.

commit | commitdiff | tree

Tamar Christina [Wed, 5 Jun 2024 18:31:39 +0000 (19:31 +0100)]

AArch64: add new alternative with early clobber to patterns

This patch adds new alternatives to the patterns which are affected. The new
alternatives with the conditional early clobbers are added before the normal
ones in order for LRA to prefer them in the event that we have enough free
registers to accommodate them.

In case register pressure is too high the normal alternatives will be preferred
before a reload is considered as we rather have the tie than a spill.

Tests are in the next patch.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (and<mode>3,
@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
*<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>,
*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest,
@aarch64_pred_cmp<cmp_op><mode>_wide,
*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>,
*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest,
@aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc,
*aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest,
*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add
new early clobber
alternative.
* config/aarch64/aarch64-sve2.md
(@aarch64_pred_<sve_int_op><mode>): Likewise.

commit | commitdiff | tree

Tamar Christina [Wed, 5 Jun 2024 18:31:11 +0000 (19:31 +0100)]

AArch64: add new tuning param and attribute for enabling conditional early clobber

This adds a new tuning parameter AARCH64_EXTRA_TUNE_AVOID_PRED_RMW for AArch64 to
allow us to conditionally enable the early clobber alternatives based on the
tuning models.

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def
(AVOID_PRED_RMW): New.
* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
* config/aarch64/aarch64.md (pred_clobber): New.
(arch_enabled): Use it.

commit | commitdiff | tree

Tamar Christina [Wed, 5 Jun 2024 18:30:39 +0000 (19:30 +0100)]

AArch64: convert several predicate patterns to new compact syntax

This converts the single alternative patterns to the new compact syntax such
that when I add the new alternatives it's clearer what's being changed.

Note that this will spew out a bunch of warnings from geninsn as it'll warn that
@ is useless for a single alternative pattern. These are not fatal so won't
break the build and are only temporary.

No change in functionality is expected with this patch.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (and<mode>3,
@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
*<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest,
@aarch64_pred_cmp<cmp_op><mode>_wide,
*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, *aarch64_brk<brk_op>_cc,
*aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>,
*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z,
*aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc,
*aarch64_rdffr_cc): Convert to compact syntax.
* config/aarch64/aarch64-sve2.md
(@aarch64_pred_<sve_int_op><mode>): Likewise.

commit | commitdiff | tree

Jakub Jelinek [Wed, 5 Jun 2024 17:10:26 +0000 (19:10 +0200)]

openmp: OpenMP loop transformation support

This patch is largely rewritten version of the
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631764.html
patch set which I've promissed to adjust the way I'd like it but didn't
get to it until now.
The previous series together in diffstat was
176 files changed, 12107 insertions(+), 298 deletions(-)
This patch is
197 files changed, 10843 insertions(+), 212 deletions(-)
and diff between the old series and new patch is
268 files changed, 8053 insertions(+), 9231 deletions(-)

Only the 5.1/5.2 tile/unroll constructs are supported, in various
places some preparations for the other 6.0 loop transformations
constructs (interchange/reverse/fuse) are done, but certainly
not complete and not everywhere.  The important difference is that
because tile/unroll partial map 1:1 the original loops to generated
canonical loops and add another set of generated loops without canonical
form inside of it, the tile/unroll partial constructs are terminal
for the generated loop, one can't have some loops from the tile or
unroll partial and some further loops from inside the body of that
construct.
The GENERIC representation attempts to match what the standard specifies,
so there are separate OMP_TILE and OMP_UNROLL trees.  If for a particular
loop in a loop nest of some OpenMP loop it awaits a generated loop from a
nested loop, or if in OMP_LOOPXFORM_LOWERED OMP_TILE/UNROLL construct
a generated loop has been moved to some surrounding construct, that
particular loop is represented by all NULL_TREEs in the
OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} vector.
The lowering of the loop transforming constructs is done at gimplification
time, at the start of gimplify_omp_for.
I think this way it is more maintainable over magic clauses with various
loop depths on the other looping constructs or the magic OMP_LOOP_TRANS
construct.
Though, I admit I'm still undecided how to represent the OpenMP 6.0
loop transformation case of say:
  #pragma omp for collapse (4)
  for (int i = 0; i < 32; ++i)
  #pragma omp interchange permutation (2, 1)
  #pragma omp reverse
  for (int j = 0; j < 32; ++j)
  #pragma omp reverse
  for (int k = 0; k < 32; ++k)
  for (int l = 0; l < 32; ++l)
    ;
Surely the i loop would go to first vector elements of OMP_FOR_*
of the work-sharing loop, then 2 loops are expecting generated loops
from interchange which would be inside of the body.  But the innermost
l loop isn't part of the interchange, so the question is where to
put it.  One possibility is to have it in the 4th loop of the OMP_FOR,
another possibility would be to add some artificial construct inside
of the OMP_INTERCHANGE and 2 OMP_REVERSE bodies which would contain
the inner loop(s), e.g. it could be OMP_INTERCHANGE without permutation
clause or some artificial ones or whatever.

I've recently raised various unclear things in the 5.1/5.2/TRs versions
regarding loop transformations, in particular
https://github.com/OpenMP/spec/issues/3908
https://github.com/OpenMP/spec/issues/3909
(sorry, private links unless you have OpenMP membership).  Until those
are resolved, I have a sorry on trying to mix generated loops with
non-rectangular loops (way too many questions need to be answered before
that can be done) and similarly for mixing non-perfectly nested loops
with generated loops (again, it can be implemented somehow, but is way
too unclear).  The second issue is mostly about data sharing, which is
ambiguous, the patch makes the artificial iterators of the loops effectively
private in the associated constructs (more like local), but for user
iterators doesn't do anything in particular, so for now one needs to use
explicit data sharing clauses on the non-loop transformation OpenMP looping
constructs or surrounding parallel/task/target etc.

2024-06-05  Jakub Jelinek  <jakub@redhat.com>
    Frederik Harwath  <frederik@codesourcery.com>
    Sandra Loosemore  <sandra@codesourcery.com>

gcc/
* tree.def (OMP_TILE, OMP_UNROLL): New tree codes.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_PARTIAL,
OMP_CLAUSE_FULL and OMP_CLAUSE_SIZES.
* tree.h (OMP_LOOPXFORM_CHECK): Define.
(OMP_LOOPXFORM_LOWERED): Define.
(OMP_CLAUSE_PARTIAL_EXPR): Define.
(OMP_CLAUSE_SIZES_LIST): Define.
* tree.cc (omp_clause_num_ops, omp_clause_code_name): Add entries
for OMP_CLAUSE_{PARTIAL,FULL,SIZES}.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_{PARTIAL,FULL,SIZES}.
(dump_generic_node): Handle OMP_TILE and OMP_UNROLL.  Skip printing
loops with NULL OMP_FOR_INIT (node) vector element.
* gimplify.cc (is_gimple_stmt): Handle OMP_TILE and OMP_UNROLL.
(gimplify_omp_taskloop_expr): For SAVE_EXPR use gimplify_save_expr.
(gimplify_omp_loop_xform): New function.
(gimplify_omp_for): Call omp_maybe_apply_loop_xforms and if that
reshuffles what the passed pointer points to, retry or return GS_OK.
Handle OMP_TILE and OMP_UNROLL.
(gimplify_omp_loop): Call omp_maybe_apply_loop_xforms and if that
reshuffles what the passed pointer points to, return GS_OK.
(gimplify_expr): Handle OMP_TILE and OMP_UNROLL.
* omp-general.h (omp_loop_number_of_iterations,
omp_maybe_apply_loop_xforms): Declare.
* omp-general.cc (omp_adjust_for_condition): For LE_EXPR and GE_EXPR
with pointers, don't add/subtract one, but the size of what the
pointer points to.
(omp_loop_number_of_iterations, omp_apply_tile,
find_nested_loop_xform, omp_maybe_apply_loop_xforms): New functions.
gcc/c-family/
* c-common.h (c_omp_find_generated_loop): Declare.
* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_TILE and
OMP_UNROLL.
* c-omp.cc (c_finish_omp_for): Handle generated loops.
(c_omp_is_loop_iterator): Likewise.
(c_find_nested_loop_xform_r, c_omp_find_generated_loop): New
functions.
(c_omp_check_loop_iv): Handle generated loops.  For now sorry
on mixing non-rectangular loop with generated loops.
(c_omp_check_loop_binding_exprs): For now sorry on mixing
imperfect loops with generated loops.
(c_omp_directives): Uncomment tile and unroll entries.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE and
PRAGMA_OMP_UNROLL, change PRAGMA_OMP__LAST_ to the latter.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.
* c-pragma.cc (omp_pragmas_simd): Add tile and unroll omp pragmas.
gcc/c/
* c-parser.cc (c_parser_skip_std_attribute_spec_seq): New function.
(check_omp_intervening_code): Reject imperfectly nested tile.
(c_parser_compound_statement_nostart): If want_nested_loop, use
c_parser_omp_next_tokens_can_be_canon_loop instead of just checking
for RID_FOR keyword.
(c_parser_omp_clause_name): Handle full and partial clause names.
(c_parser_omp_clause_allocate): Remove spurious semicolon.
(c_parser_omp_clause_full, c_parser_omp_clause_partial): New
functions.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.
(c_parser_omp_next_tokens_can_be_canon_loop): New function.
(c_parser_omp_loop_nest): Parse C23 attributes.  Handle tile/unroll
constructs.  Use c_parser_omp_next_tokens_can_be_canon_loop instead
of just checking for RID_FOR keyword.  Only add_stmt (body) if it is
non-NULL.
(c_parser_omp_for_loop): Rename tiling variable to oacc_tiling.  For
OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST.
Use c_parser_omp_next_tokens_can_be_canon_loop instead of just
checking for RID_FOR keyword.  Remove spurious semicolon.  Don't call
c_omp_check_loop_binding_exprs if stmt is NULL.  Skip generated loops.
(c_parser_omp_tile_sizes, c_parser_omp_tile): New functions.
(OMP_UNROLL_CLAUSE_MASK): Define.
(c_parser_omp_unroll): New function.
(c_parser_omp_construct): Handle PRAGMA_OMP_TILE and
PRAGMA_OMP_UNROLL.
* c-typeck.cc (c_finish_omp_clauses): Adjust wording of some of the
conflicting clause diagnostic messages to include word clause.
Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES} and diagnose full vs. partial
conflict.
gcc/cp/
* cp-tree.h (dependent_omp_for_p): Add another tree argument.
* parser.cc (check_omp_intervening_code): Reject imperfectly nested
tile.
(cp_parser_statement_seq_opt): If want_nested_loop, use
cp_parser_next_tokens_can_be_canon_loop instead of just checking
for RID_FOR keyword.
(cp_parser_omp_clause_name): Handle full and partial clause names.
(cp_parser_omp_clause_full, cp_parser_omp_clause_partial): New
functions.
(cp_parser_omp_all_clauses): Formatting fix.  Handle
PRAGMA_OMP_CLAUSE_PARTIAL and PRAGMA_OMP_CLAUSE_FULL.
(cp_parser_next_tokens_can_be_canon_loop): New function.
(cp_parser_omp_loop_nest): Parse C++11 attributes.  Handle tile/unroll
constructs.  Use cp_parser_next_tokens_can_be_canon_loop instead
of just checking for RID_FOR keyword.  Only add_stmt
cp_parser_omp_loop_nest result if it is non-NULL.
(cp_parser_omp_for_loop): Rename tiling variable to oacc_tiling.  For
OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST.
Use cp_parser_next_tokens_can_be_canon_loop instead of just
checking for RID_FOR keyword.  Remove spurious semicolon.  Don't call
c_omp_check_loop_binding_exprs if stmt is NULL.  Skip and/or handle
generated loops.  Remove spurious ()s around & operands.
(cp_parser_omp_tile_sizes, cp_parser_omp_tile): New functions.
(OMP_UNROLL_CLAUSE_MASK): Define.
(cp_parser_omp_unroll): New function.
(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE and
PRAGMA_OMP_UNROLL.
(cp_parser_pragma): Likewise.
* semantics.cc (finish_omp_clauses): Don't call
fold_build_cleanup_point_expr for cases which obviously won't need it,
like checked INTEGER_CSTs.  Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}
and diagnose full vs. partial conflict.  Adjust wording of some of the
conflicting clause diagnostic messages to include word clause.
(finish_omp_for): Use decl equal to global_namespace as a marker for
generated loop.  Pass also body to dependent_omp_for_p.  Skip
generated loops.
(finish_omp_for_block): Skip generated loops.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}.
(tsubst_stmt): Handle OMP_TILE and OMP_UNROLL.  Handle or skip
generated loops.
(dependent_omp_for_p): Add body argument.  If declv vector element
is NULL, find generated loop.
* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_TILE and OMP_UNROLL.
(cp_fold_r): Likewise.
(cp_genericize_r): Likewise.  Skip generated loops.
gcc/fortran/
* gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL,
ST_OMP_END_UNROLL, ST_OMP_TILE and ST_OMP_END_TILE.
(struct gfc_omp_clauses): Add sizes_list, partial, full and erroneous
members.
(enum gfc_exec_op): Add EXEC_OMP_UNROLL and EXEC_OMP_TILE.
(gfc_expr_list_len): Declare.
* match.h (gfc_match_omp_tile, gfc_match_omp_unroll): Declare.
* openmp.cc (gfc_get_location): Declare.
(gfc_free_omp_clauses): Free sizes_list.
(match_oacc_expr_list): Rename to ...
(match_omp_oacc_expr_list): ... this.  Add is_omp argument and
change diagnostic wording if it is true.
(enum omp_mask2): Add OMP_CLAUSE_{FULL,PARTIAL,SIZES}.
(gfc_match_omp_clauses): Parse full, partial and sizes clauses.
(gfc_match_oacc_wait): Use match_omp_oacc_expr_list instead of
match_oacc_expr_list.
(OMP_UNROLL_CLAUSES, OMP_TILE_CLAUSES): Define.
(gfc_match_omp_tile, gfc_match_omp_unroll): New functions.
(resolve_omp_clauses): Diagnose full vs. partial clause conflict.
Resolve sizes clause arguments.
(find_nested_loop_in_chain): Use switch instead of series of ifs.
Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse to
list length of sizes_list if present.
(gfc_resolve_do_iterator): Return for EXEC_OMP_TILE or
EXEC_OMP_UNROLL.
(restructure_intervening_code): Remove spurious ()s around & operands.
(is_outer_iteration_variable): Handle EXEC_OMP_TILE and
EXEC_OMP_UNROLL.
(check_nested_loop_in_chain): Likewise.
(expr_is_invariant): Likewise.
(resolve_omp_do): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.  Diagnose
tile without sizes clause.  Use sizes_list length for count if
non-NULL.  Set code->ext.omp_clauses->erroneous on loops where we've
reported diagnostics.  Sorry for mixing non-rectangular loops with
generated loops.
(omp_code_to_statement): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Parse end tile, end unroll, tile
and unroll.  Move nothing entry alphabetically.
(case_exec_markers): Add ST_OMP_TILE and ST_OMP_UNROLL.
(gfc_ascii_statement): Handle ST_OMP_END_TILE, ST_OMP_END_UNROLL,
ST_OMP_TILE and ST_OMP_UNROLL.
(parse_omp_do): Add nested argument.  Handle ST_OMP_TILE and
ST_OMP_UNROLL.
(parse_omp_structured_block): Adjust parse_omp_do caller.
(parse_executable): Likewise.  Handle ST_OMP_TILE and ST_OMP_UNROLL.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE and
EXEC_OMP_UNROLL.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans.cc (trans_code): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle full, partial and
sizes clauses.  Use tree_cons + nreverse instead of
temporary vector and build_tree_list_vec for tile_list handling.
(gfc_expr_list_len): New function.
(gfc_trans_omp_do): Rename tile to oacc_tile.  Handle sizes clause.
Don't assert code->op is EXEC_DO.  Handle EXEC_OMP_TILE and
EXEC_OMP_UNROLL.
(gfc_trans_omp_directive): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
* dump-parse-tree.cc (show_omp_clauses): Dump full, partial and
sizes clauses.
(show_omp_node): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
(show_code_node): Likewise.
gcc/testsuite/
* c-c++-common/gomp/attrs-tile-1.c: New test.
* c-c++-common/gomp/attrs-tile-2.c: New test.
* c-c++-common/gomp/attrs-tile-3.c: New test.
* c-c++-common/gomp/attrs-tile-4.c: New test.
* c-c++-common/gomp/attrs-tile-5.c: New test.
* c-c++-common/gomp/attrs-tile-6.c: New test.
* c-c++-common/gomp/attrs-unroll-1.c: New test.
* c-c++-common/gomp/attrs-unroll-2.c: New test.
* c-c++-common/gomp/attrs-unroll-3.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-1.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-2.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-3.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-4.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-5.c: New test.
* c-c++-common/gomp/imperfect-attributes.c: Adjust expected
diagnostics.
* c-c++-common/gomp/imperfect-loop-nest.c: New test.
* c-c++-common/gomp/ordered-5.c: New test.
* c-c++-common/gomp/scan-7.c: New test.
* c-c++-common/gomp/tile-1.c: New test.
* c-c++-common/gomp/tile-2.c: New test.
* c-c++-common/gomp/tile-3.c: New test.
* c-c++-common/gomp/tile-4.c: New test.
* c-c++-common/gomp/tile-5.c: New test.
* c-c++-common/gomp/tile-6.c: New test.
* c-c++-common/gomp/tile-7.c: New test.
* c-c++-common/gomp/tile-8.c: New test.
* c-c++-common/gomp/tile-9.c: New test.
* c-c++-common/gomp/tile-10.c: New test.
* c-c++-common/gomp/tile-11.c: New test.
* c-c++-common/gomp/tile-12.c: New test.
* c-c++-common/gomp/tile-13.c: New test.
* c-c++-common/gomp/tile-14.c: New test.
* c-c++-common/gomp/tile-15.c: New test.
* c-c++-common/gomp/unroll-1.c: New test.
* c-c++-common/gomp/unroll-2.c: New test.
* c-c++-common/gomp/unroll-3.c: New test.
* c-c++-common/gomp/unroll-4.c: New test.
* c-c++-common/gomp/unroll-5.c: New test.
* c-c++-common/gomp/unroll-6.c: New test.
* c-c++-common/gomp/unroll-7.c: New test.
* c-c++-common/gomp/unroll-8.c: New test.
* c-c++-common/gomp/unroll-9.c: New test.
* c-c++-common/gomp/unroll-inner-1.c: New test.
* c-c++-common/gomp/unroll-inner-2.c: New test.
* c-c++-common/gomp/unroll-inner-3.c: New test.
* c-c++-common/gomp/unroll-non-rect-1.c: New test.
* c-c++-common/gomp/unroll-non-rect-2.c: New test.
* c-c++-common/gomp/unroll-non-rect-3.c: New test.
* c-c++-common/gomp/unroll-simd-1.c: New test.
* gcc.dg/gomp/attrs-4.c: Adjust expected diagnostics.
* gcc.dg/gomp/for-1.c: Likewise.
* gcc.dg/gomp/for-11.c: Likewise.
* g++.dg/gomp/attrs-4.C: Likewise.
* g++.dg/gomp/for-1.C: Likewise.
* g++.dg/gomp/pr94512.C: Likewise.
* g++.dg/gomp/tile-1.C: New test.
* g++.dg/gomp/tile-2.C: New test.
* g++.dg/gomp/unroll-1.C: New test.
* g++.dg/gomp/unroll-2.C: New test.
* g++.dg/gomp/unroll-3.C: New test.
* gfortran.dg/gomp/inner-loops-1.f90: New test.
* gfortran.dg/gomp/inner-loops-2.f90: New test.
* gfortran.dg/gomp/pure-1.f90: Add tests for !$omp unroll
and !$omp tile.
* gfortran.dg/gomp/pure-2.f90: Remove those tests from here.
* gfortran.dg/gomp/scan-9.f90: New test.
* gfortran.dg/gomp/tile-1.f90: New test.
* gfortran.dg/gomp/tile-2.f90: New test.
* gfortran.dg/gomp/tile-3.f90: New test.
* gfortran.dg/gomp/tile-4.f90: New test.
* gfortran.dg/gomp/tile-5.f90: New test.
* gfortran.dg/gomp/tile-6.f90: New test.
* gfortran.dg/gomp/tile-7.f90: New test.
* gfortran.dg/gomp/tile-8.f90: New test.
* gfortran.dg/gomp/tile-9.f90: New test.
* gfortran.dg/gomp/tile-10.f90: New test.
* gfortran.dg/gomp/tile-imperfect-nest-1.f90: New test.
* gfortran.dg/gomp/tile-imperfect-nest-2.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-1.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-2.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-3.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-4.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-5.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-6.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-7.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-8.f90: New test.
* gfortran.dg/gomp/tile-non-rectangular-1.f90: New test.
* gfortran.dg/gomp/tile-non-rectangular-2.f90: New test.
* gfortran.dg/gomp/tile-non-rectangular-3.f90: New test.
* gfortran.dg/gomp/tile-unroll-1.f90: New test.
* gfortran.dg/gomp/tile-unroll-2.f90: New test.
* gfortran.dg/gomp/unroll-1.f90: New test.
* gfortran.dg/gomp/unroll-2.f90: New test.
* gfortran.dg/gomp/unroll-3.f90: New test.
* gfortran.dg/gomp/unroll-4.f90: New test.
* gfortran.dg/gomp/unroll-5.f90: New test.
* gfortran.dg/gomp/unroll-6.f90: New test.
* gfortran.dg/gomp/unroll-7.f90: New test.
* gfortran.dg/gomp/unroll-8.f90: New test.
* gfortran.dg/gomp/unroll-9.f90: New test.
* gfortran.dg/gomp/unroll-10.f90: New test.
* gfortran.dg/gomp/unroll-11.f90: New test.
* gfortran.dg/gomp/unroll-12.f90: New test.
* gfortran.dg/gomp/unroll-13.f90: New test.
* gfortran.dg/gomp/unroll-inner-loop-1.f90: New test.
* gfortran.dg/gomp/unroll-inner-loop-2.f90: New test.
* gfortran.dg/gomp/unroll-no-clause-1.f90: New test.
* gfortran.dg/gomp/unroll-non-rect-1.f90: New test.
* gfortran.dg/gomp/unroll-non-rect-2.f90: New test.
* gfortran.dg/gomp/unroll-simd-1.f90: New test.
* gfortran.dg/gomp/unroll-simd-2.f90: New test.
* gfortran.dg/gomp/unroll-simd-3.f90: New test.
* gfortran.dg/gomp/unroll-tile-1.f90: New test.
* gfortran.dg/gomp/unroll-tile-2.f90: New test.
* gfortran.dg/gomp/unroll-tile-inner-1.f90: New test.
libgomp/
* testsuite/libgomp.c-c++-common/imperfect-transform-1.c: New test.
* testsuite/libgomp.c-c++-common/imperfect-transform-2.c: New test.
* testsuite/libgomp.c-c++-common/matrix-1.h: New test.
* testsuite/libgomp.c-c++-common/matrix-constant-iter.h: New test.
* testsuite/libgomp.c-c++-common/matrix-helper.h: New test.
* testsuite/libgomp.c-c++-common/matrix-no-directive-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-for-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-omp-parallel-for-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-simd-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-target-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-target-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-taskloop-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-omp-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-simd-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-transform-variants-1.h:
New test.
* testsuite/libgomp.c-c++-common/target-imperfect-transform-1.c:
New test.
* testsuite/libgomp.c-c++-common/target-imperfect-transform-2.c:
New test.
* testsuite/libgomp.c-c++-common/unroll-1.c: New test.
* testsuite/libgomp.c-c++-common/unroll-non-rect-1.c: New test.
* testsuite/libgomp.c++/matrix-no-directive-unroll-full-1.C: New test.
* testsuite/libgomp.c++/tile-2.C: New test.
* testsuite/libgomp.c++/tile-3.C: New test.
* testsuite/libgomp.c++/unroll-1.C: New test.
* testsuite/libgomp.c++/unroll-2.C: New test.
* testsuite/libgomp.c++/unroll-full-tile.C: New test.
* testsuite/libgomp.fortran/imperfect-transform-1.f90: New test.
* testsuite/libgomp.fortran/imperfect-transform-2.f90: New test.
* testsuite/libgomp.fortran/inner-1.f90: New test.
* testsuite/libgomp.fortran/nested-fn.f90: New test.
* testsuite/libgomp.fortran/target-imperfect-transform-1.f90: New test.
* testsuite/libgomp.fortran/target-imperfect-transform-2.f90: New test.
* testsuite/libgomp.fortran/tile-1.f90: New test.
* testsuite/libgomp.fortran/tile-2.f90: New test.
* testsuite/libgomp.fortran/tile-unroll-1.f90: New test.
* testsuite/libgomp.fortran/tile-unroll-2.f90: New test.
* testsuite/libgomp.fortran/tile-unroll-3.f90: New test.
* testsuite/libgomp.fortran/tile-unroll-4.f90: New test.
* testsuite/libgomp.fortran/unroll-1.f90: New test.
* testsuite/libgomp.fortran/unroll-2.f90: New test.
* testsuite/libgomp.fortran/unroll-3.f90: New test.
* testsuite/libgomp.fortran/unroll-4.f90: New test.
* testsuite/libgomp.fortran/unroll-5.f90: New test.
* testsuite/libgomp.fortran/unroll-6.f90: New test.
* testsuite/libgomp.fortran/unroll-7a.f90: New test.
* testsuite/libgomp.fortran/unroll-7b.f90: New test.
* testsuite/libgomp.fortran/unroll-7c.f90: New test.
* testsuite/libgomp.fortran/unroll-7.f90: New test.
* testsuite/libgomp.fortran/unroll-8.f90: New test.
* testsuite/libgomp.fortran/unroll-simd-1.f90: New test.
* testsuite/libgomp.fortran/unroll-tile-1.f90: New test.
* testsuite/libgomp.fortran/unroll-tile-2.f90: New test.

commit | commitdiff | tree

Wilco Dijkstra [Wed, 5 Jun 2024 13:04:33 +0000 (14:04 +0100)]

AArch64: Fix cpu features initialization [PR115342]

The CPU features initialization code uses CPUID registers (rather than
HWCAP).  The equality comparisons it uses are incorrect: for example FEAT_SVE
is not set if SVE2 is available.  Using HWCAPs for these is both simpler and
correct.  The initialization must also be done atomically to avoid multiple
threads causing corruption due to non-atomic RMW accesses to the global.

libgcc:
PR target/115342
* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
Use HWCAP where possible.  Use atomic write for initialization.
Fix FEAT_PREDRES comparison.
(__init_cpu_features_resolver): Use atomic load for correct
initialization.
(__init_cpu_features): Likewise.

commit | commitdiff | tree

Wilco Dijkstra [Wed, 5 Jun 2024 13:05:59 +0000 (14:05 +0100)]

testsuite: Improve check-function-bodies

Improve check-function-bodies by allowing single-character function names.

gcc/testsuite:
* lib/scanasm.exp (configure_check-function-bodies): Allow single-char
function names.

commit | commitdiff | tree

Kewen Lin [Wed, 5 Jun 2024 09:23:04 +0000 (04:23 -0500)]

darwin: Replace use of LONG_DOUBLE_TYPE_SIZE

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type. To be prepared for that, this
patch is to replace use of LONG_DOUBLE_TYPE_SIZE in darwin
with TYPE_PRECISION of long_double_type_node.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/ChangeLog:

* config/darwin.cc (darwin_patch_builtins): Use TYPE_PRECISION of
long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.

commit | commitdiff | tree

Kewen Lin [Wed, 5 Jun 2024 09:22:25 +0000 (04:22 -0500)]

fortran: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type. To be prepared for that, this
patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
in fortran with TYPE_PRECISION of
{float,{,long_}double}_type_node.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (build_round_expr): Use TYPE_PRECISION of
long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
* trans-types.cc (gfc_build_real_type): Use TYPE_PRECISION of
{float,double,long_double}_type_node to replace
{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.

commit | commitdiff | tree

Kewen Lin [Wed, 5 Jun 2024 09:22:25 +0000 (04:22 -0500)]

d: Replace use of LONG_DOUBLE_TYPE_SIZE

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type. To be prepared for that, this
patch is to remove the only one use of LONG_DOUBLE_TYPE_SIZE
in d. Iain found that LONG_DOUBLE_TYPE_SIZE is poorly named
and used incorrectly before, so this patch follows his advice
with int_size_in_bytes.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

Co-authored-by: Iain Buclaw <ibuclaw@gdcproject.org>
gcc/d/ChangeLog:

* d-target.cc (Target::_init): Use int_size_in_bytes of
long_double_type_node to replace the expression with
LONG_DOUBLE_TYPE_SIZE for c.long_doublesize assignment.

commit | commitdiff | tree

Kewen Lin [Wed, 5 Jun 2024 09:22:25 +0000 (04:22 -0500)]

ada: Replace use of LONG_DOUBLE_TYPE_SIZE

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type. To be prepared for that, this
patch is to replace use of LONG_DOUBLE_TYPE_SIZE in ada
with TYPE_PRECISION of long_double_type_node.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_entity): Use TYPE_PRECISION of
long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.

commit | commitdiff | tree

Pan Li [Tue, 28 May 2024 07:37:44 +0000 (15:37 +0800)]

Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation sub.  Aka set the result of add to the min when downflow.
It will take the pattern similar as below.

SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));

For example for uint8_t, we have

* SAT_SUB (255, 0)   => 255
* SAT_SUB (1, 2)     => 0
* SAT_SUB (254, 255) => 0
* SAT_SUB (0, 255)   => 0

Given below SAT_SUB for uint64

uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
{
  return (x - y) & (-(TYPE)(x >= y));
}

Before this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  _Bool _1;
  long unsigned int _3;
  uint64_t _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = x_4(D) >= y_5(D);
  _3 = x_4(D) - y_5(D);
  _6 = _1 ? _3 : 0;
  return _6;
;;    succ:       EXIT
}

After this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  return _6;
;;    succ:       EXIT
}

The below tests are running for this patch:
*. The riscv fully regression tests.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
* match.pd: Add new match for SAT_SUB.
* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
new decl for generated in match.pd.
(build_saturation_binary_arith_call): Add new helper function
to build the gimple call to binary SAT alu.
(match_saturation_arith): Rename from.
(match_unsigned_saturation_add): Rename to.
(match_unsigned_saturation_sub): Add new func to match the
unsigned sat sub.
(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
try when COND_EXPR.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Gerald Pfeifer [Wed, 5 Jun 2024 07:26:58 +0000 (09:26 +0200)]

doc: Streamline recommendation of GNU awk

GNU awk 3.1.5 was released in August 2005; no need to specify this in
the context of "recent version".

gcc:
PR other/69374
* doc/install.texi (Prerequisites): Drop reference to GNU awk
version 3.1.5. Remove fluff.

commit | commitdiff | tree

Thomas Schwinge [Wed, 24 Apr 2024 09:51:54 +0000 (11:51 +0200)]

Add 'c-c++-common/initpri1{,-lto,-split}-static.c' as internal linkage variants

gcc/testsuite/
* c-c++-common/initpri1_part_c1.c: Consider 'CDTOR_LINKAGE'.
* c-c++-common/initpri1_part_c2.c: Likewise.
* c-c++-common/initpri1_part_c3.c: Likewise.
* c-c++-common/initpri1_part_cd4.c: Likewise.
* c-c++-common/initpri1_part_d1.c: Likewise.
* c-c++-common/initpri1_part_d2.c: Likewise.
* c-c++-common/initpri1_part_d3.c: Likewise.
* c-c++-common/initpri1.c: Specify it.
* c-c++-common/initpri1-lto.c: Likewise.
* c-c++-common/initpri1-split.c: Likewise.
* c-c++-common/initpri1-static.c: New.
* c-c++-common/initpri1-lto-static.c: Likewise.
* c-c++-common/initpri1-split-static.c: Likewise.

commit | commitdiff | tree

Thomas Schwinge [Wed, 24 Apr 2024 09:24:39 +0000 (11:24 +0200)]

Add 'c-c++-common/initpri1-split.c': 'c-c++-common/initpri1.c' split into separate translation units

gcc/testsuite/
* c-c++-common/initpri1.c: Split into...
* c-c++-common/initpri1_part_c1.c: ... this, and...
* c-c++-common/initpri1_part_c2.c: ... this, and...
* c-c++-common/initpri1_part_c3.c: ... this, and...
* c-c++-common/initpri1_part_cd4.c: ... this, and...
* c-c++-common/initpri1_part_d1.c: ... this, and...
* c-c++-common/initpri1_part_d2.c: ... this, and...
* c-c++-common/initpri1_part_d3.c: ... this, and...
* c-c++-common/initpri1_part_main.c: ... this part.
* c-c++-common/initpri1-split.c: New.

commit | commitdiff | tree

Thomas Schwinge [Wed, 24 Apr 2024 07:26:39 +0000 (09:26 +0200)]

Add C++ testing for 'gcc.dg/initpri1-lto.c': 'c-c++-common/initpri1-lto.c'

Similar to commit a7d75773adadfcd536a5ded48ba215f18e8c5b3d
"Consolidate similar C/C++ test cases for 'constructor', 'destructor' function attributes with priority".

gcc/testsuite/
* gcc.dg/initpri1-lto.c: Integrate this...
* c-c++-common/initpri1-lto.c: ... here.

commit | commitdiff | tree

Thomas Schwinge [Wed, 24 Apr 2024 07:26:39 +0000 (09:26 +0200)]

Consolidate similar C/C++ test cases for 'constructor', 'destructor' function attributes with priority

gcc/testsuite/
* gcc.dg/initpri1.c: Integrate this...
* g++.dg/special/initpri1.C: ..., and this...
* c-c++-common/initpri1.c: ... here.
* gcc.dg/initpri1-lto.c: Adjust.
* gcc.dg/initpri2.c: Integrate this...
* g++.dg/special/initpri2.C: ..., and this...
* c-c++-common/initpri2.c: ... here.

commit | commitdiff | tree

Thomas Schwinge [Wed, 24 Apr 2024 08:11:02 +0000 (10:11 +0200)]

Clarify that 'gcc.dg/initpri3.c' is a LTO variant of 'gcc.dg/initpri1.c': 'gcc.dg/initpri1-lto.c' [PR46083]

Added in commit 06c9eb5136fe0e778cc3a643131eba2a3dfb77a8 (Subversion r168642)
"re PR lto/46083 (gcc.dg/initpri1.c FAILs with -flto/-fwhopr (attribute constructor/destructor doesn't work))".

PR lto/46083
gcc/testsuite/
* gcc.dg/initpri3.c: Remove.
* gcc.dg/initpri1-lto.c: New.

commit | commitdiff | tree

Gerald Pfeifer [Wed, 5 Jun 2024 05:59:47 +0000 (07:59 +0200)]

libstdc++: Update gcc.gnu.org links in FAQ to https

libstdc++-v3:
* doc/xml/faq.xml: Move gcc.gnu.org to https.
* doc/html/faq.html: Regenerate.

commit | commitdiff | tree

liuhongt [Tue, 21 May 2024 08:57:17 +0000 (16:57 +0800)]

Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

According to IEEE standard, for conversions from floating point to
integer. When a NaN or infinite operand cannot be represented in the
destination format and this cannot otherwise be indicated, the invalid
operation exception shall be signaled. When a numeric operand would
convert to an integer outside the range of the destination format, the
invalid operation exception shall be signaled if this situation cannot
otherwise be indicated.

The patch prevent simplication of the conversion from floating point
to integer for NAN/INF/out-of-range constant when flag_trapping_math.

gcc/ChangeLog:

PR rtl-optimization/100927
PR rtl-optimization/115161
PR rtl-optimization/115115
* simplify-rtx.cc (simplify_const_unary_operation): Prevent
simplication of FIX/UNSIGNED_FIX for NAN/INF/out-of-range
constant when flag_trapping_math.
* fold-const.cc (fold_convert_const_int_from_real): Don't fold
for overflow value when_trapping_math.

gcc/testsuite/ChangeLog:

* gcc.dg/pr100927.c: New test.
* c-c++-common/Wconversion-1.c: Add -fno-trapping-math.
* c-c++-common/dfp/convert-int-saturate.c: Ditto.
* g++.dg/ubsan/pr63956.C: Ditto.
* g++.dg/warn/Wconversion-real-integer.C: Ditto.
* gcc.c-torture/execute/20031003-1.c: Ditto.
* gcc.dg/Wconversion-complex-c99.c: Ditto.
* gcc.dg/Wconversion-real-integer.c: Ditto.
* gcc.dg/c90-const-expr-11.c: Ditto.
* gcc.dg/overflow-warn-8.c: Ditto.

commit | commitdiff | tree

Xiao Zeng [Wed, 15 May 2024 05:56:42 +0000 (13:56 +0800)]

RISC-V: Add Zfbfmin extension

1 In the previous patch, the libcall for BF16 was implemented:
<https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8c7cee80eb50792e57d514be1418c453ddd1073e>

2 Riscv provides Zfbfmin extension, which completes the "Scalar BF16 Converts":
<https://github.com/riscv/riscv-bfloat16/blob/main/doc/riscv-bfloat16-zfbfmin.adoc>

3 Implemented replacing libcall with Zfbfmin extension instruction.

4 Reused previous testcases in:
<https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8c7cee80eb50792e57d514be1418c453ddd1073e>
gcc/ChangeLog:

* config/riscv/iterators.md: Add mode_iterator between
floating-point modes and BFmode.
* config/riscv/riscv.cc (riscv_output_move): Handle BFmode move
for zfbfmin.
* config/riscv/riscv.md (trunc<mode>bf2): New pattern for BFmode.
(extendbfsf2): Dotto.
(*movhf_hardfloat): Add BFmode.
(*mov<mode>_hardfloat): Dotto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfbfmin-bf16_arithmetic.c: New test.
* gcc.target/riscv/zfbfmin-bf16_comparison.c: New test.
* gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c: New test.

commit | commitdiff | tree

GCC Administrator [Wed, 5 Jun 2024 00:16:55 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

Simon Martin [Tue, 4 Jun 2024 09:59:31 +0000 (11:59 +0200)]

c++: Add testcase for PR103338

The case in that PR used to ICE until commit f04dc89. This patch simply adds
the case to the testsuite.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/103388

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash73.C: New test.

commit | commitdiff | tree

Harald Anlauf [Mon, 3 Jun 2024 20:02:06 +0000 (22:02 +0200)]

Fortran: fix ALLOCATE with SOURCE=, zero-length character [PR83865]

gcc/fortran/ChangeLog:

PR fortran/83865
* trans-stmt.cc (gfc_trans_allocate): Restrict special case for
source-expression with zero-length character to rank 0, so that
the array shape is not discarded.

gcc/testsuite/ChangeLog:

PR fortran/83865
* gfortran.dg/allocate_with_source_32.f90: New test.

commit | commitdiff | tree

Simon Martin [Tue, 4 Jun 2024 15:27:25 +0000 (17:27 +0200)]

Add missing space after seen_error in gcc/cp/pt.cc

I realized that I committed a change with a missing space after seen_error.
This fixes it, as well as another occurrence in the same file.

Apologies for the mistake - I'll commit this as obvious.

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr): Add missing space after seen_error.
(dependent_type_p): Likewise.

commit | commitdiff | tree

Simon Martin [Fri, 24 May 2024 15:00:17 +0000 (17:00 +0200)]

Fix PR c++/111106: missing ; causes internal compiler error

We currently fail upon the following because an assert in dependent_type_p
fails for f's parameter

=== cut here ===
consteval int id (int i) { return i; }
constexpr int
f (auto i) requires requires { id (i) } { return i; }
void g () { f (42); }
=== cut here ===

This patch fixes this by relaxing the assert to pass during error recovery.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/111106

gcc/cp/ChangeLog:

* pt.cc (dependent_type_p): Don't fail assert during error recovery.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval37.C: New test.

commit | commitdiff | tree

Jonathan Wakely [Tue, 4 Jun 2024 14:06:44 +0000 (15:06 +0100)]

libstdc++: Only define std::span::at for C++26 [PR115335]

In r14-5689-g1fa85dcf656e2f I added std::span::at and made the correct
changes to the __cpp_lib_span macro (with tests for the correct value in
C++20/23/26). But I didn't make the declaration of std::span::at
actually depend on the macro, so it was defined for C++20 and C++23, not
only for C++26. This fixes that oversight.

libstdc++-v3/ChangeLog:

PR libstdc++/115335
* include/std/span (span::at): Guard with feature test macro.

commit | commitdiff | tree

Jakub Jelinek [Tue, 4 Jun 2024 14:16:49 +0000 (16:16 +0200)]

ranger: Improve CLZ fold_range [PR115337]

cfn_ctz::fold_range includes special cases for the case where .CTZ has
two arguments and so is well defined at zero, and the second argument is
equal to prec or -1, but cfn_clz::fold_range does that only for the prec
case.  -1 is fairly common as well though, because the <stdbit.h> builtins
do use it now, so I think it is worth special casing that.
If we don't know anything about the argument, the difference for
.CLZ (arg, -1) is that previously the result was varying, now it will be
[-1, prec-1].  If we knew arg can't be zero, it used to be optimized before
as well into e.g. [0, prec-1] or similar.

2024-06-04  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/115337
* gimple-range-op.cc (cfn_clz::fold_range): For
m_gimple_call_internal_p handle as a special case also second argument
of -1 next to prec.

commit | commitdiff | tree

Jakub Jelinek [Tue, 4 Jun 2024 13:52:09 +0000 (15:52 +0200)]

fold-const: Handle CTZ like CLZ in tree_call_nonnegative_warnv_p [PR115337]

I think we can handle CTZ exactly like CLZ in tree_call_nonnegative_warnv_p.
Like CLZ, if it is UB at zero, the result range is [0, prec-1] and if it is
well defined at zero, the second argument provides the value at zero.

2024-06-04 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p): Handle
CASE_CFN_CTZ like CASE_CFN_CLZ.

commit | commitdiff | tree

Jakub Jelinek [Tue, 4 Jun 2024 13:51:31 +0000 (15:51 +0200)]

fold-const, gimple-fold: Some formatting cleanups

While looking into PR115337, I've spotted some badly formatted code,
which the following patch fixes.

2024-06-04 Jakub Jelinek <jakub@redhat.com>

* fold-const.cc (tree_call_nonnegative_warnv_p): Formatting fixes.
(tree_invalid_nonnegative_warnv_p): Likewise.
* gimple-fold.cc (gimple_call_nonnegative_warnv_p): Likewise.

commit | commitdiff | tree

Jakub Jelinek [Tue, 4 Jun 2024 13:49:41 +0000 (15:49 +0200)]

fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result. That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).

The following patch does that.

2024-06-04 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) <CASE_CFN_CLZ>:
If arg1 is non-NULL, RECURSE on it, otherwise return true.

* gcc.dg/bitint-106.c: New test.

commit | commitdiff | tree

Rainer Orth [Tue, 4 Jun 2024 11:33:46 +0000 (13:33 +0200)]

testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

Two new AVX10.1 tests FAIL on Solaris/x86:

FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors)
FAIL: gcc.target/i386/avx10_1-26.c (test for excess errors)

Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/avx10_1-25.c:6:9: error: the call requires 'ifunc', which is not supported by this target

Fixed by requiring ifunc support.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

2024-06-04 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/avx10_1-25.c: Require ifunc support.
* gcc.target/i386/avx10_1-26.c: Likewise.

commit | commitdiff | tree

Jakub Jelinek [Tue, 4 Jun 2024 10:28:01 +0000 (12:28 +0200)]

builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow and __builtin{add,sub}c [PR108789]

The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR. Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).

2024-06-04 Jakub Jelinek <jakub@redhat.com>

PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
(fold_builtin_addc_subc): Set TREE_SIDE_EFFECTS on call before
calling save_expr.

* gcc.c-torture/execute/pr108789.c: New test.

commit | commitdiff | tree

Jakub Jelinek [Tue, 4 Jun 2024 10:20:13 +0000 (12:20 +0200)]

invoke.texi: Clarify -march=lujiazui

I was recently searching which exact CPUs are affected by the PR114576
wrong-code issue and went from the PTA_* bitmasks in GCC, so arrived
at the goldmont, goldmont-plus, tremont and lujiazui CPUs (as -march=
cases which do enable -maes and don't enable -mavx).
But when double-checking that against the invoke.texi documentation,
that was true for the first 3, but lujiazui said it supported AVX.
I was really confused by that, until I found the
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604407.html
explanation. So, seems the CPUs do have AVX and F16C but -march=lujiazui
doesn't enable those and even activelly attempts to filter those out from
the announced CPUID features, in glibc as well as e.g. in libgcc.

Thus, I think we should document what actually happens, otherwise
users could assume that
gcc -march=lujiazui predefines __AVX__ and __F16C__, which it doesn't.

2024-06-04 Jakub Jelinek <jakub@redhat.com>

* doc/invoke.texi (lujiazui): Clarify that while the CPUs do support
AVX and F16C, -march=lujiazui actually doesn't enable those.

GNU Compiler Collection

This page took 0.130321 seconds and 5 git commands to generate.