gcc.gnu.org Git - gcc.git/log

RISC-V: Make full-vec-move1.c test robust for optimization

During investigate the support of early break autovec, we notice
the test full-vec-move1.c will be optimized to 'return 0;' in main
function body.  Because somehow the value of V type is compiler
time constant,  and then the second loop will be considered as
assert (true).

Thus,  the ccp4 pass will eliminate these stmt and just return 0.

typedef int16_t V __attribute__((vector_size (128)));

int main ()
{
  V v;
  for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
    (v)[i] = i;

  V res = v;
  for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
    assert (res[i] == i); // will be optimized to assert (true)
}

This patch would like to introduce a extern function to use the res[i]
that get rid of the ccp4 optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c:
Introduce extern func use to get rid of ccp4 optimization.

Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit b1520d2260c5e0cfcd7a4354fab70f66e2912ff2)

RISC-V: Add tests for cpymemsi expansion

cpymemsi expansion was available for RISC-V since the initial port.
However, there are not tests to detect regression.
This patch adds such tests.

Three of the tests target the expansion requirements (known length and
alignment). One test reuses an existing memcpy test from the by-pieces
framework (gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymemsi-1.c: New test.
* gcc.target/riscv/cpymemsi-2.c: New test.
* gcc.target/riscv/cpymemsi-3.c: New test.
* gcc.target/riscv/cpymemsi.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit 4d38e88227ea48e559a2f354c0e62d372e181b82)

[PATCH v1 1/1] RISC-V: Nan-box the result of movbf on soft-bf16

1 This patch implements the Nan-box of bf16.

2 Please refer to the Nan-box implementation of hf16 in:
<https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=057dc349021660c40699fb5c98fd9cac8e168653>

3 The discussion about Nan-box can be found on the website:
<https://www.mail-archive.com/search?q=Nan-box+the+result+of+movhf+on+soft-fp16&l=gcc-patches%40gcc.gnu.org>

4 Below test are passed for this patch
* The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movbf_softfloat_boxing): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Bfloat16-nanboxing.c: New test.

(cherry picked from commit ce51e6727c9d69bbab0e766c449e60fd41f5f2f9)

[RISC-V][V2] Fix incorrect if-then-else nesting of Zbs usage in constant synthesis

Reposting without the patch that ignores whitespace.  The CI system doesn't
like including both patches, that'll generate a failure to apply and none of
the tests actually get run.

So I managed to goof the if-then-else level of the bseti bits last week.  They
were supposed to be a last ditch effort to improve the result, but ended up
inside a conditional where they don't really belong.  I almost always use Zba,
Zbb and Zbs together, so it slipped by.

So it's NFC if you always test with Zbb and Zbs enabled together.  But if you
enabled Zbs without Zbb you'd see a failure to use bseti.

gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Fix incorrect
if-then-else nesting of Zbs code.

(cherry picked from commit 1c234097487927a4388ddcc690b63597bb3a90dc)

RISC-V: Cover sign-extensions in lshr<GPR:mode>3_zero_extend_4

The lshr<GPR:mode>3_zero_extend_4 pattern targets bit extraction
with zero-extension. This pattern represents the canonical form
of zero-extensions of a logical right shift.

The same optimization can be applied to sign-extensions.
Given the two optimizations are so similar, this patch converts
the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (ashiftrt): New code attribute
'extract_shift' and adding extractions to optab.
* config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): Rename to...
(*<any_extract:optab><GPR:mode>3):...this and add support for
sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: Add helpers for
sign-extension.
* gcc.target/riscv/sign-extend-rshift-32.c: New test.
* gcc.target/riscv/sign-extend-rshift-64.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit 3ee30d7981987b86bd6a9a2675e26fadec48e5cd)

RISC-V: Add zero_extract support for rv64gc

The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions. Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.

Tested with SPEC CPU 2017 (rv64gc).

PR target/111501

gcc/ChangeLog:

* config/riscv/riscv.md (*lshr<GPR:mode>3_zero_extend_4): New
pattern for zero-extraction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: New test.
* gcc.target/riscv/pr111501.c: New test.
* gcc.target/riscv/zero-extend-rshift-32.c: New test.
* gcc.target/riscv/zero-extend-rshift-64.c: New test.
* gcc.target/riscv/zero-extend-rshift.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit 3b9c760072c7792cbae6f38894756d2b96c2fd8c)

RISC-V: Cover sign-extensions in lshrsi3_zero_extend_2

The pattern lshrsi3_zero_extend_2 extracts the MSB bits of the lower
32-bit word and zero-extends it back to DImode.
This is realized using srliw, which operates on 32-bit registers.

The same optimziation can be applied to sign-extensions when emitting
a sraiw instead of the srliw.

Given these two optimizations are so similar, this patch simply
converts the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (sraiw): New code iterator 'any_extract'.
New code attribute 'extract_sidi_shift'.
* config/riscv/riscv.md (*lshrsi3_zero_extend_2): Rename to...
(*lshrsi3_extend_2):...this and add support for sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: Test sraiw 24 and sraiw 16.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit 4e46a3537ff57938a0d98fa524ac2fff8b08ae6d)

RISC-V: Add test for sraiw-31 special case

We already optimize a sign-extension of a right-shift by 31 in
<optab>si3_extend. Let's add a test for that (similar to
zero-extend-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit dd388198b8be52ab378c935fc517a269e0ba741c)

[committed][RISC-V] Turn on overlap_op_by_pieces for generic-ooo tuning

Per quick email exchange with Palmer. Given the triviality, I'm just pushing
it.

gcc/
* config/riscv/riscv.cc (generic_ooo_tune_info): Turn on
overlap_op_by_pieces.

(cherry picked from commit 9f14f1978260148d4d6208dfd73df1858e623758)

[committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

This is almost exclusively work from the VRULL team.

As we've discussed in the Tuesday meeting in the past, we'd like to have a knob
in the tuning structure to indicate that overlapped stores during
move_by_pieces expansion of memcpy & friends are acceptable.

This patch adds the that capability in our tuning structure. It's off for all
the uarchs upstream, but we have been using it inside Ventana for our uarch
with success. So technically it's NFC upstream, but puts in the infrastructure
multiple organizations likely need.

gcc/

* config/riscv/riscv.cc (struct riscv_tune_param): Add new
"overlap_op_by_pieces" field.
(rocket_tune_info, sifive_7_tune_info): Set it.
(sifive_p400_tune_info, sifive_p600_tune_info): Likewise.
(thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise.
(generic_ooo_tune_info, optimize_size_tune_info): Likewise.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): define.

gcc/testsuite/

* gcc.target/riscv/memcpy-nonoverlapping.c: New test.
* gcc.target/riscv/memset-nonoverlapping.c: New test.

(cherry picked from commit 300393484dbfa9fd3891174ea47aa3fb41915abc)

[RISC-V] [PATCH v2] Enable inlining str* by default

So with Chrstoph's patches from late 2022 we've had the ability to inline
strlen, and str[n]cmp (scalar).  However, we never actually turned this
capability on by default!

This patch flips the those default to allow inlinining by default.  It also
fixes one bug exposed by our internal testing when NBYTES is zero for strncmp.
I don't think that case happens enough to try and optimize it, we just disable
inline expansion for that instance.

This has been bootstrapped and regression tested on rv64gc at various times as
well as cross tested on rv64gc more times than I can probably count (we've have
this patch internally for a while).  More importantly, I just successfully
tested it on rv64gc and rv32gcv elf configurations with the trunk

gcc/

* config/riscv/riscv-string.cc (riscv_expand_strcmp): Do not inline
strncmp with zero size.
(emit_strcmp_scalar_compare_subword): Adjust rotation for rv32 vs rv64.
* config/riscv/riscv.opt (var_inline_strcmp): Enable by default.
(vriscv_inline_strncmp, riscv_inline_strlen): Likewise.

gcc/testsuite

* gcc.target/riscv/zbb-strlen-disabled-2.c: Turn off inlining.

(cherry picked from commit 1139f38e798181572121657e5b267a9698edb62f)

[PATCH 1/1] RISC-V: Add Zfbfmin extension to the -march= option

This patch would like to add new sub extension (aka Zfbfmin) to the
-march= option. It introduces a new data type BF16.

1 The Zfbfmin extension depend on 'F', and the FLH, FSH, FMV.X.H, and
FMV.H.X instructions as defined in the Zfh extension.

2 The Zfhmin extension includes the following instructions from the
Zfh extension: FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S.

3 Zfhmin extension depend on 'F'.

4 Simply put, just make Zfbfmin dependent on Zfhmin.

Perhaps in the future, we could propose making the FLH, FSH, FMV.X.H, and
FMV.H.X instructions an independent extension to achieve precise dependency
relationships for the Zfbfmin.

You can locate more information about Zfbfmin from below spec doc.

<https://github.com/riscv/riscv-bfloat16/releases/download/v59042fc71c31a9bcb2f1957621c960ed36fac401/riscv-bfloat16.pdf>

gcc/
* common/config/riscv/riscv-common.cc (riscv_implied_info): zfbfmin
implies zfhmin.
(riscv_ext_version_table, riscv_ext_flag_table): Add zfbfmin.
* config/riscv/riscv.opt (ZFBFMIN): Add optoion.

gcc/testsuite/
* gcc.target/riscv/arch-35.c: New test.
* gcc.target/riscv/arch-36.c: New test.
* gcc.target/riscv/predef-34.c: New test.
* gcc.target/riscv/predef-35.c: New test.

(cherry picked from commit 35224ead63732a3550ba4b1332c06e9dc7999c31)

RISC-V: Add testcase for PR114749.

this adds a test case for PR114749.
Going to commit as obvious unless somebody complains.

Regards
Robin

gcc/testsuite/ChangeLog:

PR tree-optimization/114749

* gcc.target/riscv/rvv/autovec/pr114749.c: New test.

[RISC-V] Add support for _Bfloat16

1 At point <https://github.com/riscv/riscv-bfloat16>,
  BF16 has already been completed "post public review".

2 LLVM has also added support for RISCV BF16 in
  <https://reviews.llvm.org/D151313> and
  <https://reviews.llvm.org/D150929>.

3 According to the discussion <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/367>,
  this use __bf16 and use DF16b in riscv_mangle_type like x86.

Below test are passed for this patch
    * The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/iterators.md: New mode iterator HFBF.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type _Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_init_libfuncs): Set the conversion method for BFmode and
HFmode.
(riscv_block_arith_comp_libfuncs_for_mode): Set the arithmetic
and comparison libfuncs for the mode.
* config/riscv/riscv.md (mode" ): Add BF.
(movhf): Support for BFmode.
(mov<mode>): Ditto.
(*movhf_softfloat): Ditto.
(*mov<mode>_softfloat): Ditto.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): Ditto.
* config/riscv/t-softfp32: Add support for BF16 libfuncs.
* config/riscv/t-softfp64: Ditto.
* soft-fp/floatsibf.c: For si -> bf16.
* soft-fp/floatunsibf.c: For unsi -> bf16.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparison.c: New test.
* gcc.target/riscv/bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/bf16_integer_libcall_convert.c: New test.

Co-authored-by: Jin Ma <jinma@linux.alibaba.com>
(cherry picked from commit 8c7cee80eb50792e57d514be1418c453ddd1073e)

RISC-V: Document -mcmodel=large

This slipped through the cracks. Probably also NEWS-worthy.

gcc/ChangeLog:

* doc/invoke.texi (RISC-V): Add -mcmodel=large.

(cherry picked from commit 6ffea3e37380860507cce08af42a997fbdb5d754)

So another constant synthesis improvement.

In this patch we're looking at cases where we'd like to be able to use
lui+slli, but can't because of the sign extending nature of lui on
TARGET_64BIT. For example: 0x8001100000020UL. The trunk currently generates 4
instructions for that constant, when it can be done with 3 (lui+slli.uw+addi).

When Zba is enabled, we can use lui+slli.uw as the slli.uw masks off the bits
32..63 before shifting, giving us the precise semantics we want.

I strongly suspect we'll want to do the same for a set of constants with
lui+add.uw, lui+shNadd.uw, so you'll see the beginnings of generalizing support
for lui followed by a "uw" instruction.

The new test just tests the set of cases that showed up while exploring a
particular space of the constant synthesis problem. It's not meant to be
exhaustive (failure to use shadd when profitable).

gcc/

* config/riscv/riscv.cc (riscv_integer_op): Add field tracking if we
want to use a "uw" instruction variant.
(riscv_build_integer_1): Initialize the new field in various places.
Use lui+slli.uw for some constants.
(riscv_move_integer): Handle slli.uw.

gcc/testsuite/

* gcc.target/riscv/synthesis-2.c: New test.

(cherry picked from commit 975bb17e2f6bc90d366237ab1c5dc9b8df2dee69)

RISC-V: miscll comment fixes [NFC]

gcc/ChangeLog:
* config/riscv/riscv.cc: Comment updates.
* config/riscv/riscv.h: Ditto.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
(cherry picked from commit 467ca4a195e26dba77e7f62cc1a3d45a4e541c72)

[committed][RISC-V] Fix nearbyint failure on rv32 and formatting nits

The CI system tripped an execution failure for rv32 with the ceil/round patch.

The fundamental problem is the FP->INT step in these sequences requires the
input size to match the output size.  The output size was based on rv32/rv64.
Meaning that we'd try to do DF->SI->DF.

That doesn't preserve the semantics we want in at least two ways.

The net is we can't use this trick for DF values on rv32.  While inside the
code I realized we had a similar problem for HF modes.  HF modes we can support
only for Zfa.  So I fixed that proactively.

The CI system also pointed out various formatting nits.  I think this fixes all
but one overly long line.

Note I could have factored the TARGET_ZFA test.  But I think as-written it's
clearer what the desired cases to transform are.

gcc/
* config/riscv/riscv.md (<round_pattern><ANYF:mode>2): Adjust
condition to match what can be properly implemented.  Fix various
formatting issues.
(l<round_pattern><ANYF:mode>si2_sext): Fix formatting

(cherry picked from commit 8367c996e55b2c54aeee25e446357a1015a1d11d)

[RFA][RISC-V] Improve constant synthesis for constants with 2 bits set

In doing some preparation work for using zbkb's pack instructions for constant
synthesis I figured it would be wise to get a sense of how well our constant
synthesis is actually working and address any clear issues.

So the first glaring inefficiency is in our handling of constants with a small
number of bits set.  Let's start with just two bits set.   There are 2016
distinct constants in that space (rv64).  With Zbs enabled the absolute worst
we should ever do is two instructions (bseti+bseti).  Yet we have 503 cases
where we're generating 3+ instructions when there's just two bits set in the
constant.  A constant like 0x8000000000001000 generates 4 instructions!

This patch adds bseti (and indirectly binvi if we needed it) as a first class
citizen for constant synthesis.  There's two components to this change.

First, we can't generate an IOR with a constant like (1 << 45) as an operand.
The IOR/XOR define_insn is in riscv.md.  The constant argument for those
patterns must match an arith_operand which means its not really usable for
generating bseti directly in the cases we care about (at least one of the bits
will be in the 32..63 range and thus won't match arith_operand).

We have a few things we could do.  One would be to extend the existing pattern
to incorporate bseti cases.  But I suspect folks like the separation of the
base architecture (riscv.md) from the Zb* extensions (bitmanip.md).  We could
also try to generate the RTL for bseti
directly, bypassing gen_fmt_ee (which forces undesirable constants into registers based on the predicate of the appropriate define_insn). Neither of these seemed particularly appealing to me.

So what I've done instead is to make ior/xor a define_expand and have the
expander allow a wider set of constant operands when Zbs is enabled.  That
allows us to keep the bulk of Zb* support inside bitmanip.md and continue to
use gen_fmt_ee in the constant synthesis paths.

Note the code generation in this case is designed to first set as many bits as
we can with lui, then with addi since those can both set multiple bits at a
time.  If there are any residual bits left to set we can emit bseti
instructions up to the current cost ceiling.

This results in fixing all of the 503 2-bit set cases where we emitted too many
instructions.  It also significantly helps other scenarios with more bits set.

The testcase I'm including verifies the number of instructions we generate for
the full set of 2016 possible cases.  Obviously this won't be possible as we
increase the number of bits (there are something like 48k cases with just 3
bits set).

gcc/

* config/riscv/predicates.md (arith_or_zbs_operand): New predicate.
* config/riscv/riscv.cc (riscv_build_integer_one): Use bseti to set
single bits when profitable.
* config/riscv/riscv.md (*<optab><mode>3): Renamed with '*' prefix.
(<optab><mode>3): New expander for IOR/XOR.

gcc/testsuite
* gcc.target/riscv/synthesis-1.c: New test.

[committed] [RISC-V] Don't run new rounding tests on newlib risc-v targets

The new round_32.c and round_64.c tests depend on the optimizers to recognize
the conversions feeding the floor/ceil calls and convert them into ceilf,
floorf and the like.

Those transformations only occur when the target indicates the C library has
the appropriate routines (fnclass == function_c99_misc). While newlib has
these routines, they are not exposed as available to the compiler and thus the
transformation the tests depend on do not happen. Naturally the scan-tests then
fail.

gcc/testsuite
* gcc.target/riscv/round_32.c: Add require-effective-target glibc.
* gcc.target/riscv/round_64.c: Likewise.

(cherry picked from commit 1e29da0b6508b23a7a6b14a7fb643b917a195003)

[committed] [RISC-V] Trivial pattern cleanup

As I was reviewing and cleaning up some internal work, I noticed a particular
idiom being used elsewhere in the RISC-V backend.

Specifically the use of explicit subregs when an adjustment to the
match_operand would be sufficient.

Let's take this example from the and-not splitter:

>  (define_split
>    [(set (match_operand:X 0 "register_operand")
>         (and:X (not:X (lshiftrt:X (match_operand:X 1 "register_operand")
>                                   (subreg:QI (match_operand:X 2 "register_operand") 0)))
>                (const_int 1)))]

Note the explicit subreg.  We can instead use a match_operand with QImode.
This ever-so-slightly simplifies the machine description.

It also means that if we have a QImode object lying around (say we loaded it
from memory in QImode), we can use it directly rather than first extending it
to X, then truncing to QI.  So we end up with simpler RTL and in rare cases
improve the code we generate.

When used in a define_split or define_insn_and_split we need to make suitable
adjustments to the split RTL.

Bootstrapped a while back.  Just re-tested with a cross.

gcc/
* config/riscv/bitmanip.md (splitter to use w-form division): Remove
explicit subregs.
(zero extended bitfield extraction): Similarly.
* config/riscv/thead.md (*th_memidx_operand): Similarly.

(cherry picked from commit 76ca6e1f8b1524b82a871ce29cf58c79e5e77e2b)

[committed] [RISC-V] Fix detection of store pair fusion cases

We've got the ability to count the number of store pair fusions happening in
the front-end of the pipeline.  When comparing some code from last year vs the
current trunk we saw a fairly dramatic drop.

The problem is the store pair fusion detection code was actively harmful due to
a minor bug in checking offsets.   So instead of pairing up 8 byte stores such
as sp+0 with sp+8, it tried to pair up sp+8 and sp+16.

Given uarch sensitivity I didn't try to pull together a testcase.  But we could
certainly see the undesirable behavior in benchmarks as simplistic as dhrystone
up through spec2017.

Anyway, bootstrapped a while back.  Also verified through our performance
counters that store pair fusion rates are back up.  Regression tested with
crosses a few minutes ago.

gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Break out
tests for easier debugging in store pair fusion case.  Fix offset
check in same.

(cherry picked from commit fad93e7617ce1aafb006983a71b6edc9ae1eb2d1)

This is almost exclusively Jivan's work.  His original post:

> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg336483.html

This patch is primarily meant to improve the code we generate for FP rounding
such as ceil/floor.  It also addresses some unnecessary sign extensions in the
same areas.

RISC-V's FP conversions have a bit of undesirable behavior that make them
non-suitable as-is for ceil/floor and other related functions. These
deficiencies are addressed in the Zfa extension, but there's no reason not to
pick up a nice improvement when we can.

Basically we can still use the basic FP conversions for floor/ceil and friends
when we don't care about inexact exceptions by checking for the special cases
first, then emitting the conversion when the special cases don't apply.  That's
still much faster than calling into glibc.

The redundant sign extensions are eliminated using the same trick Jivan added
last year, just in a few more places ;-)

This eliminates roughly 10% of the dynamic instruction count for imagick.  But
more importantly it's about a 17% performance improvement for that workload
within spec.

This has been bootstrapped as well as regression tested in a cross environment.
It's also successfully built & run specint/specfp correctly.

Pushing to the trunk and the coordination branch momentarily.

gcc/
* config/riscv/iterators.md (fix_ops, fix_uns): New iterators.
(RINT, rint_pattern, rint_rm): Remove unused iterators.
* config/riscv/riscv-protos.h (get_fp_rounding_coefficient): Prototype.
* config/riscv/riscv-v.cc (get_fp_rounding_coefficient): Externalize.
external linkage.
* config/riscv/riscv.md (UNSPEC_LROUND): Remove.
(fix_trunc<ANYF:mode><GPR:mode>2): Replace with ...
(<fix_uns>_trunc<ANYF:mode>si2): New expander & associated insn.
(<fix_uns>_trunc<ANYF:mode>si2_ext): New insn.
(<fix_uns>_trunc<ANYF:mode>di2): Likewise.
(l<rint_pattern><ANYF:mode><GPR:mode>2): Replace with ...
(lrint<ANYF:mode>si2): New expander and associated insn.
(lrint<ANYF:mode>si2_ext, lrint<ANYF:mode>di2): New insns.
(<round_pattern><ANYF:mode>2): Replace with....
(l<round_pattern><ANYF:mode>si2): New expander and associated insn.
(l<round_pattern><ANYF:mode>si2_sext): New insn.
(l<round_pattern><ANYF:mode>di2): Likewise.
(<round_pattern><ANYF:mode>2): New expander.

gcc/testsuite/
* gcc.target/riscv/fix.c: New test.
* gcc.target/riscv/round.c: New test.
* gcc.target/riscv/round_32.c: New test.
* gcc.target/riscv/round_64.c: New test.

(cherry picked from commit f652a35877e32d470d649d1aee5d94fa0169a478)

RISC-V: Refine the condition for add additional vars in RVV cost model

The adjacent_dr_p is sufficient and unnecessary condition for contiguous access.
So unnecessary live-ranges are added and result in smaller LMUL.

This patch uses MEMORY_ACCESS_TYPE as condition and constrains segment
load/store.

Tested on RV64 and no regression.

PR target/114506

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (non_contiguous_memory_access_p): Rename
(need_additional_vector_vars_p): Rename and refine condition

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr114506.c: New test.

Signed-off-by: demin.han <demin.han@starfivetech.com>
(cherry picked from commit ca2f531cc5db4f1020d4329976610356033e0246)

RISC-V: Fix parsing of Zic* extensions

The extension parsing table entries for a range of Zic* extensions
does not match the mask definition in riscv.opt.
This results in broken TARGET_ZIC* macros, because the values of
riscv_zi_subext and riscv_zicmo_subext are set wrong.

This patch fixes this by moving Zic64b into riscv_zicmo_subext
and all other affected Zic* extensions to riscv_zi_subext.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Move ziccamoa, ziccif,
zicclsm, and ziccrse into riscv_zi_subext.
* config/riscv/riscv.opt: Define MASK_ZIC64B for
riscv_ziccmo_subext.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
(cherry picked from commit 285300eb928b171236e895f28c960ad02dcb0d67)

RISC-V: Add -X to link spec

--discard-locals (-X) instructs the linker to remove local .L* symbols,
which occur a lot due to label differences for linker relaxation. The
arm port has a similar need and passes -X to ld.

In contrast, the RISC-V port does not pass -X to ld and rely on the
default --discard-locals in GNU ld's riscv port. The arm way is more
conventional (compiler driver instead of the linker customizes the
default linker behavior) and works with lld.

gcc/ChangeLog:

* config/riscv/elf.h (LINK_SPEC): Add -X.
* config/riscv/freebsd.h (LINK_SPEC): Add -X.
* config/riscv/linux.h (LINK_SPEC): Add -X.

Daily bump.

Fortran: fix bounds check for assignment, class component [PR86100]

gcc/fortran/ChangeLog:

PR fortran/86100
* trans-array.cc (gfc_conv_ss_startstride): Use abridged_ref_name
to generate a more user-friendly name for bounds-check messages.
* trans-expr.cc (gfc_copy_class_to_class): Fix bounds check for
rank>1 by looping over the dimensions.

gcc/testsuite/ChangeLog:

PR fortran/86100
* gfortran.dg/bounds_check_25.f90: New test.

(cherry picked from commit 93765736815a049e14d985b758a213cfe60c1e1c)

Daily bump.

c++: deleting array temporary [PR115187]

Decaying the array temporary to a pointer and then deleting that crashes in
verify_gimple_stmt, because the TARGET_EXPR is first evaluated inside the
TRY_FINALLY_EXPR, but the cleanup point is outside. Fixed by using
get_target_expr instead of save_expr.

I also adjust the stabilize_expr comment to prevent me from again thinking
it's a suitable replacement.

PR c++/115187

gcc/cp/ChangeLog:

* init.cc (build_delete): Use get_target_expr instead of save_expr.
* tree.cc (stabilize_expr): Update comment.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/array-prvalue3.C: New test.

c++: Propagate using decls from partitions [PR114868]

The modules code currently neglects to set OVL_USING_P on the dependency
created for a using-decl, which causes it not to remember that the
OVL_EXPORT_P flag had been set on it when emitted from the primary
interface unit. This patch ensures that it occurs.

PR c++/114868

gcc/cp/ChangeLog:

* module.cc (depset::hash::add_binding_entity): Propagate
OVL_USING_P for using-declarations.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-15_a.C: New test.
* g++.dg/modules/using-15_b.C: New test.
* g++.dg/modules/using-15_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
(cherry picked from commit 0d0215b10dbbe39d655ceda4af283f288ec7680c)

c++: Fix instantiation of imported temploid friends [PR114275]

This patch fixes a number of issues with the handling of temploid friend
declarations.

The primary issue is that instantiations of friend declarations should
attach the declaration to the same module as the befriending class, by
[module.unit] p7.1 and [temp.friend] p2; this could be a different
module from the current TU, and so needs special handling.

The other main issue here is that we can't assume that just because name
lookup didn't find a definition for a hidden class template, that it
doesn't exist at all: it could be a non-exported entity that we've
nevertheless streamed in from an imported module.  We need to ensure
that when instantiating template friend classes that we return the same
TEMPLATE_DECL that we got from our imports, otherwise we will get later
issues with 'duplicate_decls' (rightfully) complaining that they're
different when trying to merge.

This doesn't appear necessary for function templates due to the existing
name lookup handling already finding these hidden declarations.

(cherry-picked from commits b5f6a56940e70838a07e885de03a92e2bd64674a and
ec2365e07537e8b17745d75c28a2b45bf33be119)

PR c++/105320
PR c++/114275

gcc/cp/ChangeLog:

* cp-tree.h (propagate_defining_module): Declare.
(remove_defining_module): Declare.
(lookup_imported_hidden_friend): Declare.
* decl.cc (duplicate_decls): Also check if hidden decls can be
redeclared in this module. Call remove_defining_module on
to-be-freed newdecl.
* module.cc (imported_temploid_friends): New.
(init_modules): Initialize it.
(trees_out::decl_value): Write it; don't consider imported
temploid friends as attached to a module.
(trees_in::decl_value): Read it for non-discarded decls.
(get_originating_module_decl): Follow the owning decl for an
imported temploid friend.
(propagate_defining_module): New.
(remove_defining_module): New.
* name-lookup.cc (get_mergeable_namespace_binding): New.
(lookup_imported_hidden_friend): New.
* pt.cc (tsubst_friend_function): Propagate defining module for
new friend functions.
(tsubst_friend_class): Lookup imported hidden friends.  Check
for valid module attachment of existing names.  Propagate
defining module for new classes.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-10_a.C: New test.
* g++.dg/modules/tpl-friend-10_b.C: New test.
* g++.dg/modules/tpl-friend-10_c.C: New test.
* g++.dg/modules/tpl-friend-10_d.C: New test.
* g++.dg/modules/tpl-friend-11_a.C: New test.
* g++.dg/modules/tpl-friend-11_b.C: New test.
* g++.dg/modules/tpl-friend-12_a.C: New test.
* g++.dg/modules/tpl-friend-12_b.C: New test.
* g++.dg/modules/tpl-friend-12_c.C: New test.
* g++.dg/modules/tpl-friend-12_d.C: New test.
* g++.dg/modules/tpl-friend-12_e.C: New test.
* g++.dg/modules/tpl-friend-12_f.C: New test.
* g++.dg/modules/tpl-friend-13_a.C: New test.
* g++.dg/modules/tpl-friend-13_b.C: New test.
* g++.dg/modules/tpl-friend-13_c.C: New test.
* g++.dg/modules/tpl-friend-13_d.C: New test.
* g++.dg/modules/tpl-friend-13_e.C: New test.
* g++.dg/modules/tpl-friend-13_f.C: New test.
* g++.dg/modules/tpl-friend-13_g.C: New test.
* g++.dg/modules/tpl-friend-14_a.C: New test.
* g++.dg/modules/tpl-friend-14_b.C: New test.
* g++.dg/modules/tpl-friend-14_c.C: New test.
* g++.dg/modules/tpl-friend-14_d.C: New test.
* g++.dg/modules/tpl-friend-9.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

c++: Standardise errors for module_may_redeclare

Currently different places calling 'module_may_redeclare' all emit very
similar but slightly different error messages, and handle different
kinds of declarations differently. This patch makes the function
perform its own error messages so that they're all in one place, and
prepares it for use with temploid friends.

gcc/cp/ChangeLog:

* cp-tree.h (module_may_redeclare): Add default parameter.
* decl.cc (duplicate_decls): Don't emit errors for failed
module_may_redeclare.
(xref_tag): Likewise.
(start_enum): Likewise.
* semantics.cc (begin_class_definition): Likewise.
* module.cc (module_may_redeclare): Clean up logic. Emit error
messages on failure.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-12.C: Update error message.
* g++.dg/modules/friend-5_b.C: Likewise.
* g++.dg/modules/shadow-1_b.C: Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 2faf040335f9b49c33ba6d40cf317920f27ce431)

Daily bump.

sra: Do not leave work for DSE (that it can sometimes not perform)

When looking again at the g++.dg/tree-ssa/pr109849.C testcase we
discovered that it generates terrible store-to-load forwarding stalls
because SRA was leaving behind aggregate loads but all the stores were
by scalar parts and DSE failed to remove the useless load.  SRA has
all the knowledge to remove the statement even now, so this small
patch makes it do so.

With this patch, the g++.dg/tree-ssa/pr109849.C micro-benchmark runs 9
times faster (on an AMD EPYC 75F3 machine).

gcc/ChangeLog:

2024-04-18  Martin Jambor  <mjambor@suse.cz>

* tree-sra.cc (sra_modify_assign): Remove the original statement
also when dealing with a store to a fully covered aggregate from a
non-candidate.

gcc/testsuite/ChangeLog:

2024-04-23  Martin Jambor  <mjambor@suse.cz>

* g++.dg/tree-ssa/pr109849.C: Also check that the aggeegate store
to cur disappears.
* gcc.dg/tree-ssa/ssa-dse-26.c: Instead of relying on DSE,
check that the unwanted stores were removed at early SRA time.

(cherry picked from commit f6743695b4d2bd4da96e56a19157372f93b800bd)

Daily bump.

c++: failure to suppress -Wsizeof-array-div in template [PR114983]

-Wsizeof-array-div offers a way to suppress the warning by wrapping
the second operand of the division in parens:

  sizeof (samplesBuffer) / (sizeof(unsigned char))

but this doesn't work in a template, because we fail to propagate
the suppression bits.  Do it, then.

The finish_parenthesized_expr hunk is not needed because suppress_warning
isn't very fine-grained.  But I think it makes sense to be explicit and
not rely on OPT_Wparentheses also suppressing OPT_Wsizeof_array_div.

PR c++/114983

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) <case SIZEOF_EXPR>: Use copy_warning.
* semantics.cc (finish_parenthesized_expr): Also suppress
-Wsizeof-array-div.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wsizeof-array-div3.C: New test.

(cherry picked from commit 646db3d30bd071a1b671b4f91c9ea2ab7f2be21c)

testsuite: Verify r0-r3 are extended with CMSE

Add regression test to the existing zero/sign extend tests for CMSE to
verify that r0, r1, r2 and r3 are properly extended, not just r0.

boolCharShortEnumSecureFunc test is done using -O0 to ensure the
instructions are in a predictable order.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cmse/extend-param.c: Add regression test. Add
-fshort-enums.
* gcc.target/arm/cmse/extend-return.c: Add -fshort-enums option.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 9ddad76e98ac8f257f90b3814ed3c6ba78d0f3c7)

Fix internal error in seh_cfa_offset with -O2 -fno-omit-frame-pointer

The problem directly comes from the -ffold-mem-offsets pass messing up with
the prologue and the frame-related instructions, which is a no-no with SEH,
so the fix simply disconnects the pass in these circumstances.

gcc/
PR rtl-optimization/115038
* fold-mem-offsets.cc (fold_offsets): Return 0 if the defining
instruction of the register is frame related.

gcc/testsuite/
* g++.dg/opt/fmo1.C: New test.

libstdc++: Implement std::formatter<std::thread::id> without <sstream> [PR115099]

The std::thread::id formatter uses std::basic_ostringstream without
including <sstream>, which went unnoticed because the test for it uses
a stringstream to check the output is correct.

The fix implemented here is to stop using basic_ostringstream for
formatting thread::id and just use std::format instead.

As a drive-by fix, the formatter specialization is constrained to
require that the thread::id::native_handle_type can be formatted, to
avoid making the formatter ill-formed if the pthread_t type is not a
pointer or integer. Since non-void pointers can't be formatted, ensure
that we convert pointers to const void* for formatting. Make a similar
change to the existing operator<< overload so that in the unlikely case
that pthread_t is a typedef for char* we don't treat it as a
null-terminated string when inserting into a stream.

libstdc++-v3/ChangeLog:

PR libstdc++/115099
* include/bits/std_thread.h: Declare formatter as friend of
thread::id.
* include/std/thread (operator<<): Convert non-void pointers to
void pointers for output.
(formatter): Add constraint that thread::native_handle_type is a
pointer or integer.
(formatter::format): Reimplement without basic_ostringstream.
* testsuite/30_threads/thread/id/output.cc: Check output
compiles before <sstream> has been included.

(cherry picked from commit 1a5e4dd83788ea4c049d354d83ad58a6a3d747e6)

strlen: Fix up !si->full_string_p handling in count_nonzero_bytes_addr [PR115152]

The following testcase is miscompiled because
strlen_pass::count_nonzero_bytes_addr doesn't handle correctly
the !si->full_string_p case.
If si->full_string_p, it correctly computes minlen and maxlen as
minimum and maximum length of the '\0' terminated stgring and
clears *nulterm (ie. makes sure !full_string_p in the ultimate
caller) if minlen is equal or larger than nbytes and so
'\0' isn't guaranteed to be among those bytes.
But in the !si->full_string_p case, all we know is that there
are [minlen,maxlen] non-zero bytes followed by unknown bytes,
so effectively the maxlen is infinite (but caller cares about only
the first nbytes bytes) and furthermore, we never know if there is
any '\0' char among those, so *nulterm needs to be always cleared.

2024-05-22 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/115152
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes_addr): If
!si->full_string_p, clear *nulterm and set maxlen to nbytes.

* gcc.dg/pr115152.c: New test.

(cherry picked from commit dbc9b45a3c2468ad134b3a9bd4961f7ae6bc1e67)

ubsan: Use right address space for MEM_REF created for bool/enum sanitization [PR115172]

The following testcase is miscompiled, because -fsanitize=bool,enum
creates a MEM_REF without propagating there address space qualifiers,
so what should be normally loaded using say %gs:/%fs: segment prefix
isn't.  Together with asan it then causes that load to be sanitized.

2024-05-22  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/115172
* ubsan.cc (instrument_bool_enum_load): If rhs is not in generic
address space, use qualified version of utype with the right
address space.  Formatting fix.

* gcc.dg/asan/pr115172.c: New test.

(cherry picked from commit d3c506eff54fcbac389a529c2e98da108a410b7f)

i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

Since vpermq is really slow, we should avoid using it for permutation
when vpmovwb is not available (needs AVX512BW) for ix86_expand_vecop_qihi2
and fall back to ix86_expand_vecop_qihi.

gcc/ChangeLog:

PR target/115069
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Do not enable the optimization when AVX512BW is not enabled.

gcc/testsuite/ChangeLog:

PR target/115069
* gcc.target/i386/pr115069.c: New.

Daily bump.

c++: Fix std dialect hint for std::to_address [PR107800]

The correct dialect for std::to_address is cxx20 not cxx11.

gcc/cp/ChangeLog:

PR libstdc++/107800
* cxxapi-data.csv <to_address>: Change dialect to cxx20.
* std-name-hint.gperf: Regenerate.
* std-name-hint.h: Regenerate.

(cherry picked from commit 826a7d3d19d3ebf04e21d6f1c89eb341a36fb5d1)

c++: folding non-dep enumerator from current inst [PR115139]

After the tsubst_copy removal r14-4796-g3e3d73ed5e85e7 GCC 14 ICEs during
fold_non_dependent_expr for 'e1 | e2' below ultimately because we no
longer exit early when substituting the CONST_DECLs for e1 and e2 with
args=NULL_TREE and instead proceed to substitute the class context A<Ts...>
(also with args=NULL_TREE) which ends up ICEing from tsubst_pack_expansion
(due to processing_template_decl being cleared).

Incidentally, the ICE went away on trunk ever since the tsubst_aggr_type
removal r15-123-gf04dc89a991ddc since it changed the CONST_DECL case of
tsubst_expr to use tsubst to substitute the context, which short circuits
for empty args and so avoids the ICE.

This patch fixes this ICE for GCC 14 by narrowly restoring the early exit
for empty args that would've happened in tsubst_copy when substituting an
enumerator CONST_DECL. We might as well apply this to trunk too, as a
small optimization.

PR c++/115139

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) <case CONST_DECL>: Exit early if args
is empty.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent33.C: New test.

Reviewed-by: Marek Polacek <mpolacek@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit f0c0bced62b9c728ed1e672747aa234d918da22c)

Fortran: fix dependency checks for inquiry refs [PR115039]

gcc/fortran/ChangeLog:

PR fortran/115039
* expr.cc (gfc_traverse_expr): An inquiry ref does not constitute
a dependency and cannot collide with a symbol.

gcc/testsuite/ChangeLog:

PR fortran/115039
* gfortran.dg/statement_function_5.f90: New test.

(cherry picked from commit d4974fd22730014e337fd7ec2471945ba8afb00e)

match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for signed 1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.

Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround the undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for that.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115154

gcc/ChangeLog:

* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): Disable
for 1bit signed types.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 49c87d22535ac4f8aacf088b3f462861c26cacb4)

Daily bump.

PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.

Bootstrapped and tested on x86_64_linux-gnu with no regressions.

PR tree-optimization/115143

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 9ff8f041331ef8b56007fb3c4d41d76f9850010d)

c++: aggregate CTAD w/ paren init and bases [PR115114]

During aggregate CTAD with paren init, we're accidentally overlooking
base classes since TYPE_FIELDS of a template type doesn't contain
corresponding base fields. So we need to consider them separately.

PR c++/115114

gcc/cp/ChangeLog:

* pt.cc (maybe_aggr_guide): Consider bases in the paren init case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr15.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 5aaf47cb1987bbc5508c4b9b7dad5ea7d69af2c2)

c++: lvalueness of non-dependent assignment expr [PR114994]

r14-4111-g6e92a6a2a72d3b made us check non-dependent simple assignment
expressions ahead of time and give them a type, as was already done for
compound assignments. Unlike for compound assignments however, if a
simple assignment resolves to an operator overload we represent it as a
(typed) MODOP_EXPR instead of a CALL_EXPR to the selected overload.
(I reckoned this was at worst a pessimization -- we'll just have to repeat
overload resolution at instantiatiation time.)

But this turns out to break the below testcase ultimately because
MODOP_EXPR (of non-reference type) is always treated as an lvalue
according to lvalue_kind, which is incorrect for the MODOP_EXPR
representing x=42.

We can fix this by representing such class assignment expressions as
CALL_EXPRs as well, but this turns out to require some tweaking of our
-Wparentheses warning logic and may introduce other fallout making it
unsuitable for backporting.

So this patch instead fixes lvalue_kind to consider the type of a
MODOP_EXPR representing a class assignment.

PR c++/114994

gcc/cp/ChangeLog:

* tree.cc (lvalue_kind) <case MODOP_EXPR>: For a class
assignment, consider the result type.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent32.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit c6cc6d4741a880109c4e0e64d5a189687fb526f6)

Daily bump.

AVR: target/115065 - Tweak __clzhi2.

The libgcc implementation of __clzhi2 can be tweaked by
one cycle in some situations by re-arranging the instructions.
It also reduces the WCET by 1 cycle.

libgcc/
PR target/115065
* config/avr/lib1funcs.S (__clzhi2): Tweak.

(cherry picked from commit 988838da722dea09bd81ee9d49800a6f24980372)

Fortran: Fix select type regression due to r14-9489 [PR114874]

2024-05-17 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/114874
* gfortran.h: Add 'assoc_name_inferred' to gfc_namespace.
* match.cc (gfc_match_select_type): Set 'assoc_name_inferred'
in select type namespace if the selector has inferred type.
* primary.cc (gfc_match_varspec): If a select type temporary
is apparently scalar and a left parenthesis has been detected,
check the current namespace has 'assoc_name_inferred' set. If
so, set inferred_type.
* resolve.cc (resolve_variable): If the namespace of a select
type temporary is marked with 'assoc_name_inferred' call
gfc_fixup_inferred_type_refs to ensure references are OK.
(gfc_fixup_inferred_type_refs): Catch invalid array refs..

gcc/testsuite/
PR fortran/114874
* gfortran.dg/pr114874_1.f90: New test for valid code.
* gfortran.dg/pr114874_2.f90: New test for invalid code.

(cherry picked from commit 5f5074fe7aaf9524defb265299a985eecba7f914)

libstdc++: Fix typo in _Grapheme_cluster_view::_Iterator [PR115119]

libstdc++-v3/ChangeLog:

PR libstdc++/115119
* include/bits/unicode.h (_Iterator::operator++(int)): Fix typo
in increment expression.
* testsuite/ext/unicode/grapheme_view.cc: Check post-increment
on view's iterator.

(cherry picked from commit c9e05b03c18e898be604ab90401476e9c473cc52)

tree-optimization/114998 - use-after-free with loop distribution

When loop distribution releases a PHI node of the original IL it
can end up clobbering memory that's re-used when it upon releasing
its RDG resets all stmt UIDs back to -1, even those that got released.

The fix is to avoid resetting UIDs based on stmts in the RDG but
instead reset only those still present in the loop.

PR tree-optimization/114998
* tree-loop-distribution.cc (free_rdg): Take loop argument.
Reset UIDs of stmts still in the IL rather than all stmts
referenced from the RDG.
(loop_distribution::build_rdg): Pass loop to free_rdg.
(loop_distribution::distribute_loop): Likewise.
(loop_distribution::transform_reduction_loop): Likewise.

* gcc.dg/torture/pr114998.c: New testcase.

(cherry picked from commit 34d15a4d630a0d54eddb99bdab086c506e10dac5)

Update gcc sv.po

* sv.po: Update.

Daily bump.

middle-end/114931 - type_hash_canon and structual equality types

TYPE_STRUCTURAL_EQUALITY_P is part of our type system so we have
to make sure to include that into the type unification done via
type_hash_canon. This requires the flag to be set before querying
the hash which is the biggest part of the patch.

PR middle-end/114931
gcc/
* tree.cc (type_hash_canon_hash): Hash TYPE_STRUCTURAL_EQUALITY_P.
(type_cache_hasher::equal): Compare TYPE_STRUCTURAL_EQUALITY_P.
(build_array_type_1): Set TYPE_STRUCTURAL_EQUALITY_P before
probing with type_hash_canon.
(build_function_type): Likewise.
(build_method_type_directly): Likewise.
(build_offset_type): Likewise.
(build_complex_type): Likewise.
* attribs.cc (build_type_attribute_qual_variant): Likewise.

gcc/c-family/
* c-common.cc (complete_array_type): Set TYPE_STRUCTURAL_EQUALITY_P
before probing with type_hash_canon.

gcc/testsuite/
* gcc.dg/pr114931.c: New testcase.

(cherry picked from commit b09c2e9560648b0cf993c2ca9ad972c34e6bddfa)

Avoid changing type in the type_hash_canon hash

When building a type and type_hash_canon returns an existing type
avoid changing it, in particular its TYPE_CANONICAL.

PR middle-end/114931
* tree.cc (build_array_type_1): Return early when type_hash_canon
returned an older existing type.
(build_function_type): Likewise.
(build_method_type_directly): Likewise.
(build_offset_type): Likewise.

(cherry picked from commit 7a212ac678e13e0df5da2d090144b246a1262b64)

Daily bump.

libstdc++: Guard dynamic_cast use in src/c++23/print.cc [PR115015]

Do not use dynamic_cast unconditionally, in case libstdc++ is built with
-fno-rtti.

libstdc++-v3/ChangeLog:

PR libstdc++/115015
* src/c++23/print.cc (__open_terminal(streambuf*)) [!__cpp_rtti]:
Do not use dynamic_cast.

libstdc++: Fix typo in std::stacktrace::max_size [PR115063]

libstdc++-v3/ChangeLog:

PR libstdc++/115063
* include/std/stacktrace (basic_stacktrace::max_size): Fix typo
in reference to _M_alloc member.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Check
max_size() compiles.

(cherry picked from commit dd9677f3343ca2a4b4aab9428b8129774accac29)

libstdc++: Guard uses of is_pointer_interconvertible_v [PR114891]

This type trait isn't supported by Clang 18. It's only used in static
assertions, so they can just be omitted if the trait isn't available.

libstdc++-v3/ChangeLog:

PR libstdc++/114891
* include/std/generator: Check feature test macro before using
is_pointer_interconvertible_v.

(cherry picked from commit 1fbe1a50d86df11f434351cf62461a32747f9710)

libstdc++: Update ABI test to disallow adding to released symbol versions

If we update the list of "active" symbols versions now, rather than when
adding a new symbol version, we will notice if new symbols get added to
the wrong version (as in PR 114692).

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_abi.cc: Update latest versions to
new versions that should be used in future.

(cherry picked from commit 6e25ca387fbbb412a2e498e28ea5db28e033a318)

libstdc++: Fix handling of incomplete UTF-8 sequences in _Unicode_view

Eddie Nolan reported to me that _Unicode_view was not correctly
implementing the substitution of ill-formed subsequences with U+FFFD,
due to failing to increment the counter when the iterator reaches the
end of the sequence before a multibyte sequence is complete. As a
result, the incomplete sequence was not completely consumed, and then
the remaining character was treated as another ill-formed sequence,
giving two U+FFFD characters instead of one.

To avoid similar mistakes in future, this change introduces a lambda
that increments the iterator and the counter together. This ensures the
counter is always incremented when the iterator is incremented, so that
we always know how many characters have been consumed.

libstdc++-v3/ChangeLog:

* include/bits/unicode.h (_Unicode_view::_M_read_utf8): Ensure
count of characters consumed is correct when the end of the
input is reached unexpectedly.
* testsuite/ext/unicode/view.cc: Test incomplete UTF-8
sequences.

(cherry picked from commit 3f04f3939ea0ac8fdd766a60655d29de2ffb44e5)

libstdc++: Fix <memory> for -std=c++23 -ffreestanding [PR114866]

std::shared_ptr isn't declared for freestanding, so guard uses of it
with #if _GLIBCXX_HOSTED in <bits/out_ptr.h>.

libstdc++-v3/ChangeLog:

PR libstdc++/114866
* include/bits/out_ptr.h [!_GLIBCXX_HOSTED]: Don't refer to
shared_ptr, __shared_ptr or __is_shred_ptr.
* testsuite/20_util/headers/memory/114866.cc: New test.

(cherry picked from commit 9927059bb88e966e0a45f09e4fd1193f93df708f)

Daily bump.

c++: nested aggregate/alias CTAD fixes [PR114974, PR114901, PR114903]

During maybe_aggr_guide with a nested class template and paren init,
like with list init we need to consider the generic template type rather
than the partially instantiated type since partial instantiations don't
have (partially instantiated) TYPE_FIELDS.  In turn we need to partially
substitute PARMs in the paren init case as well.  As a drive-by improvement
it seems better to use outer_template_args instead of DECL_TI_ARGS during
this partial substitution so that we lower instead of substitute the
innermost template parameters, which is generally more robust.

And during alias_ctad_tweaks with a nested class template, even though
the guides may be already partially instantiated we still need to
substitute the outermost arguments into its constraints.

PR c++/114974
PR c++/114901
PR c++/114903

gcc/cp/ChangeLog:

* pt.cc (maybe_aggr_guide): Fix obtaining TYPE_FIELDS in
the paren init case.  Hoist out partial substitution logic
to apply to the paren init case as well.
(alias_ctad_tweaks): Substitute outer template arguments into
a guide's constraints.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr14.C: New test.
* g++.dg/cpp2a/class-deduction-alias20.C: New test.
* g++.dg/cpp2a/class-deduction-alias21.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 6d31a370e26eeb950c326332633b3e8e84b6630b)

Update gcc .po files

* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po,
ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po,
zh_TW.po: Update.

Daily bump.

doc: Describe limitations re Ada, D, and Go on FreeBSD

gcc:
PR target/69374
PR target/112959
* doc/install.texi (Specific) <*-*-freebsd*>: The Ada and D
run-time libraries are broken on i386 which also can affect
64-bit builds. Go is broken.

(cherry picked from commit ff98aab108a6a4e50a831e7cfc011c2131f8d19c)

doc: FreeBSD no longer has a GNU toolchain in base

gcc:
PR target/69374
PR target/112959
* doc/install.texi (Specific) <*-*-freebsd*>: No longer refer
to GCC or binutils in base. Recommend bootstrap using binutils.

(cherry picked from commit 0695aba3e987f4bb06c95f29ff90a8a3234e1507)

doc: Remove old details on libunwind for ia64-*-hpux*

gcc:
PR target/69374
* doc/install.texi (Specific) <ia64-*-hpux*>: Remove details
on libunwind for GCC 3.4 and earlier.

(cherry picked from commit 81f7965e63028c1289ae3b1836750da74b01bc4a)

doc: Remove references to FreeBSD 7 and older

FreeBSD 7 has been end of life for years and current GCC most likely
would not build there anymore anyways.

gcc:
PR target/69374
PR target/112959
* doc/install.texi (Specific) <*-*-freebsd*>: Remove references to
FreeBSD 7 and older.

(cherry picked from commit 507f3ce34c55e78b23eeaf57bc4d927a1f25f8fb)

AVR: target/114981 - Tweak __builtin_powif / __powisf2

Implement __powisf2 in assembly.

PR target/114981
libgcc/
* config/avr/t-avr (LIB2FUNCS_EXCLUDE): Add _powisf2.
(LIB1ASMFUNCS) [!avrtiny]: Add _powif.
* config/avr/lib1funcs.S (mov4): New .macro.
(L_powif, __powisf2) [!avrtiny]: New module and function.

gcc/testsuite/
* gcc.target/avr/pr114981-powif.c: New test.

(cherry picked from commit af64af69c3cc85dbe00c520651a54850bf5cadc1)

c++, mingw: Fix up types of dtor hooks to __cxa_{,thread_}atexit/__cxa_throw on mingw ia32 [PR114968]

__cxa_atexit/__cxa_thread_atexit/__cxa_throw functions accept function
pointers to usually directly destructors rather than wrappers around
them.
Now, mingw ia32 uses implicitly __attribute__((thiscall)) calling
conventions for METHOD_TYPE (where the this pointer is passed in %ecx
register, the rest on the stack), so these functions use:
in config/os/mingw32/os_defines.h:
#if defined (__i386__)
#define _GLIBCXX_CDTOR_CALLABI __thiscall
#endif
in libsupc++/cxxabi.h
__cxa_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void*) _GLIBCXX_NOTHROW;
__cxa_thread_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void *) _GLIBCXX_NOTHROW;
__cxa_throw(void*, std::type_info*, void (_GLIBCXX_CDTOR_CALLABI *) (void *))
__attribute__((__noreturn__));

Now, mingw for some weird reason uses
#define TARGET_CXX_USE_ATEXIT_FOR_CXA_ATEXIT hook_bool_void_true
so it never actually uses __cxa_atexit, but does use __cxa_thread_atexit
and __cxa_throw.  Recent changes for modules result in more detailed
__cxa_*atexit/__cxa_throw prototypes precreated by the compiler, and if
that happens and one also includes <cxxabi.h>, the compiler complains about
mismatches in the prototypes.

One thing is the missing thiscall attribute on the FUNCTION_TYPE, the
other problem is that all of atexit/__cxa_atexit/__cxa_thread_atexit
get function pointer types created by a single function,
get_atexit_fn_ptr_type (), which creates it depending on if atexit
or __cxa_atexit will be used as either void(*)(void) or void(*)(void *),
but when using atexit and __cxa_thread_atexit it uses the wrong function
type for __cxa_thread_atexit.

The following patch adds a target hook to add the thiscall attribute to the
function pointers, and splits the get_atexit_fn_ptr_type () function into
get_atexit_fn_ptr_type () and get_cxa_atexit_fn_ptr_type (), the former always
creates shared void(*)(void) type, the latter creates either
void(*)(void*) (on most targets) or void(__attribute__((thiscall))*)(void*)
(on mingw ia32).  So that we don't waiste another GTY global tree for it,
because cleanup_type used for the same purpose for __cxa_throw should be
the same, the code changes it to use that type too.

In register_dtor_fn then based on the decision whether to use atexit,
__cxa_atexit or __cxa_thread_atexit it picks the right function pointer
type, and also if it decides to emit a __tcf_* wrapper for the cleanup,
uses that type for that wrapper so that it agrees on calling convention.

2024-05-10  Jakub Jelinek  <jakub@redhat.com>

PR target/114968
gcc/
* target.def (use_atexit_for_cxa_atexit): Remove spurious space
from comment.
(adjust_cdtor_callabi_fntype): New cxx target hook.
* targhooks.h (default_cxx_adjust_cdtor_callabi_fntype): Declare.
* targhooks.cc (default_cxx_adjust_cdtor_callabi_fntype): New
function.
* doc/tm.texi.in (TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE): Add.
* doc/tm.texi: Regenerate.
* config/i386/i386.cc (ix86_cxx_adjust_cdtor_callabi_fntype): New
function.
(TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE): Redefine.
gcc/cp/
* cp-tree.h (atexit_fn_ptr_type_node, cleanup_type): Adjust macro
comments.
(get_cxa_atexit_fn_ptr_type): Declare.
* decl.cc (get_atexit_fn_ptr_type): Adjust function comment, only
build type for atexit argument.
(get_cxa_atexit_fn_ptr_type): New function.
(get_atexit_node): Call get_cxa_atexit_fn_ptr_type rather than
get_atexit_fn_ptr_type when using __cxa_atexit.
(get_thread_atexit_node): Call get_cxa_atexit_fn_ptr_type
rather than get_atexit_fn_ptr_type.
(start_cleanup_fn): Add ob_parm argument, call
get_cxa_atexit_fn_ptr_type or get_atexit_fn_ptr_type depending
on it and create PARM_DECL also based on that argument.
(register_dtor_fn): Adjust start_cleanup_fn caller, use
get_cxa_atexit_fn_ptr_type rather than get_atexit_fn_ptr_type
for use_dtor casts.
* except.cc (build_throw): Use get_cxa_atexit_fn_ptr_type ().

(cherry picked from commit e5d8fd9ce05611093191d500ebc39f150d0ece2b)

driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

In GCC 14 we started to emit URLs for "command-line option <option> is
valid for <language> but not <another language>" and "-Werror= argument
'-Werror=<option>' is not valid for <language>" warnings. So we should
have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
-fdiagnostics-urls= wouldn't be able to control URLs in these warnings.

No test cases are added because with TERM=xterm-256colors PR114980
already triggers some test failures.

gcc/ChangeLog:

PR driver/114980
* opts-common.cc (prune_options): Move -fdiagnostics-urls=
early like -fdiagnostics-color=.

(cherry picked from commit f75806ec63ec1af2d76a194e5fa73e114b2b8857)

Fortran: fix issues with class(*) assignment [PR114827]

gcc/fortran/ChangeLog:

PR fortran/114827
* trans-array.cc (gfc_alloc_allocatable_for_assignment): Take into
account _len of unlimited polymorphic entities when calculating
the effective element size for allocation size and array span.
Set _len of lhs to _len of rhs.
* trans-expr.cc (trans_class_assignment): Take into account _len
of unlimited polymorphic entities for allocation size.

gcc/testsuite/ChangeLog:

PR fortran/114827
* gfortran.dg/asan/unlimited_polymorphic_34.f90: New test.

(cherry picked from commit 21e7aa5f3ea44ca2fef8deb8788edffc04901b5c)

Daily bump.

testsuite: Fix up vector-subaccess-1.C test for ia32 [PR89224]

The test FAILs on i686-linux due to
.../gcc/testsuite/g++.dg/torture/vector-subaccess-1.C:16:6: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi]
excess warnings.

This fixes it by adding -Wno-psabi, like commonly done in other tests.

2024-05-09 Jakub Jelinek <jakub@redhat.com>

PR c++/89224
* g++.dg/torture/vector-subaccess-1.C: Add -Wno-psabi as additional
options.

(cherry picked from commit 8fb65ec816ff8f0d529b6d30821abace4328c9a2)

AVR: target/114975 - Add combine-pattern for __parityqi2.

PR target/114975
gcc/
* config/avr/avr.md: Add combine pattern for
8-bit parity detection.

gcc/testsuite/
* gcc.target/avr/pr114975-parity.c: New test.

(cherry picked from commit 41bc359c322d45ec1adfb51f7a45c7ef02ce6ca9)

AVR: target/114975 - Add combine-pattern for __popcountqi2.

PR target/114975
gcc/
* config/avr/avr.md: Add combine pattern for
8-bit popcount detection.

gcc/testsuite/
* gcc.target/avr/pr114975-popcount.c: New test.

(cherry picked from commit c8f4bbb824fafecf021a802324cd79e64b03b947)

AVR: target/114981 - Support __builtin_powi[l] / __powidf2.

This supports __powidf2 by means of a double wrapper for already
existing f7_powi (renamed to __f7_powi by f7-renames.h).
It tweaks the implementation so that it does not perform trivial
multiplications with 1.0 any more, but instead uses a move.
It also fixes the last statement of f7_powi, which was wrong.
Notice that f7_powi was unused until now.

PR target/114981
libgcc/config/avr/libf7/
* libf7-common.mk (F7_ASM_PARTS): Add D_powi
* libf7-asm.sx (F7MOD_D_powi_, __powidf2): New module and function.
* libf7.c (f7_powi): Fix last (wrong) statement.
Tweak trivial multiplications with 1.0.

gcc/testsuite/
* gcc.target/avr/pr114981-powil.c: New test.

(cherry picked from commit de4eea7d7ea86e54843507c68d6672eca9d8c7bb)

Objective-C, NeXT, v2: Correct a regression in code-gen.

There have been several changes in the ABI of Objective-C which
depend on the OS version targetted. In this case Protocols and
LabelProtocols should be made weak/hidden/extern from macOS 10.7
however there was a mistake in the code causing this to occur
from macOS 10.6. Fixed thus.

gcc/objc/ChangeLog:

* objc-next-runtime-abi-02.cc (WEAK_PROTOCOLS_AFTER): New.
(next_runtime_abi_02_protocol_decl): Use WEAK_PROTOCOLS_AFTER
to determine this ABI change.
(build_v2_protocol_list_address_table): Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit 9b5c0be59d0f94df0517820f00b4520b5abddd8c)

reassoc: Fix up optimize_range_tests_to_bit_test [PR114965]

The optimize_range_tests_to_bit_test optimization normally emits a range
test first:
          if (entry_test_needed)
            {
              tem = build_range_check (loc, optype, unshare_expr (exp),
                                       false, lowi, high);
              if (tem == NULL_TREE || is_gimple_val (tem))
                continue;
            }
so during the bit test we already know that exp is in the [lowi, high]
range, but skips it if we have range info which tells us this isn't
necessary.
Also, normally it emits shifts by exp - lowi counter, but has an
optimization to use just exp counter if the mask isn't a more expensive
constant in that case and lowi is > 0 and high is smaller than prec.

The following testcase is miscompiled because the two abnormal cases
are triggered.  The range of exp is [43, 43][48, 48][95, 95], so we on
64-bit arch decide we don't need the entry test, because 95 - 43 < 64.
And we also decide to use just exp as counter, because the range test
tests just for exp == 43 || exp == 48, so high is smaller than 64 too.
Because 95 is in the exp range, we can't do that, we'd either need to
do a range test first, i.e.
if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1))
or need to subtract lowi from the shift counter, i.e.
if ((1UL << (exp - 43)) & mask2)
but can't do both unless r.upper_bound () is < prec.

The following patch ensures that.

2024-05-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/114965
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to
optimize away exp - lowi subtraction from shift count unless entry
test is emitted or unless r.upper_bound () is smaller than prec.

* gcc.c-torture/execute/pr114965.c: New test.

(cherry picked from commit 9adec2d91e62a479474ae79df5b455fd4b8463ba)

c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]

After r7-987-gf17a223de829cb, the access for the elements of a vector type would lose the qualifiers.
So if we had `constvector[0]`, the type of the element of the array would not have const on it.
This was due to a missing build_qualified_type for the inner type of the vector when building the array type.
We need to add back the call to build_qualified_type and now the access has the correct qualifiers. So the
overloads and even if it is a lvalue or rvalue is correctly done.

Note we correctly now reject the testcase gcc.dg/pr83415.c which was incorrectly accepted after r7-987-gf17a223de829cb.

Built and tested for aarch64-linux-gnu.

PR c++/89224

gcc/c-family/ChangeLog:

* c-common.cc (convert_vector_to_array_for_subscript): Call build_qualified_type
for the inner type.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Compare main variants
for the vector/array types instead of the types directly.

gcc/testsuite/ChangeLog:

* g++.dg/torture/vector-subaccess-1.C: New test.
* gcc.dg/pr83415.c: Change warning to error.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 4421d35167b3083e0f2e4c84c91fded09a30cf22)

c++/modules: Stream unmergeable temporaries by value again [PR114856]

In r14-9266-g2823b4d96d9ec4 I gave all temporary vars a DECL_CONTEXT,
including those at namespace or global scope, so that they could be
properly merged across importers. However, not all of these temporary
vars are actually supposed to be mergeable.

For instance, in the attached testcase we have an unnamed temporary var
used in the NSDMI of a class member, which cannot properly merged -- but
it also doesn't need to be, as it'll be thrown away when the class type
itself is merged anyway.

This patch reverts the change made above and instead makes a weaker
adjustment that only causes temporary vars with linkage have a
DECL_CONTEXT to merge from. This way these unnamed, "unmergeable"
temporaries are properly streamed by value again.

PR c++/114856

gcc/cp/ChangeLog:

* call.cc (make_temporary_var_for_ref_to_temp): Set context for
temporaries with linkage.
* init.cc (create_temporary_var): Revert to only set context
when in a function decl.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr114856.h: New test.
* g++.dg/modules/pr114856_a.H: New test.
* g++.dg/modules/pr114856_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
(cherry picked from commit e60032b382364897a58e67994baac896bcd03327)

expansion: Use __trunchfbf2 calls rather than __extendhfbf2 [PR114907]

The HF and BF modes have the same size/precision and neither is
a subset nor superset of the other.
So, using either __extendhfbf2 or __trunchfbf2 is weird.
The expansion apparently emits __extendhfbf2, but on the libgcc side
we apparently have __trunchfbf2 implemented.

I think it is easier to switch to using what is available rather than
adding new entrypoints to libgcc, even alias, because this is backportable.

2024-05-07 Jakub Jelinek <jakub@redhat.com>

PR middle-end/114907
* expr.cc (convert_mode_scalar): Use trunc_optab rather than
sext_optab for HF->BF conversions.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise.

* gcc.dg/pr114907.c: New test.

(cherry picked from commit 28ee13db2e9d995bd3728c4ff3a3545e24b39cd2)

tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]

In r9-5742 we've started allowing to inline always_inline functions into
functions which have disabled e.g. address sanitization even when the
always_inline function is implicitly from command line options sanitized.

This mostly works fine because most of the asan instrumentation is done only
late after ipa, but as the following testcase the .ASAN_MARK ifn calls
gimplifier adds can result in ICEs.

Fixed by dropping those during inlining, similarly to how we drop
.TSAN_FUNC_EXIT calls.

2024-05-07 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/114956
* tree-inline.cc: Include asan.h.
(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
sanitization disabled.

* gcc.dg/asan/pr114956.c: New test.

(cherry picked from commit d4e25cf4f7c1f51a8824cc62bbb85a81a41b829a)

[PR modula2/113768][PR modula2/114133] bugfix constants must be cast prior to vararg call

This bug fix corrects the test codes below by converting the constant
literals to the type required by C. In the testcases below the values, 1
etc were converted into the INTEGER type before being passed to a C
vararg function. By default in modula2 constant literal ordinals are
represented as the ZTYPE (the largest GCC integer type node).

gcc/testsuite/ChangeLog:

PR modula2/113768
PR modula2/114133
* gm2/extensions/run/pass/callingc10.mod: Convert constant literal
numbers into INTEGER.
* gm2/extensions/run/pass/callingc11.mod: Ditto.
* gm2/extensions/run/pass/vararg2.mod: Ditto.
* gm2/iso/run/pass/packed.mod: Emit a printf as a runtime
diagnostic.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

libgomp: Add gfx90c, 1036 and 1103 declare variant tests

Recently -march=gfx{90c,1036,1103} support has been added, but corresponding
changes weren't done in the testsuite.

The following patch adds that.

Tested on x86_64-linux (with fiji and gfx1103 devices; had to use
OMP_DEFAULT_DEVICE=1 there, fiji doesn't really work due to LLVM dropping
support, but we still list those as offloading devices).

2024-05-02 Jakub Jelinek <jakub@redhat.com>

* testsuite/libgomp.c/declare-variant-4.h (gfx90c, gfx1036, gfx1103):
New functions.
(f): Add #pragma omp declare variant directives for those.
* testsuite/libgomp.c/declare-variant-4-gfx90c.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx1036.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx1103.c: New test.

(cherry picked from commit 5eb25d1561dd22316331feee92164f97ca79d1c3)

gimple-ssa-sprintf: Use [0, 1] range for %lc with (wint_t) 0 argument [PR114876]

Seems when Martin S. implemented this, he coded there strict reading
of the standard, which said that %lc with (wint_t) 0 argument is handled
as wchar_t[2] temp = { arg, 0 }; %ls with temp arg and so shouldn't print
any values.  But, most of the libc implementations actually handled that
case like %c with '\0' argument, adding a single NUL character, the only
known exception is musl.
Recently, C23 changed this in response to GB-141 and POSIX in
https://austingroupbugs.net/view.php?id=1647
so that it should have the same behavior as %c with '\0'.

Because there is implementation divergence, the following patch uses
a range rather than hardcoding it to all 1s (i.e. the %c behavior),
though the likely case is still 1 (forward looking plus most of
implementations).
The res.knownrange = true; assignment removed is redundant due to
the same assignment done unconditionally before the if statement,
rest is formatting fixes.

I don't think the min >= 0 && min < 128 case is right either, I'd think
it should be min >= 0 && max < 128, otherwise it is just some possible
inputs are (maybe) ASCII and there can be others, but this code is a total
mess anyway, with the min, max, likely (somewhere in [min, max]?) and then
unlikely possibly larger than max, dunno, perhaps for at least some chars
in the ASCII range the likely case could be for the ascii case; so perhaps
just the one_2_one_ascii shouldn't set max to 1 and mayfail should be true
for max >= 128.  Anyway, didn't feel I should touch that right now.

2024-04-30  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/114876
* gimple-ssa-sprintf.cc (format_character): For min == 0 && max == 0,
set max, likely and unlikely members to 1 rather than 0.  Remove
useless res.knownrange = true;.  Formatting fixes.

* gcc.dg/pr114876.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Adjust expected
diagnostics.

(cherry picked from commit 6c6b70f07208ca14ba783933988c04c6fc2fff42)

c++/modules: imported spec befriending class tmpl [PR114889]

When adding to CLASSTYPE_BEFRIENDING_CLASSES as part of installing an
imported class definition, we need to look through TEMPLATE_DECL like
make_friend_class does.

Otherwise in the below testcase we won't add _Hashtable<int, int> to
CLASSTYPE_BEFRIENDING_CLASSES of _Map_base, which leads to a bogus
access check failure for _M_hash_code.

PR c++/114889

gcc/cp/ChangeLog:

* module.cc (trees_in::read_class_def): Look through
TEMPLATE_DECL when adding to CLASSTYPE_BEFRIENDING_CLASSES.

gcc/testsuite/ChangeLog:

* g++.dg/modules/friend-8_a.H: New test.
* g++.dg/modules/friend-8_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 22b20ac6c6aead2d3f36c413a77dd0b80adfec39)

AVR: ipa/92606 - Don't optimize PROGMEM data against non-PROGMEM.

ipa/92606: Inter-procedural analysis optimizes data across
address-spaces and PROGMEM. As of v14, the PROGMEM part is
still not fixed (and there is still no target hook as proposed
in PR92932). Just disable respective bogus optimization.

PR ipa/92606
gcc/
* config/avr/avr.cc (avr_option_override): Set
flag_ipa_icf_variables = 0.
gcc/testsuite/
* gcc.target/avr/torture/pr92606.c: New test.

(cherry picked from commit 08e752e72363ae7fd5a5fcb70913a0f7b240387b)

Bump BASE-VER

* BASE-VER: Set to 14.1.1.

Update ChangeLog and version files for release

Update gennews for GCC 14.

2024-05-07 Jakub Jelinek <jakub@redhat.com>

* gennews (files): Add files for GCC 14.

(cherry picked from commit 7ee3f769529f8d418bf361eb821aab17a33567e3)