gcc.gnu.org Git - gcc.git/log

tree-optimization/114297 - SLP reduction with early break fix

The following makes sure to pass in the SLP node for the live stmts
we are generating the reduction epilogue for to
vect_create_epilog_for_reduction. This follows the previous fix for
the non-SLP path.

PR tree-optimization/114297
* tree-vect-loop.cc (vectorizable_live_operation): Pass in the
live stmts SLP node to vect_create_epilog_for_reduction.

* gcc.dg/vect/vect-early-break_123-pr114297.c: New testcase.

Reject -fno-multiflags [PR114314]

When -fmultiflags option support was added in r13-3693-g6b1a2474f9e422,
it accidently allowed -fno-multiflags which then would pass on to cc1.
This fixes that oversight.

Committed as obvious after bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

PR driver/114314
* common.opt (fmultiflags): Add RejectNegative.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Daily bump.

libgfortran: [PR114304] Revert portion of PR105347 change.

PR libfortran/105437
PR libfortran/114304

libgfortran/ChangeLog:

* io/list_read.c (eat_separator): Remove check for decimal
point mode and semicolon used as a seprator. Removes
the regression.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105473.f90: Add additional checks to address
the case of semicolon at the end of a line.

Update gcc sv.po

* sv.po: Update.

gomp: testsuite: improve compatibility of bad-array-section-3.c [PR113428]

This test generates different warnings on ilp32 targets because the size
of an integer matches the size of a pointer. Avoid this by using
signed char.

gcc/testsuite:

PR testsuite/113428
* gcc.dg/gomp/bad-array-section-c-3.c: Use signed char instead
of int.

PR modula2/114295 Incorrect location if compiling implementation without definition

This patch fixes a bug which occurred if gm2 was asked to compile an
implementation module and could not find the definition module.  The error
location would be set to the SYSTEM module.  The bug occurred as the
module sym was created during the peep phase after which the few tokens are
destroyed and recreated during parsing.  The bug fix is to call
PutDeclared when the module is encountered during parsing which updates
the tokenno associated with the module.

gcc/m2/ChangeLog:

PR modula2/114295
* gm2-compiler/M2Batch.mod (MakeProgramSource): Call PutDeclared
if the module is known.
(MakeDefinitionSource): Ditto.
(MakeImplementationSource): Ditto.
* gm2-compiler/M2Comp.mod (ExamineHeader): New procedure.
(ExamineCompilationUnit): Rewrite.
(PeepInto): Rewrite.
* gm2-compiler/M2Error.mod (NewError): Remove default call to
GetTokenNo.
* gm2-compiler/M2Quads.mod (callRequestDependant): Push tokno with
Adr.
(BuildStringAdrParam): Ditto.
(doBuildBinaryOp): Push OperatorPos on the bool stack.
(BuildRelOp): Ditto.
* gm2-compiler/P2Build.bnf (SetType): Pass set token pos to
BuildSetType.
(PointerType): Pass pointer token pos to BuildPointerType.
* gm2-compiler/P2SymBuild.def (BuildPointerType): Add parameter
pointerpos.
(BuildSetType): Add parameter setpos.
* gm2-compiler/P2SymBuild.mod (BuildPointerType): Add parameter
pointerpos.  Build combined token and use it when creating a
pointer type.
(BuildSetType): Add parameter setpos.  Build combined token and
use it when creating a set type.
* gm2-compiler/SymbolTable.mod (DebugUnknownToken): New constant.
(CheckTok): New procedure function.
(MakeProcedure): Call CheckTok.
(MakeRecord): Ditto.
(MakeVarient): Ditto.
(MakeEnumeration): Ditto.
(MakeHiddenType): Ditto.
(MakeConstant): Ditto.
(MakeConstStringCnul): Ditto.
(MakeSubrange): Ditto.
(MakeTemporary): Ditto.
(MakeVariableForParam): Ditto.
(MakeParameterHeapVar): Ditto.
(MakePointer): Ditto.
(MakeSet): Ditto.
(MakeUnbounded): Ditto.
(MakeProcType): Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

testsuite: vect: Require vect_hw_misalign in gcc.dg/vect/vect-cost-model-1.c etc. [PR98238]

Several gcc.dg/vect/vect-cost-model-?.c tests FAIL on 32 and 64-bit
Solaris/SPARC:

FAIL: gcc.dg/vect/vect-cost-model-1.c -flto -ffat-lto-objects
scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-cost-model-1.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-cost-model-3.c -flto -ffat-lto-objects
scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-cost-model-3.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-cost-model-5.c -flto -ffat-lto-objects
scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-cost-model-5.c scan-tree-dump vect "LOOP VECTORIZED"

The dumps show

/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c:7:30:
note: ==> examining statement: _3 = *_2;
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c:7:30:
missed: unsupported unaligned access
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c:8:6:
missed: not vectorized: relevant stmt not supported: _3 = *_2;

so I think the tests need to require vect_hw_misalign. This is what
this patch does.

Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11.

2024-02-22 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
PR tree-optimization/98238
* gcc.dg/vect/vect-cost-model-1.c (scan-tree-dump): Also require
vect_hw_misalign.
* gcc.dg/vect/vect-cost-model-3.c: Likewise.
* gcc.dg/vect/vect-cost-model-5.c: Likewise.

testsuite: vect: Require vect_perm in several tests [PR114071, PR113557, PR96109]

Several vectorization tests FAIL on 32 and 64-bit Solaris/SPARC:

FAIL: gcc.dg/vect/pr37027.c -flto -ffat-lto-objects scan-tree-dump-times
vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/pr37027.c -flto -ffat-lto-objects scan-tree-dump-times
vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/pr37027.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/pr37027.c scan-tree-dump-times vect "vectorizing stmts
using SLP" 1
FAIL: gcc.dg/vect/pr67790.c -flto -ffat-lto-objects scan-tree-dump vect
"vectorizing stmts using SLP"
FAIL: gcc.dg/vect/pr67790.c scan-tree-dump vect "vectorizing stmts using SLP"
FAIL: gcc.dg/vect/slp-47.c -flto -ffat-lto-objects scan-tree-dump-times
vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-47.c scan-tree-dump-times vect "vectorizing stmts
using SLP" 2
FAIL: gcc.dg/vect/slp-48.c -flto -ffat-lto-objects scan-tree-dump-times
vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-48.c scan-tree-dump-times vect "vectorizing stmts
using SLP" 2
FAIL: gcc.dg/vect/slp-reduc-1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-reduc-1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-1.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-reduc-1.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-2.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-reduc-2.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-2.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-reduc-2.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-7.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-reduc-7.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1
FAIL: gcc.dg/vect/slp-reduc-8.c -flto -ffat-lto-objects scan-tree-dump vect
"vectorized 1 loops"
FAIL: gcc.dg/vect/slp-reduc-8.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/vect-multi-peel-gaps.c -flto -ffat-lto-objects
scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-multi-peel-gaps.c scan-tree-dump vect "LOOP VECTORIZED"

The dumps show variations of

/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/pr37027.c:24:17:
note: ==> examining statement: _4 = a[i_19].f2;
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/pr37027.c:24:17:
missed: unsupported vect permute { 1 0 3 2 5 4 }
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/pr37027.c:24:17:
missed: unsupported load permutation
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/pr37027.c:27:17:
missed: not vectorized: relevant stmt not supported: _4 = a[i_19].f2;

so I think the tests should require vect_perm. This is what this patch does

Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11.

2024-02-22 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
PR tree-optimization/114071
* gcc.dg/vect/pr37027.c: Require vect_perm.
* gcc.dg/vect/pr67790.c: Likewise.
* gcc.dg/vect/slp-reduc-1.c: Likewise.
* gcc.dg/vect/slp-reduc-2.c: Likewise.
* gcc.dg/vect/slp-reduc-7.c: Likewise.
* gcc.dg/vect/slp-reduc-8.c: Likewise.

PR tree-optimization/113557
* gcc.dg/vect/vect-multi-peel-gaps.c (scan-tree-dump): Also
require vect_perm.

PR testsuite/96109
* gcc.dg/vect/slp-47.c: Require vect_perm.
* gcc.dg/vect/slp-48.c: Likewise.

aarch64,arm: Move branch-protection data to targets

The branch-protection types are target specific, not the same on arm
and aarch64. This currently affects pac-ret+b-key, but there will be
a new type on aarch64 that is not relevant for arm.

After the move, change aarch_ identifiers to aarch64_ or arm_ as
appropriate.

Refactor aarch_validate_mbranch_protection to take the target specific
branch-protection types as an argument.

In case of invalid input currently no hints are provided: the way
branch-protection types and subtypes can be mixed makes it difficult
without causing confusion.

gcc/ChangeLog:

* config/aarch64/aarch64.md: Rename aarch_ to aarch64_.
* config/aarch64/aarch64.opt: Likewise.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Likewise.
* config/aarch64/aarch64.cc (aarch64_expand_prologue): Likewise.
(aarch64_expand_epilogue): Likewise.
(aarch64_post_cfi_startproc): Likewise.
(aarch64_handle_no_branch_protection): Copy and rename.
(aarch64_handle_standard_branch_protection): Likewise.
(aarch64_handle_pac_ret_protection): Likewise.
(aarch64_handle_pac_ret_leaf): Likewise.
(aarch64_handle_pac_ret_b_key): Likewise.
(aarch64_handle_bti_protection): Likewise.
(aarch64_override_options): Update branch protection validation.
(aarch64_handle_attr_branch_protection): Likewise.
* config/arm/aarch-common-protos.h (aarch_validate_mbranch_protection):
Pass branch protection type description as argument.
(struct aarch_branch_protect_type): Move from aarch-common.h.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Remove.
(aarch_handle_standard_branch_protection): Remove.
(aarch_handle_pac_ret_protection): Remove.
(aarch_handle_pac_ret_leaf): Remove.
(aarch_handle_pac_ret_b_key): Remove.
(aarch_handle_bti_protection): Remove.
(aarch_validate_mbranch_protection): Pass branch protection type
description as argument.
* config/arm/aarch-common.h (enum aarch_key_type): Remove.
(struct aarch_branch_protect_type): Remove.
* config/arm/arm-c.cc (arm_cpu_builtins): Remove aarch_ra_sign_key.
* config/arm/arm.cc (arm_handle_no_branch_protection): Copy and rename.
(arm_handle_standard_branch_protection): Likewise.
(arm_handle_pac_ret_protection): Likewise.
(arm_handle_pac_ret_leaf): Likewise.
(arm_handle_bti_protection): Likewise.
(arm_configure_build_target): Update branch protection validation.
* config/arm/arm.opt: Remove aarch_ra_sign_key.

middle-end/114299 - missing error recovery from gimplify failure

When internal_get_tmp_var fails to gimplify the value the temporary
SSA name is supposed to be initialized with we can leak SSA names
with a NULL SSA_NAME_DEF_STMT into the IL. That's bad, so recover
from this by instead returning a decl in that case.

PR middle-end/114299
* gimplify.cc (internal_get_tmp_var): When gimplification
of VAL failed, return a decl.

* gcc.target/i386/pr114299.c: New testcase.

bitint: Avoid rewriting large/huge _BitInt vars into SSA after bitint lowering [PR114278]

The following testcase ICEs, because update-address-taken subpass of
fre5 rewrites
  _BitInt(128) b;
  vector(16) unsigned char _3;

  <bb 2> [local count: 1073741824]:
  _3 = MEM <vector(16) unsigned char> [(char * {ref-all})p_2(D)];
  MEM <vector(16) unsigned char> [(char * {ref-all})&b] = _3;
  b ={v} {CLOBBER(eos)};
to
  _BitInt(128) b;
  vector(16) unsigned char _3;

  <bb 2> [local count: 1073741824]:
  _3 = MEM <vector(16) unsigned char> [(char * {ref-all})p_2(D)];
  b_5 = VIEW_CONVERT_EXPR<_BitInt(128)>(_3);
but we can't have large/huge _BitInt vars in SSA form after the bitint
lowering except for function arguments loaded from memory, as expansion
isn't able to deal with those, it relies on bitint lowering to lower
those operations.
The following patch fixes that by setting DECL_NOT_GIMPLE_REG_P for
large/huge _BitInt vars after bitint lowering, such that we don't
rewrite them into SSA form.

2024-03-11  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/114278
* tree-ssa.cc (maybe_optimize_var): If large/huge _BitInt vars are no
longer addressable, set DECL_NOT_GIMPLE_REG_P on them.

* gcc.dg/bitint-99.c: New test.

Fix placement of recently implemented DIE

It's the DIE added for enumeration types with reverse scalar storage order.

gcc/
PR debug/113519
PR debug/113777
* dwarf2out.cc (gen_enumeration_type_die): In the reverse case,
generate the DIE with the same parent as in the regular case.

gcc/testsuite/
* gcc.dg/sso-20.c: New test.
* gcc.dg/sso-21.c: Likewise.

Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

The problem here is that merge_truthop_with_opposite_arm would
use the type of the result of the comparison rather than the operands
of the comparison to figure out if we are honoring NaNs.
This fixes that oversight and now we get the correct results in this
case.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/95351

gcc/ChangeLog:

* fold-const.cc (merge_truthop_with_opposite_arm): Use
the type of the operands of the comparison and not the type
of the comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/float_opposite_arm-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Daily bump.

d: Fix -fpreview=in ICEs with forward referenced parameter [PR112285]

The way that the target hook preferPassByRef is implemented, it relied
on the GCC "back-end" tree type to determine whether or not to use `ref'
ABI for D `in' parameters; e.g: prefer by value if it is expected that
the target will pass the type around in registers.

Building the GCC tree type depends on the AST type being complete - all
semantic processing is finished - but as this hook is called from the
front-end, this will not be the case for forward referenced or
self-referencing types.

The consensus in upstream is that `in' parameters should always be
implicitly `ref', but as the front-end does not yet support all types
being rvalue references, limit this just static arrays and structs.

PR d/112285
PR d/112290

gcc/d/ChangeLog:

* d-target.cc (Target::preferPassByRef): Return true for all static
array and struct types.

gcc/testsuite/ChangeLog:

* gdc.dg/pr112285.d: New test.
* gdc.dg/pr112290.d: New test.

[committed] [PR tree-optimization/110199] Simplify MIN/MAX more often

So as I mentioned in the BZ, the case of

t = MIN_EXPR (A, B)

where we know something about the relationship between A and B can be trivially
handled by some existing code in DOM.  That existing code would simplify when A
== B.  But by testing GE and LE instead of EQ we can cover more cases with
minimal effort.  When applicable the MIN/MAX turns into a simple copy.

I made one other change.  We have other binary operations that we simplify when
we know something about the relationship between the operands.  That code was
not canonicalizing the order of operands when building the expression to lookup
in the hash tables to discover that relationship.  Since those paths are only
testing for equality, we can trivially reverse them and not have to worry about
changing codes or anything like that.  So extremely safe and avoids having to
come back and fix that code to match the MIN_EXPR/MAX_EXPR case later.

Bootstrapped on x86 and also tested on the crosses.  I briefly thought there
was an sh regression, but that was actually the recent fwprop changes twiddling
code generation for one test.

PR tree-optimization/110199
gcc/
* tree-ssa-scopedtables.cc
(avail_exprs_stack::simplify_binary_operation): Generalize handling
of MIN_EXPR/MAX_EXPR to allow additional simplifications.  Canonicalize
comparison operands for other cases.

gcc/testsuite

* gcc.dg/tree-ssa/minmax-27.c: New test.
* gcc.dg/tree-ssa/minmax-28.c: New test.

VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled

This patch would like to fix one ICE in vectorizable_store when both the
loop_masks and loop_lens are enabled.  The ICE looks like below when build
with "-march=rv64gcv -O3".

during GIMPLE pass: vect
test.c: In function ‘d’:
test.c:6:6: internal compiler error: in vectorizable_store, at
tree-vect-stmts.cc:8691
    6 | void d() {
      |      ^
0x37a6f2f vectorizable_store
        .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691
0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*,
_slp_tree*, _slp_instance*, vec<stmt_info_for_cost, va_heap, vl_ptr>*)
        .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242
0x1db5dca vect_analyze_loop_operations
        .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208
0x1db885b vect_analyze_loop_2
        .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041
0x1dba029 vect_analyze_loop_1
        .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481
0x1dbabad vect_analyze_loop(loop*, vec_info_shared*)
        .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639
0x1e389d1 try_vectorize_loop_1
        .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066
0x1e38f3d try_vectorize_loop
        .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182
0x1e39230 execute
        .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298

There are two ways to reach vectorizer LD/ST, one is the analysis and
the other is transform.  We cannot have both the lens and the masks
enabled during transform but it is valid during analysis.  Given the
transform doesn't required cost_vec,  we can only enable the assert
based on cost_vec is NULL or not.

Below testsuites are passed for this patch:
* The x86 bootstrap tests.
* The x86 fully regression tests.
* The aarch64 fully regression tests.
* The riscv fully regressison tests.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Enable the assert
during transform process.
(vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr114195-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Revert "[committed] Adjust expectations for pr59533-1.c"

This reverts commit 7e16f819ff413c48702f9087b62eaac39a060a14.

[committed] [target/102250] Document python requirement for risc-v

PR target/102250
gcc/

* doc/install.texi: Document need for python when building
RISC-V compilers.

[committed] [PR target/111362] Fix compare-debug issue with mode switching

The issue here is the code we emit for mode-switching can change when -g is
added to the command line.  This is caused by processing debug notes occurring
after a call which is the last real statement in a basic block.

Without -g the CALL_INSN is literally the last insn in the block and the loop
exits.  If mode switching after the call is needed, it'll be handled as we
process outgoing edges.

With -g the loop iterates again and in the processing of the node the backend
signals that a mode switch is necessary.

I pondered fixing this in the target, but the better fix is to ignore the debug
notes in the insn stream.

I did a cursory review of some of the other compare-debug failures, but did not
immediately see others which would likely be fixed by this change.  Sigh.

Anyway, bootstrapped and regression tested on x86.  Regression tested on rv64
as well.

PR target/111362
gcc/
* mode-switching.cc (optimize_mode_switching): Only process
NONDEBUG insns.

gcc/testsuite

* gcc.target/riscv/compare-debug-1.c: New test.
* gcc.target/riscv/compare-debug-2.c: New test.

Daily bump.

AVR: Fix typos in comment, indentation glitches in avr.md.

gcc/
* config/avr/avr.md: Fix typos in comment, indentation glitches
and some other nits.

fwprop: Restore previous behavior for forward propagation of RTL with MEMs [PR114284]

Before the recent PR111267 r14-8319 fwprop changes, fwprop would never try
to propagate what was not considered PROFITABLE, where the profitable part
actually was partly about profitability, partly about very good reasons
not to actually propagate and partly for cases where propagation is
completely incorrect.
In particular, classify_result has:
  /* Allow (subreg (mem)) -> (mem) simplifications with the following
     exceptions:
     1) Propagating (mem)s into multiple uses is not profitable.
     2) Propagating (mem)s across EBBs may not be profitable if the source EBB
        runs less frequently.
     3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
     4) Creating new (mem/v)s is not correct, since DCE will not remove the old
        ones.  */
  if (single_use_p
      && single_ebb_p
      && SUBREG_P (old_rtx)
      && !paradoxical_subreg_p (old_rtx)
      && MEM_P (new_rtx)
      && !MEM_VOLATILE_P (new_rtx))
    return PROFITABLE;
and didn't mark any other MEM_P (new_rtx) or rtxes which contain
a MEM in its subrtxes as PROFITABLE.  Now, since r14-8319 profitable_p
method has been renamed to likely_profitable_p and has just a minor role.
Now, rule 4) above is something that isn't about profitability, but about
correct behavior, if you propagate mem/v, the code is miscompiled.
This particular case has been fixed elsewhere by Haochen in r14-9379.
But I think even the 1) and 2) and maybe 3) are a strong don't do it,
don't rely solely on rtx costs, increasing the number of loads of the same
memory, even when cached, is undesirable, canceling load hoisting can
be undesirable as well.

So, the following patch restores previous behavior of src contains any MEMs,
in that case likely_profitable_p () is taken as the old profitable_p ()
as a requirement rather than just a hint.  For propagation of something
which doesn't load from memory this keeps the r14-8319 behavior.

2024-03-09  Jakub Jelinek  <jakub@redhat.com>

PR target/114284
* fwprop.cc (try_fwprop_subst_pattern): Don't propagate
src containing MEMs unless prop.likely_profitable_p ().

LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

In Binutils we need to make IE to LE relaxation only allowed when there
is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid
"partial" relaxation won't happen with the extreme code model. So if we
are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an
R_LARCH_RELAX to allow the relaxation. The IE to LE relaxation does not
require the pcalau12i and the ld instruction to be adjacent, so we don't
need to limit ourselves to use the macro.

For the distro maintainers backporting changes: this change depends on
r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in
the extreme code model.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Support 'Q' for R_LARCH_RELAX for TLS IE.
(loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS
IE.
* config/loongarch/loongarch.md (ld_from_got<mode>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/tls-ie-relax.c: New test.
* gcc.target/loongarch/tls-ie-norelax.c: New test.
* gcc.target/loongarch/tls-ie-extreme.c: New test.

AVR: Add cost computation for some insn combine patterns.

gcc/
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS]: Determine cost for
usum_widenqihi and add_zero_extend1.
[MINUS]: Determine costs for udiff_widenqihi, sub+zero_extend,
sub+sign_extend.
* config/avr/avr.md (*addhi3.sign_extend1, *subhi3.sign_extend2):
Compute exact insn lengths.
(*usum_widenqihi3): Allow input operands to commute.

i386: Regenerate i386.opt.urls

When I've added the -mnoreturn-no-callee-saved-registers option
to i386.opt, I forgot to regenerate i386.opt.urls and Mark's
CI kindly reminded me of that.

Fixed thusly.

2024-03-09 Jakub Jelinek <jakub@redhat.com>

* config/i386/i386.opt.urls: Regenerate.

LoongArch: testsuite: Add compilation options to the regname-fp-s9.c.

When the value of the macro DEFAULT_CFLAGS is set to '-ansi -pedantic-errors',
regname-s9-fp.c will test to fail. To solve this problem, add the compilation
option '-Wno-pedantic -std=gnu90' to this test case.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/regname-fp-s9.c: Add compilation option
'-Wno-pedantic -std=gnu90'.

LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
implemented through "ll.w+sc.w". In the implementation of the instruction sequence,
it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 32-bit
and 64-bit, the two operand registers that need to be compared are symbolically
extended, and one of the operand registers is obtained from memory through the
"ll.w" instruction, which can ensure that the symbolic expansion is carried out.
However, the value of the other operand register is not guaranteed to be the
value of the sign extension.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
In loongarch64, a sign extension operation is added when
operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/atomic-cas-int.C: New test.

libstdc++: Do not require a time-of-day when parsing sys_days [PR114240]

When parsing a std::chrono::sys_days (or a sys_time with an even longer
period) we should not require a time-of-day to be present in the input,
because we can't represent that in the result type anyway.

Rather than trying to decide which specializations should require a
time-of-date and which should not, follow the direction of Howard
Hinnant's date library, which allows extracting a sys_time of any period
from input that only contains a date, defaulting the time-of-day part to
00:00:00. This seems consistent with the intent of the standard, which
says it's an error "If the parse fails to decode a valid date" (i.e., it
doesn't care about decoding a valid time, only a date).

libstdc++-v3/ChangeLog:

PR libstdc++/114240
* include/bits/chrono_io.h (_Parser::operator()): Assume
hours(0) for a time_point, so that a time is not required
to be present.
* testsuite/std/time/parse/114240.cc: New test.

libstdc++: Fix parsing of leap seconds as chrono::utc_time [PR114279]

Implementing all chrono::from_stream overloads in terms of
chrono::sys_time meant that a leap second time like 23:59:60.001 cannot
be parsed, because that cannot be represented in a sys_time.

The fix to support parsing leap seconds as utc_time is to convert the
parsed date to utc_time<days> and then add the parsed time to that,
which allows the result to land in a leap second, rather than doing all
the arithmetic with sys_time which doesn't have leap seconds.

For local_time we also allow %S to parse a 60s value, because doing
otherwise might disallow some valid uses. We can't know all use cases
users have for treating times as local_time.

For all other clocks, we can reject times that have 60 or 60.nnn as the
seconds part, because that cannot occur in a valid UNIX, GPS, or TAI
time. Since our chrono::file_clock uses sys_time, it can't occur for
that clock either.

In order to support this a new _M_is_leap_second member is needed in the
_Parser type. This can be added at the end, where most targets currently
have padding bytes. Similar to what I did recently for formatter _Spec
structs, we can also reserve additional padding bits for future
expansion.

This also fixes bugs in the from_stream overloads for utc_time,
tai_time, gps_time, and file_time, which were not using time_point_cast
to explicitly convert to the result type. That's needed because the
result type might have lower precision than the value returned from
from_sys or from_utc, which has a precision no lower than seconds.

libstdc++-v3/ChangeLog:

PR libstdc++/114279
* include/bits/chrono_io.h (_Parser::_M_is_leap_second): New
data member.
(_Parser::_M_reserved): Reserve padding bits for future use.
(_Parser::operator()): Set _M_is_leap_second if %S reads 60s.
(from_stream): Only allow _M_is_leap_second for utc_time and
local_time. Adjust arithmetic for utc_time so that leap seconds
are preserved. Use time_point_cast to convert to a possibly
lower-precision result type.
* testsuite/std/time/parse.cc: Move to ...
* testsuite/std/time/parse/parse.cc: ... here.
* testsuite/std/time/parse/114279.cc: New test.

Daily bump.

ipa: Avoid excessive removing of SSAs (PR 113757)

PR 113757 shows that the code which was meant to debug-reset and
remove SSAs defined by LHSs of calls redirected to
__builtin_unreachable can trigger also when speculative
devirtualization creates a call to a noreturn function (and since it
is noreturn, it does not bother dealing with its return value).

What is more, it seems that the code handling this case is not really
necessary.  I feel slightly idiotic about this because I have a
feeling that I added it because of a failing test-case but I can
neither find the testcase nor a reason why the code in
cgraph_edge::redirect_call_stmt_to_callee would not be sufficient (it
turns the SSA name into a default-def, a bit like IPA-SRA, but any
code dominated by a call to a noreturn is not dangerous when it comes
to its side-effects).  So this patch just removes the handling.

gcc/ChangeLog:

2024-02-07  Martin Jambor  <mjambor@suse.cz>

PR ipa/113757
* tree-inline.cc (redirect_all_calls): Remove code adding SSAs to
id->killed_new_ssa_names.

gcc/testsuite/ChangeLog:

2024-02-07  Martin Jambor  <mjambor@suse.cz>

PR ipa/113757
* g++.dg/ipa/pr113757.C: New test.

libbacktrace: don't assume compressed section is aligned

Patch originally by GitHub user ubyte at
https://github.com/ianlancetaylor/libbacktrace/pull/120.

* elf.c (elf_uncompress_chdr): Don't assume compressed section is
aligned.

[PR113790][LRA]: Fixing LRA ICE on riscv64

  LRA failed to consider all insn alternatives when non-reload pseudo
did not get a hard register.  This resulted in failure to generate
code by LRA.  The patch fixes this problem.

gcc/ChangeLog:

PR target/113790
* lra-assigns.cc (assign_by_spills): Set up all_spilled_pseudos
for non-reload pseudo too.

bpf: add size threshold for inlining mem builtins

BPF cannot fall back on library calls to implement memmove, memcpy and
memset, so we attempt to expand these inline always if possible.
However, this inline expansion was being attempted even for excessively
large operations, which could result in gcc consuming huge amounts of
memory and hanging.

Add a size threshold in the BPF backend below which to always expand
these operations inline, and introduce an option
-minline-memops-threshold= to control the threshold. Defaults to
1024 bytes.

gcc/

* config/bpf/bpf.cc (bpf_expand_cpymem, bpf_expand_setmem): Do
not attempt inline expansion if size is above threshold.
* config/bpf/bpf.opt (-minline-memops-threshold): New option.
* doc/invoke.texi (eBPF Options) <-minline-memops-threshold>:
Document.

gcc/testsuite/

* gcc.target/bpf/inline-memops-threshold-1.c: New test.
* gcc.target/bpf/inline-memops-threshold-2.c: New test.

arm: testsuite: tweak bics_3.c [PR113542]

This test was too simple, which meant that the compiler was sometimes
able to find a better optimization of the code than using a BICS
instruction. Fix this by changing the test slightly to produce a
sequence where BICS should always be the preferred solution.

gcc/testsuite:
PR target/113542
* gcc.target/arm/bics_3.c: Adjust code to something which should
always result in BICS.

bpf: testsuite: fix unresolved test in memset-1.c

The test was trying to do too much by both checking for an error, and
checking the resulting assembly. Of course, due to the error no asm was
produced, so the scan-asm went unresolved. Split it into two separate
tests to fix the issue.

gcc/testsuite/

* gcc.target/bpf/memset-1.c: Move error test case to...
* gcc.target/bpf/memset-2.c: ... here. New test.

GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)

'GCN_SUPPRESS_HOST_FALLBACK' originated as 'HSA_SUPPRESS_HOST_FALLBACK' in the
libgomp HSA plugin, where the idea was -- in my understanding -- that you
wouldn't have device code available for all functions that may be called, and
in that case transparently (shared memory system!) do host-fallback execution.
Or, with 'HSA_SUPPRESS_HOST_FALLBACK' set, you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin as
'GCN_SUPPRESS_HOST_FALLBACK'. However, the original meaning isn't applicable
for the libgomp GCN plugin anymore: we assume that we're generating device code
for all relevant functions, and we're implementing a non-shared memory system,
where we cannot transparently do host-fallback execution for individual
functions.

However, 'GCN_SUPPRESS_HOST_FALLBACK' has gained an additional meaning, to
enforce a fatal error in case that 'libhsa-runtime64.so.1' can't be dynamically
loaded; keep that meaning.

libgomp/
* plugin/plugin-gcn.c (GOMP_OFFLOAD_can_run): Don't consider
'GCN_SUPPRESS_HOST_FALLBACK' anymore (assume always-'true').
(init_hsa_context): Adjust 'GCN_SUPPRESS_HOST_FALLBACK' error
message.

nvptx: 'cuDeviceGetCount' failure is fatal

Per commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp", we're now using 'return -1'
from 'GOMP_OFFLOAD_get_num_devices' for 'omp_requires_mask' purposes. This
missed that via 'nvptx_get_num_devices', we could also 'return -1' for
'cuDeviceGetCount' failure. Before, this meant (in 'gomp_target_init') to
silently ignore the plugin/device -- which also has been doubtful behavior.
Let's instead turn 'cuDeviceGetCount' failure into a fatal error, similar to
other errors during device initialization.

libgomp/
* plugin/plugin-nvptx.c (nvptx_get_num_devices):
'cuDeviceGetCount' failure is fatal.

GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1'

If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the corresponding
libgomp plugin/device gets disabled, as before. But if they are available,
report any inconsistencies such as missing symbols, similar to how we fail in
presence of other issues during device initialization.

libgomp/
* plugin/plugin-gcn.c (init_hsa_runtime_functions): Fatal error
for missing symbols.
* plugin/plugin-nvptx.c (init_cuda_lib): Likewise.

ARM: Fix builtin-bswap-1.c test [PR113915]

On Thumb-2 the use of CBZ blocks conditional execution, so change the
test to compare with a non-zero value.

gcc/testsuite/ChangeLog:
PR target/113915
* gcc.target/arm/builtin-bswap.x: Fix test to avoid emitting CBZ.

contrib: Improve dg-extract-results.sh's Python detection [PR109668]

'python' on some systems (e.g. SLES 15) might be Python 2. Prefer python3,
then python, then python2 (as the script still tries to work there).

PR other/109668
* dg-extract-results.sh: Check for python3 before python. Check for
python2 last.

testsuite: Fix up pr113617 test for darwin [PR113617]

The test attempts to link a shared library, and apparently Darwin doesn't
allow by default for shared libraries to contain undefined symbols.

The following patch just adds dummy definitions for the symbols, so that
the library no longer has any undefined symbols at least in my linux
testing.
Furthermore, for target { !shared } targets (like darwin until the it is
fixed in target-supports.exp), because we then link a program rather than
shared library, the patch also adds a dummy main definition so that it
can link.

2024-03-08 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/113617
PR target/114233
* g++.dg/other/pr113617.C: Define -DSHARED when linking with -shared.
* g++.dg/other/pr113617-aux.cc: Add definitions for used methods and
templates not defined elsewhere.

tree-optimization/114269 - 434.zeusmp regression after SCEV analysis fix

The following addresses a performance regression caused by the recent
SCEV analysis fix with regard to folding multiplications and undefined
behavior on overflow. We do not handle (T) { a, +, b } * c but can
treat sign-conversions from unsigned by performing the multiplication
in the unsigned type. That's what we already do for additions (but
that misses one case that turns out important).

This fixes the 434.zeusmp regression for me.

PR tree-optimization/114269
PR tree-optimization/114074
* tree-chrec.cc (chrec_fold_plus_1): Handle sign-conversions
in the third CASE_CONVERT case as well.
(chrec_fold_multiply): Handle sign-conversions from unsigned
by performing the operation in the unsigned type.

modula2: Rebuild bootstrap tools with faster dynamic arrays

This patch configures the larger dynamic arrays to use a larger
growth factor and larger initial size. It also rebuilds mc and pge
using the improved default array sizes in Indexing.mod.

gcc/m2/ChangeLog:

* gm2-compiler/M2Quads.mod (Init): Use InitIndexTuned with
default size 65K.
* gm2-compiler/SymbolConversion.mod (Init): Ditto.
* gm2-compiler/SymbolTable.mod (BEGIN): Ditto.
* mc-boot/GM2Dependent.cc: Rebuild.
* mc-boot/GM2Dependent.h: Rebuild.
* mc-boot/GM2RTS.cc: Rebuild.
* pge-boot/GIndexing.cc: Rebuild.
* pge-boot/GIndexing.h: Rebuild.
* pge-boot/GM2Dependent.cc: Rebuild.
* pge-boot/GM2Dependent.h: Rebuild.
* pge-boot/GM2RTS.cc: Rebuild.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

AVR: Add an insn combine pattern for offset computation.

Computing  uint16_t += 2 * uint8_t  can occur when an offset
into a 16-bit array is computed.  Without this pattern is costs
six instructions: A move (1), a zero-extend (1), a shift (2) and
an addition (2).  With this pattern it costs 4.

gcc/
* config/avr/avr.md (*addhi3_zero_extend.ashift1): New pattern.
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS]: Compute its cost.

bb-reorder: Fix assertion

When touching bb-reorder yesterday, I've noticed the checking assert
doesn't actually check what it meant to.
Because asm_noperands returns >= 0 for inline asm patterns (in that case
number of input+output+label operands, so asm goto has at least one)
and -1 if it isn't inline asm.

The following patch fixes the assertion to actually check that it is
asm goto.

2024-03-08 Jakub Jelinek <jakub@redhat.com>

* bb-reorder.cc (fix_up_fall_thru_edges): Fix up checking assert,
asm_noperands < 0 means it is not asm goto too.

i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

The following patch hides the noreturn no_callee_saved_registers (except bp)
optimization with a not enabled by default option.
The reason is that most noreturn functions should be called just once in a
program (unless they are recursive or invoke longjmp or similar, for exceptions
we already punt), so it isn't that essential to save a few instructions in their
prologue, but more importantly because it interferes with debugging.
And unlike most other optimizations, doesn't actually make it harder to debug
the given function, which can be solved by recompiling the given function if
it is too hard to debug, but makes it harder to debug the callers of that
noreturn function. Those can be from a different translation unit, different
binary or even different package, so if e.g. glibc abort needs to use all
of the callee saved registers (%rbx, %rbp, %r12, %r13, %r14, %r15), debugging
any programs which abort will be harder because any DWARF expressions which
use those registers will be optimized out, not just in the immediate caller,
but in other callers as well until some frame restores a particular register
from some stack slot.

2024-03-08 Jakub Jelinek <jakub@redhat.com>

PR target/38534
* config/i386/i386.opt (mnoreturn-no-callee-saved-registers): New
option.
* config/i386/i386-options.cc (ix86_set_func_type): Don't use
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP unless
ix86_noreturn_no_callee_saved_registers is enabled.
* doc/invoke.texi (-mnoreturn-no-callee-saved-registers): Document.

* gcc.target/i386/pr38534-1.c: Add -mnoreturn-no-callee-saved-registers
to dg-options.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/pr38534-5.c: Likewise.
* gcc.target/i386/pr38534-6.c: Likewise.
* gcc.target/i386/pr114097-1.c: Likewise.
* gcc.target/i386/stack-check-17.c: Likewise.

c-family, c++: Fix up handling of types which may have padding in __atomic_{compare_}exchange

On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote:
> Ah, although __atomic_compare_exchange only takes pointers, the
> compiler replaces that with a call to __atomic_compare_exchange_n
> which takes the newval by value, which presumably uses an 80-bit FP
> register and so the padding bits become indeterminate again.

The problem is that __atomic_{,compare_}exchange lowering if it has
a supported atomic 1/2/4/8/16 size emits code like:
  _3 = *p2;
  _4 = VIEW_CONVERT_EXPR<I_type> (_3);
so if long double or some small struct etc. has some carefully filled
padding bits, those bits can be lost on the assignment.  The library call
for __atomic_{,compare_}exchange would actually work because it woiuld
load the value from memory using integral type or memcpy.
E.g. on
void
foo (long double *a, long double *b, long double *c)
{
  __atomic_compare_exchange (a, b, c, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
}
we end up with -O0 with:
        fldt    (%rax)
        fstpt   -48(%rbp)
        movq    -48(%rbp), %rax
        movq    -40(%rbp), %rdx
i.e. load *c from memory into 387 register, store it back to uninitialized
stack slot (the padding bits are now random in there) and then load a
__uint128_t (pair of GPR regs).  The problem is that we first load it using
whatever type the pointer points to and then VIEW_CONVERT_EXPR that value:
  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
The following patch fixes that by creating a MEM_REF instead, with the
I_type type, but with the original pointer type on the second argument for
aliasing purposes, so we actually preserve the padding bits that way.
With this patch instead of the above assembly we emit
        movq    8(%rax), %rdx
        movq    (%rax), %rax
I had to add support for MEM_REF in pt.cc, though with the assumption
that it has been already originally created with non-dependent
types/operands (which is the case here for the __atomic*exchange lowering).

2024-03-08  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
* c-common.cc (resolve_overloaded_atomic_exchange): Instead of setting
p1 to VIEW_CONVERT_EXPR<I_type> (*p1), set it to MEM_REF with p1 and
(typeof (p1)) 0 operands and I_type type.
(resolve_overloaded_atomic_compare_exchange): Similarly for p2.
gcc/cp/
* pt.cc (tsubst_expr): Handle MEM_REF.
gcc/testsuite/
* g++.dg/ext/atomic-5.C: New test.

dwarf2out: Emit DW_AT_export_symbols on anon unions/structs [PR113918]

DWARF5 added DW_AT_export_symbols both for use on inline namespaces (where
we emit it), but also on anonymous unions/structs (and we didn't emit that
attribute there).
The following patch fixes it.

2024-03-08 Jakub Jelinek <jakub@redhat.com>

PR debug/113918
gcc/
* dwarf2out.cc (gen_field_die): Emit DW_AT_export_symbols
on anonymous unions or structs for -gdwarf-5 or -gno-strict-dwarf.
gcc/c/
* c-tree.h (c_type_dwarf_attribute): Declare.
* c-objc-common.h (LANG_HOOKS_TYPE_DWARF_ATTRIBUTE): Redefine.
* c-objc-common.cc: Include dwarf2.h.
(c_type_dwarf_attribute): New function.
gcc/cp/
* cp-objcp-common.cc (cp_type_dwarf_attribute): Return 1
for DW_AT_export_symbols on anonymous structs or unions.
gcc/testsuite/
* c-c++-common/dwarf2/pr113918.c: New test.

c++: Fix up parameter pack diagnostics on xobj vs. varargs functions [PR113802]

The simple presence of ellipsis as next token after the parameter
declaration doesn't imply it is a parameter pack, it sometimes is, e.g.
if its type is a pack, but sometimes is not and in that case it acts
the same as if the next tokens were , ... instead of just ...
The xobj param cannot be a function parameter pack though treats both
the declarator->parameter_pack_p and token->type == CPP_ELLIPSIS as
sufficient conditions for the error.  The conditions for CPP_ELLIPSIS
are done a little bit later in the same function and complex enough that
IMHO shouldn't be repeated, on the other side for the
declarator->parameter_pack_p case we clear that flag for xobj params
for error recovery reasons.

This patch just moves the diagnostics later (after the CPP_ELLIPSIS handling)
and changes the error recovery behavior by pretending the this specifier
didn't appear if an error is reported.

2024-03-08  Jakub Jelinek  <jakub@redhat.com>

PR c++/113802
* parser.cc (cp_parser_parameter_declaration): Move the xobj_param_p
pack diagnostics after ellipsis handling and if an error is reported,
pretend this specifier didn't appear.  Formatting fix.

* g++.dg/cpp23/explicit-obj-diagnostics3.C (S0, S1, S2, S3, S4): Don't
expect any diagnostics on f and fd member function templates, add
similar templates with ...Selves instead of Selves as k and kd and
expect diagnostics for those.  Expect extra diagnostics in error
recovery for g and gd member function templates.

MAINTAINERS: Fix order in Write After Aproval

ChangeLog:

* MAINTAINERS: Fix order of names in Write After Aproval

Signed-off-by: Filip Kastl <fkastl@suse.cz>

testsuite/108355 - make gcc.dg/tree-ssa/ssa-fre-104.c properly XFAIL

The testcase only XFAILs on targets where int has an alignment
of sizeof(int). Align the respective array this way to make it
XFAIL consistenlty.

PR testsuite/108355
* gcc.dg/tree-ssa/ssa-fre-104.c: Align e.

modula2: Add constant aggregate tests

This patch adds four constant aggregate tests and assignment of
arrays by a constant in two different scopes.

gcc/testsuite/ChangeLog:

* gm2/iso/pass/arrayconst.mod: New test.
* gm2/iso/pass/arrayconst2.mod: New test.
* gm2/iso/pass/arrayconst3.mod: New test.
* gm2/iso/pass/arrayconst4.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

RISC-V: Fix ICE in riscv vector costs

The following code can result in ICE:
-march=rv64gcv --param riscv-autovec-lmul=dynamic -O3

char *jpeg_difference7_input_buf;
void jpeg_difference7(int *diff_buf) {
  unsigned width;
  int samp, Rb;
  while (--width) {
    Rb = samp = *jpeg_difference7_input_buf;
    *diff_buf++ = -(int)(samp + (long)Rb >> 1);
  }
}

One biggest_mode update missed in one branch and trigger assertion fail.
gcc_assert (biggest_size >= mode_size);

Tested On RV64 and no regression.

PR target/114264

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc: Fix ICE

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr114264.c: New test.

Signed-off-by: demin.han <demin.han@starfivetech.com>

fwprop: Avoid volatile rtx to be propagated

The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f)
which introduces an exception for propagation on single set insn.  The
propagation which might not be profitable (checked by profitable_p) is still
allowed to be propagated to single set insn.  It has a potential problem
that a volatile operand might be propagated to a singel set insn.  If the
define insn is not eliminated after propagation, the volatile operand will
be executed for multiple times.  This patch fixes the problem by skipping
volatile set source rtx in propagation.

gcc/
* fwprop.cc (forward_propagate_into): Return false for volatile set
source rtx.

gcc/testsuite/
* gcc.target/powerpc/fwprop-1.c: New.

Daily bump.

libstdc++: Use std::from_chars to speed up parsing subsecond durations

With std::from_chars we can parse subsecond durations much faster than
with std::num_get, as shown in the microbenchmarks below. We were using
std::num_get and std::numpunct in order to parse a number with the
locale's decimal point character. But we copy the chars from the input
stream into a new buffer anyway, so we can replace the locale's decimal
point with '.' in that buffer, and then we can use std::from_chars on
it.

Benchmark                Time             CPU   Iterations
----------------------------------------------------------
from_chars_millisec    158 ns          158 ns      4524046
num_get_millisec       192 ns          192 ns      3644626
from_chars_microsec    164 ns          163 ns      4330627
num_get_microsec       205 ns          205 ns      3413452
from_chars_nanosec     173 ns          173 ns      4072653
num_get_nanosec        227 ns          227 ns      3105161

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (_Parser::operator()): Use
std::from_chars to parse fractional seconds.

libstdc++: Fix parsing of fractional seconds [PR114244]

When converting a chrono::duration<long double> to a result type with an
integer representation we should use chrono::round<_Duration> so that we
don't truncate towards zero. Rounding ensures that e.g. 0.001999s
becomes 2ms not 1ms.

We can also remove some redundant uses of chrono::duration_cast to
convert from seconds to _Duration, because the _Parser class template
requires _Duration type to be able to represent seconds without loss of
precision.

This also fixes a bug where no fractional part would be parsed for
chrono::duration<long double> because its period is ratio<1>. We should
also consider treat_as_floating_point<rep> when deciding whether to skip
reading a fractional part.

libstdc++-v3/ChangeLog:

PR libstdc++/114244
* include/bits/chrono_io.h (_Parser::operator()): Remove
redundant uses of duration_cast. Use chrono::round to convert
long double value to durations with integer representations.
Check represenation type when deciding whether to skip parsing
fractional seconds.
* testsuite/20_util/duration/114244.cc: New test.
* testsuite/20_util/duration/io.cc: Check that a floating-point
duration with ratio<1> precision can be parsed.

c++: Redetermine whether to write vtables on stream-in [PR114229]

We currently always stream DECL_INTERFACE_KNOWN, which is needed since
many kinds of declarations already have their interface determined at
parse time. But for vtables and type-info declarations we need to
re-evaluate on stream-in as whether they need to be emitted or not
changes in each TU, so this patch clears DECL_INTERFACE_KNOWN on these
kinds of declarations so that they can go through 'import_export_decl'
again.

Note that the precise details of the virt-2 tests will need to change
when we implement the resolution of [1], for now I just updated the test
to not fail with the new (current) semantics.

[1]: https://github.com/itanium-cxx-abi/cxx-abi/pull/171

PR c++/114229

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Redetermine
DECL_INTERFACE_KNOWN on stream-in for vtables and tinfo.
* decl2.cc (import_export_decl): Add fixme for ABI changes with
module vtables and tinfo.

gcc/testsuite/ChangeLog:

* g++.dg/modules/virt-2_b.C: Update test to acknowledge that we
now emit vtables here too.
* g++.dg/modules/virt-3_a.C: New test.
* g++.dg/modules/virt-3_b.C: New test.
* g++.dg/modules/virt-3_c.C: New test.
* g++.dg/modules/virt-3_d.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: member alias tmpl partial inst [PR103994]

Alias templates are weird in that their specializations can appear in
both decl_specializations and type_specializations.  They're always in
the decl table, and additionally appear in the type table only at parse
time via finish_template_type.  There seems to be no good reason for
them to appear in both tables, and the code paths end up stepping over
each other in particular for a partial instantiation such as
A<B>::key_arg<T> in the below modules testcase: the type code path
(lookup_template_class) wants to set TI_TEMPLATE to the most general
template whereas the decl code path (tsubst_template_decl called during
instantiation of A<B>) already set TI_TEMPLATE to the partially
instantiated TEMPLATE_DECL.  This TI_TEMPLATE change ends up confusing
modules which decides to stream the logically equivalent TYPE_DECL and
TEMPLATE_DECL for this partial instantiation separately.

This patch fixes this by making lookup_template_class dispatch to
instantiate_alias_template early for alias template specializations.
In turn we now add such specializations only to the decl table.  This
admits some nice simplification in the modules code which otherwise has
to cope with such specializations appearing in both tables.

PR c++/103994

gcc/cp/ChangeLog:

* cp-tree.h (add_mergeable_specialization): Remove second
parameter.
* module.cc (depset::disc_bits::DB_ALIAS_TMPL_INST_BIT): Remove.
(depset::disc_bits::DB_ALIAS_SPEC_BIT): Remove.
(depset::is_alias_tmpl_inst): Remove.
(depset::is_alias): Remove.
(merge_kind::MK_tmpl_alias_mask): Remove.
(merge_kind::MK_alias_spec): Remove.
(merge_kind_name): Remove entries for alias specializations.
(trees_out::core_vals) <case TEMPLATE_DECL>: Adjust after
removing is_alias_tmpl_inst.
(trees_in::decl_value): Adjust add_mergeable_specialization
calls.
(trees_out::get_merge_kind) <case depset::EK_SPECIALIZATION>:
Use MK_decl_spec for alias template specializations.
(trees_out::key_mergeable): Simplify after MK_tmpl_alias_mask
removal.
(depset::hash::make_dependency): Adjust after removing
DB_ALIAS_TMPL_INST_BIT.
(specialization_add): Don't allow alias templates when !decl_p.
(depset::hash::add_specializations): Remove now-dead code
accomodating alias template specializations in the type table.
* pt.cc (lookup_template_class): Dispatch early to
instantiate_alias_template for alias templates.  Simplify
accordingly.
(add_mergeable_specialization): Remove alias_p parameter and
simplify accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99425-1_b.H: s/alias/decl in dump scan.
* g++.dg/modules/tpl-alias-1_a.H: Likewise.
* g++.dg/modules/tpl-alias-2_a.H: New test.
* g++.dg/modules/tpl-alias-2_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

AArch64: memcpy/memset expansions should not emit LDP/STP [PR113618]

The new RTL introduced for LDP/STP results in regressions due to use of UNSPEC.
Given the new LDP fusion pass is good at finding LDP opportunities, change the
memcpy, memmove and memset expansions to emit single vector loads/stores.
This fixes the regression and enables more RTL optimization on the standard
memory accesses. Handling of unaligned tail of memcpy/memmove is improved
with -mgeneral-regs-only. SPEC2017 performance improves slightly. Codesize
is a bit worse due to missed LDP opportunities as discussed in the PR.

gcc/ChangeLog:
PR target/113618
* config/aarch64/aarch64.cc (aarch64_copy_one_block): Remove.
(aarch64_expand_cpymem): Emit single load/store only.
(aarch64_set_one_block): Emit single stores only.

gcc/testsuite/ChangeLog:
PR target/113618
* gcc.target/aarch64/pr113618.c: New test.

c++/modules: inline namespace abi_tag streaming [PR110730]

The unreduced testcase from PR110730 crashes at runtime ultimately
because we don't stream the abi_tag attribute on inline namespaces and
so the filesystem::current_path() call resolves to the non-C++11 ABI
version even though the C++11 ABI is active, leading to a crash when
destroying the path temporary (which contains an std::string member).
Similar story for the PR105512 testcase.

While we do stream the DECL_ATTRIBUTES of all decls that go through
the generic tree streaming routines, it seems namespaces are streamed
separately from other decls and we don't use the generic routines for
them. So this patch makes us stream the abi_tag manually for (inline)
namespaces.

PR c++/110730
PR c++/105512

gcc/cp/ChangeLog:

* module.cc (module_state::write_namespaces): Stream the
abi_tag attribute of an inline namespace.
(module_state::read_namespaces): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/hello-2_a.C: New test.
* g++.dg/modules/hello-2_b.C: New test.
* g++.dg/modules/namespace-6_a.H: New test.
* g++.dg/modules/namespace-6_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

libstdc++: Do not define lock-free atomic aliases if not fully lock-free [PR114103]

The whole point of these typedefs is to guarantee lock-freedom, so if
the target has no such types, we shouldn't defined the typedefs at all.

libstdc++-v3/ChangeLog:

PR libstdc++/114103
* include/bits/version.def (atomic_lock_free_type_aliases): Add
extra_cond to check for at least one always-lock-free type.
* include/bits/version.h: Regenerate.
* include/std/atomic (atomic_signed_lock_free)
(atomic_unsigned_lock_free): Only use always-lock-free types.
* src/c++20/tzdb.cc (time_zone::_Impl::RulesCounter): Don't use
atomic counter if lock-free aliases aren't available.
* testsuite/29_atomics/atomic/lock_free_aliases.cc: XFAIL for
targets without lock-free word-size compare_exchange.

libstdc++: Update expiry times for leap seconds lists

The list in tzdb.cc isn't the only hardcoded list of leap seconds in the
library, there's the one defined inline in <chrono> (to avoid loading
the tzdb for the common case) and another in a testcase. This updates
them to note that there are no new leap seconds in 2024 either, until at
least 2024-12-28.

libstdc++-v3/ChangeLog:

* include/std/chrono (__get_leap_second_info): Update expiry
time for hardcoded list of leap seconds.
* testsuite/std/time/tzdb/leap_seconds.cc: Update comment.

libstdc++: Replace unnecessary uses of built-ins in testsuite

I don't see why we should rely on __builtin_memset etc. in tests. We can
just include <cstring> and use the public API.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/deque/allocator/default_init.cc: Use
std::memset instead of __builtin_memset.
* testsuite/23_containers/forward_list/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/list/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/map/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/set/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/unordered_map/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/unordered_set/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/vector/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/vector/bool/allocator/default_init.cc:
Likewise.
* testsuite/29_atomics/atomic/compare_exchange_padding.cc:
Likewise.
* testsuite/util/atomic/wait_notify_util.h: Likewise.

libstdc++: Better diagnostics for std::format errors

This adds two new static_assert messages to the internals of
std::make_format_args to give better diagnostics for invalid format
args. Rather than just getting an error saying that basic_format_arg
cannot be constructed, we get more specific errors for the cases where
std::formatter isn't specialized for the type at all, and where it's
specialized but only meets the BasicFormatter requirements and so can
only format non-const arguments.

Also add a test for the existing static_assert when constructing a
format_string for non-formattable args.

libstdc++-v3/ChangeLog:

* include/std/format (_Arg_store::_S_make_elt): Add two
static_assert checks to give more user-friendly error messages.
* testsuite/lib/prune.exp (libstdc++-dg-prune): Prune another
form of "in requirements with" note.
* testsuite/std/format/arguments/args_neg.cc: Check for
user-friendly diagnostics for non-formattable types.
* testsuite/std/format/string_neg.cc: Likewise.

testsuite, darwin: improve check for -shared support

The undefined symbols are allowed for C checks, but when
this is run as C++, the mangled foo() symbol is still
seen as undefined, and the testsuite thinks darwin does not
support -shared.

gcc/testsuite/ChangeLog:

PR target/114233
* lib/target-supports.exp: Fix test for C++.

vect: Do not peel epilogue for partial vectors.

r14-7036-gcbf569486b2dec added an epilogue vectorization guard for early
break but PR114196 shows that we also run into the problem without early
break. Therefore merge the condition into the topmost vectorization
guard.

gcc/ChangeLog:

PR middle-end/114196

* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Merge
vectorization guards.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr114196.c: New test.
* gcc.target/riscv/rvv/autovec/pr114196.c: New test.

PR modula2/109969 Linking large project causes an ICE

This patch contains a re-write of M2LexBuf.mod which removes the linked
list of token buckets and simplifies the implementation using a dynamic
array. It contains more checking (for empty source files for example).
The patch also contains a fix for an ICE in gcc/m2/gm2-gcc/builtins.cc

gcc/m2/ChangeLog:

PR modula2/109969
* gm2-compiler/M2LexBuf.def (TokenToLineNo): Rename parameter.
(TokenToColumnNo): Rename parameter.
(TokenToLocation): Rename parameter.
(FindFileNameFromToken): Rename parameter.
(DumpTokens): Rewrite comment.
* gm2-compiler/M2LexBuf.mod: Rewrite.
* gm2-compiler/P0SyntaxCheck.bnf (CheckInsertCandidate):
DumpTokens before and after inserting recovery token.
* gm2-gcc/m2builtins.cc (do_target_support_exists): Add
bf_c99_compl case.
* gm2-libs/Indexing.def (InitIndexTuned): New procedure
function.
(IsEmpty): New procedure function.
* gm2-libs/Indexing.mod (InitIndexTuned): New procedure
function.
(IsEmpty): New procedure function.
(Index): New field GrowFactor.
(PutIndice): Use GrowFactor to extend dynamic array.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: ICE with variable template and [[deprecated]] [PR110031]

lookup_and_finish_template_variable already has and uses the complain
parameter but it is not passing it down to mark_used so we got the
default tf_warning_or_error, which causes various problems when
lookup_and_finish_template_variable gets called with complain=tf_none.

PR c++/110031

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Pass complain to
mark_used.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/inline-var11.C: New test.

doc: Fix docs for -dD regarding predefined macros

The manual has always claimed that -dD differs from -dM by not
outputting predefined macros, but that's untrue. It has been untrue
since GCC 3.0 (probably with the change to use libcpp as the default
preprocessor implementation).

gcc/ChangeLog:

* doc/cppopts.texi: Remove incorrect claim about -dD not
outputting predefined macros.

rs6000: Don't ICE when compiling the __builtin_vsx_splat_2di [PR113950]

When we expand the __builtin_vsx_splat_2di built-in, we were allowing immediate
value for second operand which causes an unrecognizable insn ICE. Even though
the immediate value was forced into a register, it wasn't correctly assigned
to the second operand. So corrected the assignment of op1 to operands[1].

2024-03-07 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
PR target/113950
* config/rs6000/vsx.md (vsx_splat_<mode>): Correct assignment to operand1
and simplify else if with else.

gcc/testsuite/
PR target/113950
* gcc.target/powerpc/pr113950.c: New testcase.

Fix bogus error on allocator for array type with Dynamic_Predicate

This is a regression present on all active branches: the compiler gives
a bogus error on an allocator for an unconstrained array type declared
with a Dynamic_Predicate because Apply_Predicate_Check is invoked directly
on a subtype reference, which it cannot handle.

This moves the check to the resulting access value (after dereference) like
in Expand_Allocator_Expression.

gcc/ada/
PR ada/113979
* exp_ch4.adb (Expand_N_Allocator): In the subtype indication case,
call Apply_Predicate_Check on the resulting access value if needed.

gcc/testsuite/
* gnat.dg/predicate15.adb: New test.

Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

When building gcc's C++ sources against recent libc++, the poisoning of
the ctype macros due to including safe-ctype.h before including C++
standard headers such as <list>, <map>, etc, causes many compilation
errors, similar to:

  In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
  In file included from /home/dim/src/gcc/master/gcc/system.h:233:
  In file included from /usr/include/c++/v1/vector:321:
  In file included from
  /usr/include/c++/v1/__format/formatter_bool.h:20:
  In file included from
  /usr/include/c++/v1/__format/formatter_integral.h:32:
  In file included from /usr/include/c++/v1/locale:202:
  /usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
  only applies to structs, variables, functions, and namespaces
    546 |     _LIBCPP_INLINE_VISIBILITY
        |     ^
  /usr/include/c++/v1/__config:813:37: note: expanded from macro
  '_LIBCPP_INLINE_VISIBILITY'
    813 | #  define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
        |                                     ^
  /usr/include/c++/v1/__config:792:26: note: expanded from macro
  '_LIBCPP_HIDE_FROM_ABI'
    792 |
    __attribute__((__abi_tag__(_LIBCPP_TOSTRING(
  _LIBCPP_VERSIONED_IDENTIFIER))))
        |                          ^
  In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
  In file included from /home/dim/src/gcc/master/gcc/system.h:233:
  In file included from /usr/include/c++/v1/vector:321:
  In file included from
  /usr/include/c++/v1/__format/formatter_bool.h:20:
  In file included from
  /usr/include/c++/v1/__format/formatter_integral.h:32:
  In file included from /usr/include/c++/v1/locale:202:
  /usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
  declaration list
    547 |     char_type toupper(char_type __c) const
        |                                     ^
  /usr/include/c++/v1/__locale:553:48: error: too many arguments
  provided to function-like macro invocation
    553 |     const char_type* toupper(char_type* __low, const
    char_type* __high) const
        |                                                ^
  /home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
  macro 'toupper' defined here
    146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
        |         ^

This is because libc++ uses different transitive includes than
libstdc++, and some of those transitive includes pull in various ctype
declarations (typically via <locale>).

There was already a special case for including <string> before
safe-ctype.h, so move the rest of the C++ standard header includes to
the same location, to fix the problem.

gcc/ChangeLog:

* system.h: Include safe-ctype.h after C++ standard headers.

Signed-off-by: Dimitry Andric <dimitry@andric.com>

analyzer: Fix up some -Wformat* warnings

I'm seeing warnings like
../../gcc/analyzer/access-diagram.cc: In member function ‘void ana::bit_size_expr::print(pretty_printer*) const’:
../../gcc/analyzer/access-diagram.cc:399:26: warning: unknown conversion type character ‘E’ in format [-Wformat=]
  399 |         pp_printf (pp, _("%qE bytes"), bytes_expr);
      |                          ^~~~~~~~~~~
when building stage2/stage3 gcc.  While such warnings would be
understandable when building stage1 because one could e.g. have some
older host compiler which doesn't understand some of the format specifiers,
the above seems to be because we have in pretty-print.h
#ifdef GCC_DIAG_STYLE
#define GCC_PPDIAG_STYLE GCC_DIAG_STYLE
#else
#define GCC_PPDIAG_STYLE __gcc_diag__
#endif
and use GCC_PPDIAG_STYLE e.g. for pp_printf, and while
diagnostic-core.h has
#ifndef GCC_DIAG_STYLE
#define GCC_DIAG_STYLE __gcc_tdiag__
#endif
(and similarly various FE headers include their own GCC_DIAG_STYLE)
when including pretty-print.h before diagnostic-core.h we end up
with __gcc_diag__ style rather than __gcc_tdiag__ style, which I think
is the right thing for the analyzer, because analyzer seems to use
default_tree_printer everywhere:
grep pp_format_decoder.*=.default_tree_printer analyzer/* | wc -l
57

The following patch fixes that by making sure diagnostic-core.h is included
before pretty-print.h.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

* access-diagram.cc: Include diagnostic-core.h before including
diagnostic.h or diagnostic-path.h.
* sm-malloc.cc: Likewise.
* diagnostic-manager.cc: Likewise.
* call-summary.cc: Likewise.
* record-layout.cc: Likewise.

contrib: Update test_mklog to correspond to mklog

contrib/ChangeLog:

* test_mklog.py: "Moved to..." -> "Move to..."

Signed-off-by: Filip Kastl <fkastl@suse.cz>

c++: Fix ICE diagnosing incomplete type of overloaded function set [PR98356]

In the linked PR the result of 'get_first_fn' is a USING_DECL against
the template parameter, to be filled in on instantiation. But we don't
actually need to get the first set of the member functions: it's enough
to know that we have a (possibly overloaded) member function at all.

PR c++/98356

gcc/cp/ChangeLog:

* typeck2.cc (cxx_incomplete_type_diagnostic): Don't assume
'member' will be a FUNCTION_DECL (or something like it).

gcc/testsuite/ChangeLog:

* g++.dg/pr98356.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Stream DECL_CONTEXT for template template parms [PR98881]

When streaming in a nested template-template parameter as in the
attached testcase, we end up reaching the containing template-template
parameter in 'tpl_parms_fini'. We should not set the DECL_CONTEXT to
this (nested) template-template parameter, as it should already be the
struct that the outer template-template parameter is declared on.

The precise logic for what DECL_CONTEXT should be for a template
template parameter in various situations seems rather obscure. Rather
than trying to determine the assumptions that need to hold, it seems
simpler to just always re-stream the DECL_CONTEXT as needed for now.

PR c++/98881

gcc/cp/ChangeLog:

* module.cc (trees_out::tpl_parms_fini): Stream out DECL_CONTEXT
for template template parameters.
(trees_in::tpl_parms_fini): Read it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-tpl-parm-3.h: New test.
* g++.dg/modules/tpl-tpl-parm-3_a.H: New test.
* g++.dg/modules/tpl-tpl-parm-3_b.C: New test.
* g++.dg/modules/tpl-tpl-parm-3_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]

The following testcase ICEs, because fix_crossing_unconditional_branches
thinks that asm goto is an unconditional jump and removes it, replacing it
with unconditional jump to one of the labels.
This doesn't happen on x86 because the function in question isn't invoked
there at all:
  /* If the architecture does not have unconditional branches that
     can span all of memory, convert crossing unconditional branches
     into indirect jumps.  Since adding an indirect jump also adds
     a new register usage, update the register usage information as
     well.  */
  if (!HAS_LONG_UNCOND_BRANCH)
    fix_crossing_unconditional_branches ();
I think for the asm goto case, for the non-fallthru edge if any we should
handle it like any other fallthru (and fix_crossing_unconditional_branches
doesn't really deal with those, it only looks at explicit branches at the
end of bbs and we are in cfglayout mode at that point) and for the labels
we just pass the labels as immediates to the assembly and it is up to the
user to figure out how to store them/branch to them or whatever they want to
do.
So, the following patch fixes this by not treating asm goto as a simple
unconditional jump.

I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
somewhere else, where outofcfglayout or whatever should actually create
those indirect jumps on the crossing edges instead of adding normal
unconditional jumps, I see e.g. in
__attribute__((cold)) int bar (char *);
__attribute__((hot)) int baz (char *);
void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto l1; l1: baz (""); }
void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: return; l1: bar (""); goto l2; }
with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
just b .L? jumps which I believe are +-32MB, so if .text is larger than
32MB, it could fail to link, but this patch doesn't address that.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/110079
* bb-reorder.cc (fix_crossing_unconditional_branches): Don't adjust
asm goto.

* gcc.dg/pr110079.c: New test.

expand: Fix UB in choose_mult_variant [PR105533]

As documented in the function comment, choose_mult_variant attempts to
compute costs of 3 different cases, val, -val and val - 1.
The -val case is actually only done if val fits into host int, so there
should be no overflow, but the val - 1 case is done unconditionally.
val is shwi (but inside of synth_mult already uhwi), so when val is
HOST_WIDE_INT_MIN, val - 1 invokes UB.  The following patch fixes that
by using val - HOST_WIDE_INT_1U, but I'm not really convinced it would
DTRT for > 64-bit modes, so I've guarded it as well.  Though, arch
would need to have really strange costs that something that could be
expressed as x << 63 would be better expressed as (x * 0x7fffffffffffffff) + 1
In the long term, I think we should just rewrite
choose_mult_variant/synth_mult etc. to work on wide_int.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/105533
* expmed.cc (choose_mult_variant): Only try the val - 1 variant
if val is not HOST_WIDE_INT_MIN or if mode has exactly
HOST_BITS_PER_WIDE_INT precision.  Avoid triggering UB while computing
val - 1.

* gcc.dg/pr105533.c: New test.

sccvn: Avoid UB in ao_ref_init_from_vn_reference [PR105533]

When compiling libgcc or on e.g.
int a[64];
int p;

void
foo (void)
{
  int s = 1;
  while (p)
    {
      s -= 11;
      a[s] != 0;
    }
}
sccvn invokes UB in the compiler as detected by ubsan:
../../gcc/poly-int.h:1089:5: runtime error: left shift of negative value -40
The problem is that we still use C++11..C++17 as the implementation language
and in those C++ versions shifting negative values left is UB (well defined
since C++20) and above in
           offset += op->off << LOG2_BITS_PER_UNIT;
op->off is poly_int64 with -40 value (in libgcc with -8).
I understand the offset_int << LOG2_BITS_PER_UNIT shifts but it is then well
defined during underlying implementation which is done on the uhwi limbs,
but for poly_int64 we use
                offset += pop->off * BITS_PER_UNIT;
a few lines earlier and I think that is both more readable in what it
actually does and triggers UB only if there would be signed multiply
overflow.  In the end, the compiler will treat them the same at least at the
RTL level (at least, if not and they aren't the same cost, it should).

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/105533
* tree-ssa-sccvn.cc (ao_ref_init_from_vn_reference) <case ARRAY_REF>:
Multiple op->off by BITS_PER_UNIT instead of shifting it left by
LOG2_BITS_PER_UNIT.

LoongArch: testsuite:Fix problems with incorrect results in vector test cases.

In simd_correctness_check.h, the role of the macro ASSERTEQ_64 is to check the
result of the passed vector values for the 64-bit data of each array element.
It turns out that it uses the abs() function to check only the lower 32 bits
of the data at a time, so it replaces abs() with the llabs() function.

However, the following two problems may occur after modification:

1.FAIL in lasx-xvfrint_s.c and lsx-vfrint_s.c
The reason for the error is because vector test cases that use __m{128,256} to
define vector types are composed of 32-bit primitive types, they should use
ASSERTEQ_32 instead of ASSERTEQ_64 to check for correctness.

2.FAIL in lasx-xvshuf_b.c and lsx-vshuf.c
The cause of the error is that the expected result of the function setting in
the test case is incorrect.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c: Replace
ASSERTEQ_64 with the macro ASSERTEQ_32.
* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Modify the expected
test results of some functions according to the function of the vector
instruction.
* gcc.target/loongarch/vector/lsx/lsx-vfrint_s.c: Same
modification as lasx-xvfrint_s.c.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Same
modification as lasx-xvshuf_b.c.
* gcc.target/loongarch/vector/simd_correctness_check.h: Use the llabs()
function instead of abs() to check the correctness of the results.

LoongArch: Use /lib instead of /lib64 as the library search path for MUSL.

gcc/ChangeLog:

* config.gcc: Add a case for loongarch*-*-linux-musl*.
* config/loongarch/linux.h: Disable the multilib-compatible
treatment for *musl* targets.
* config/loongarch/musl.h: New file.

match.pd: Optimize a * !a to 0 [PR114009]

The following patch attempts to fix an optimization regression through
adding a simple simplification.  We already have the
/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
(if (!canonicalize_math_p ())
(for cmp (tcc_comparison)
  (simplify
   (mult:c (convert (cmp@0 @1 @2)) @3)
   (if (INTEGRAL_TYPE_P (type)
        && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
     (cond @0 @3 { build_zero_cst (type); })))
optimization which otherwise triggers during the a * !a multiplication,
but that is done only late and we aren't able through range assumptions
optimize it yet anyway.

The patch adds a specific simplification for it.
If a is zero, then a * !a will be 0 * 1 (or for signed 1-bit 0 * -1)
and so 0.
If a is non-zero, then a * !a will be a * 0 and so again 0.
THe pattern is valid for scalar integers, complex integers and vector types,
but I think will actually trigger only for the scalar integers.  For
vector types I've added other two with VEC_COND_EXPR in it, for complex
there are different GENERIC trees to match and it is something that likely
would be never matched in GIMPLE, so I didn't handle that.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/114009
* genmatch.cc (decision_tree::gen): Emit ARG_UNUSED for captures
argument even for GENERIC, not just for GIMPLE.
* match.pd (a * !a -> 0): New simplifications.

* gcc.dg/tree-ssa/pr114009.c: New test.

RISC-V: Refactor expand_vec_cmp [NFC]

There are two expand_vec_cmp functions.
They have same structure and similar code.
We can use default arguments instead of overloading.

Tested on RV32 and RV64.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_cmp): Change proto
* config/riscv/riscv-v.cc (expand_vec_cmp): Use default arguments
(expand_vec_cmp_float): Adapt arguments

Signed-off-by: demin.han <demin.han@starfivetech.com>

Fortran: Fix issue with using snprintf function.

The previous patch used snprintf to set the message
string. The message string is not a formatted string
and the snprintf will interpret '%' related characters
as format specifiers when there are no associated
output variables. A segfault ensues.

This change replaces snprintf with a fortran string copy
function and null terminates the message string.

PR libfortran/105456

libgfortran/ChangeLog:

* io/list_read.c (list_formatted_read_scalar): Use fstrcpy
from libgfortran/runtime/string.c to replace snprintf.
(nml_read_obj): Likewise.
* io/transfer.c (unformatted_read): Likewise.
(unformatted_write): Likewise.
(formatted_transfer_scalar_read): Likewise.
(formatted_transfer_scalar_write): Likewise.
* io/write.c (list_formatted_write_scalar): Likewise.
(nml_write_obj): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105456.f90: Revise using '%' characters
in users error message.

Daily bump.

i386: Fix and improve insn constraint for V2QI arithmetic/shift insns

optimize_function_for_size_p predicate is not stable during optab selection,
because it also depends on node->count/node->frequency of the current function,
which are updated during IPA, so they may change between early opts and
late opts.  Use optimize_size instead - optimize_size implies
optimize_function_for_size_p (cfun), so if a named pattern uses
"&& optimize_size" and the insn it splits into uses
optimize_function_for_size_p (cfun), it shouldn't fail.

PR target/114232

gcc/ChangeLog:

* config/i386/mmx.md (negv2qi2): Enable for optimize_size instead
of optimize_function_for_size_p.  Explictily enable for TARGET_SSE2.
(negv2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(<plusminus:insn>v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.  Explictily enable for TARGET_SSE2.
(<plusminus:insn>v2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(<any_shift:insn>v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.

RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads [PR114200].

Three-operand instructions like vmacc are modeled with an implicit
output reload when the output does not match one of the operands. For
this we use vmv.v.v which is subject to length masking.

In a situation where the current vl is less than the full vlenb
and the fma's result value is used as input for a vector reduction
(which is never length masked) we effectively only reduce vl
elements. The masked-out elements are relevant for the
reduction, though, leading to a wrong result.

This patch replaces the vmv reloads by full-register reloads.

gcc/ChangeLog:

PR target/114200
PR target/114202

* config/riscv/vector.md: Use vmv[1248]r.v instead of vmv.v.v.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr114200.c: New test.
* gcc.target/riscv/rvv/autovec/pr114202.c: New test.

RISC-V: Adjust vec unit-stride load/store costs.

Scalar loads provide offset addressing while unit-stride vector
instructions cannot.  The offset must be loaded into a general-purpose
register before it can be used.  In order to account for this, this
patch adds an address arithmetic heuristic that keeps track of data
reference operands.  If we haven't seen the operand before we add the
cost of a scalar statement.

This helps to get rid of an lbm regression when vectorizing (roughly
0.5% fewer dynamic instructions).  gcc5 improves by 0.2% and deepsjeng
by 0.25%.  wrf and nab degrade by 0.1%.  This is because before we now
adjust the cost of SLP as well as loop-vectorized instructions whereas
we would only adjust loop-vectorized instructions before.
Considering higher scalar_to_vec costs (3 vs 1) for all vectorization
types causes some snippets not to get vectorized anymore.  Given these
costs the decision looks correct but appears worse when just counting
dynamic instructions.

In total SPECint 2017 has 4 bln dynamic instructions less and SPECfp 0.7
bln.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Move...
(costs::adjust_stmt_cost): ... to here and add vec_load/vec_store
offset handling.
(costs::add_stmt_cost): Also adjust cost for statements without
stmt_info.
* config/riscv/riscv-vector-costs.h: Define zero constant.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-2.c: New test.

ARM: Fix conditional execution [PR113915]

By default most patterns can be conditionalized on Arm targets.  However
Thumb-2 predication requires the "predicable" attribute be explicitly
set to "yes".  Most patterns are shared between Arm and Thumb(-2) and are
marked with "predicable".  Given this sharing, it does not make sense to
use a different default for Arm.  So only consider conditional execution
of instructions that have the predicable attribute set to yes.  This ensures
that patterns not explicitly marked as such are never conditionally executed.

gcc/ChangeLog:
PR target/113915
* config/arm/arm.md (NOCOND): Improve comment.
(arm_rev*) Add predicable.
* config/arm/arm.cc (arm_final_prescan_insn): Add check for
PREDICABLE_YES.

gcc/testsuite/ChangeLog:
PR target/113915
* gcc.target/arm/builtin-bswap-1.c: Fix test to allow conditional
execution both for Arm and Thumb-2.

Revert "Set num_threads to 50 on 32-bit hppa in two libgomp loop tests"

This reverts commit b14209715e659f6d3ca0f9eef9a4851e7bd6e373.

[PR target/113001] Fix incorrect operand swapping in conditional move

This bug totally fell off my radar.  Sorry about that.

We have some special casing the conditional move expander to simplify a
conditional move when comparing a register against zero and that same register
is one of the arms.

Specifically a (eq (reg) (const_int 0)) where reg is also the true arm or (ne
(reg) (const_int 0)) where reg is the false arm need not use the fully
generalized conditional move, thus saving an instruction for those cases.

In the NE case we swapped the operands, but didn't swap the condition, which
led to the ICE due to an unrecognized pattern.  THe backend actually has
distinct patterns for those two cases.  So swapping the operands is neither
needed nor advisable.

Regression tested on rv64gc and verified the new tests pass.

Pushing to the trunk.

PR target/113001
PR target/112871
gcc/
* config/riscv/riscv.cc (expand_conditional_move): Do not swap
operands when the comparison operand is the same as the false
arm for a NE test.

gcc/testsuite
* gcc.target/riscv/zicond-ice-3.c: New test.
* gcc.target/riscv/zicond-ice-4.c: New test.

Fortran: error recovery while simplifying expressions [PR103707,PR106987]

When an exception is encountered during simplification of arithmetic
expressions, the result may depend on whether range-checking is active
(-frange-check) or not.  However, the code path in the front-end should
stay the same for "soft" errors for which the exception is triggered by the
check, while "hard" errors should always terminate the simplification, so
that error recovery is independent of the flag.  Separation of arithmetic
error codes into "hard" and "soft" errors shall be done consistently via
is_hard_arith_error().

PR fortran/103707
PR fortran/106987

gcc/fortran/ChangeLog:

* arith.cc (is_hard_arith_error): New helper function to determine
whether an arithmetic error is "hard" or not.
(check_result): Use it.
(gfc_arith_divide): Set "Division by zero" only for regular
numerators of real and complex divisions.
(reduce_unary): Use is_hard_arith_error to determine whether a hard
or (recoverable) soft error was encountered.  Terminate immediately
on hard error, otherwise remember code of first soft error.
(reduce_binary_ac): Likewise.
(reduce_binary_ca): Likewise.
(reduce_binary_aa): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr99350.f90:
* gfortran.dg/arithmetic_overflow_3.f90: New test.

c++: ICE with noexcept and local specialization [PR114114]

Here we ICE because we call register_local_specialization while
local_specializations is null, so

  local_specializations->put ();

crashes on null this.  It's null since maybe_instantiate_noexcept calls
push_to_top_level which creates a new scope.  Normally, I would have
guessed that we need a new local_specialization_stack.  But here we're
dealing with an operand of a noexcept, which is an unevaluated operand,
and those aren't registered in the hash map.  maybe_instantiate_noexcept
wasn't signalling that it's substituting an unevaluated operand though.

PR c++/114114

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Save/restore
cp_unevaluated_operand, c_inhibit_evaluation_warnings, and
cp_noexcept_operand around the tsubst_expr call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept84.C: New test.

i386: Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move

Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move and
use generic code instead.

No functional changes.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_move) [TARGET_MACHO]:
Eliminate common code and use generic code instead.

amdgcn: additional gfx1030/gfx1100 support: adjust test cases

The "SDWA" changes in commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support" caused a few regressions:

    PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler zero_extendv64qiv64si2

    PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler zero_extendv64hiv64si2

    PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler zero_extendv64qiv64si2

    PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler zero_extendv64hiv64si2

Those test cases need corresponding adjustment.

gcc/testsuite/
* gcc.target/gcn/sram-ecc-3.c: Adjust.
* gcc.target/gcn/sram-ecc-4.c: Likewise.
* gcc.target/gcn/sram-ecc-7.c: Likewise.
* gcc.target/gcn/sram-ecc-8.c: Likewise.

AVR: Adjust rtx cost of plus + zero_extend.

gcc/
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS+ZERO_EXTEND]: Adjust
rtx cost.