gcc.gnu.org Git - gcc.git/log

testsuite: fix lambda-decltype3.C in C++11

This fixes
FAIL: g++.dg/cpp0x/lambda/lambda-decltype3.C -std=c++11 (test for excess errors)
due to
lambda-decltype3.C:25:6: error: lambda capture initializers only available with '-std=c++14' or '-std=gnu++14' [-Wc++14-extensions]

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-decltype3.C: Check __cpp_init_captures.

[PATCH] libgcc/m68k: Fixes for soft float

Check for non-zero denorm in __adddf3. Need to check both the upper and
lower 32-bit chunks of a 64-bit float for a non-zero value when
checking to see if the value is -0.

Fix __addsf3 when the sum exponent is exactly 0xff to ensure that
produces infinity and not nan.

Handle converting NaN/inf values between formats.

Handle underflow and overflow when truncating.

Write a replacement for __fixxfsi so that it does not raise extra
exceptions during an extra conversion from long double to double.

libgcc/
* config/m68k/lb1sf68.S (__adddf3): Properly check for non-zero denorm.
(__divdf3): Restore sign bit properly.
(__addsf3): Correct exponent check.
* config/m68k/fpgnulib.c (EXPMASK): Define.
(__extendsfdf2): Handle Inf and NaN properly.
(__truncdfsf2): Handle underflow and overflow correctly.
(__extenddfxf2): Handle underflow, denorms, Inf and NaN correctly.
(__truncxfdf2): Handle underflow and denorms correctly.
(__fixxfsi): Reimplement.

[PATCH] doc: Add fpatchable-function-entry to Option-Summary page[PR110983]

gcc/
PR middle-end/110983
* doc/invoke.texi (Option Summary): Add -fpatchable-function-entry.

RISC-V: Fix indentation of "length" attribute for branches and jumps

The "length" attribute calculation expressions for branches and jumps
are incorrectly and misleadingly indented, and they overrun the 80
column limit as well, all of this causing troubles in following them.
Correct all these issues.

gcc/
* config/riscv/riscv.md (length): Fix indentation for branch and
jump length calculation expressions.

c23: recursive type checking of tagged type

Adapt the old and unused code for type checking for C23.

gcc/c/:
* c-typeck.cc (struct comptypes_data): Add anon_field flag.
(comptypes, comptypes_check_unum_int,
comptypes_check_different_types): Remove old cache.
(tagged_tu_types_compatible_p): Rewrite.

g++: Rely on dg-do-what-default to avoid running pr102788.cc on non-vector targets

Testcases in g++.dg/vect rely on check_vect_support_and_set_flags
to set dg-do-what-default and avoid running vector tests on non-vector
targets. The testcase in this patch overwrites the default with
dg-do run.

Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr102788.cc: Remove dg-do run directive.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

Handle constant CONSTRUCTORs in operand_compare

This teaches operand_compare to compare constant CONSTRUCTORs, which is
quite helpful for so-called fat pointers in Ada, i.e. objects that are
semantically pointers but are represented by structures made up of two
pointers. This is modeled on the implementation present in the ICF pass.

gcc/
* fold-const.cc (operand_compare::operand_equal_p) <CONSTRUCTOR>:
Deal with nonempty constant CONSTRUCTORs.
(operand_compare::hash_operand) <CONSTRUCTOR>: Hash DECL_FIELD_OFFSET
and DECL_FIELD_BIT_OFFSET for FIELD_DECLs.

gcc/testsuite/
* gnat.dg/opt103.ads, gnat.dg/opt103.adb: New test.

[IRA]: Check autoinc and memory address after temporary equivalence substitution

My previous RA patches to take register equivalence into account do
temporary register equivalence substitution to find out that the
equivalence can be consumed by insns.  The insn with the substitution is
checked on validity using target-depended code.  This code expects that
autoinc operations work on register but this register can be substituted
by equivalent memory.  The patch fixes this problem.  The patch also adds
checking that the substitution can be consumed in memory address too.

gcc/ChangeLog:

PR target/112337
* ira-costs.cc: (validate_autoinc_and_mem_addr_p): New function.
(equiv_can_be_consumed_p): Use it.

gcc/testsuite/ChangeLog:

PR target/112337
* gcc.target/arm/pr112337.c: New.

ada: Fix syntax error

gcc/ada/
* expect.c (__gnat_waitpid): fix syntax errors

c++: decltype of (by-value captured reference) [PR79620]

The capture_decltype handling in finish_decltype_type wasn't looking
through implicit INDIRECT_REF (added by convert_from_reference), which
caused us to incorrectly resolve decltype((r)) to float& below. This
patch fixes this, and adds an assert to outer_automatic_var_p to help
prevent against such bugs.

We still don't fully accept the example ultimately because for the
decltype inside the lambda's trailing return type, at that point we're
in lambda type scope but not yet in lambda function scope that the
capture_decltype handling looks for (which is an orthogonal bug).

PR c++/79620

gcc/cp/ChangeLog:

* cp-tree.h (STRIP_REFERENCE_REF): Define.
* semantics.cc (outer_var_p): Assert REFERENCE_REF_P is false.
(finish_decltype_type): Look through implicit INDIRECT_REF when
deciding whether to call capture_decltype.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-decltype3.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: decltype of capture proxy [PR79378, PR96917]

We typically don't see capture proxies in finish_decltype_type because
process_outer_var_ref is a no-op within an unevaluated context and so a
use of a captured variable within decltype resolves to the captured
variable, not the capture. But we can see them during decltype(auto)
deduction and for decltype of an init-capture, which suggests we need to
handle capture proxies specially within finish_decltype_type after all.
This patch adds such handling.

PR c++/79378
PR c++/96917

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): Handle an id-expression
naming a capture proxy specially.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto7.C: New test.
* g++.dg/cpp1y/lambda-init20.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

Allow md iterators to include other iterators

This patch allows an .md iterator to include the contents of
previous iterators, possibly with an extra condition attached.

Too much indirection might become hard to follow, so for the
AArch64 changes I tried to stick to things that seemed likely
to be uncontroversial:

(a) structure iterators that combine modes for different sizes
and vector counts

(b) iterators that explicitly duplicate another iterator
(for iterating over the cross product)

gcc/
* read-rtl.cc (md_reader::read_mapping): Allow iterators to
include other iterators.
* doc/md.texi: Document the change.
* config/aarch64/iterators.md (DREG2, VQ2, TX2, DX2, SX2): Include
the iterator that is being duplicated, rather than reproducing it.
(VSTRUCT_D): Redefine using VSTRUCT_[234]D.
(VSTRUCT_Q): Likewise VSTRUCT_[234]Q.
(VSTRUCT_2QD, VSTRUCT_3QD, VSTRUCT_4QD, VSTRUCT_QD): Redefine using
the individual D and Q iterators.

i386: Clear stack protector scratch with zero/sign-extend instruction

Use unrelated register initializations using zero/sign-extend instructions
to clear stack protector scratch register.

Hanlde only SI -> DImode extensions for 64-bit targets, as this is the
only extension that triggers the peephole in a non-negligible number.

Also use explicit check for word_mode instead of mode iterator in peephole2
patterns to avoid pattern explosion.

gcc/ChangeLog:

* config/i386/i386.md (stack_protect_set_1 peephole2):
Explicitly check operand 2 for word_mode.
(stack_protect_set_1 peephole2 #2): Ditto.
(stack_protect_set_2 peephole2): Ditto.
(stack_protect_set_3 peephole2): Ditto.
(*stack_protect_set_4z_<mode>_di): New insn patter.
(*stack_protect_set_4s_<mode>_di): Ditto.
(stack_protect_set_4 peephole2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization involving zero/sign-extend instruction.

i386: Fix ashift insn mnemonic in shift code attribute

gcc/ChangeLog:

* config/i386/i386.md (shift): Use SAL insted of SLL
for ashift insn mnemonic.

Middle-end: Fix bug of induction variable vectorization for RVV

PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

1. Since SELECT_VL result is not necessary always VF in non-final iteration.

Current GIMPLE IR is wrong:

...
_35 = .SELECT_VL (ivtmp_33, VF);
_21 = vect_vec_iv_.8_22 + { VF, ... };

E.g. Consider the total iterations N = 6, the VF = 4.
Since SELECT_VL output is defined as not always to be VF in non-final iteration
which needs to depend on hardware implementation.

Suppose we have a RVV CPU core with vsetvl doing even distribution workload optimization.
It may process 3 elements at the 1st iteration and 3 elements at the last iteration.
Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... };
is wrong which is adding VF, which is 4, actually, we didn't process 4 elements.

It should be adding 3 elements which is the result of SELECT_VL.
So, here the correct IR should be:

  _36 = .SELECT_VL (ivtmp_34, VF);
  _22 = (int) _36;
  vect_cst__21 = [vec_duplicate_expr] _22;

2. This issue only happens on non-SLP vectorization single rgroup since:

     if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
    {
      tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
      if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
  OPTIMIZE_FOR_SPEED)
  && LOOP_VINFO_LENS (loop_vinfo).length () == 1
  && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
  && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
      || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
    }

3. This issue doesn't appears on nested loop no matter LOOP_VINFO_USING_SELECT_VL_P is true or false.

Since:

  # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)>
  # vect_diff_15.7_20 = PHI <vect_diff_9.8_22(3), vect_diff_18.5_11(5)>
  _19 = vect_vec_iv_.6_5 + { 1, ... };
  vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, vect_diff_15.7_20, vect_diff_15.7_20, _28, 0);
  ivtmp_1 = ivtmp_4 + 4294967295;
  ....
  <bb 5> [local count: 6549826]:
  # vect_diff_18.5_11 = PHI <vect_diff_9.8_22(4), { 0, ... }(2)>
  # ivtmp_26 = PHI <ivtmp_27(4), 40(2)>
  _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
  goto <bb 3>; [100.00%]

Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; update induction variable
independent on VF (or don't care about how many elements are processed in the iteration).

The update is loop invariant. So it won't be the problem even if LOOP_VINFO_USING_SELECT_VL_P is true.

Testing passed, Ok for trunk ?

PR tree-optimization/112438

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_induction): Bugfix when
LOOP_VINFO_USING_SELECT_VL_P.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112438.c: New test.

libatomic: Improve ifunc selection on AArch64

Add support for ifunc selection based on CPUID register. Neoverse N1 supports
atomic 128-bit load/store, so use the FEAT_USCAT ifunc like newer Neoverse
cores.

Reviewed-by: Kyrylo.Tkachov@arm.com
libatomic:
* config/linux/aarch64/host-config.h (ifunc1): Use CPUID in ifunc
selection.

RISC-V: Add combine optimization by slideup for vec_init vectorization

This patch is a small optimization for vector initialization.
Discovered when I am evaluating benchmarks.

Consider this following case:
void foo3 (int8_t *out, int8_t x, int8_t y)
{
  v16qi v = {y, y, y, y, y, y, y, x, x, x, x, x, x, x, x, x};
  *(v16qi*)out = v;
}

Before this patch:

        vsetivli        zero,16,e8,m1,ta,ma
        vmv.v.x v1,a2
        vslide1down.vx  v1,v1,a1
        vslide1down.vx  v1,v1,a1
        vslide1down.vx  v1,v1,a1
        vslide1down.vx  v1,v1,a1
        vslide1down.vx  v1,v1,a1
        vslide1down.vx  v1,v1,a1
        vslide1down.vx  v1,v1,a1
        vslide1down.vx  v1,v1,a1
        vslide1down.vx  v1,v1,a1
        vse8.v  v1,0(a0)
        ret

After this patch:

vsetivli zero,16,e8,m1,ta,ma
vmv.v.x v1,a1
vmv.v.x v2,a2
vslideup.vi v1,v2,8
vse8.v v1,0(a0)
ret

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): New enum.
* config/riscv/riscv-v.cc
(rvv_builder::combine_sequence_use_slideup_profitable_p): New function.
(expand_vector_init_slideup_combine_sequence): Ditto.
(expand_vec_init): Add slideup combine optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add combine test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/combine-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-7.c: New test.

RISC-V: testsuite: Fix 32-bit FAILs.

This patch fixes several more FAILs that would only show in 32-bit runs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: Adjust.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/slp-mask-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c:
Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
Ditto.

vect: Look through pattern stmt in fold_left_reduction.

It appears as if we "look through" a statement pattern in
vect_finish_replace_stmt but not before when we replace the newly
created vector statement's lhs. Then the lhs is the statement pattern's
lhs while in vect_finish_replace_stmt we assert that it's from the
statement the pattern replaced.

This patch uses vect_orig_stmt on the scalar destination's definition so
the replaced statement is used everywhere.

gcc/ChangeLog:

PR tree-optimization/112464

* tree-vect-loop.cc (vectorize_fold_left_reduction): Use
vect_orig_stmt on scalar_dest_def_info.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112464.c: New test.

RISC-V: XTheadMemPair: Fix missing fcsr handling in ISR prologue/epilogue

The t0 register is used as a temporary register for interrupts, so it needs
special treatment. It is necessary to avoid using "th.ldd" in the interrupt
program to stop the subsequent operation of the t0 register, so they need to
exchange positions in the function "riscv_for_each_saved_reg".

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the interrupt
operation before the XTheadMemPair.

tree-optimization/110221 - SLP and loop mask/len

The following fixes the issue that when SLP stmts are internal defs
but appear invariant because they end up only using invariant defs
then they get scheduled outside of the loop. This nice optimization
breaks down when loop masks or lens are applied since those are not
explicitly tracked as dependences. The following makes sure to never
schedule internal defs outside of the vectorized loop when the
loop uses masks/lens.

PR tree-optimization/110221
* tree-vect-slp.cc (vect_schedule_slp_node): When loop
masking / len is applied make sure to not schedule
intenal defs outside of the loop.

* gfortran.dg/pr110221.f: New testcase.

vect: Don't set excess bits in unform masks

AVX ignores any excess bits in the mask (at least for vector sizes >=8), but
AMD GCN magically uses a larger vector than was intended (the smaller sizes are
"fake"), leading to wrong-code.

This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c,
gfortran.dg/c-interop/contiguous-1.f90,
gfortran.dg/c-interop/ff-descriptor-7.f90, and others.

gcc/ChangeLog:

* expr.cc (store_constructor): Add "and" operation to uniform mask
generation.

amdgcn: Fix v_add constraints (pr112308)

The instruction doesn't allow "B" constants for the vop3b encoding (used when
the cc register isn't VCC), so fix the pattern and all the insns that might get
split to it post-reload.

Also switch to the new constraint format for ease of adding new alternatives.

gcc/ChangeLog:

PR target/112308
* config/gcn/gcn-valu.md (add<mode>3<exec_clobber>): Fix B constraint
and switch to the new format.
(add<mode>3_dup<exec_clobber>): Likewise.
(add<mode>3_vcc<exec_vcc>): Likewise.
(add<mode>3_vcc_dup<exec_vcc>): Likewise.
(add<mode>3_vcc_zext_dup): Likewise.
(add<mode>3_vcc_zext_dup_exec): Likewise.
(add<mode>3_vcc_zext_dup2): Likewise.
(add<mode>3_vcc_zext_dup2_exec): Likewise.

middle-end/112469 - fix missing converts in vec_cond_expr simplification

The following avoids type inconsistencies in .COND_op generated by
simplifications of VEC_COND_EXPRs.

PR middle-end/112469
* match.pd (cond ? op a : b -> .COND_op (cond, a, b)): Add
missing view_converts.

* gcc.dg/torture/pr112469.c: New testcase.

amdgcn: Fix vector min/max ICE

The DImode min/max instructions need a clobber that SImode does not, so
add the special case to the reduction expand code.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_expand_reduc_scalar): Add clobber to DImode
min/max instructions.

libgomp.texi: Update OpenMP 6.0-preview implementation-status list

libgomp/ChangeLog:

* libgomp.texi (OpenMP Impl. Status): Update for OpenMP TR12;
renamed section from TR11.

LoongArch: Fix instruction name typo in lsx_vreplgr2vr_<lsxfmt_f> template

gcc/ChangeLog:

* config/loongarch/lsx.md: Fix instruction name typo in
lsx_vreplgr2vr_<lsxfmt_f> template.

RISC-V: Robustify vec_init pattern[NFC]

Although current GCC didn't cause ICE when I create FP16 vec_init case
with -march=rv64gcv (no ZVFH), current vec_init pattern looks wrong.

Since V_VLS FP16 predicate is TARGET_VECTOR_ELEN_FP_16, wheras vec_init
needs vfslide1down/vfslide1up.

It makes more sense to robustify the vec_init patterns which split them
into 2 patterns (one is integer, the other is float) like other autovectorization patterns.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_init<mode><vel>): Split patterns.

Revert "RISC-V: Support vec_init for trailing same element"

This reverts commit e7f4040d9d6ec40c48ada940168885d7dde03af9 as
introduces some legacy vmv insns.

RISC-V: Support vec_init for trailing same element

This patch would like to support the vec_init for the trailing same
element in the array. For example as below

typedef double vnx16df __attribute__ ((vector_size (128)));

__attribute__ ((noipa)) void
f_vnx16df (double a, double b, double *out)
{
  vnx16df v = {a, a, a, b, b, b, b, b, b, b, b, b, b, b, b, b};
  *(vnx16df *) out = v;
}

Before this patch:
f_vnx16df:
  vsetivli        zero,16,e64,m8,ta,ma
  vfmv.v.f        v8,fa0
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vfslide1down.vf v8,v8,fa1
  vs8r.v  v8,0(a0)
  ret

After this patch:
f_vnx16df:
  vsetivli      zero,16,e64,m8,ta,ma
  vfmv.v.f      v16,fa1
  vfslide1up.vf v8,v16,fa0
  vmv8r.v       v16,v8
  vfslide1up.vf v8,v16,fa0
  vmv8r.v       v16,v8
  vfslide1up.vf v8,v16,fa0
  vs8r.v        v8,0(a0)
  ret

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vector_init_trailing_same_elem):
New fun impl to expand the insn when trailing same elements.
(expand_vec_init): Try trailing same elements when vec_init.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

[PATCH v3] libiberty: Use posix_spawn in pex-unix when available.

Hi,

This patch implements pex_unix_exec_child using posix_spawn when
available.

This should especially benefit recent macOS (where vfork just calls
fork), but should have equivalent or faster performance on all
platforms.
In addition, the implementation is substantially simpler than the
vfork+exec code path.

Tested on x86_64-linux.

v2: Fix error handling (previously the function would be run twice in
case of error), and don't use a macro that changes control flow.

v3: Match file style for error-handling blocks, don't close
in/out/errdes on error, and check close() for errors.

libiberty/
* configure.ac (AC_CHECK_HEADERS): Add spawn.h.
(checkfuncs): Add posix_spawn, posix_spawnp.
(AC_CHECK_FUNCS): Add posix_spawn, posix_spawnp.
* aclocal.m4, configure, config.in: Rebuild.
* pex-unix.c [HAVE_POSIX_SPAWN] (pex_unix_exec_child): New function.

test: Fix FAIL of pr97428.c for RVV

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97428.c: Add additional compile option for riscv.

RISC-V: Move cond_copysign from combine pattern to autovec pattern

Since cond_copysign has been support into match.pd (middle-end).
We don't need to support conditional copysign by RTL combine pass.

Instead, we can support it by direct explicit cond_copysign optab.

conditional copysign tests are already available in the testsuite.
No need to add tests.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_copysign<mode>): Remove.
* config/riscv/autovec.md (cond_copysign<mode>): New pattern.

Internal-fn: Add FLOATN support for l/ll round and rint [PR/112432]

The defined DEF_EXT_LIB_FLOATN_NX_BUILTINS functions should also
have DEF_INTERNAL_FLT_FLOATN_FN instead of DEF_INTERNAL_FLT_FN for
the FLOATN support. According to the glibc API and gcc builtin, we
have below table for the FLOATN is supported or not.

+---------+-------+-------------------------------------+
|         | glibc | gcc: DEF_EXT_LIB_FLOATN_NX_BUILTINS |
+---------+-------+-------------------------------------+
| iceil   | N     | N                                   |
| ifloor  | N     | N                                   |
| irint   | N     | N                                   |
| iround  | N     | N                                   |
| lceil   | N     | N                                   |
| lfloor  | N     | N                                   |
| lrint   | Y     | Y                                   |
| lround  | Y     | Y                                   |
| llceil  | N     | N                                   |
| llfllor | N     | N                                   |
| llrint  | Y     | Y                                   |
| llround | Y     | Y                                   |
+---------+-------+-------------------------------------+

This patch would like to support FLOATN for:
1. lrint
2. lround
3. llrint
4. llround

The below tests are passed within this patch:
1. x86 bootstrap and regression test.
2. aarch64 regression test.
3. riscv regression tests.

PR target/112432

gcc/ChangeLog:

* internal-fn.def (LRINT): Add FLOATN support.
(LROUND): Ditto.
(LLRINT): Ditto.
(LLROUND): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

[committed] Improve single bit zero extraction on H8.

When zero extracting a single bit bitfield from bits 16..31 on the H8 we
currently generate some pretty bad code.

The fundamental issue is we can't shift efficiently and there's no trivial way
to extract a single bit out of the high half word of an SImode value.

What usually happens is we use a synthesized right shift to get the single bit
into the desired position, then a bit-and to mask off everything we don't care
about.

The shifts are expensive, even using tricks like half and quarter word moves to
implement shift-by-16 and shift-by-8.  Additionally a logical right shift must
clear out the upper bits which is redundant since we're going to mask things
with &1 later.

This patch provides a consistently better sequence for such extractions.  The
general form moves the high half into the low half, a bit extraction into C,
clear the destination, then move C into the destination with a few special
cases.

This also avoids all the shenanigans for H8/SX which has a much more capable
shifter.  It's not single cycle, but it is reasonably efficient.

This has been regression tested on the H8 without issues.  Pushing to the trunk
momentarily.

jeff

ps.  Yes, supporting zero extraction of multi-bit fields might be improvable as
well.  But I've already spent more time on this than I can reasonably justify.

gcc/
* config/h8300/combiner.md (single bit sign_extract): Avoid recently
added patterns for H8/SX.
(single bit zero_extract): New patterns.

Fix wrong code due to vec_merge + pcmp to blendvb splitter.

gcc/ChangeLog:

PR target/112443
* config/i386/sse.md (*avx2_pcmp<mode>3_4): Fix swap condition
from LT to GT since there's not in the pattern.
(*avx2_pcmp<mode>3_5): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr112443.C: New test.

bpf: fix pseudo-c asm emitted for *mulsidi3_zeroextend

This patch fixes the pseudo-c BPF assembly syntax used for
*mulsidi3_zeroextend, which was being emitted as:

rN *= wM

instead of the proper way to denote a mul32 in pseudo-C syntax:

wN *= wM

Includes test.
Tested in bpf-unknown-none-gcc target in x86_64-linux-gnu host.

gcc/ChangeLog:

* config/bpf/bpf.cc (bpf_print_register): Accept modifier code 'W'
to force emitting register names using the wN form.
* config/bpf/bpf.md (*mulsidi3_zeroextend): Force operands to
always use wN written form in pseudo-C assembly syntax.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c: New test.

bpf: testsuite: fix expected regexp in gcc.target/bpf/ldxdw.c

gcc/testsuite/ChangeLog:

* gcc.target/bpf/ldxdw.c: Fix regexp with expected result.

libstdc++: mark 20_util/scoped_allocator/noexcept.cc R-E-T hosted

libstdc++-v3/ChangeLog:

* testsuite/20_util/scoped_allocator/noexcept.cc: Mark as
requiring hosted.

libstdc++: declare std::allocator in !HOSTED as an extension

This allows us to add features to freestanding which allow specifying
non-default allocators (generators, collections, ...) without having to
modify them.

libstdc++-v3/ChangeLog:

* include/bits/memoryfwd.h: Remove HOSTED check around allocator
and its specializations.

diagnostics: cleanups to diagnostic-show-locus.cc

Reduce implicit usage of line_table global, and move source printing to
within diagnostic_context.

gcc/ChangeLog:
* diagnostic-show-locus.cc (layout::m_line_table): New field.
(compatible_locations_p): Convert to...
(layout::compatible_locations_p): ...this, replacing uses of
line_table global with m_line_table.
(layout::layout): Convert "richloc" param from a pointer to a
const reference.  Initialize m_line_table member.
(layout::maybe_add_location_range):  Replace uses of line_table
global with m_line_table.  Pass the latter to
linemap_client_expand_location_to_spelling_point.
(layout::print_leading_fixits): Pass m_line_table to
affects_line_p.
(layout::print_trailing_fixits): Likewise.
(gcc_rich_location::add_location_if_nearby): Update for change
to layout ctor params.
(diagnostic_show_locus): Convert to...
(diagnostic_context::maybe_show_locus): ...this, converting
richloc param from a pointer to a const reference.  Make "loc"
const.  Split out printing part of function to...
(diagnostic_context::show_locus): ...this.
(selftest::test_offset_impl): Update for change to layout ctor
params.
(selftest::test_layout_x_offset_display_utf8): Likewise.
(selftest::test_layout_x_offset_display_tab): Likewise.
(selftest::test_tab_expansion): Likewise.
* diagnostic.h (diagnostic_context::maybe_show_locus): New decl.
(diagnostic_context::show_locus): New decl.
(diagnostic_show_locus): Convert from a decl to an inline function.
* gdbinit.in (break-on-diagnostic): Update from a breakpoint
on diagnostic_show_locus to one on
diagnostic_context::maybe_show_locus.
* genmatch.cc (linemap_client_expand_location_to_spelling_point):
Add "set" param and use it in place of line_table global.
* input.cc (expand_location_1): Likewise.
(expand_location): Update for new param of expand_location_1.
(expand_location_to_spelling_point): Likewise.
(linemap_client_expand_location_to_spelling_point): Add "set"
param and use it in place of line_table global.
* tree-diagnostic-path.cc (event_range::print): Pass line_table
for new param of linemap_client_expand_location_to_spelling_point.

libcpp/ChangeLog:
* include/line-map.h (rich_location::get_expanded_location): Make
const.
(rich_location::get_line_table): New accessor.
(rich_location::m_line_table): Make the pointer be const.
(rich_location::m_have_expanded_location): Make mutable.
(rich_location::m_expanded_location): Likewise.
(fixit_hint::affects_line_p): Add const line_maps * param.
(linemap_client_expand_location_to_spelling_point): Likewise.
* line-map.cc (rich_location::get_expanded_location): Make const.
Pass m_line_table to
linemap_client_expand_location_to_spelling_point.
(rich_location::maybe_add_fixit): Likewise.
(fixit_hint::affects_line_p): Add set param and pass to
linemap_client_expand_location_to_spelling_point.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Add missing declaration of get_restrict in C++ interface

gcc/jit/ChangeLog:

* libgccjit++.h:

MAINTAINERS: Add myself to write after approval

Signed-off-by: Jeff Law <jeffreyalaw@gmail.com>
ChangeLog:

* MAINTAINERS: Add myself.

libstdc++: Fix forwarding in __take/drop_of_repeat_view [PR112453]

We need to respect the value category of the repeat_view passed to these
two functions when accessing the view's _M_value member. This revealed
that the space-efficient partial specialization of __box lacks && overloads
of operator* to match those of the primary template (inherited from
std::optional).

PR libstdc++/112453

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::__box<_Tp>::operator*): Define
&& overloads as well.
(__detail::__take_of_repeat_view): Forward __r when accessing
its _M_value member.
(__detail::__drop_of_repeat_view): Likewise.
* testsuite/std/ranges/repeat/1.cc (test07): New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

RISC-V/testsuite: Fix several zvfh tests.

This fixes some zvfh test oversights as well as adds zfh to the target
requirements. It's not strictly necessary to have zfh but it greatly
simplifies test handling when we can just calculate the reference value
instead of working around it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh-1.c: Adjust.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-3.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-4.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-3.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-4.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/reduc/reduc_zvfh-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/reduc/reduc_zvfh_run-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_zvfh-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_zvfh-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_zvfh-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_zvfh-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_zvfh-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_zvfh-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_zvfh_run-2.c: New test.

i386:  Improve stack protector patterns and peephole2s even more

Improve stack protector patterns and peephole2s even more:

a. Use unrelated register clears with integer mode size <= word
   mode size to clear stack protector scratch register.

b. Use unrelated register initializations in front of stack
   protector sequence to clear stack protector scratch register.

c. Use unrelated register initializations using LEA instructions
   to clear stack protector scratch register.

These stack protector improvements reuse 6914 unrelated register
initializations to substitute the clear of stack protector scratch
register in 12034 instances of stack protector sequence in recent linux
defconfig build.

gcc/ChangeLog:

* config/i386/i386.md (@stack_protect_set_1_<PTR:mode>_<W:mode>):
Use W mode iterator instead of SWI48.  Output MOV instead of XOR
for TARGET_USE_MOV0.
(stack_protect_set_1 peephole2): Use integer modes with
mode size <= word mode size for operand 3.
(stack_protect_set_1 peephole2 #2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization, originally in front of stack
protector sequence.
(*stack_protect_set_3_<PTR:mode>_<SWI48:mode>): New insn pattern.
(stack_protect_set_1 peephole2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization involving LEA instruction.

[IRA]: Fixing conflict calculation from region landing pads.

The following patch fixes conflict calculation from exception landing
pads. The previous patch processed only one newly created landing pad.
Besides it was wrong, it also resulted in large memory consumption by IRA.

gcc/ChangeLog:

PR rtl-optimization/110215
* ira-lives.cc: (add_conflict_from_region_landing_pads): New
function.
(process_bb_node_lives): Use it.

libstdc++: [_Hashtable] Use RAII type to manage rehash functor state

Replace usage of __try/__catch with a RAII type to restore rehash functor
state when needed.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_RehashStateGuard): New.
(_Insert_base<>::_M_insert_range(_IIt, _IIt, const _NodeGet&, false_type)):
Adapt.
* include/bits/hashtable.h (__rehash_guard_t): New.
(__rehash_state): Remove.
(_M_rehash): Remove.
(_M_rehash_aux): Rename into _M_rehash.
(_M_assign_elements, _M_insert_unique_node, _M_insert_multi_node): Adapt.
(rehash): Adapt.

i386 PIE: accept @GOTOFF in load/store multi base address

Looking at the code generated for sse2-{load,store}-multi.c with PIE,
I realized we could use UNSPEC_GOTOFF as a base address, and that this
would enable the test to use the vector insns expected by the tests
even with PIC, so I extended the base + offset logic used by the SSE2
multi-load/store peepholes to accept reg + symbolic base + offset too,
so that the test generated the expected insns even with PIE.

for gcc/ChangeLog

* config/i386/i386.cc (symbolic_base_address_p,
base_address_p): New, factored out from...
(extract_base_offset_in_addr): ... here and extended to
recognize REG+GOTOFF, as in gcc.target/i386/sse2-load-multi.c
and sse2-store-multi.c with PIE enabled by default.

testsuite: xfail scev-[35].c on ia32

These gimplefe tests never got the desired optimization on ia32, but
they only started visibly failing when the representation of MEMs in
dumps changed from printing 'symbol: a' to '&a'.

The transformation is not considered profitable on ia32, that's why it
doesn't take place. Maybe that's a bug in itself, but it's not a
regression, and not something to be noisy about.

for gcc/testsuite/ChangeLog

* gcc.dg/tree-ssa/scev-3.c: xfail on ia32.
* gcc.dg/tree-ssa/scev-5.c: Likewise.

AArch64: Add SVE implementation for cond_copysign.

This adds an implementation for masked copysign along with an optimized
pattern for masked copysign (x, -1).

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64-sve.md (cond_copysign<mode>): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/sve/fneg-abs_5.c: New test.

AArch64: Handle copysign (x, -1) expansion efficiently

copysign (x, -1) is effectively fneg (abs (x)) which on AArch64 can be
most efficiently done by doing an OR of the signbit.

The middle-end will optimize fneg (abs (x)) now to copysign as the
canonical form and so this optimizes the expansion.

If the target has an inclusive-OR that takes an immediate, then the transformed
instruction is both shorter and faster. For those that don't, the immediate
has to be separately constructed, but this still ends up being faster as the
immediate construction is not on the critical path.

Note that this is part of another patch series, the additional testcases
are mutually dependent on the match.pd patch. As such the tests are added
there insteadof here.

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64.md (copysign<GPF:mode>3): Handle
copysign (x, -1).
* config/aarch64/aarch64-simd.md (copysign<mode>3): Likewise.
* config/aarch64/aarch64-sve.md (copysign<mode>3): Likewise.

AArch64: Use SVE unpredicated LOGICAL expressions when Advanced SIMD inefficient [PR109154]

SVE has much bigger immediate encoding range for bitmasks than Advanced SIMD has
and so on a system that is SVE capable if we need an Advanced SIMD Inclusive-OR
by immediate and would require a reload then use an unpredicated SVE ORR instead.

This has both speed and size improvements.

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64.md (<optab><mode>3): Add SVE split case.
* config/aarch64/aarch64-simd.md (ior<mode>3<vczle><vczbe>): Likewise.
* config/aarch64/predicates.md(aarch64_orr_imm_sve_advsimd): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/sve/fneg-abs_1.c: Updated.
* gcc.target/aarch64/sve/fneg-abs_2.c: Updated.
* gcc.target/aarch64/sve/fneg-abs_4.c: Updated.

AArch64: Add movi for 0 moves for scalar types [PR109154]

Following the Neoverse N/V and Cortex-A optimization guides SIMD 0 immediates
should be created with a movi of 0.

At the moment we generate an `fmov .., xzr` which is slower and requires a
GP -> FP transfer.

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64.md (*mov<mode>_aarch64, *movsi_aarch64,
*movdi_aarch64): Add new w -> Z case.
* config/aarch64/iterators.md (Vbtype): Add QI and HI.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/fneg-abs_2.c: Updated.
* gcc.target/aarch64/fneg-abs_4.c: Updated.
* gcc.target/aarch64/dbl_mov_immediate_1.c: Updated.

AArch64: Add special patterns for creating DI scalar and vector constant 1 << 63 [PR109154]

This adds a way to generate special sequences for creation of constants for
which we don't have single instructions sequences which would have normally
lead to a GP -> FP transfer or a literal load.

The patch starts out by adding support for creating 1 << 63 using fneg (mov 0).

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64-protos.h (aarch64_simd_special_constant_p,
aarch64_maybe_generate_simd_constant): New.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VQMOV:mode>,
*aarch64_simd_mov<VDMOV:mode>): Add new coden for special constants.
* config/aarch64/aarch64.cc (aarch64_extract_vec_duplicate_wide_int):
Take optional mode.
(aarch64_simd_special_constant_p,
aarch64_maybe_generate_simd_constant): New.
* config/aarch64/aarch64.md (*movdi_aarch64): Add new codegen for
special constants.
* config/aarch64/constraints.md (Dx): new.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/fneg-abs_1.c: Updated.
* gcc.target/aarch64/fneg-abs_2.c: Updated.
* gcc.target/aarch64/fneg-abs_4.c: Updated.
* gcc.target/aarch64/dbl_mov_immediate_1.c: Updated.

ifcvt: Add support for conditional copysign

This adds a masked variant of copysign. Nothing very exciting just the
general machinery to define and use a new masked IFN.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note: This patch is part of a testseries and tests for it are added in the
AArch64 patch that adds supports for the optab.

gcc/ChangeLog:

PR tree-optimization/109154
* internal-fn.def (COPYSIGN): New.
* match.pd (UNCOND_BINARY, COND_BINARY): Map IFN_COPYSIGN to
IFN_COND_COPYSIGN.
* optabs.def (cond_copysign_optab, cond_len_copysign_optab): New.

middle-end: optimize fneg (fabs (x)) to copysign (x, -1) [PR109154]

This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
canonical and allows a target to expand this sequence efficiently. Such
sequences are common in scientific code working with gradients.

There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
which I remove since this is a less efficient form. The testsuite is also
updated in light of this.

gcc/ChangeLog:

PR tree-optimization/109154
* match.pd: Add new neg+abs rule, remove inverse copysign rule.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.dg/fold-copysign-1.c: Updated.
* gcc.dg/pr55152-2.c: Updated.
* gcc.dg/tree-ssa/abs-4.c: Updated.
* gcc.dg/tree-ssa/backprop-6.c: Updated.
* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
* gcc.target/aarch64/fneg-abs_1.c: New test.
* gcc.target/aarch64/fneg-abs_2.c: New test.
* gcc.target/aarch64/fneg-abs_3.c: New test.
* gcc.target/aarch64/fneg-abs_4.c: New test.
* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
* gcc.target/aarch64/sve/fneg-abs_4.c: New test.

middle-end: expand copysign handling from lockstep to nested iters

various optimizations in match.pd only happened on COPYSIGN in lock step
which means they exclude IFN_COPYSIGN. COPYSIGN however is restricted to only
the C99 builtins and so doesn't work for vectors.

The patch expands these optimizations to work as nested iters.

This is needed for the second patch which will add the testcase.

gcc/ChangeLog:

PR tree-optimization/109154
* match.pd: expand existing copysign optimizations.

Fix PR ada/111813 (Inconsistent limit in Ada.Calendar.Formatting)

The description of the second Value function (returning Duration) (ARM 9.6.1(87)
doesn't place any limitation on the Elapsed_Time parameter's value, beyond
"Constraint_Error is raised if the string is not formatted as described for Image, or
the function cannot interpret the given string as a Duration value".

It would seem reasonable that Value and Image should be consistent, in that any
string produced by Image should be accepted by Value. Since Image must produce
a two-digit representation of the Hours, there's an implication that its
Elapsed_Time parameter should be less than 100.0 hours (the ARM merely says
that in that case the result is implementation-defined).

The current implementation of Value raises Constraint_Error if the Elapsed_Time
parameter is greater than or equal to 24 hours.

This patch removes the restriction, so that the Elapsed_Time parameter must only
be less than 100.0 hours.

2023-10-15 Simon Wright <simon@pushface.org>

PR ada/111813
gcc/ada/
* libgnat/a-calfor.adb (Value (2)): Allow values of
parameter Elapsed_Time greater than or equal to 24 hours, by doing
the hour calculations in Natural rather than Hour_Number (0 ..
23). Calculate the result directly rather than by using Seconds_Of
(whose Hour parameter is of type Hour_Number).
If an exception occurs of type Constraint_Error, re-raise it
rather than raising a new CE.

gcc/testsuite/
* gnat.dg/calendar_format_value.adb: New test.

Do not prepend target triple to -fuse-ld=lld,mold.

lld and mold are platform-agnostic and not prefixed with target triple.
Prepending the target triple makes it less likely to find the intended
linker executable.

A potential breaking change is that we no longer try to search for
triple-prefixed lld/mold binaries anymore. However, since there doesn't
seem to be support to build LLVM or mold with triple-prefixed executable
names, it seems better to just not bother with that case.

PR driver/111605
* collect2.cc (main): Do not prepend target triple to
-fuse-ld=lld,mold.

Refactor x86 decl based scatter vectorization, prepare SLP

The following refactors the x86 decl based scatter vectorization
similar to what I did to the gather path.  This prepares scatters
for SLP as well, mainly single-lane since there are multiple
missing bits to support multi-lane scatters.

Tested extensively on the SLP-only branch which has the ability
to force SLP even for single lanes.

PR tree-optimization/111133
* tree-vect-stmts.cc (vect_build_scatter_store_calls):
Remove and refactor to ...
(vect_build_one_scatter_store_call): ... this new function.
(vectorizable_store): Use vect_check_scalar_mask to record
the SLP node for the mask operand.  Code generate scatters
with builtin decls from the main scatter vectorization
path and prepare that for SLP.
* tree-vect-slp.cc (vect_get_operand_map): Do not look
at the VDEF to decide between scatter or gather since that
doesn't work for patterns.  Use the LHS being an SSA_NAME
or not instead.

RISC-V: Refine frm emit after bb end in succ edges

This patch would like to fine the frm insn emit when we
meet abnormal edge in the loop. Conceptually, we only need
to emit once when abnormal instead of every iteration in
the loop.

This patch would like to fix this defect and only perform
insert_insn_end_basic_block when at least one succ edge is
abnormal.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_frm_emit_after_bb_end): Only
perform once emit when at least one succ edge is abnormal.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add PR112450 test to avoid regression

ICE has been fixed by Richard:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450.

Add test to avoid future regression. Committed.

PR target/112450

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112450.c: New test.

tree-optimization/112450 - avoid AVX512 style masking for BImode masks

The following avoids running into the AVX512 style masking code for
RVV which would theoretically be able to handle it if I were not
relying on integer mode maskness in vect_get_loop_mask. While that's
easy to fix (patch in PR), the preference is to not have AVX512 style
masking for RVV, thus the following.

* tree-vect-loop.cc (vect_verify_full_masking_avx512):
Check we have integer mode masks as required by
vect_get_loop_mask.

tree-optimization/112444 - avoid bougs PHI value-numbering

With .DEFERRED_INIT ssa_undefined_value_p () can return true for
values we did not visit (because they proved unreachable) but
are not .VN_TOP. Avoid using those as value which, because they
are not visited, are assumed to be defined outside of the region.

PR tree-optimization/112444
* tree-ssa-sccvn.cc (visit_phi): Avoid using not visited
defs as undefined vals.

* gcc.dg/torture/pr112444.c: New testcase.

MAINTAINERS: Update my email address

ChangeLog:

* MAINTAINERS: Update my email address.

MIPS: Use -mnan value for -mabs if not specified

On most hardware, FCSR.ABS2008 is set the value same with FCSR.NAN2008.
Let's use this behaivor by default in GCC, aka
gcc -mnan=2008 -c fabs.c
will imply `-mabs=2008`.

And of course, `gcc -mnan=2008 -mabs=legacy` can continue workable
like previous.

gcc/ChangeLog

* config/mips/mips.cc(mips_option_override): Set mips_abs to
2008, if mips_abs is default and mips_nan is 2008.

gcc/testsuite/
* gcc.target/mips/fabs-nan2008.c: New test.
* gcc.target/mips/fabsf-nan2008.c: New test.

i386: Fix C99 compatibility issues in the x86-64 AVX ABI test suite

gcc/testsuite/

* gcc.target/x86_64/abi/avx/avx-check.h (main): Call
__builtin_printf instead of printf.
* gcc.target/x86_64/abi/avx/test_passing_m256.c
(fun_check_passing_m256_8_values): Add missing void return
type.
* gcc.target/x86_64/abi/avx512f/avx512f-check.h (main): Call
__builtin_printf instead of printf.
* gcc.target/x86_64/abi/avx512f/test_passing_m512.c
(fun_check_passing_m512_8_values): Add missing void return
type.
* gcc.target/x86_64/abi/bf16/bf16-check.h (main): Call
__builtin_printf instead of printf.
* gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h (main):
Likewise.
* gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c
(fun_check_passing_m256bf16_8_values): Add missing void
return type.
* gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h (main):
Call __builtin_printf instead of printf.
* gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c
(fun_check_passing_m512bf16_8_values): Add missign void
return type.

c: Add -Wreturn-mismatch warning, split from -Wreturn-type

The existing -Wreturn-type option covers both constraint violations
(which are mandatory to diagnose) and warnings that have known
false positives.  The new -Wreturn-mismatch warning is only about
the constraint violations (missing or extra return expressions),
and should eventually be turned into a permerror.

The -std=gnu89 test cases show that by default, we do not warn for
return; in a function not returning void.  This matches previous
practice for -Wreturn-type.

gcc/c-family/

* c.opt (Wreturn-mismatch): New.

gcc/c/

* c-typeck.cc (c_finish_return): Use pedwarn with
OPT_Wreturn_mismatch for missing/extra return expressions.

gcc/

* doc/invoke.texi (Warning Options): Document
-Wreturn-mismatch.  Update -Wreturn-type documentation.

gcc/testsuite/

* gcc.dg/Wreturn-mismatch-1.c: New.
* gcc.dg/Wreturn-mismatch-2.c: New.
* gcc.dg/Wreturn-mismatch-3.c: New.
* gcc.dg/Wreturn-mismatch-4.c: New.
* gcc.dg/Wreturn-mismatch-5.c: New.
* gcc.dg/Wreturn-mismatch-6.c: New.
* gcc.dg/noncompile/pr55976-1.c: Change -Werror=return-type
to -Werror=return-mismatch.
* gcc.dg/noncompile/pr55976-2.c: Change -Wreturn-type
to -Wreturn-mismatch.

gcc.dg/Wmissing-parameter-type*: Test the intended warning

gcc/testsuite/ChangeLog:

* gcc.dg/Wmissing-parameter-type.c: Build with -std=gnu89
to trigger the -Wmissing-parameter-type warning
and not the default -Wimplicit warning. Also match
against -Wmissing-parameter-type.
* gcc.dg/Wmissing-parameter-type-Wextra.c: Likewise.

s390: Revise vector reverse elements

Replace UNSPEC_VEC_ELTSWAP with a vec_select implementation.

Furthermore, for a vector reverse elements operation between registers
of mode V8HI perform three rotates instead of a vperm operation since
the latter involves loading the permutation vector from the literal
pool.

Prior z15, instead of
  larl + vl + vl + vperm
prefer
  vl + vpdi (+ verllg (+ verllf))
for a load operation.

Likewise, prior z15, instead of
  larl + vl + vperm + vst
prefer
  vpdi (+ verllg (+ verllf)) + vst
for a store operation.

gcc/ChangeLog:

* config/s390/s390.md: Remove UNSPEC_VEC_ELTSWAP.
* config/s390/vector.md (eltswapv16qi): New expander.
(*eltswapv16qi): New insn and splitter.
(eltswapv8hi): New insn and splitter.
(eltswap<mode>): New insn and splitter for modes V_HW_4 as well
as V_HW_2.
* config/s390/vx-builtins.md (eltswap<mode>): Remove.
(*eltswapv16qi): Remove.
(*eltswap<mode>): Remove.
(*eltswap<mode>_emu): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/vec-reve-load-halfword-z14.c: Remove
vperm and substitude by vpdi et al.
* gcc.target/s390/zvector/vec-reve-load-halfword.c: Likewise.
* gcc.target/s390/vector/reverse-elements-1.c: New test.
* gcc.target/s390/vector/reverse-elements-2.c: New test.
* gcc.target/s390/vector/reverse-elements-3.c: New test.
* gcc.target/s390/vector/reverse-elements-4.c: New test.
* gcc.target/s390/vector/reverse-elements-5.c: New test.
* gcc.target/s390/vector/reverse-elements-6.c: New test.
* gcc.target/s390/vector/reverse-elements-7.c: New test.

s390: Add expand_perm_reverse_elements

Replace expand_perm_with_rot, expand_perm_with_vster, and
expand_perm_with_vstbrq with a general implementation
expand_perm_reverse_elements.

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_rot): Remove.
(expand_perm_reverse_elements): New.
(expand_perm_with_vster): Remove.
(expand_perm_with_vstbrq): Remove.
(vectorize_vec_perm_const_1): Replace removed functions with new
one.

s390: Recognize further vpdi and vmr{l,h} pattern

Deal with cases where vpdi and vmr{l,h} are still applicable if the
operands of those instructions are swapped.  For example, currently for

V2DI foo (V2DI x)
{
  return (V2DI) {x[1], x[0]};
}

the assembler sequence

vlgvg   %r1,%v24,1
vzero   %v0
vlvgg   %v0,%r1,0
vmrhg   %v24,%v0,%v24

is emitted.  With this patch a single vpdi is emitted.

Extensive tests are included in a subsequent patch of this series where
more cases are covered.

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_merge): Deal with cases
where vmr{l,h} are still applicable if the operands are swapped.
(expand_perm_with_vpdi): Likewise for vpdi.

s390: Reduce number of patterns where the condition is false anyway

For patterns which make use of two modes, do not build the cross product
and then exclude illegal combinations via conditions but rather do not
create those in the first place. Here we are following the idea of the
attribute TOINTVEC/tointvec and introduce TOINT/toint.

gcc/ChangeLog:

* config/s390/s390.md (VX_CONV_INT): Remove iterator.
(gf): Add float mappings.
(TOINT, toint): New attribute.
(*fixuns_trunc<VX_CONV_BFP:mode><VX_CONV_INT:mode>2_z13):
Remove.
(*fixuns_trunc<mode><toint>2_z13): Add.
(*fix_trunc<VX_CONV_BFP:mode><VX_CONV_INT:mode>2_bfp_z13):
Remove.
(*fix_trunc<mode><toint>2_bfp_z13): Add.
(*floatuns<VX_CONV_INT:mode><VX_CONV_BFP:mode>2_z13): Remove.
(*floatuns<toint><mode>2_z13): Add.
* config/s390/vector.md (VX_VEC_CONV_INT): Remove iterator.
(float<VX_VEC_CONV_INT:mode><VX_VEC_CONV_BFP:mode>2): Remove.
(float<tointvec><mode>2): Add.
(floatuns<VX_VEC_CONV_INT:mode><VX_VEC_CONV_BFP:mode>2): Remove.
(floatuns<tointvec><mode>2): Add.
(fix_trunc<VX_VEC_CONV_BFP:mode><VX_VEC_CONV_INT:mode>2):
Remove.
(fix_trunc<mode><tointvec>2): Add.
(fixuns_trunc<VX_VEC_CONV_BFP:mode><VX_VEC_CONV_INT:mode>2):
Remove.
(fixuns_trunc<VX_VEC_CONV_BFP:mode><tointvec>2): Add.

libgcc: Add {unsigned ,}__int128 <-> _Decimal{32,64,128} conversion support [PR65833]

The following patch adds the missing
{unsigned ,}__int128 <-> _Decimal{32,64,128}
conversion support into libgcc.a on top of the _BitInt support
(doing it without that would be larger amount of code and I hope all
the targets which support __int128 will eventually support _BitInt,
after all it is a required part of C23) and because it is in libgcc.a
only, it doesn't hurt that much if it is added for some architectures
only in GCC 15.
Initially I thought about doing this on the compiler side, but doing
it on the library side seems to be easier and more -Os friendly.
The tests currently require bitint effective target, that can be
removed when all the int128 targets support bitint.

2023-11-09 Jakub Jelinek <jakub@redhat.com>

PR libgcc/65833
libgcc/
* config/t-softfp (softfp_bid_list): Add
{U,}TItype <-> _Decimal{32,64,128} conversions.
* soft-fp/floattisd.c: New file.
* soft-fp/floattidd.c: New file.
* soft-fp/floattitd.c: New file.
* soft-fp/floatuntisd.c: New file.
* soft-fp/floatuntidd.c: New file.
* soft-fp/floatuntitd.c: New file.
* soft-fp/fixsdti.c: New file.
* soft-fp/fixddti.c: New file.
* soft-fp/fixtdti.c: New file.
* soft-fp/fixunssdti.c: New file.
* soft-fp/fixunsddti.c: New file.
* soft-fp/fixunstdti.c: New file.
gcc/testsuite/
* gcc.dg/dfp/int128-1.c: New test.
* gcc.dg/dfp/int128-2.c: New test.
* gcc.dg/dfp/int128-3.c: New test.
* gcc.dg/dfp/int128-4.c: New test.

attribs: Fix ICE with -Wno-attributes= [PR112339]

The following testcase ICEs, because with -Wno-attributes=foo::no_sanitize
(but generally any other non-gnu namespace and some gnu well known attribute
name within that other namespace) the FEs don't really parse attribute
arguments of such attribute, but lookup_attribute_spec is non-NULL with
NULL handler and such attributes are added to DECL_ATTRIBUTES or
TYPE_ATTRIBUTES and then when e.g. middle-end does lookup_attribute
on a particular attribute and expects the attribute to mean something
and/or have a particular verified arguments, it can crash when seeing
the foreign attribute in there instead.

The following patch fixes that by never adding ignored attributes
to DECL_ATTRIBUTES/TYPE_ATTRIBUTES, previously that was the case just
for attributes in ignored namespace (where lookup_attribute_space
returned NULL).  We don't really know anything about those attributes,
so shouldn't pretend we know something about them, especially when
the arguments are error_mark_node or NULL instead of something that
would have been parsed.  And it would be really weird if we normally
ignore say [[clang::unused]] attribute, but when people use
-Wno-attributes=clang::unused we actually treated it as gnu::unused.
All the user asked for is suppress warnings about that attribute being
unknown.

The first hunk is just playing safe, I'm worried people could
-Wno-attributes=gnu::
and get various crashes with known GNU attributes not being actually
parsed and recorded (or worse e.g. when we tweak standard attributes
into GNU attributes and we wouldn't add those).
The -Wno-attributes= documentation says that it suppresses warning about
unknown attributes, so I think -Wno-attributes=gnu:: should prevent
warning about say [[gnu::foobarbaz]] attribute, but not about
[[gnu::unused]] because the latter is a known attribute.
The routine would return true for any scoped attribute in the ignored
namespace, with the change it ignores only unknown attributes in ignored
namespace, known ones in there will be ignored only if they have
max_length of -2 (e.g.. with
-Wno-attributes=gnu:: -Wno-attributes=gnu::foobarbaz).

2023-11-09  Jakub Jelinek  <jakub@redhat.com>

PR c/112339
* attribs.cc (attribute_ignored_p): Only return true for
attr_namespace_ignored_p if as is NULL.
(decl_attributes): Never add ignored attributes.

* c-c++-common/ubsan/Wno-attributes-1.c: New test.

RISC-V: Fix the illegal operands for the XTheadMemidx extension.

The pattern "*extend<SHORT:mode><SUPERQI:mode>2_bitmanip" and
"*zero_extendhi<GPR:mode>2_bitmanip" in bitmanip.md are similar
to the pattern "*th_memidx_bb_extendqi<SUPERQI:mode>2" and
"*th_memidx_bb_zero_extendhi<GPR:mode>2" in thead.md, which will
cause the wrong instruction to be generated and report the
following error in binutils:
Assembler messages:
Error: illegal operands `lb a5,(a0),1,0'

In fact, the correct instruction is "th.lbia a5,(a0),1,0".

gcc/ChangeLog:

* config/riscv/bitmanip.md: Avoid the conflict between
zbb and xtheadmemidx in patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-uindex-zbb.c: New test.

Fix SIMD clone SLP a bit more

The following fixes an omission, mangling the non-SLP and SLP
simd-clone info.

* tree-vect-stmts.cc (vectorizable_simd_clone_call): Record
to the correct simd_clone_info.

libstdc++: [_Hashtable] Use RAII type to guard node while constructing value

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h
(struct _NodePtrGuard<_HashtableAlloc, _NodePtr>): New.
(_ReuseAllocNode::operator()(_Args&&...)): Use latter to guard allocated node
pointer while constructing in place the value_type instance.

RISC-V: Fix dynamic LMUL cost model ICE

When trying to use dynamic LMUL to compile benchmark.
Notice there is a bunch ICEs.

This patch fixes those ICEs and append tests.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::preferred_new_lmul_p): Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-ice-3.c: New test.

libstdc++: optimize bit iterators assuming normalization [PR110807]

The representation of bit iterators, using a pointer into an array of
words, and an unsigned bit offset into that word, makes for some
optimization challenges: because the compiler doesn't know that the
offset is always in a certain narrow range, beginning at zero and
ending before the word bitwidth, when a function loads an offset that
it hasn't normalized itself, it may fail to derive certain reasonable
conclusions, even to the point of retaining useless calls that elicit
incorrect warnings.

Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
list to a global bit vector variable.  Based on the compile-time
constant length of the list, we decide in _M_insert_range whether to
use the existing storage or to allocate new storage for the vector.
After allocation, we decide in _M_copy_aligned how to copy any
preexisting portions of the vector to the newly-allocated storage.
When copying two or more words, we use __builtin_memmove.

However, because we compute the available room using bit offsets
without range information, even comparing them with constants, we fail
to infer ranges for the preexisting vector depending on word size, and
may thus retain the memmove call despite knowing we've only allocated
one word.

Other parts of the compiler then detect the mismatch between the
constant allocation size and the much larger range that could
theoretically be copied into the newly-allocated storage if we could
reach the call.

Ensuring the compiler is aware of the constraints on the offset range
enables it to do a much better job at optimizing.  Using attribute
assume (_M_offset <= ...) didn't work, because gimple lowered that to
something that vrp could only use to ensure 'this' was non-NULL.
Exposing _M_offset as an automatic variable/gimple register outside
the unevaluated assume operand enabled the optimizer to do its job.

Rather than placing such load-then-assume constructs all over, I
introduced an always-inline member function in bit iterators that does
the job of conveying to the compiler the information that the
assumption is supposed to hold, and various calls throughout functions
pertaining to bit iterators that might not otherwise know that the
offsets have to be in range, so that the compiler no longer needs to
make conservative assumptions that prevent optimizations.

With the explicit assumptions, the compiler can correlate the test for
available storage in the vector with the test for how much storage
might need to be copied, and determine that, if we're not asking for
enough room for two or more words, we can omit entirely the code to
copy two or more words, without any runtime overhead whatsoever: no
traces remain of the undefined behavior or of the tests that inform
the compiler about the assumptions that must hold.

for  libstdc++-v3/ChangeLog

PR libstdc++/110807
* include/bits/stl_bvector.h (_Bit_iterator_base): Add
_M_assume_normalized member function.  Call it in _M_bump_up,
_M_bump_down, _M_incr, operator==, operator<=>, operator<, and
operator-.
(_Bit_iterator): Also call it in operator*.
(_Bit_const_iterator): Likewise.

testsuite: adjust gomp test for x86 -m32

declare-target-3.C expects .quad for entries in offload_var_table, but
the entries are pointer-wide, so 32-bit targets use .long instead.
Accept both.

for gcc/testsuite/ChangeLog

* g++.dg/gomp/declare-target-3.C: Adjust for 32-bit targets.

testsuite: force PIC/PIE off for pr58245-1.C

This test expects a single mention of stack_chk_fail, as part of a
call sequence, but when e.g. PIE is enabled by default, we output
.hidden stack_chk_fail_local, which makes for a count mismatch.

Disable PIC/PIE so as to not depend on the configurable default.

for gcc/testsuite/ChangeLog

* g++.dg/pr58245-1.C: Disable PIC/PIE.

skip debug stmts when assigning locus discriminators

c-c++-common/goacc/kernels-loop-g.c has been failing (compare-debug)
on i686-linux-gnu since r13-3172, because the implementation enabled
debug stmts to cause discriminators to be assigned differently, and
the discriminators are printed in the .gkd dumps that -fcompare-debug
compares.

This patch prevents debug stmts from affecting the discriminators in
nondebug stmts, but enables debug stmts to get discriminators just as
nondebug stmts would if their line numbers match.

I suppose we could arrange for discriminators to be omitted from the
-fcompare-debug dumps, but keeping discriminators in sync is probably
good to avoid other potential sources of divergence between debug and
nondebug.

for gcc/ChangeLog

* tree-cfg.cc (assign_discriminators): Handle debug stmts.

RISC-V: Fix dynamic tests [NFC]

This patch just adapt dynamic LMUL tests for following preparing patches.

Committed.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-9.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/no-dynamic-lmul-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: Run all tests.

Daily bump.

i386: Apply LRA reload workaround to insns with high registers [PR82524]

LRA is not able to reload zero_extracted in-out operand with matched input
operand in the same way as strict_low_part in-out operand.  The patch
applies the strict_low_part workaround, where we allow LRA to generate
an instruction with non-matched input operand, which is split post reload
to the instruction that inserts non-matched input operand to an in-out
operand and the instruction that uses matched operand, also to
zero_extracted in-out operand case.

The generated code from the pr82524.c testcase improves from:

movl    %esi, %ecx
movl    %edi, %eax
movsbl  %ch, %esi
addl    %esi, %edx
movb    %dl, %ah

to:
movl    %edi, %eax
movl    %esi, %ecx
movb    %ch, %ah
addb    %dl, %ah

The compiler is now also able to handle non-commutative operations:

movl    %edi, %eax
movl    %esi, %ecx
movb    %ch, %ah
subb    %dl, %ah

and unary operations:

movl    %edi, %eax
movl    %esi, %edx
movb    %dh, %ah
negb    %ah

The patch also robustifies split condition of the splitters to ensure that
only alternatives with unmatched operands are split.

PR target/82524

gcc/ChangeLog:

* config/i386/i386.md (*add<mode>_1_slp):
Split insn only for unmatched operand 0.
(*sub<mode>_1_slp): Ditto.
(*<any_logic:code><mode>_1_slp): Merge pattern from "*and<mode>_1_slp"
and "*<any_logic:code><mode>_1_slp" using any_logic code iterator.
Split insn only for unmatched operand 0.
(*neg<mode>1_slp): Split insn only for unmatched operand 0.
(*one_cmpl<mode>_1_slp): Ditto.
(*ashl<mode>3_1_slp): Ditto.
(*<any_shiftrt:insn><mode>_1_slp): Ditto.
(*<any_rotate:insn><mode>_1_slp): Ditto.
(*addqi_ext<mode>_1): Redefine as define_insn_and_split.  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*<plusminus:insn>qi_ext<mode>_2): Merge pattern from
"*addqi_ext<mode>_2" and "*subqi_ext<mode>_2" using plusminus code
iterator. Redefine as define_insn_and_split.  Add alternative 1
and split insn after reload for unmatched operand 0.
(*subqi_ext<mode>_1): Redefine as define_insn_and_split.  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*<any_logic:code>qi_ext<mode>_0): Merge pattern from
"*andqi_ext<mode>_0" and and "*<any_logic:code>qi_ext<mode>_0" using
any_logic code iterator.
(*<any_logic:code>qi_ext<mode>_1): Merge pattern from
"*andqi_ext<mode>_1" and "*<any_logic:code>qi_ext<mode>_1" using
any_logic code iterator. Redefine as define_insn_and_split.  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*<any_logic:code>qi_ext<mode>_1_cc): Merge pattern from
"*andqi_ext<mode>_1_cc" and "*xorqi_ext<mode>_1_cc" using any_logic
code iterator. Redefine as define_insn_and_split.  Add alternative 1
and split insn after reload for unmatched operand 0.
(*<any_logic:code>qi_ext<mode>_2): Merge pattern from
"*andqi_ext<mode>_2" and "*<any_or:code>qi_ext<mode>_2" using
any_logic code iterator. Redefine as define_insn_and_split.  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*<any_logic:code>qi_ext<mode>_3): Redefine as define_insn_and_split.
Add alternative 1 and split insn after reload for unmatched operand 0.
(*negqi_ext<mode>_1): Rename from "*negqi_ext<mode>_2".  Add
alternative 1 and split insn after reload for unmatched operand 0.
(*one_cmplqi_ext<mode>_1): Ditto.
(*ashlqi_ext<mode>_1): Ditto.
(*<any_shiftrt:insn>qi_ext<mode>_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr78904-1.c (test_sub): New test.
* gcc.target/i386/pr78904-1a.c (test_sub): Ditto.
* gcc.target/i386/pr78904-1b.c (test_sub): Ditto.
* gcc.target/i386/pr78904-2.c (test_sub): Ditto.
* gcc.target/i386/pr78904-2a.c (test_sub): Ditto.
* gcc.target/i386/pr78904-2b.c (test_sub): Ditto.
* gcc.target/i386/pr78952-4.c (test_sub): Ditto.
* gcc.target/i386/pr82524.c: New test.
* gcc.target/i386/pr82524-1.c: New test.
* gcc.target/i386/pr82524-2.c: New test.
* gcc.target/i386/pr82524-3.c: New test.

Fix SLP of emulated gathers

The following fixes an error in the SLP of emulated gathers,
discovered by x86 specific tests when enabling single-lane SLP.

* tree-vect-stmts.cc (vectorizable_load): Adjust offset
vector gathering for SLP of emulated gathers.

TLC to vect_check_store_rhs and vect_slp_child_index_for_operand

This prepares us for the SLP of scatters. We have to tell
vect_slp_child_index_for_operand whether we are dealing with
a scatter/gather stmt so this adds an argument similar to
the one we have for vect_get_operand_map. This also refactors
vect_check_store_rhs to get the actual rhs and the associated
SLP node instead of leaving that to the caller.

* tree-vectorizer.h (vect_slp_child_index_for_operand):
Add gatherscatter_p argument.
* tree-vect-slp.cc (vect_slp_child_index_for_operand): Likewise.
Pass it on.
* tree-vect-stmts.cc (vect_check_store_rhs): Turn the rhs
argument into an output, also output the SLP node associated
with it.
(vectorizable_simd_clone_call): Adjust.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.

Fix SLP of masked loads

The following adjusts things to use the correct mask operand for
the SLP of masked loads and gathers. Test coverage is from
runtime fails of i386 specific AVX512 tests when enabling single-lane
SLP.

* tree-vect-stmts.cc (vectorizable_load): Use the correct
vectorized mask operand.

RISC-V: Removed unnecessary sign-extend for vsetvl

Hi,

This patch try to combine bellow two insns and then further remove
unnecessary sign_extend operations. This optimization is borrowed
from LLVM (https://godbolt.org/z/4f6v56xej):
  (set (reg:DI 134 [ _1 ])
       (unspec:DI [
               (const_int 19 [0x13])
               (const_int 8 [0x8])
               (const_int 5 [0x5])
               (const_int 2 [0x2]) repeated x2
           ] UNSPEC_VSETVL))
  (set (reg/v:DI 135 [ <retval> ])
          (sign_extend:DI (subreg:SI (reg:DI 134 [ _1 ]) 0)))

The reason we can remove signe_extend is because currently the vl value
returned by the vsetvl instruction ranges from 0 to 65536 (uint16_t), and
bits 17 to 63 (including 31) are always 0, so there is no change after
sign_extend. Note that for HI and QI modes we cannot do this.
Of course, if the range returned by vsetvl later expands to 32 bits,
then this combine pattern needs to be removed. But that could be
a long time from now.

gcc/ChangeLog:

* config/riscv/vector.md (*vsetvldi_no_side_effects_si_extend):
New combine pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_int.c: New test.

Improve C99 compatibility of gcc.dg/setjmp-7.c test

gcc/testsuite/ChangeLog:

* gcc.dg/setjmp-7.c (_setjmp): Declare.

LibF7: Tweak IEEE double multiplication.

libgcc/config/avr/libf7/
* libf7-asm.sx (mul_mant) [AVR_HAVE_MUL]: Tweak code.

RISC-V: Fix VSETVL VL check condition bug

When fixing the induction variable vectorization bug, notice there is a ICE bug
in VSETVL PASS:

0x178015b rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*)
        ../../../../gcc/gcc/rtl.cc:770
0x1079cdd rhs_regno(rtx_def const*)
        ../../../../gcc/gcc/rtl.h:1934
0x1dab360 vsetvl_info::parse_insn(rtl_ssa::insn_info*)
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:1070
0x1daa272 vsetvl_info::vsetvl_info(rtl_ssa::insn_info*)
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:746
0x1da5d98 pre_vsetvl::fuse_local_vsetvl_info()
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:2708
0x1da94d9 pass_vsetvl::lazy_vsetvl()
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3444
0x1da977c pass_vsetvl::execute(function*)
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3504

Committed as it is obvious.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vl-use-ice.c: New test.

libgfortran: Remove empty array descriptor first dimension overwrite [PR112371]

Remove the forced overwrite of the first dimension of the result array
descriptor to set it to zero extent, in the function templates for
transformational functions doing an array reduction along a dimension.  This
overwrite, which happened before early returning in case the result array
was empty, was wrong because an array may have a non-zero extent in the
first dimension and still be empty if it has a zero extent in a higher
dimension.  Overwriting the dimension was resulting in wrong array result
upper bound for the first dimension in that case.

The offending piece of code was present in several places, and this removes
them all.  More precisely, there is only one case to fix for logical
reduction functions, and there are three cases for other reduction
functions, corresponding to non-masked reduction, reduction with array mask,
and reduction with scalar mask.  The impacted m4 files are
ifunction_logical.m4 for logical reduction functions, ifunction.m4 for
regular functions and types, ifunction-s.m4 for character minloc and maxloc,
ifunction-s2.m4 for character minval and maxval, and ifindloc1.m4 for
findloc.

PR fortran/112371

libgfortran/ChangeLog:

* m4/ifunction.m4 (START_ARRAY_FUNCTION, START_MASKED_ARRAY_FUNCTION,
SCALAR_ARRAY_FUNCTION): Remove overwrite of the first dimension of the
array descriptor.
* m4/ifunction-s.m4 (START_ARRAY_FUNCTION, START_MASKED_ARRAY_FUNCTION,
SCALAR_ARRAY_FUNCTION): Ditto.
* m4/ifunction-s2.m4 (START_ARRAY_FUNCTION,
START_MASKED_ARRAY_FUNCTION, SCALAR_ARRAY_FUNCTION): Ditto.
* m4/ifunction_logical.m4 (START_ARRAY_FUNCTION): Ditto.
* m4/ifindloc1.m4: Ditto.
* generated/all_l1.c: Regenerate.
* generated/all_l16.c: Regenerate.
* generated/all_l2.c: Regenerate.
* generated/all_l4.c: Regenerate.
* generated/all_l8.c: Regenerate.
* generated/any_l1.c: Regenerate.
* generated/any_l16.c: Regenerate.
* generated/any_l2.c: Regenerate.
* generated/any_l4.c: Regenerate.
* generated/any_l8.c: Regenerate.
* generated/count_16_l.c: Regenerate.
* generated/count_1_l.c: Regenerate.
* generated/count_2_l.c: Regenerate.
* generated/count_4_l.c: Regenerate.
* generated/count_8_l.c: Regenerate.
* generated/findloc1_c10.c: Regenerate.
* generated/findloc1_c16.c: Regenerate.
* generated/findloc1_c17.c: Regenerate.
* generated/findloc1_c4.c: Regenerate.
* generated/findloc1_c8.c: Regenerate.
* generated/findloc1_i1.c: Regenerate.
* generated/findloc1_i16.c: Regenerate.
* generated/findloc1_i2.c: Regenerate.
* generated/findloc1_i4.c: Regenerate.
* generated/findloc1_i8.c: Regenerate.
* generated/findloc1_r10.c: Regenerate.
* generated/findloc1_r16.c: Regenerate.
* generated/findloc1_r17.c: Regenerate.
* generated/findloc1_r4.c: Regenerate.
* generated/findloc1_r8.c: Regenerate.
* generated/findloc1_s1.c: Regenerate.
* generated/findloc1_s4.c: Regenerate.
* generated/iall_i1.c: Regenerate.
* generated/iall_i16.c: Regenerate.
* generated/iall_i2.c: Regenerate.
* generated/iall_i4.c: Regenerate.
* generated/iall_i8.c: Regenerate.
* generated/iany_i1.c: Regenerate.
* generated/iany_i16.c: Regenerate.
* generated/iany_i2.c: Regenerate.
* generated/iany_i4.c: Regenerate.
* generated/iany_i8.c: Regenerate.
* generated/iparity_i1.c: Regenerate.
* generated/iparity_i16.c: Regenerate.
* generated/iparity_i2.c: Regenerate.
* generated/iparity_i4.c: Regenerate.
* generated/iparity_i8.c: Regenerate.
* generated/maxloc1_16_i1.c: Regenerate.
* generated/maxloc1_16_i16.c: Regenerate.
* generated/maxloc1_16_i2.c: Regenerate.
* generated/maxloc1_16_i4.c: Regenerate.
* generated/maxloc1_16_i8.c: Regenerate.
* generated/maxloc1_16_r10.c: Regenerate.
* generated/maxloc1_16_r16.c: Regenerate.
* generated/maxloc1_16_r17.c: Regenerate.
* generated/maxloc1_16_r4.c: Regenerate.
* generated/maxloc1_16_r8.c: Regenerate.
* generated/maxloc1_16_s1.c: Regenerate.
* generated/maxloc1_16_s4.c: Regenerate.
* generated/maxloc1_4_i1.c: Regenerate.
* generated/maxloc1_4_i16.c: Regenerate.
* generated/maxloc1_4_i2.c: Regenerate.
* generated/maxloc1_4_i4.c: Regenerate.
* generated/maxloc1_4_i8.c: Regenerate.
* generated/maxloc1_4_r10.c: Regenerate.
* generated/maxloc1_4_r16.c: Regenerate.
* generated/maxloc1_4_r17.c: Regenerate.
* generated/maxloc1_4_r4.c: Regenerate.
* generated/maxloc1_4_r8.c: Regenerate.
* generated/maxloc1_4_s1.c: Regenerate.
* generated/maxloc1_4_s4.c: Regenerate.
* generated/maxloc1_8_i1.c: Regenerate.
* generated/maxloc1_8_i16.c: Regenerate.
* generated/maxloc1_8_i2.c: Regenerate.
* generated/maxloc1_8_i4.c: Regenerate.
* generated/maxloc1_8_i8.c: Regenerate.
* generated/maxloc1_8_r10.c: Regenerate.
* generated/maxloc1_8_r16.c: Regenerate.
* generated/maxloc1_8_r17.c: Regenerate.
* generated/maxloc1_8_r4.c: Regenerate.
* generated/maxloc1_8_r8.c: Regenerate.
* generated/maxloc1_8_s1.c: Regenerate.
* generated/maxloc1_8_s4.c: Regenerate.
* generated/maxval1_s1.c: Regenerate.
* generated/maxval1_s4.c: Regenerate.
* generated/maxval_i1.c: Regenerate.
* generated/maxval_i16.c: Regenerate.
* generated/maxval_i2.c: Regenerate.
* generated/maxval_i4.c: Regenerate.
* generated/maxval_i8.c: Regenerate.
* generated/maxval_r10.c: Regenerate.
* generated/maxval_r16.c: Regenerate.
* generated/maxval_r17.c: Regenerate.
* generated/maxval_r4.c: Regenerate.
* generated/maxval_r8.c: Regenerate.
* generated/minloc1_16_i1.c: Regenerate.
* generated/minloc1_16_i16.c: Regenerate.
* generated/minloc1_16_i2.c: Regenerate.
* generated/minloc1_16_i4.c: Regenerate.
* generated/minloc1_16_i8.c: Regenerate.
* generated/minloc1_16_r10.c: Regenerate.
* generated/minloc1_16_r16.c: Regenerate.
* generated/minloc1_16_r17.c: Regenerate.
* generated/minloc1_16_r4.c: Regenerate.
* generated/minloc1_16_r8.c: Regenerate.
* generated/minloc1_16_s1.c: Regenerate.
* generated/minloc1_16_s4.c: Regenerate.
* generated/minloc1_4_i1.c: Regenerate.
* generated/minloc1_4_i16.c: Regenerate.
* generated/minloc1_4_i2.c: Regenerate.
* generated/minloc1_4_i4.c: Regenerate.
* generated/minloc1_4_i8.c: Regenerate.
* generated/minloc1_4_r10.c: Regenerate.
* generated/minloc1_4_r16.c: Regenerate.
* generated/minloc1_4_r17.c: Regenerate.
* generated/minloc1_4_r4.c: Regenerate.
* generated/minloc1_4_r8.c: Regenerate.
* generated/minloc1_4_s1.c: Regenerate.
* generated/minloc1_4_s4.c: Regenerate.
* generated/minloc1_8_i1.c: Regenerate.
* generated/minloc1_8_i16.c: Regenerate.
* generated/minloc1_8_i2.c: Regenerate.
* generated/minloc1_8_i4.c: Regenerate.
* generated/minloc1_8_i8.c: Regenerate.
* generated/minloc1_8_r10.c: Regenerate.
* generated/minloc1_8_r16.c: Regenerate.
* generated/minloc1_8_r17.c: Regenerate.
* generated/minloc1_8_r4.c: Regenerate.
* generated/minloc1_8_r8.c: Regenerate.
* generated/minloc1_8_s1.c: Regenerate.
* generated/minloc1_8_s4.c: Regenerate.
* generated/minval1_s1.c: Regenerate.
* generated/minval1_s4.c: Regenerate.
* generated/minval_i1.c: Regenerate.
* generated/minval_i16.c: Regenerate.
* generated/minval_i2.c: Regenerate.
* generated/minval_i4.c: Regenerate.
* generated/minval_i8.c: Regenerate.
* generated/minval_r10.c: Regenerate.
* generated/minval_r16.c: Regenerate.
* generated/minval_r17.c: Regenerate.
* generated/minval_r4.c: Regenerate.
* generated/minval_r8.c: Regenerate.
* generated/norm2_r10.c: Regenerate.
* generated/norm2_r16.c: Regenerate.
* generated/norm2_r17.c: Regenerate.
* generated/norm2_r4.c: Regenerate.
* generated/norm2_r8.c: Regenerate.
* generated/parity_l1.c: Regenerate.
* generated/parity_l16.c: Regenerate.
* generated/parity_l2.c: Regenerate.
* generated/parity_l4.c: Regenerate.
* generated/parity_l8.c: Regenerate.
* generated/product_c10.c: Regenerate.
* generated/product_c16.c: Regenerate.
* generated/product_c17.c: Regenerate.
* generated/product_c4.c: Regenerate.
* generated/product_c8.c: Regenerate.
* generated/product_i1.c: Regenerate.
* generated/product_i16.c: Regenerate.
* generated/product_i2.c: Regenerate.
* generated/product_i4.c: Regenerate.
* generated/product_i8.c: Regenerate.
* generated/product_r10.c: Regenerate.
* generated/product_r16.c: Regenerate.
* generated/product_r17.c: Regenerate.
* generated/product_r4.c: Regenerate.
* generated/product_r8.c: Regenerate.
* generated/sum_c10.c: Regenerate.
* generated/sum_c16.c: Regenerate.
* generated/sum_c17.c: Regenerate.
* generated/sum_c4.c: Regenerate.
* generated/sum_c8.c: Regenerate.
* generated/sum_i1.c: Regenerate.
* generated/sum_i16.c: Regenerate.
* generated/sum_i2.c: Regenerate.
* generated/sum_i4.c: Regenerate.
* generated/sum_i8.c: Regenerate.
* generated/sum_r10.c: Regenerate.
* generated/sum_r16.c: Regenerate.
* generated/sum_r17.c: Regenerate.
* generated/sum_r4.c: Regenerate.
* generated/sum_r8.c: Regenerate.

gcc/testsuite/ChangeLog:

* gfortran.dg/bound_11.f90: New test.

libgfortran: Remove early return if extent is zero [PR112371]

Remove the early return present in function templates for transformational
functions doing a (masked) reduction of an array along a dimension.
This early return, which triggered if the extent in the reduction dimension
was zero, was wrong because even if the reduction operation degenerates to
a constant value in that case, one has to loop anyway along the other
dimensions to initialize every element of the resulting array with that
constant value. The case of negative extent (not sure whether it may happen
in practice) which was also early returning, is handled by clamping to zero.

The offending piece of code was present in several places, and this removes
them all. Namely, the impacted m4 files are ifunction.m4 for regular
functions and types, ifunction-s.m4 for character minloc and maxloc, and
ifunction-s2.m4 for character minval and maxval.

PR fortran/112371

libgfortran/ChangeLog:

* m4/ifunction.m4 (START_MASKED_ARRAY_FUNCTION): Remove early return if
extent is zero or less, and clamp negative value to zero.
* m4/ifunction-s.m4 (START_MASKED_ARRAY_FUNCTION): Ditto.
* m4/ifunction-s2.m4 (START_MASKED_ARRAY_FUNCTION): Ditto.
* generated/iall_i1.c: Regenerate.
* generated/iall_i16.c: Regenerate.
* generated/iall_i2.c: Regenerate.
* generated/iall_i4.c: Regenerate.
* generated/iall_i8.c: Regenerate.
* generated/iany_i1.c: Regenerate.
* generated/iany_i16.c: Regenerate.
* generated/iany_i2.c: Regenerate.
* generated/iany_i4.c: Regenerate.
* generated/iany_i8.c: Regenerate.
* generated/iparity_i1.c: Regenerate.
* generated/iparity_i16.c: Regenerate.
* generated/iparity_i2.c: Regenerate.
* generated/iparity_i4.c: Regenerate.
* generated/iparity_i8.c: Regenerate.
* generated/maxloc1_16_i1.c: Regenerate.
* generated/maxloc1_16_i16.c: Regenerate.
* generated/maxloc1_16_i2.c: Regenerate.
* generated/maxloc1_16_i4.c: Regenerate.
* generated/maxloc1_16_i8.c: Regenerate.
* generated/maxloc1_16_r10.c: Regenerate.
* generated/maxloc1_16_r16.c: Regenerate.
* generated/maxloc1_16_r17.c: Regenerate.
* generated/maxloc1_16_r4.c: Regenerate.
* generated/maxloc1_16_r8.c: Regenerate.
* generated/maxloc1_16_s1.c: Regenerate.
* generated/maxloc1_16_s4.c: Regenerate.
* generated/maxloc1_4_i1.c: Regenerate.
* generated/maxloc1_4_i16.c: Regenerate.
* generated/maxloc1_4_i2.c: Regenerate.
* generated/maxloc1_4_i4.c: Regenerate.
* generated/maxloc1_4_i8.c: Regenerate.
* generated/maxloc1_4_r10.c: Regenerate.
* generated/maxloc1_4_r16.c: Regenerate.
* generated/maxloc1_4_r17.c: Regenerate.
* generated/maxloc1_4_r4.c: Regenerate.
* generated/maxloc1_4_r8.c: Regenerate.
* generated/maxloc1_4_s1.c: Regenerate.
* generated/maxloc1_4_s4.c: Regenerate.
* generated/maxloc1_8_i1.c: Regenerate.
* generated/maxloc1_8_i16.c: Regenerate.
* generated/maxloc1_8_i2.c: Regenerate.
* generated/maxloc1_8_i4.c: Regenerate.
* generated/maxloc1_8_i8.c: Regenerate.
* generated/maxloc1_8_r10.c: Regenerate.
* generated/maxloc1_8_r16.c: Regenerate.
* generated/maxloc1_8_r17.c: Regenerate.
* generated/maxloc1_8_r4.c: Regenerate.
* generated/maxloc1_8_r8.c: Regenerate.
* generated/maxloc1_8_s1.c: Regenerate.
* generated/maxloc1_8_s4.c: Regenerate.
* generated/maxval1_s1.c: Regenerate.
* generated/maxval1_s4.c: Regenerate.
* generated/maxval_i1.c: Regenerate.
* generated/maxval_i16.c: Regenerate.
* generated/maxval_i2.c: Regenerate.
* generated/maxval_i4.c: Regenerate.
* generated/maxval_i8.c: Regenerate.
* generated/maxval_r10.c: Regenerate.
* generated/maxval_r16.c: Regenerate.
* generated/maxval_r17.c: Regenerate.
* generated/maxval_r4.c: Regenerate.
* generated/maxval_r8.c: Regenerate.
* generated/minloc1_16_i1.c: Regenerate.
* generated/minloc1_16_i16.c: Regenerate.
* generated/minloc1_16_i2.c: Regenerate.
* generated/minloc1_16_i4.c: Regenerate.
* generated/minloc1_16_i8.c: Regenerate.
* generated/minloc1_16_r10.c: Regenerate.
* generated/minloc1_16_r16.c: Regenerate.
* generated/minloc1_16_r17.c: Regenerate.
* generated/minloc1_16_r4.c: Regenerate.
* generated/minloc1_16_r8.c: Regenerate.
* generated/minloc1_16_s1.c: Regenerate.
* generated/minloc1_16_s4.c: Regenerate.
* generated/minloc1_4_i1.c: Regenerate.
* generated/minloc1_4_i16.c: Regenerate.
* generated/minloc1_4_i2.c: Regenerate.
* generated/minloc1_4_i4.c: Regenerate.
* generated/minloc1_4_i8.c: Regenerate.
* generated/minloc1_4_r10.c: Regenerate.
* generated/minloc1_4_r16.c: Regenerate.
* generated/minloc1_4_r17.c: Regenerate.
* generated/minloc1_4_r4.c: Regenerate.
* generated/minloc1_4_r8.c: Regenerate.
* generated/minloc1_4_s1.c: Regenerate.
* generated/minloc1_4_s4.c: Regenerate.
* generated/minloc1_8_i1.c: Regenerate.
* generated/minloc1_8_i16.c: Regenerate.
* generated/minloc1_8_i2.c: Regenerate.
* generated/minloc1_8_i4.c: Regenerate.
* generated/minloc1_8_i8.c: Regenerate.
* generated/minloc1_8_r10.c: Regenerate.
* generated/minloc1_8_r16.c: Regenerate.
* generated/minloc1_8_r17.c: Regenerate.
* generated/minloc1_8_r4.c: Regenerate.
* generated/minloc1_8_r8.c: Regenerate.
* generated/minloc1_8_s1.c: Regenerate.
* generated/minloc1_8_s4.c: Regenerate.
* generated/minval1_s1.c: Regenerate.
* generated/minval1_s4.c: Regenerate.
* generated/minval_i1.c: Regenerate.
* generated/minval_i16.c: Regenerate.
* generated/minval_i2.c: Regenerate.
* generated/minval_i4.c: Regenerate.
* generated/minval_i8.c: Regenerate.
* generated/minval_r10.c: Regenerate.
* generated/minval_r16.c: Regenerate.
* generated/minval_r17.c: Regenerate.
* generated/minval_r4.c: Regenerate.
* generated/minval_r8.c: Regenerate.
* generated/product_c10.c: Regenerate.
* generated/product_c16.c: Regenerate.
* generated/product_c17.c: Regenerate.
* generated/product_c4.c: Regenerate.
* generated/product_c8.c: Regenerate.
* generated/product_i1.c: Regenerate.
* generated/product_i16.c: Regenerate.
* generated/product_i2.c: Regenerate.
* generated/product_i4.c: Regenerate.
* generated/product_i8.c: Regenerate.
* generated/product_r10.c: Regenerate.
* generated/product_r16.c: Regenerate.
* generated/product_r17.c: Regenerate.
* generated/product_r4.c: Regenerate.
* generated/product_r8.c: Regenerate.
* generated/sum_c10.c: Regenerate.
* generated/sum_c16.c: Regenerate.
* generated/sum_c17.c: Regenerate.
* generated/sum_c4.c: Regenerate.
* generated/sum_c8.c: Regenerate.
* generated/sum_i1.c: Regenerate.
* generated/sum_i16.c: Regenerate.
* generated/sum_i2.c: Regenerate.
* generated/sum_i4.c: Regenerate.
* generated/sum_i8.c: Regenerate.
* generated/sum_r10.c: Regenerate.
* generated/sum_r16.c: Regenerate.
* generated/sum_r17.c: Regenerate.
* generated/sum_r4.c: Regenerate.
* generated/sum_r8.c: Regenerate.

gcc/testsuite/ChangeLog:

* gfortran.dg/bound_10.f90: New test.

libgfortran: Don't skip allocation if size is zero [PR112412]

In the function template of transformational functions doing a reduction
of an array along one dimension, if the passed in result array was
unallocated and the calculated allocation size was zero (this is the case
of empty result arrays), an early return used to skip the allocation.  This
change moves the allocation before the early return, so that empty result
arrays are not seen as unallocated.  This is possible because zero size is
explicitly supported by the allocation function.

The offending code is present in several places, and this updates them all.
More precisely, there is one place in the template for logical reductions,
and there are two places in the templates corresponding to masked reductions
with respectively array mask and scalar mask.  Templates for unmasked
reductions, which already allocate before returning, are not affected, but
unmasked reductions are checked nevertheless in the testcase.  The affected
m4 files are ifunction.m4 for regular functions and types, ifunction-s.m4
for character minloc and maxloc, ifunction-s2.m4 for character minval and
maxval, and ifunction_logical for logical reductions.

PR fortran/112412

libgfortran/ChangeLog:

* m4/ifunction.m4 (START_MASKED_ARRAY_FUNCTION, SCALAR_ARRAY_FUNCTION):
Don't skip allocation if the allocation size is zero.
* m4/ifunction-s.m4 (START_MASKED_ARRAY_FUNCTION,
SCALAR_ARRAY_FUNCTION): Ditto.
* m4/ifunction-s2.m4 (START_MASKED_ARRAY_FUNCTION,
SCALAR_ARRAY_FUNCTION): Ditto.
* m4/ifunction_logical.m4 (START_ARRAY_FUNCTION): Ditto.
* generated/all_l1.c: Regenerate.
* generated/all_l16.c: Regenerate.
* generated/all_l2.c: Regenerate.
* generated/all_l4.c: Regenerate.
* generated/all_l8.c: Regenerate.
* generated/any_l1.c: Regenerate.
* generated/any_l16.c: Regenerate.
* generated/any_l2.c: Regenerate.
* generated/any_l4.c: Regenerate.
* generated/any_l8.c: Regenerate.
* generated/count_16_l.c: Regenerate.
* generated/count_1_l.c: Regenerate.
* generated/count_2_l.c: Regenerate.
* generated/count_4_l.c: Regenerate.
* generated/count_8_l.c: Regenerate.
* generated/iall_i1.c: Regenerate.
* generated/iall_i16.c: Regenerate.
* generated/iall_i2.c: Regenerate.
* generated/iall_i4.c: Regenerate.
* generated/iall_i8.c: Regenerate.
* generated/iany_i1.c: Regenerate.
* generated/iany_i16.c: Regenerate.
* generated/iany_i2.c: Regenerate.
* generated/iany_i4.c: Regenerate.
* generated/iany_i8.c: Regenerate.
* generated/iparity_i1.c: Regenerate.
* generated/iparity_i16.c: Regenerate.
* generated/iparity_i2.c: Regenerate.
* generated/iparity_i4.c: Regenerate.
* generated/iparity_i8.c: Regenerate.
* generated/maxloc1_16_i1.c: Regenerate.
* generated/maxloc1_16_i16.c: Regenerate.
* generated/maxloc1_16_i2.c: Regenerate.
* generated/maxloc1_16_i4.c: Regenerate.
* generated/maxloc1_16_i8.c: Regenerate.
* generated/maxloc1_16_r10.c: Regenerate.
* generated/maxloc1_16_r16.c: Regenerate.
* generated/maxloc1_16_r17.c: Regenerate.
* generated/maxloc1_16_r4.c: Regenerate.
* generated/maxloc1_16_r8.c: Regenerate.
* generated/maxloc1_16_s1.c: Regenerate.
* generated/maxloc1_16_s4.c: Regenerate.
* generated/maxloc1_4_i1.c: Regenerate.
* generated/maxloc1_4_i16.c: Regenerate.
* generated/maxloc1_4_i2.c: Regenerate.
* generated/maxloc1_4_i4.c: Regenerate.
* generated/maxloc1_4_i8.c: Regenerate.
* generated/maxloc1_4_r10.c: Regenerate.
* generated/maxloc1_4_r16.c: Regenerate.
* generated/maxloc1_4_r17.c: Regenerate.
* generated/maxloc1_4_r4.c: Regenerate.
* generated/maxloc1_4_r8.c: Regenerate.
* generated/maxloc1_4_s1.c: Regenerate.
* generated/maxloc1_4_s4.c: Regenerate.
* generated/maxloc1_8_i1.c: Regenerate.
* generated/maxloc1_8_i16.c: Regenerate.
* generated/maxloc1_8_i2.c: Regenerate.
* generated/maxloc1_8_i4.c: Regenerate.
* generated/maxloc1_8_i8.c: Regenerate.
* generated/maxloc1_8_r10.c: Regenerate.
* generated/maxloc1_8_r16.c: Regenerate.
* generated/maxloc1_8_r17.c: Regenerate.
* generated/maxloc1_8_r4.c: Regenerate.
* generated/maxloc1_8_r8.c: Regenerate.
* generated/maxloc1_8_s1.c: Regenerate.
* generated/maxloc1_8_s4.c: Regenerate.
* generated/maxval1_s1.c: Regenerate.
* generated/maxval1_s4.c: Regenerate.
* generated/maxval_i1.c: Regenerate.
* generated/maxval_i16.c: Regenerate.
* generated/maxval_i2.c: Regenerate.
* generated/maxval_i4.c: Regenerate.
* generated/maxval_i8.c: Regenerate.
* generated/maxval_r10.c: Regenerate.
* generated/maxval_r16.c: Regenerate.
* generated/maxval_r17.c: Regenerate.
* generated/maxval_r4.c: Regenerate.
* generated/maxval_r8.c: Regenerate.
* generated/minloc1_16_i1.c: Regenerate.
* generated/minloc1_16_i16.c: Regenerate.
* generated/minloc1_16_i2.c: Regenerate.
* generated/minloc1_16_i4.c: Regenerate.
* generated/minloc1_16_i8.c: Regenerate.
* generated/minloc1_16_r10.c: Regenerate.
* generated/minloc1_16_r16.c: Regenerate.
* generated/minloc1_16_r17.c: Regenerate.
* generated/minloc1_16_r4.c: Regenerate.
* generated/minloc1_16_r8.c: Regenerate.
* generated/minloc1_16_s1.c: Regenerate.
* generated/minloc1_16_s4.c: Regenerate.
* generated/minloc1_4_i1.c: Regenerate.
* generated/minloc1_4_i16.c: Regenerate.
* generated/minloc1_4_i2.c: Regenerate.
* generated/minloc1_4_i4.c: Regenerate.
* generated/minloc1_4_i8.c: Regenerate.
* generated/minloc1_4_r10.c: Regenerate.
* generated/minloc1_4_r16.c: Regenerate.
* generated/minloc1_4_r17.c: Regenerate.
* generated/minloc1_4_r4.c: Regenerate.
* generated/minloc1_4_r8.c: Regenerate.
* generated/minloc1_4_s1.c: Regenerate.
* generated/minloc1_4_s4.c: Regenerate.
* generated/minloc1_8_i1.c: Regenerate.
* generated/minloc1_8_i16.c: Regenerate.
* generated/minloc1_8_i2.c: Regenerate.
* generated/minloc1_8_i4.c: Regenerate.
* generated/minloc1_8_i8.c: Regenerate.
* generated/minloc1_8_r10.c: Regenerate.
* generated/minloc1_8_r16.c: Regenerate.
* generated/minloc1_8_r17.c: Regenerate.
* generated/minloc1_8_r4.c: Regenerate.
* generated/minloc1_8_r8.c: Regenerate.
* generated/minloc1_8_s1.c: Regenerate.
* generated/minloc1_8_s4.c: Regenerate.
* generated/minval1_s1.c: Regenerate.
* generated/minval1_s4.c: Regenerate.
* generated/minval_i1.c: Regenerate.
* generated/minval_i16.c: Regenerate.
* generated/minval_i2.c: Regenerate.
* generated/minval_i4.c: Regenerate.
* generated/minval_i8.c: Regenerate.
* generated/minval_r10.c: Regenerate.
* generated/minval_r16.c: Regenerate.
* generated/minval_r17.c: Regenerate.
* generated/minval_r4.c: Regenerate.
* generated/minval_r8.c: Regenerate.
* generated/product_c10.c: Regenerate.
* generated/product_c16.c: Regenerate.
* generated/product_c17.c: Regenerate.
* generated/product_c4.c: Regenerate.
* generated/product_c8.c: Regenerate.
* generated/product_i1.c: Regenerate.
* generated/product_i16.c: Regenerate.
* generated/product_i2.c: Regenerate.
* generated/product_i4.c: Regenerate.
* generated/product_i8.c: Regenerate.
* generated/product_r10.c: Regenerate.
* generated/product_r16.c: Regenerate.
* generated/product_r17.c: Regenerate.
* generated/product_r4.c: Regenerate.
* generated/product_r8.c: Regenerate.
* generated/sum_c10.c: Regenerate.
* generated/sum_c16.c: Regenerate.
* generated/sum_c17.c: Regenerate.
* generated/sum_c4.c: Regenerate.
* generated/sum_c8.c: Regenerate.
* generated/sum_i1.c: Regenerate.
* generated/sum_i16.c: Regenerate.
* generated/sum_i2.c: Regenerate.
* generated/sum_i4.c: Regenerate.
* generated/sum_i8.c: Regenerate.
* generated/sum_r10.c: Regenerate.
* generated/sum_r16.c: Regenerate.
* generated/sum_r17.c: Regenerate.
* generated/sum_r4.c: Regenerate.
* generated/sum_r8.c: Regenerate.

gcc/testsuite/ChangeLog:

* gfortran.dg/allocated_4.f90: New test.

RISC-V: Eliminate unused parameter warning.

The parameter orig_fndecl is not used, use anonymous parameters instead.

../.././gcc/gcc/config/riscv/riscv-c.cc: In function ‘bool riscv_check_builtin_call(location_t, vec<unsigned int>, tree, tree, unsigned int, tree_node**)’:
../.././gcc/gcc/config/riscv/riscv-c.cc:207:11: warning: unused parameter ‘orig_fndecl’ [-Wunused-parameter]
tree orig_fndecl, unsigned int nargs, tree *args)
^~~~~~~~~~~

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_check_builtin_call): Eliminate warning.

[i386] APX: Fix ICE due to movti postreload splitter [PR112394]

When APX EGPR enabled, the TImode move pattern *movti_internal allows
move between gpr and sse reg using constraint pair ("r","Yd"). Then a
post-reload splitter transform such move to vec_extractv2di, while under
-msse4.1 -mno-avx EGPR is not allowed for its enabled alternative, which
caused ICE that insn does not match the constraint. To prevent such ICE,
we need to adjust the constraint correspond to "Yd". Add a new
constraint "jc" to disable EGPR under -mno-avx.

gcc/ChangeLog:

PR target/112394
* config/i386/constraints.md (jc): New constraint that prohibits
EGPR on -mno-avx.
* config/i386/i386.md (*movdi_internal): Change r constraint
corresponds to Yd.
(*movti_internal): Likewise.

gcc/testsuite/ChangeLog:

PR target/112394
* gcc.target/i386/pr112394.c: New test.

test: Fix bb-slp-33.c for RVV

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-33.c: Rewrite the condition.