gcc.gnu.org Git - gcc.git/log

Simplify abs (copysign (x, y))

The following adds simplification of abs (copysign (x, y)) to abs (x).

* match.pd (abs (copysign (x, y)) -> abs (x)): New pattern.

* gcc.dg/fold-abs-6.c: New testcase.

Harden scan patterns with a bit of scripting:

$ egrep -r 'scan-assembler(|-not|-times) "[[:alnum:].]{1,7}"' riscv
$ egrep -rl 'scan-assembler(|-not|-times) "[[:alnum:].]{1,7}"' riscv > files
$ cat edcmds
g/$scan-assembler\(\|-not\|-times$ \+\)"$[[:alnum:]]\{1,5\}$\.$[[:alpha:].]\{1,3\}$"/s//\1{\\m\3\\.\4\\M}/
g/$scan-assembler\(\|-not\|-times$ \+\)"$[[:alnum:]]\{1,7\}$"/s//\1{\\m\3}/
w
q
$ sed 's/.*/ed & < edcmds/' < files > tmp
$ source tmp

gcc/testsuite/
* gcc.target/riscv/shift-shift-1.c: Avoid spurious pattern matches.
* gcc.target/riscv/shift-shift-3.c: Likewise.
* gcc.target/riscv/zba-shNadd-01.c: Likewise.
* gcc.target/riscv/zba-shNadd-02.c: Likewise.
* gcc.target/riscv/zbb-andn-orn-xnor-01.c: Likewise.
* gcc.target/riscv/zbb-andn-orn-xnor-02.c: Likewise.
* gcc.target/riscv/zbb-min-max.c: Likewise.
* gcc.target/riscv/zero-extend-1.c: Likewise.
* gcc.target/riscv/zero-extend-2.c: Likewise.
* gcc.target/riscv/zero-extend-3.c: Likewise.
* gcc.target/riscv/zero-extend-4.c: Likewise.
* gcc.target/riscv/zero-extend-5.c: Likewise.
* gcc.target/riscv/_Float16-soft-2.c: Likewise.
* gcc.target/riscv/_Float16-soft-3.c: Likewise.
* gcc.target/riscv/_Float16-zfh-1.c: Likewise.
* gcc.target/riscv/_Float16-zfh-2.c: Likewise.
* gcc.target/riscv/_Float16-zfh-3.c: Likewise.
* gcc.target/riscv/and-extend-1.c: Likewise.
* gcc.target/riscv/and-extend-2.c: Likewise.
* gcc.target/riscv/pr108987.c: Likewise.
* gcc.target/riscv/ret-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/align-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/align-2.c: Likewise.
* gcc.target/riscv/zba-shNadd-04.c: Likewise.
* gcc.target/riscv/zba-shNadd-07.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-02.c: Likewise.
* gcc.target/riscv/zbbw.c: Likewise.
* gcc.target/riscv/zbc32.c: Likewise.
* gcc.target/riscv/zbc64.c: Likewise.
* gcc.target/riscv/zbkb32.c: Likewise.
* gcc.target/riscv/zbkb64.c: Likewise.
* gcc.target/riscv/zbkc32.c: Likewise.
* gcc.target/riscv/zbkc64.c: Likewise.
* gcc.target/riscv/zbkx32.c: Likewise.
* gcc.target/riscv/zbkx64.c: Likewise.
* gcc.target/riscv/zfa-fleq-fltq.c: Likewise.
* gcc.target/riscv/zfa-fli-zfh.c: Likewise.
* gcc.target/riscv/zfa-fli.c: Likewise.
* gcc.target/riscv/zknd64.c: Likewise.
* gcc.target/riscv/zksed32.c: Likewise.
* gcc.target/riscv/zksed64.c: Likewise.
* gcc.target/riscv/zksh32.c: Likewise.
* gcc.target/riscv/zksh64.c: Likewise.
* gcc.target/riscv/_Float16-soft-1.c: Likewise.
* gcc.target/riscv/_Float16-zfhmin-1.c: Likewise.
* gcc.target/riscv/_Float16-zfhmin-2.c: Likewise.
* gcc.target/riscv/_Float16-zfhmin-3.c: Likewise.
* gcc.target/riscv/_Float16-zhinxmin-1.c: Likewise.
* gcc.target/riscv/_Float16-zhinxmin-2.c: Likewise.
* gcc.target/riscv/_Float16-zhinxmin-3.c: Likewise.
* gcc.target/riscv/fle-ieee.c: Likewise.
* gcc.target/riscv/fle-snan.c: Likewise.
* gcc.target/riscv/flef-ieee.c: Likewise.
* gcc.target/riscv/flef-snan.c: Likewise.
* gcc.target/riscv/flt-ieee.c: Likewise.
* gcc.target/riscv/flt-snan.c: Likewise.
* gcc.target/riscv/fltf-ieee.c: Likewise.
* gcc.target/riscv/fltf-snan.c: Likewise.
* gcc.target/riscv/interrupt-1.c: Likewise.
* gcc.target/riscv/interrupt-mmode.c: Likewise.
* gcc.target/riscv/interrupt-smode.c: Likewise.
* gcc.target/riscv/interrupt-umode.c: Likewise.
* gcc.target/riscv/pr106888.c: Likewise.
* gcc.target/riscv/pr89835.c: Likewise.
* gcc.target/riscv/shift-and-1.c: Likewise.
* gcc.target/riscv/shift-and-2.c: Likewise.
* gcc.target/riscv/shift-shift-2.c: Likewise.
* gcc.target/riscv/shift-shift-4.c: Likewise.
* gcc.target/riscv/shift-shift-5.c: Likewise.
* gcc.target/riscv/shorten-memrefs-7.c: Likewise.
* gcc.target/riscv/sign-extend.c: Likewise.
* gcc.target/riscv/switch-qi.c: Likewise.
* gcc.target/riscv/switch-si.c: Likewise.
* gcc.target/riscv/xtheadbb-ext-1.c: Likewise.
* gcc.target/riscv/xtheadbb-ext.c: Likewise.
* gcc.target/riscv/xtheadbb-extu-1.c: Likewise.
* gcc.target/riscv/xtheadbb-extu.c: Likewise.
* gcc.target/riscv/xtheadbb-strlen.c: Likewise.
* gcc.target/riscv/xtheadbs-tst.c: Likewise.
* gcc.target/riscv/xtheadfmv-fmv.c: Likewise.
* gcc.target/riscv/xventanacondops-primitiveSemantics.c: Likewise.
* gcc.target/riscv/zba-adduw.c: Likewise.
* gcc.target/riscv/zba-shadd.c: Likewise.
* gcc.target/riscv/zba-slliuw.c: Likewise.
* gcc.target/riscv/zba-zextw.c: Likewise.
* gcc.target/riscv/zbb-min-max-02.c: Likewise.
* gcc.target/riscv/zbb-min-max-03.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-01.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-03.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-04.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-05.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-06.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-07.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-08.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-09.c: Likewise.
* gcc.target/riscv/zbb-strlen.c: Likewise.
* gcc.target/riscv/zbb_32_bswap-1.c: Likewise.
* gcc.target/riscv/zbb_32_bswap-2.c: Likewise.
* gcc.target/riscv/zbb_bswap-1.c: Likewise.
* gcc.target/riscv/zbb_bswap-2.c: Likewise.
* gcc.target/riscv/zbs-bclr.c: Likewise.
* gcc.target/riscv/zbs-bext-02.c: Likewise.
* gcc.target/riscv/zbs-bext.c: Likewise.
* gcc.target/riscv/zbs-binv.c: Likewise.
* gcc.target/riscv/zbs-bset.c: Likewise.
* gcc.target/riscv/zero-scratch-regs-2.c: Likewise.
* gcc.target/riscv/zicond-primitiveSemantics.c: Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: Likewise.

remove workaround for GCC 4.1-4.3 [PR105606]

While looking into vec.h, I've noticed we still have a workaround for
GCC 4.1-4.3 bugs.
As we now use C++11 and thus need to be built by GCC 4.8 or later,
I think this is now never used.

2023-09-27 Jakub Jelinek <jakub@redhat.com>

PR c++/105606
* system.h (BROKEN_VALUE_INITIALIZATION): Don't define.
* vec.h (vec_default_construct): Remove BROKEN_VALUE_INITIALIZATION
workaround.
* function.cc (assign_parm_find_data_types): Likewise.

RISC-V: Support FP roundeven auto-vectorization

This patch would like to support auto-vectorization for the
roundeven API in math.h. It depends on the -ffast-math option.

When we would like to call roundeven like v2 = roundeven (v1), we will
convert it into below insns (reference the implementation of llvm).

* vfcvt.x.f v3, v1, RNE
* vfcvt.f.x v2, v3

However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.

  +-----------+---------------+-----------------+
  | raw float | binary layout | after roundeven |
  +-----------+---------------+-----------------+
  | 8388607.5 | 0x4affffff    | 8388608.0       |
  | 8388608.0 | 0x4b000000    | 8388608.0       |
  | 8388609.0 | 0x4b000001    | 8388609.0       |
  +-----------+---------------+-----------------+

All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.

Befor this patch:
math-roundeven-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw     fa0,0(s0)
  addi    s0,s0,4
  addi    s1,s1,4
  call    roundeven
  fsw     fa0,-4(s1)
  bne     s0,s2,.L3

After this patch:
  ...
  fsrmi       0   // Rounding to nearest, ties to even
.L4:
  vfabs.v     v1,v2
  vmflt.vf    v0,v1,fa5
  vfcvt.x.f.v v3,v2,v0.t
  vfcvt.f.x.v v1,v3,v0.t
  vfsgnj.vv   v1,v1,v2
  bne         .L4
.L14:
  fsrm        a6
  ret

Please note VLS mode is also involved in this patch and covered by the
test cases.  We will add more run test with zfa support later.

gcc/ChangeLog:

* config/riscv/autovec.md (roundeven<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_roundeven): New func decl.
* config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

DSE: Fix ICE when the mode with access_size don't exist on the target[PR111590]

hen doing fortran test with 'V' extension enabled on RISC-V port.
I saw multiple ICE: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111590

The root cause is on DSE:

internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356
0x1918f70 smallest_mode_for_size(poly_int<2u, unsigned long>, mode_class)
        ../../../../gcc/gcc/stor-layout.cc:356
0x11f75bb smallest_int_mode_for_size(poly_int<2u, unsigned long>)
        ../../../../gcc/gcc/machmode.h:916
0x3304141 find_shift_sequence
        ../../../../gcc/gcc/dse.cc:1738
0x3304f1a get_stored_val
        ../../../../gcc/gcc/dse.cc:1906
0x3305377 replace_read
        ../../../../gcc/gcc/dse.cc:2010
0x3306226 check_mem_read_rtx
        ../../../../gcc/gcc/dse.cc:2310
0x330667b check_mem_read_use
        ../../../../gcc/gcc/dse.cc:2415

After investigations, DSE is trying to do optimization like this following codes:

(insn 86 85 87 9 (set (reg:V4DI 168)
        (mem/u/c:V4DI (reg/f:DI 171) [0  S32 A128])) "bug.f90":6:18 discrim 6 1167 {*movv4di}
     (expr_list:REG_EQUAL (const_vector:V4DI [
                (const_int 4 [0x4])
                (const_int 1 [0x1]) repeated x2
                (const_int 3 [0x3])
            ])
        (nil)))

(set (mem) (reg:V4DI 168))

Then it ICE on: auto new_mode = smallest_int_mode_for_size (access_size * BITS_PER_UNIT);

The access_size may be 24 or 32. We don't have such integer modes with these size so it ICE.

TODO: The better way maybe make DSE use native_encode_rtx/native_decode_rtx
      but I don't know how to do that.  So let's quickly fix this issue, we
      can improve the fix later.

PR target/111590

gcc/ChangeLog:

* dse.cc (find_shift_sequence): Check the mode with access_size exist on the target.

ifcvt: Fix comments

Fix comments since original comment is confusing.

gcc/ChangeLog:

* tree-if-conv.cc (is_cond_scalar_reduction): Fix comments.

RISCV test infrastructure for d / v / zfh extensions

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_rv_float_abi_soft):
New proc.
(check_effective_target_riscv_d): Likewise.
(check_effective_target_riscv_v): Likewise.
(check_effective_target_riscv_zfh): Likewise.
(check_effective_target_riscv_v_ok): likewise.
(check_effective_target_riscv_zfh_ok): Likewise.
(riscv_get_arch, add_options_for_riscv_v): Likewise.
(add_options_for_riscv_zfh): Likewise.
(add_options_for_riscv_d): Likewise.

RISC-V: Support FP trunc auto-vectorization

This patch would like to support auto-vectorization for the
trunc API in math.h. It depends on the -ffast-math option.

When we would like to call trunc/truncf like v2 = trunc (v1),
we will convert it into below insns (reference the implementation of
llvm).

* vfcvt.rtz.x.f v3, v1
* vfcvt.f.x v2, v3

However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:

  +------------+---------------+-----------------+
  | raw float  | binary layout | after trunc     |
  +------------+---------------+-----------------+
  | -8388607.5 | 0xcaffffff    | -8388607.0      |
  | 8388607.5  | 0x4affffff    | 8388607.0       |
  | 8388608.0  | 0x4b000000    | 8388608.0       |
  | 8388609.0  | 0x4b000001    | 8388609.0       |
  +------------+---------------+-----------------+

All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.

Befor this patch:
math-trunc-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw     fa0,0(s0)
  addi    s0,s0,4
  addi    s1,s1,4
  call    trunc
  fsw     fa0,-4(s1)
  bne     s0,s2,.L3

After this patch:
  vfabs.v     v2,v1
  vmflt.vf    v0,v2,fa5
  vfcvt.rtz.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  vfsgnj.vv   v2,v2,v1
  bne         .L4

Please note VLS mode is also involved in this patch and covered by the
test cases.

gcc/ChangeLog:

* config/riscv/autovec.md (btrunc<mode>2): New pattern.
* config/riscv/riscv-protos.h (expand_vec_trunc): New func decl.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): New func impl.
(expand_vec_trunc): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-trunc-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Fix pr111456-1.c for targets that use unsigned char by default

This fixes the testcase to use an explicit `signed char` instead of plain `char`.

Committed as obvious after a test with a cross to powerpc64-linux-gnu and x86_64-linux-gnu.

gcc/testsuite/ChangeLog:

PR testsuite/111603
* gcc.dg/tree-ssa/pr111456-1.c: Use `signed char` instead of plain `char`.

__atomic_test_and_set: Fall back to library, not non-atomic code

Make __atomic_test_and_set consistent with other __atomic_ and __sync_
builtins: call a matching library function instead of emitting
non-atomic code when the target has no direct insn support.

There's special-case code handling targetm.atomic_test_and_set_trueval
!= 1 trying a modified maybe_emit_sync_lock_test_and_set.  Previously,
if that worked but its matching emit_store_flag_force returned NULL,
we'd segfault later on.  Now that the caller handles NULL, gcc_assert
here instead.

While the referenced PR:s are ARM-specific, the issue is general.

PR target/107567
PR target/109166
* builtins.cc (expand_builtin) <case BUILT_IN_ATOMIC_TEST_AND_SET>:
Handle failure from expand_builtin_atomic_test_and_set.
* optabs.cc (expand_atomic_test_and_set): When all attempts fail to
generate atomic code through target support, return NULL
instead of emitting non-atomic code.  Also, for code handling
targetm.atomic_test_and_set_trueval != 1, gcc_assert result
from calling emit_store_flag_force instead of returning NULL.

testsuite: Require thread-fence for 29_atomics/atomic_flag/cons/value_init.cc

A recent patch made __atomic_test_and_set no longer fall
back to emitting non-atomic code, but instead will then emit
a call to __atomic_test_and_set, thereby exposing the need
to gate also this test on support for atomics, similar to
r14-3980-g62b29347c38394.

libstdc++-v3:
* testsuite/29_atomics/atomic_flag/cons/value_init.cc: Add
dg-require-thread-fence.

RISC-V: Add zicond tests

These are tests from patch 3/5 of Ziao Zeng's zicond submission.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: New test.

Co-Authored-By: Jeff Law <jlaw@ventanamicro.com>

Ensure ssa_name is still valid.

When the IL changes, an equivalence set may contain ssa_names that no
longer exist. Ensure names are still valid and not in the free list.

PR tree-optimization/111599
gcc/
* value-relation.cc (relation_oracle::valid_equivs): Ensure
ssa_name is valid.

gcc/testsuite/
* gcc.dg/pr111599.c: New.

PR modula2/111510 runtime ICE findChildAndParent has caused internal runtime error

This patch fixes the runtime bug above.  The full runtime message is:
findChildAndParent has caused internal runtime error, RTentity is either
corrupt or the module storage has not been initialized yet.  The bug is
due to a non nul terminated string determining the module initialization order.
This results in modules being uninitialized and the above crash.  The bug
manifests itself on 32 bit systems - but obviously is latent on all
targets and the fix should be applied to both gcc-14 and gcc-13.

gcc/m2/ChangeLog:

PR modula2/111510
* gm2-compiler/M2GenGCC.mod (IsExportedGcc): Minor spacing changes.
(BuildTrashTreeFromInterface): Minor spacing changes.
* gm2-compiler/M2Options.mod (GetRuntimeModuleOverride): Call
string to generate a nul terminated C style string.
* gm2-compiler/M2Quads.mod (BuildStringAdrParam): New procedure.
(BuildM2InitFunction): Replace inline parameter generation with
calls to BuildStringAdrParam.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

AArch64: Remove BTI from outline atomics

The outline atomic functions have hidden visibility and can only be called
directly. Therefore we can remove the BTI at function entry. This improves
security by reducing the number of indirect entry points in a binary.
The BTI markings on the objects are still emitted.

libgcc/ChangeLog:
* config/aarch64/lse.S (BTI_C): Remove define.

MATCH: Simplify `(A ==/!= B) &/| (((cast)A) CMP C)`

This patch adds support to the pattern for `(A == B) &/| (A CMP C)`
where the second A could be casted to a different type.
Some were handled correctly if using seperate `if` statements
but not if combined with BIT_AND/BIT_IOR.
In the case of pr111456-1.c, the testcase would pass if
`--param=logical-op-non-short-circuit=0` was used but now
can be optimized always.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/106164
PR tree-optimization/111456

gcc/ChangeLog:

* match.pd (`(A ==/!= B) & (A CMP C)`):
Support an optional cast on the second A.
(`(A ==/!= B) | (A CMP C)`): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cmpbit-6.c: New test.
* gcc.dg/tree-ssa/cmpbit-7.c: New test.
* gcc.dg/tree-ssa/pr111456-1.c: New test.

PHIOPT: Fix minmax_replacement for three way

So when diamond bb support was added to minmax_replacement in r13-1950-g9bb19e143cfe,
the code was not expecting the alt_middle_bb not to exist if it was empty (for threeway_p).
So when factor_out_conditional_conversion was used to factor out conversions, it turns out
the assumption for alt_middle_bb to be wrong and we ended up with threeway_p being true but
having middle_bb being empty but alt_middle_bb not being empty which causes wrong code in
many cases.

This patch fixes the issue by adding a test for the 2 cases where the assumption on
threeway_p case having the other bb being empty.

Changes made:
v2: Fix test for `(a <= u) b = MAX(a, d) else b = u`.

Note my plan for GCC 15 is remove minmax_replacement as match.pd will catch all cases
at that point.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111469

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Fix
the assumption for the `non-diamond` handling cases
of diamond code.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr111469-1.c: New test.

MATCH: Optimize COND_ADD reduction pattern

Current COND_ADD reduction pattern can't optimize floating-point vector.
As Richard suggested: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631336.html
Allow COND_ADD reduction pattern to optimize floating-point vector.

Bootstrap and Regression is running.

Ok for trunk if tests pass ?

gcc/ChangeLog:

* match.pd: Optimize COND_ADD reduction pattern.

MATCH: Optimize COND_ADD_LEN reduction pattern

This patch leverage this commit: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=62b505a4d5fc89
to optimize COND_LEN_ADD reduction pattern.

We are doing optimization of VEC_COND_EXPR + COND_LEN_ADD -> COND_LEN_ADD.

Consider thsi following case:

void
pr11594 (uint64_t *restrict a, uint64_t *restrict b, int loop_size)
{
  uint64_t result = 0;

  for (int i = 0; i < loop_size; i++)
    {
      if (b[i] <= a[i])
{
  result += a[i];
}
    }

  a[0] = result;
}

Before this patch:
        vsetvli a7,zero,e64,m1,ta,ma
        vmv.v.i v2,0
        vmv1r.v v3,v2                    --- redundant
.L3:
        vsetvli a5,a2,e64,m1,ta,ma
        vle64.v v1,0(a3)
        vle64.v v0,0(a1)
        slli    a6,a5,3
        vsetvli a7,zero,e64,m1,ta,ma
        sub     a2,a2,a5
        vmsleu.vv       v0,v0,v1
        add     a1,a1,a6
        vmerge.vvm      v1,v3,v1,v0     ---- redundant.
        add     a3,a3,a6
        vsetvli zero,a5,e64,m1,tu,ma
        vadd.vv v2,v2,v1
        bne     a2,zero,.L3
        li      a5,0
        vsetvli a4,zero,e64,m1,ta,ma
        vmv.s.x v1,a5
        vredsum.vs      v2,v2,v1
        vmv.x.s a5,v2
        sd      a5,0(a0)
        ret

After this patch:

vsetvli a6,zero,e64,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a2,e64,m1,ta,ma
vle64.v v2,0(a4)
vle64.v v0,0(a1)
slli a3,a5,3
vsetvli a6,zero,e64,m1,ta,ma
sub a2,a2,a5
vmsleu.vv v0,v0,v2
add a1,a1,a3
vsetvli zero,a5,e64,m1,tu,mu
add a4,a4,a3
vadd.vv v1,v1,v2,v0.t
bne a2,zero,.L3
li a5,0
vsetivli zero,1,e64,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e64,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a5,v1
sd a5,0(a0)
ret

Bootstrap && Regression is running.

Ok for trunk when testing passes ?

PR tree-optimization/111594
PR tree-optimization/110660

gcc/ChangeLog:

* match.pd: Optimize COND_LEN_ADD reduction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111594.c: New test.

ada: Fix missing call to Finalize_Protection for simple protected objects

There is a glitch in Exp_Ch7.Build_Finalizer causing the finalizer to do
nothing for simple protected objects.

The change also removes redundant calls to the Is_Simple_Protected_Type
predicate and fixes a minor inconsistency between Requires_Cleanup_Actions
and Build_Finalizer for this case.

gcc/ada/

* exp_ch7.adb (Build_Finalizer.Process_Declarations): Remove call
to Is_Simple_Protected_Type as redundant.
(Build_Finalizer.Process_Object_Declaration): Do not retrieve the
corresponding record type for simple protected objects. Make the
flow of control more explicit in their specific processing.
* exp_util.adb (Requires_Cleanup_Actions): Return false for simple
protected objects present in library-level package bodies for the
sake of consistency with Build_Finalizer and remove call to
Is_Simple_Protected_Type as redundant.

ada: Fix deferred constant wrongly rejected

This recent regression occurs when the nominal subtype of the constant is a
discriminated record type with default discriminants.

gcc/ada/
PR ada/110488
* sem_ch3.adb (Analyze_Object_Declaration): Do not build a default
subtype for a deferred constant in the definite case too.

ada: Fix unnesting generated loops with nested finalization procedure

The compiler can generate loops for creating array aggregates, for
example used during the initialization of variable. If the component
type of the array element requires finalization, the compiler also
creates a block and a nested procedure that need to be correctly
unnested if unnesting is enabled. During the unnesting transformation,
the scopes for these inner blocks need to be fixed and set to the
enclosing loop entity.

gcc/ada/

* exp_ch7.adb (Contains_Subprogram): Recursively search for subp
in loop's statements.
(Unnest_Loop)<Fixup_Inner_Scopes>: New.
(Unnest_Loop): Rename local variable for more clarity.
* exp_unst.ads: Refresh comment.

ada: Crash processing the accessibility level of an actual parameter

gcc/ada/

* exp_ch6.adb (Expand_Call_Helper): When computing the
accessibility level of an actual parameter based on the
expresssion of a constant declaration, add missing support for
deferred constants

ada: Fix missing finalization of extended return object on abnormal completion

This happens in the case of a nonlimited return type and is a fallout of the
optimization recently implemented for them.

gcc/ada/

* einfo.ads (Status_Flag_Or_Transient_Decl): Remove ??? comment.
* exp_ch6.adb (Expand_N_Extended_Return_Statement): Extend the
handling of finalizable return objects to the non-BIP case.
* exp_ch7.adb (Build_Finalizer.Process_Declarations): Adjust the
comment accordingly.
* exp_util.adb (Requires_Cleanup_Actions): Likewise.

ada: Update personality function for CHERI purecap

This makes two changes to the GNAT personality function to reflect
differences for pure capability CHERI/Morello. The first is to use
__builtin_code_address_from_pointer to drop the LSB from Morello
code pointers when searching through call-site tables (without this
we would never find the right landing pad when unwinding).

The second change is to reflect the change in the exception table
format for pure-capability Morello where the landing pad is a capability
indirected by an offset in the call-site table.

gcc/ada/

* raise-gcc.c (get_ip_from_context): Adapt for CHERI purecap
(get_call_site_action_for): Adapt for CHERI purecap

ada: Fix conversions between addresses and integers

On CHERI targets the size of System.Address and Integer_Address
(or similar) are not the same. The operations in System.Storage_Elements
should be used to convert between integers and addresses.

gcc/ada/

* libgnat/a-tags.adb (To_Tag): Use System.Storage_Elements for
integer to address conversion.
* libgnat/s-putima.adb (Put_Image_Pointer): Likewise.

ada: Add CHERI variant of System.Stream_Attributes

Reading and writing System.Address to a stream on CHERI targets does
not preserve the capability tag; it will always be invalid since
a valid capability cannot be created out of thin air. Reading an Address
from a stream would therefore never yield a capability that can be
dereferenced.

This patch introduces a CHERI variant of System.Stream_Attributes that
raises Program_Error when attempting to read a System.Address from a stream.

gcc/ada/

* libgnat/s-stratt__cheri.adb: New file

ada: Define CHERI exception types

These exception types map to the CHERI hardware exceptions that are
triggered due to misuse of capabilities.

gcc/ada/

* libgnat/i-cheri.ads (Capability_Bound_Error)
(Capability_Permission_Error, Capability_Sealed_Error)
(Capability_Tag_Error): New, define CHERI exception types.

ada: Make minor corrections to CUDA-related comments

gcc/ada/

* exp_prag.adb: Make minor corrections in comments.
* rtsfind.ads: Remove unused element from RTU_Id definition.

ada: Dimensional analysis when used with elementary functions

gcc/ada/

* doc/gnat_ugn/gnat_and_program_execution.rst: Add more details on
using Generic Elementary Functions with dimensional analysis.
* gnat_ugn.texi: Regenerate.

ada: Clarify RM references that justify a constraint check

gcc/ada/

* exp_ch5.adb (Expand_N_Case_Statement): Reference both sections
of the Ada RM that deal with case statements and case expressions
to justify the insertion of a runtime check.

RISC-V: Support FP round auto-vectorization

This patch would like to support auto-vectorization for the
round API in math.h. It depends on the -ffast-math option.

When we would like to call round/roundf like v2 = round (v1),
we will convert it into below insns (reference the implementation of llvm).

* vfcvt.x.f v3, v1, RMM
* vfcvt.f.x v2, v3

However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:

  +------------+---------------+-----------------+
  | raw float  | binary layout | after round     |
  +------------+---------------+-----------------+
  | -8388607.5 | 0xcaffffff    | -8388608.0      |
  | 8388607.5  | 0x4affffff    | 8388608.0       |
  | 8388608.0  | 0x4b000000    | 8388608.0       |
  | 8388609.0  | 0x4b000001    | 8388609.0       |
  +------------+---------------+-----------------+

All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.

Befor this patch:
math-round-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw     fa0,0(s0)
  addi    s0,s0,4
  addi    s1,s1,4
  call    round
  fsw     fa0,-4(s1)
  bne     s0,s2,.L3

After this patch:
  ...
  fsrmi       4   // RMM, rounding to nearest, ties to max magnitude
.L4:
  vfabs.v     v2,v1
  vmflt.vf    v0,v2,fa5
  vfcvt.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  vfsgnj.vv   v2,v2,v1
  bne         .L4
.L14:
  fsrm        a6
  ret

Please note VLS mode is also involved in this patch and covered by the
test cases.

gcc/ChangeLog:

* config/riscv/autovec.md (round<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_round): New function decl.
* config/riscv/riscv-v.cc (expand_vec_round): New function impl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-round-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-round-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V/testsuite: Fix ILP32 RVV failures from missing <gnu/stubs-ilp32d.h>

In non-multilib installations system headers may not be available for
compilation options using a non-default model, causing build errors such
as:

In file included from .../include/features.h:527,
                 from .../include/assert.h:35,
                 from .../gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h:2,
                 from .../gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c:4:
.../include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No such file or directory

Therefore we have to be very cautious when trying to use a non-default
model in the testsuite, preferably avoiding to rely on headers that have
not been supplied by GCC itself, or otherwise verifying in a preparatory
step whether the given model is buildable in a given test environment.

In this case however we can easily avoid the issue, because <assert.h>
facilities are not used at all by "vmv-imm-template.h", which includes
the header.  Remove the inclusion then, turning these issues:

FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize (test for excess errors)
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize  scan-assembler-times vmv.v.i 32
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize  scan-assembler-times vmv.v.x 8
FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize (test for excess errors)
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  scan-assembler-times vmv.v.i 32
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  scan-assembler-times vmv.v.x 8

into successful results:

PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize (test for excess errors)
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize  scan-assembler-times vmv.v.i 32
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize  scan-assembler-times vmv.v.x 8
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize (test for excess errors)
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  scan-assembler-times vmv.v.i 32
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  scan-assembler-times vmv.v.x 8

in a plain LP64 `riscv64-linux-gnu' configuration.

gcc/testsuite/
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Remove
<assert.h> inclusion.

Darwin: Handle -dynamiclib on cc1 lines.

The changes of r14-4172 missed a case where we accept -dynamiclib on the
command line and then pass it to cc1 (which does not accept it).

This prunes the -dynamiclib from cc1 lines.

gcc/ChangeLog:

* config/darwin.h (DARWIN_CC1_SPEC): Remove -dynamiclib.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

invoke.texi: Update -fopenmp and -fopenmp-simd for omp::decl and loop semantic

gcc/ChangeLog:

PR middle-end/111547
* doc/invoke.texi (-fopenmp): Mention C++11 [[omp::decl(...)]] syntax.
(-fopenmp-simd): Likewise. Clarify 'loop' directive semantic.

RISC-V: Support FP rint auto-vectorization

This patch would like to support auto-vectorization for the
rint API in math.h. It depends on the -ffast-math option.

When we would like to call rint/rintf like v2 = rint (v1),
we will convert it into below insns (reference the implementation of llvm).

* vfcvt.x.f v3, v1
* vfcvt.f.x v2, v3

However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:

Assume we have RTZ rounding mode

  +------------+---------------+-----------------+
  | raw float  | binary layout | after int       |
  +------------+---------------+-----------------+
  | -8388607.5 | 0xcaffffff    | -8388607.0      |
  | 8388607.5  | 0x4affffff    | 8388607.0       |
  | 8388608.0  | 0x4b000000    | 8388608.0       |
  | 8388609.0  | 0x4b000001    | 8388609.0       |
  +------------+---------------+-----------------+

All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.

Befor this patch:
math-rint-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw     fa0,0(s0)
  addi    s0,s0,4
  addi    s1,s1,4
  call    rint
  fsw     fa0,-4(s1)
  bne     s0,s2,.L3

After this patch:
  vfabs.v     v2,v1
  vmflt.vf    v0,v2,fa5
  vfcvt.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  vfsgnj.vv   v2,v2,v1

Please note VLS mode is also involved in this patch and covered by the
test cases.

gcc/ChangeLog:

* config/riscv/autovec.md (rint<mode>2): New pattern.
* config/riscv/riscv-protos.h (expand_vec_rint): New function decl.
* config/riscv/riscv-v.cc (expand_vec_rint): New function impl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-rint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-rint-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Support FP nearbyint auto-vectorization

This patch would like to support auto-vectorization for the
nearbyint API in math.h. It depends on the -ffast-math option.

When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1),
we will convert it into below insns (reference the implementation of llvm).

* frflags a5
* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3
* fsflags a5

However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:

Assume we have RTZ rounding mode

  +------------+---------------+-----------------+
  | raw float  | binary layout | after nearbyint |
  +------------+---------------+-----------------+
  | 8388607.5  | 0x4affffff    | 8388607.0       |
  | 8388608.0  | 0x4b000000    | 8388608.0       |
  | 8388609.0  | 0x4b000001    | 8388609.0       |
  +------------+---------------+-----------------+

All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.

Befor this patch:
math-nearbyint-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw     fa0,0(s0)
  addi    s0,s0,4
  addi    s1,s1,4
  call    nearbyint
  fsw     fa0,-4(s1)
  bne     s0,s2,.L3

After this patch:
  vfabs.v     v2,v1
  vmflt.vf    v0,v2,fa5
  frflags     a7
  vfcvt.x.f.v v4,v1,v0.t
  vfcvt.f.x.v v2,v4,v0.t
  fsflags     a7
  vfsgnj.vv   v2,v2,v1

Please note VLS mode is also involved in this patch and covered by the
test cases.

gcc/ChangeLog:

* config/riscv/autovec.md (nearbyint<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_nearbyint): New function decl.
* config/riscv/riscv-v.cc (expand_vec_nearbyint): New func impl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/test-math.h: Add helper function.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Rename rounding const fp function for refactor

The rounding related API shared one const, rename it to avoid
unnecessary redundant code.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (gen_ceil_const_fp): Remove.
(get_fp_rounding_coefficient): Rename.
(gen_floor_const_fp): Remove.
(expand_vec_ceil): Take renamed func.
(expand_vec_floor): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

[PR111497][LRA]: Copy substituted equivalence

When we substitute the equivalence and it becomes shared, we can fail
to correctly update reg info used by LRA.  This can result in wrong
code generation, e.g. because of incorrect live analysis.  It can also
result in compiler crash as the pseudo survives RA.  This is what
exactly happened for the PR.  This patch solves this problem by
unsharing substituted equivalences.

gcc/ChangeLog:

PR middle-end/111497
* lra-constraints.cc (lra_constraints): Copy substituted
equivalence.
* lra.cc (lra): Change comment for calling unshare_all_rtl_again.

gcc/testsuite/ChangeLog:

PR middle-end/111497
* g++.target/i386/pr111497.C: new test.

Add missing return in gori_compute::logical_combine

The varying case currently falls through to the 1/true case.

gcc/
* gimple-range-gori.cc (gori_compute::logical_combine): Add missing
return statement in the varying case.

gcc/testsuite/
* gnat.dg/opt102.adb:New test.
* gnat.dg/opt102_pkg.adb, gnat.dg/opt102_pkg.ads: New helper.

libstdc++: Shorten integer std::to/from_chars symbol names

For std::to_chars:

The constrained alias __integer_to_chars_result_type seems unnecessary
ever since r10-3080-g28f0075742ed58 got rid of the only public overload
which used it.  Now only non-public overloads are constrained by it
(through their return type) and these non-public overloads aren't used
in a SFINAE context, so the constraints have no observable effect.  So
this patch gets rid of this alias, which greatly shortens the symbol names
of the affected functions (since the expanded alias is quite large).

For std::from_chars:

We can't get rid of the corresponding alias because its constrains the
public integer std::from_chars overload.  But we can avoid having the
constraint bloat the mangled name by instead encoding it as a defaulted
template parameter.  We use the non-type parameter form

  enable_if_t<..., int> = 0

instead of the type parameter form

  typename = enable_if_t<...>

because the type form can be bypassed by giving an explicit template
argument for the type parameter, e.g. 'std::from_chars<int, void>(...)',
so the non-type form seems like the more robust choice.

In passing, use __is_standard_integer in the constraint.

libstdc++-v3/ChangeLog:

* include/std/charconv (__detail::__integer_to_chars_result_type):
Remove.
(__detail::__to_chars_16): Use to_chars_result as return type.
(__detail::__to_chars_10): Likewise.
(__detail::__to_chars_8): Likewise.
(__detail::__to_chars_2): Likewise.
(__detail::__to_chars_i): Likewise.
(__detail::__integer_from_chars_result_type): Inline the
constraint into ...
(from_chars): ... here.  Use __is_standard_integer in the
constraint.  Encode constraint as a defaulted non-type template
parameter instead of within the return type.

Update baseline symbols for hppa-linux.

2023-09-25 John David Anglin <danglin@gcc.gnu.org>

libstdc++-v3/ChangeLog:

* config/abi/post/hppa-linux-gnu/baseline_symbols.txt: Update.

libstdc++: Prevent unwanted ADL in std::to_array [PR111512]

As noted in PR c++/111512, GCC does ADL for __builtin_memcpy if it is
unqualified, which can cause errors for template argument types which
cannot be completed.

Casting the memcpy arguments to void* prevents ADL from considering the
problem type.

libstdc++-v3/ChangeLog:

PR libstdc++/111511
PR c++/111512
* include/std/array (to_array): Cast memcpy arguments to void*.
* testsuite/23_containers/array/creation/111512.cc: New test.

libstdc++: Define C++23 std::forward_like (P2445R1)

libstdc++-v3/ChangeLog:

* include/bits/move.h (forward_list): Define for C++23.
* include/bits/version.def (forward_like): Define.
* include/bits/version.h: Regenerate.
* include/std/utility (__glibcxx_want_forward_like): Define.
* testsuite/20_util/forward_like/1.cc: New test.
* testsuite/20_util/forward_like/2_neg.cc: New test.
* testsuite/20_util/forward_like/version.cc: New test.

LoongArch: doc: Update -m[no-]explicit-relocs for r14-4160

gcc/ChangeLog:

* doc/invoke.texi: Update -m[no-]explicit-relocs for r14-4160.

Fix PR 110386: backprop vs ABSU_EXPR

The issue here is that when backprop tries to go
and strip sign ops, it skips over ABSU_EXPR but
ABSU_EXPR not only does an ABS, it also changes the
type to unsigned.
Since strip_sign_op_1 is only supposed to strip off
sign changing operands and not ones that change types,
removing ABSU_EXPR here is correct. We don't handle
nop conversions so this does cause any missed optimizations either.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/110386

gcc/ChangeLog:

* gimple-ssa-backprop.cc (strip_sign_op_1): Remove ABSU_EXPR.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr110386-1.c: New test.
* gcc.c-torture/compile/pr110386-2.c: New test.

RISC-V: Fix AVL/VL bug of VSETVL PASS[PR111548]

This patch fixes that AVL/VL reg incorrect fetch in VSETVL PASS.

C/C++ regression passed.

But gfortran didn't run yet. I am still finding a way to run it.

Will commit it when I pass the fortran regression.

PR target/111548

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (earliest_pred_can_be_fused_p): Bugfix

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111548.c: New test.

rs6000: Skip empty inline asm in rs6000_update_ipa_fn_target_info [PR111366]

PR111366 exposes one thing that can be improved in function
rs6000_update_ipa_fn_target_info is to skip the given empty
inline asm string, since it's impossible to adopt any
hardware features (so far HTM).

Since this rs6000_update_ipa_fn_target_info related approach
exists in GCC12 and later, the affected project highway has
updated its target pragma with ",htm", see the link:
https://github.com/google/highway/commit/15e63d61eb535f478bc
I'd not bother to consider an inline asm parser for now but
will file a separated PR for further enhancement.

PR target/111366

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_update_ipa_fn_target_info): Skip
empty inline asm.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr111366.C: New test.

rs6000: Use default target option node for callee by default [PR111380]

As PR111380 (and the discussion in related PRs) shows, for
now how function rs6000_can_inline_p treats the callee
without any target option node is wrong.  It considers it's
always safe to inline this kind of callee, but actually its
target flags are from the command line options
(target_option_default_node), it's possible that the flags
of callee don't satisfy the condition of inlining, but it
is still inlined, then result in unexpected consequence.

As the associated test case pr111380-1.c shows, the caller
main is attributed with power8, but the callee foo is
compiled with power9 from command line, it's unexpected to
make main inline foo since foo can contain something that
requires power9 capability.  Without this patch, for lto
(with -flto) we can get error message (as it forces the
callee to have a target option node), but for non-lto, it's
inlined unexpectedly.

This patch is to make callee adopt target_option_default_node
when it doesn't have a target option node, it can avoid wrong
inlining decision and fix the inconsistency between LTO and
non-LTO.  It also aligns with what the other ports do.

PR target/111380

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_can_inline_p): Adopt
target_option_default_node when the callee has no option
attributes, also simplify the existing code accordingly.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr111380-1.c: New test.
* gcc.target/powerpc/pr111380-2.c: New test.

LoongArch: Optimizations of vector construction.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_vecinit_merge_<LASX:mode>): New
pattern for vector construction.
(vec_set<mode>_internal): Ditto.
(lasx_xvinsgr2vr_<mode256_i_half>_internal): Ditto.
(lasx_xvilvl_<lasxfmt_f>_internal): Ditto.
* config/loongarch/loongarch.cc (loongarch_expand_vector_init):
Optimized the implementation of vector construction.
(loongarch_expand_vector_init_same): New function.
* config/loongarch/lsx.md (lsx_vilvl_<lsxfmt_f>_internal): New
pattern for vector construction.
(lsx_vreplvei_mirror_<lsxfmt_f>): New pattern for vector
construction.
(vec_concatv2df): Ditto.
(vec_concatv4sf): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vec-construct-opt.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vec-construct-opt.c: New test.

Daily bump.

RISC-V: Fix fortran ICE/PR111546 when RV32 vec_init

When broadcast the reperated element, we take the mask_int_mode
by mistake. This patch would like to fix it by leveraging the machine
mode of the element.

The below test case in RV32 will be fixed.

* gcc/testsuite/gfortran.dg/overload_5.f90

PR target/111546

gcc/ChangeLog:

* config/riscv/riscv-v.cc
(expand_vector_init_merge_repeating_sequence): Bugfix

Signed-off-by: Pan Li <pan2.li@intel.com>

Fortran: Pad mismatched charlens in component initializers [PR68155]

2023-09-24 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/68155
* decl.cc (fix_initializer_charlen): New function broken out of
add_init_expr_to_sym.
(add_init_expr_to_sym, build_struct): Call the new function.

PR fortran/111271
* trans-expr.cc (gfc_conv_intrinsic_to_class): Remove repeated
condition.

gcc/testsuite/
PR fortran/68155
* gfortran.dg/pr68155.f90: New test.

MATCH: Add `(X & ~Y) & Y` and `(X | ~Y) | Y`

Even though this gets optimized by reassociation, catching it more often
will always be better.

Note the reason why I didn't add `(X ^ ~Y) ^ Y` is that it gets caught
by prefering `~(X ^ Y)` to `(X ^ ~Y)` which then it is caught by the
the pattern for `(X ^ Y) ^ Y` already.

PR tree-optimization/111543

gcc/ChangeLog:

* match.pd (`(X & ~Y) & Y`, `(X | ~Y) | Y`): New patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-4.c: New test.

Daily bump.

RISC-V: Support full coverage VLS combine support

Support full coverage VLS combine support.

Committed.

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Extend VLS modes
* config/riscv/vector-iterators.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h:
* gcc.target/riscv/rvv/autovec/vls/cond_convert-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-9.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_copysign-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_ext-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_ext-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_ext-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_ext-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_ext-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mulh-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_narrow-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_narrow-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wadd-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wadd-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wadd-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wadd-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wfma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wfma-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wfms-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wfnma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wmul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wmul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wmul-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wsub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wsub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wsub-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_wsub-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/narrow-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/narrow-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/narrow-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wred-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wred-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wred-3.c: New test.

fortran: error recovery on duplicate declaration of class variable [PR95710]

gcc/fortran/ChangeLog:

PR fortran/95710
* class.cc (gfc_build_class_symbol): Do not try to build class
container for invalid typespec.
* resolve.cc (resolve_fl_var_and_proc): Prevent NULL pointer
dereference.
(resolve_symbol): Likewise.

gcc/testsuite/ChangeLog:

PR fortran/95710
* gfortran.dg/pr95710.f90: New test.

d: Merge upstream dmd, druntime 4574d1728d, phobos d7e79f024.

D front-end changes:

- Import dmd v2.105.0.
- Catch clause must take only `const' or mutable exceptions.
- Creating a `scope' class instance with a non-scope constructor
is now `@system' only with `-fpreview=dip1000'.
- Global `const' variables can no longer be initialized from a
non-shared static constructor

D runtime changes:

- Import druntime v2.105.0.

Phobos changes:

- Import phobos v2.105.0.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 4574d1728d.
* dmd/VERSION: Bump version to v2.105.0.
* d-diagnostic.cc (verror): Remove.
(verrorSupplemental): Remove.
(vwarning): Remove.
(vwarningSupplemental): Remove.
(vdeprecation): Remove.
(vdeprecationSupplemental): Remove.
(vmessage): Remove.
(vtip): Remove.
(verrorReport): New function.
(verrorReportSupplemental): New function.
* d-lang.cc (d_parse_file): Update for new front-end interface.
* decl.cc (d_mangle_decl): Update for new front-end interface.
* intrinsics.cc (maybe_set_intrinsic): Update for new front-end
interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 4574d1728d.
* src/MERGE: Merge upstream phobos d7e79f024.

testsuite: Add new test for already fixed PR111455

The following testcase has been fixed by r14-4231.

2023-09-23 Jakub Jelinek <jakub@redhat.com>

PR c++/111455
* g++.dg/ext/integer-pack8.C: New test.

RISC-V: Add VLS unary combine patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add VLS modes for conditional ABS/SQRT.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/cond_abs-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_sqrt-1.c: New test.

RISC-V: Suport FP floor auto-vectorization

This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.

When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).

* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3

However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.

  +-----------+---------------+-------------+
  | raw float | binary layout | after floor |
  +-----------+---------------+-------------+
  | 8388607.5 | 0x4affffff    | 8388607.0   |
  | 8388608.0 | 0x4b000000    | 8388608.0   |
  | 8388609.0 | 0x4b000001    | 8388609.0   |
  +-----------+---------------+-------------+

All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.

Befor this patch:
math-floor-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw     fa0,0(s0)
  addi    s0,s0,4
  addi    s1,s1,4
  call    ceilf
  fsw     fa0,-4(s1)
  bne     s0,s2,.L3

After this patch:
  ...
  fsrmi       2   // Rounding Down
.L4:
  vfabs.v     v1,v2
  vmflt.vf    v0,v1,fa5
  vfcvt.x.f.v v3,v2,v0.t
  vfcvt.f.x.v v1,v3,v0.t
  vfsgnj.vv   v1,v1,v2
  bne         .L4
.L14:
  fsrm        a6
  ret

Please note VLS mode is also involved in this patch and covered by the
test cases.

gcc/ChangeLog:

* config/riscv/autovec.md (floor<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_floor): New function decl.
* config/riscv/riscv-v.cc (gen_floor_const_fp): New function impl.
(expand_vec_floor): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-floor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-floor-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove FP run test for ceil.

FP16 is not well reconciled when linking.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: Remove.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

c++ __integer_pack conversion again [PR111357]

As Jakub pointed out, the real problem here is that in a partial
substitution we're forgetting the conversion to the type of the non-type
template argument, because maybe_convert_nontype_argument doesn't do
anything with value-dependent arguments. I'm experimenting with changing
that, but in the meantime we can work around it here.

PR c++/111357

gcc/cp/ChangeLog:

* pt.cc (expand_integer_pack): Use IMPLICIT_CONV_EXPR.

c++: constexpr and designated initializer

The change of active member being non-constant (before C++20) results in a
CONSTRUCTOR with a null value for the first field, don't crash.

gcc/cp/ChangeLog:

* constexpr.cc (free_constructor): Handle null ce->value.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-union7.C: New test.

c++: unroll pragma in templates [PR111529]

We were failing to handle ANNOTATE_EXPR in tsubst_copy_and_build, leading to
problems with substitution of any wrapped expressions.

Let's also not tell users that lambda templates are available in C++14.

PR c++/111529

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_declarator_opt): Don't suggest
-std=c++14 for lambda templates.
* pt.cc (tsubst_expr): Move ANNOTATE_EXPR handling...
(tsubst_copy_and_build): ...here.

gcc/testsuite/ChangeLog:

* g++.dg/ext/unroll-4.C: New test.

RISC-V: Refine the code gen for ceil auto vectorization.

We vectorized below ceil code already.

void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_ceilf (in[i]);
}

Before this patch:
vfmv.v.x    v4,fa0     // can be removed
vfabs.v     v0,v1
vmv1r.v     v2,v1      // can be removed
vmflt.vv    v0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1

After this patch:
vfabs.v     v1,v2
vmflt.vf    v0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2

We can generate better code include below items.

* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(emit_vec_float_cmp_mask): Rename.
(expand_vec_copysign): Ditto.
(emit_vec_copysign): Ditto.
(emit_vec_abs): New function impl.
(emit_vec_cvt_x_f): Ditto.
(emit_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add VLS mode widen ternary tests

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS modes.
* gcc.target/riscv/rvv/autovec/vls/wfma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wfma-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wfma-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wfms-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wfnma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wfnms-1.c: New test.

RISC-V: Add VLS widen binary combine patterns

Regression passed.

Committed.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Extend VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS modes cond tests.
* gcc.target/riscv/rvv/autovec/vls/wadd-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wadd-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wadd-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wadd-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wmul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wmul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wmul-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wsub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wsub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wsub-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/wsub-4.c: New test.

c++: missing SFINAE in grok_array_decl [PR111493]

We should guard both the diagnostic and backward compatibilty fallback
code with tf_error, so that in a SFINAE context we don't issue any
diagnostics and correctly treat ill-formed C++23 multidimensional
subscript operator expressions as such.

PR c++/111493

gcc/cp/ChangeLog:

* decl2.cc (grok_array_decl): Guard diagnostic and backward
compatibility fallback code paths with tf_error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/subscript15.C: New test.

c++: constraint rewriting during ttp coercion [PR111485]

In order to compare the constraints of a ttp with that of its argument,
we rewrite the ttp's constraints in terms of the argument template's
template parameters. The substitution to achieve this currently uses a
single level of template arguments, but that never does the right thing
because a ttp's template parameters always have level >= 2. This patch
fixes this by including the outer template arguments in the substitution,
which ought to match the depth of the ttp.

The second testcase demonstrates it's better to substitute the concrete
outer template arguments instead of generic ones since a ttp's constraints
could depend on outer parameters.

PR c++/111485

gcc/cp/ChangeLog:

* pt.cc (is_compatible_template_arg): New parameter 'args'.
Add the outer template arguments 'args' to 'new_args'.
(convert_template_argument): Pass 'args' to
is_compatible_template_arg.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp5.C: New test.
* g++.dg/cpp2a/concepts-ttp6.C: New test.

RISC-V: Move ceil test cases to unop folder

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/math-ceil-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c: ...here.
* gcc.target/riscv/rvv/autovec/test-math.h: Moved to...
* gcc.target/riscv/rvv/autovec/unop/test-math.h: ...here.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove @ of vec_duplicate pattern

It's obvious the @ of vec_duplicate pattern is duplicate.

Regression passed.

Committed.
gcc/ChangeLog:

* config/riscv/riscv-v.cc (gen_const_vector_dup): Use global expand function.
* config/riscv/vector.md (@vec_duplicate<mode>): Remove @.
(vec_duplicate<mode>): Ditto.

RISC-V: Add VLS conditional patterns support

Regression passed.

Committed.

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS conditional patterns.
* config/riscv/riscv-protos.h (expand_cond_unop): Ditto.
(expand_cond_binop): Ditto.
(expand_cond_ternop): Ditto.
* config/riscv/riscv-v.cc (expand_cond_unop): Ditto.
(expand_cond_binop): Ditto.
(expand_cond_ternop): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS conditional tests.
* gcc.target/riscv/rvv/autovec/vls/cond_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_and-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_div-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_div-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fma-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fms-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnma-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnms-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_ior-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_max-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_max-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_min-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_min-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mod-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_not-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_shift-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_shift-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_sub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_sub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_xor-1.c: New test.

RISC-V: Rename the test macro for math autovec test

Rename TEST_CEIL to TEST_UNARY_CALL for the underlying function
autovec patch testing.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/test-math.h: Rename.
* gcc.target/riscv/rvv/autovec/math-ceil-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Optimization of vrgather.vv into vrgatherei16.vv[PR111451]

Consider this following case:

typedef int32_t vnx32si __attribute__ ((vector_size (128)));

  __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2,     \
       TYPE *out)                      \
  {                                                                            \
    TYPE v                                                                     \
      = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \
    *(TYPE *) out = v;                                                         \
  }

  T (vnx32si, 32)                                                              \

TEST_ALL (PERMUTE)

Before this patch:
  li a4,31
  vsetvli a5,zero,e32,m8,ta,ma
  vl8re32.v v24,0(a0)
  vid.v v8
  vrsub.vx v8,v8,a4
  vrgather.vv v16,v24,v8
  vs8r.v v16,0(a2)
  ret

The index vector register "v8" occupies 8 registers.
We should optimize it into vrgatherei16.vv which is
using int16 as the index elements.

After this patch:
  vsetvli a5,zero,e16,m4,ta,ma
  li a4,31
  vid.v v4
  vl8re32.v v16,0(a0)
  vrsub.vx v4,v4,a4
  vsetvli zero,zero,e32,m8,ta,ma
  vrgatherei16.vv v8,v16,v4
  vs8r.v v8,0(a2)
  ret
With vrgatherei16.vv, the v8 will occupy 4 registers instead
of 8. Lower the register consuming and register pressure.

PR target/111451

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_gather_insn): Optimization of vrgather.vv
into vrgatherei16.vv.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust case.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto.

RISC-V: Remove arch and abi option for run test case.

Remove the -march and -mabi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Remove arch and abi.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Support combine cond extend and reduce sum to widen reduce sum

This patch support combining cond extend and reduce_sum to cond widen reduce_sum
like combine the following three insns:
   (set (reg:RVVM2HI 149)
        (if_then_else:RVVM2HI
          (unspec:RVVMF8BI [
            (const_vector:RVVMF8BI repeat [
              (const_int 1 [0x1])
            ])
            (reg:DI 146)
            (const_int 2 [0x2]) repeated x2
            (const_int 1 [0x1])
            (reg:SI 66 vl)
            (reg:SI 67 vtype)
          ] UNSPEC_VPREDICATE)
         (const_vector:RVVM2HI repeat [
           (const_int 0 [0])
         ])
         (unspec:RVVM2HI [
           (reg:SI 0 zero)
         ] UNSPEC_VUNDEF)))
  (set (reg:RVVM2HI 138)
    (if_then_else:RVVM2HI
      (reg:RVVMF8BI 135)
      (reg:RVVM2HI 148)
      (reg:RVVM2HI 149)))
  (set (reg:HI 150)
    (unspec:HI [
      (reg:RVVM2HI 138)
    ] UNSPEC_REDUC_SUM))
into one insn:
  (set (reg:SI 147)
    (unspec:SI [
      (if_then_else:RVVM2SI
        (reg:RVVMF16BI 135)
        (sign_extend:RVVM2SI (reg:RVVM1HI 136))
        (if_then_else:RVVM2HI
          (unspec:RVVMF8BI [
            (const_vector:RVVMF8BI repeat [
              (const_int 1 [0x1])
            ])
            (reg:DI 146)
            (const_int 2 [0x2]) repeated x2
            (const_int 1 [0x1])
            (reg:SI 66 vl)
            (reg:SI 67 vtype)
          ] UNSPEC_VPREDICATE)
         (const_vector:RVVM2HI repeat [
           (const_int 0 [0])
         ])
         (unspec:RVVM2HI [
           (reg:SI 0 zero)
         ] UNSPEC_VUNDEF)))
    ] UNSPEC_REDUC_SUM))

Consider the following C code:

int16_t foo (int8_t *restrict a, int8_t *restrict pred)
{
  int16_t sum = 0;
  for (int i = 0; i < 16; i += 1)
    if (pred[i])
      sum += a[i];
  return sum;
}

assembly before this patch:

foo:
        vsetivli        zero,16,e16,m2,ta,ma
        li      a5,0
        vmv.v.i v2,0
        vsetvli zero,zero,e8,m1,ta,ma
        vl1re8.v        v0,0(a1)
        vmsne.vi        v0,v0,0
        vsetvli zero,zero,e16,m2,ta,mu
        vle8.v  v4,0(a0),v0.t
        vmv.s.x v1,a5
        vsext.vf2       v2,v4,v0.t
        vredsum.vs      v2,v2,v1
        vmv.x.s a0,v2
        slliw   a0,a0,16
        sraiw   a0,a0,16
        ret

assembly after this patch:

foo:
li a5,0
vsetivli zero,16,e16,m1,ta,ma
vmv.s.x v3,a5
vsetivli zero,16,e8,m1,ta,ma
vl1re8.v v0,0(a1)
vmsne.vi v0,v0,0
vle8.v v2,0(a0),v0.t
vwredsum.vs v1,v2,v3,v0.t
vsetivli zero,0,e16,m1,ta,ma
vmv.x.s a0,v1
slliw a0,a0,16
sraiw a0,a0,16
ret

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_widen_reduc_plus_scal_<mode>):
New combine patterns.
* config/riscv/riscv-protos.h (enum insn_type): New insn_type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-2.c: New test.

RISC-V: Split VLS avl_type from NONVLMAX avl_type

This patch split a VLS avl_type from the NONVLMAX avl_type, denoting
those RVV insn with length set to the number of units of VLS modes.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum avl_type): New VLS avl_type.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Move comments.

RISC-V: Leverage __builtin_xx instead of math.h for test

The math.h may have problems in some environment, take __builtin__xx
instead for testing.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/floating-point-max-5.c:
Remove reference to math.h.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sgnjx-2.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Support ceil and ceilf auto-vectorization

Update in v4:

* Add test for _Float16.
* Remove unnecessary macro in def.h for test.

Original log:

This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.

When we would like to call ceil/ceilf like v2 = ceil (v1), we will
convert it into below insn (reference the implementation of llvm).

* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3

However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.

  +-----------+---------------+
  | float     | binary layout |
  +-----------+---------------+
  | 8388607.5 | 0x4affffff    |
  | 8388608.0 | 0x4b000000    |
  | 8388609.0 | 0x4b000001    |
  +-----------+---------------+

All single floating point great than 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.

Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw     fa0,0(s0)
  addi    s0,s0,4
  addi    s1,s1,4
  call    ceilf
  fsw     fa0,-4(s1)
  bne     s0,s2,.L3

After this patch:
  ...
  fsrmi   3
.L4:
  vfabs.v     v0,v1
  vmv1r.v     v2,v1
  vmflt.vv    v0,v0,v4
  sub         a3,a3,a4
  vfcvt.x.f.v v3,v1,v0.t
  vfcvt.f.x.v v2,v3,v0.t
  vfsgnj.vv   v2,v2,v1
  bne         .L4
.L14:
  fsrm    a6
  ret

Please note VLS mode is also involved in this patch and covered by the
test cases.

gcc/ChangeLog:

* config/riscv/autovec.md (ceil<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_ceil): New function decl.
* config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl.
(expand_vec_float_cmp_mask): Ditto.
(expand_vec_copysign): Ditto.
(expand_vec_ceil): Ditto.
* config/riscv/vector.md: Add VLS mode support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/math-ceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
* gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

RISC-V: Add VLS integer ABS support

Regression passed.

Committed.

gcc/ChangeLog:

* config/riscv/autovec.md: Extend VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/abs-2.c: New test.

RISC-V: Add more VLS unary tests

Notice we are missing these tests.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/abs-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/not-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/sqrt-1.c: New test.

RISC-V: Support VLS mult high

Regression passed.

Committed.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Extend VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS mult high.
* gcc.target/riscv/rvv/autovec/vls/mulh-1.c: New test.

RISC-V: Adjusting the comments of the emit_vlmax_insn/emit_vlmax_insn_lra/emit_nonvlmax_insn functions

V2 Change: Use Robin's comments.

This patch adjusts the comments of the
emit_vlmax_insn/emit_vlmax_insn_lra/emit_nonvlmax_insn functions.
The purpose of the adjustment is to make it clear that vlmax here is not
VLMAX as defined inside the RVV ISA. This is because this function is used
by RVV mode (e.g. RVVM1SImode) in addition to VLS mode (V16QI). For RVV mode,
it means the same thing, for VLS mode, it indicates setting the vl to the
number of units of the mode. Changed the comment because I didn't think of
a better name. If there is a suitable name, feel free to discuss it.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_insn): Adjust comments.
(emit_nonvlmax_insn): Adjust comments.
(emit_vlmax_insn_lra): Adjust comments.

Co-Authored-By: Robin Dapp <rdapp.gcc@gmail.com>

rust: Implement TARGET_RUST_OS_INFO for *-*-*linux*.

gcc/ChangeLog:

* config.gcc (*linux*): Set rust target_objs, and
target_has_targetrustm,
* config/t-linux (linux-rust.o): New rule.
* config/linux-rust.cc: New file.

rust: Implement TARGET_RUST_OS_INFO for i[34567]86-*-mingw* and x86_64-*-mingw*.

gcc/ChangeLog:

* config.gcc (i[34567]86-*-mingw* | x86_64-*-mingw*): Set
rust_target_objs and target_has_targetrustm.
* config/t-winnt (winnt-rust.o): New rule.
* config/winnt-rust.cc: New file.

rust: Implement TARGET_RUST_OS_INFO for *-*-fuchsia*.

gcc/ChangeLog:

* config.gcc (*-*-fuchsia): Set tmake_rule, rust_target_objs,
and target_has_targetrustm.
* config/fuchsia-rust.cc: New file.
* config/t-fuchsia: New file.

rust: Implement TARGET_RUST_OS_INFO for *-*-vxworks*

gcc/ChangeLog:

* config.gcc (*-*-vxworks*): Set rust_target_objs and
target_has_targetrustm.
* config/t-vxworks (vxworks-rust.o): New rule.
* config/vxworks-rust.cc: New file.

rust: Implement TARGET_RUST_OS_INFO for *-*-dragonfly*

gcc/ChangeLog:

* config.gcc (*-*-dragonfly*): Set rust_target_objs and
target_has_targetrustm.
* config/t-dragonfly (dragonfly-rust.o): New rule.
* config/dragonfly-rust.cc: New file.

rust: Implement TARGET_RUST_OS_INFO for *-*-solaris2*.

gcc/ChangeLog:

* config.gcc (*-*-solaris2*): Set rust_target_objs and
target_has_targetrustm.
* config/t-sol2 (sol2-rust.o): New rule.
* config/sol2-rust.cc: New file.

rust: Implement TARGET_RUST_OS_INFO for *-*-openbsd*

gcc/ChangeLog:

* config.gcc (*-*-openbsd*): Set rust_target_objs and
target_has_targetrustm.
* config/t-openbsd (openbsd-rust.o): New rule.
* config/openbsd-rust.cc: New file.

rust: Implement TARGET_RUST_OS_INFO for *-*-netbsd*

gcc/ChangeLog:

* config.gcc (*-*-netbsd*): Set rust_target_objs and
target_has_targetrustm.
* config/t-netbsd (netbsd-rust.o): New rule.
* config/netbsd-rust.cc: New file.

rust: Implement TARGET_RUST_OS_INFO for *-*-freebsd*

gcc/ChangeLog:

* config.gcc (*-*-freebsd*): Set rust_target_objs and
target_has_targetrustm.
* config/t-freebsd (freebsd-rust.o): New rule.
* config/freebsd-rust.cc: New file.

rust: Implement TARGET_RUST_OS_INFO for *-*-darwin*

gcc/ChangeLog:

* config.gcc (*-*-darwin*): Set rust_target_objs and
target_has_targetrustm.
* config/t-darwin (darwin-rust.o): New rule.
* config/darwin-rust.cc: New file.

rust: Implement TARGET_RUST_CPU_INFO for i[34567]86-*-* and x86_64-*-*

There are still quite a lot of the previously reverted i386-rust.cc
missing, so it's only a partial reimplementation.

gcc/ChangeLog:

* config/i386/t-i386 (i386-rust.o): New rule.
* config/i386/i386-rust.cc: New file.
* config/i386/i386-rust.h: New file.

rust: Reintroduce TARGET_RUST_OS_INFO hook

gcc/ChangeLog:

* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document TARGET_RUST_OS_INFO.

gcc/rust/ChangeLog:

* rust-session-manager.cc (Session::init): Call
targetrustm.rust_os_info.
* rust-target.def (rust_os_info): New hook.

rust: Reintroduce TARGET_RUST_CPU_INFO hook

gcc/ChangeLog:

* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add @node for Rust language and ABI, and document
TARGET_RUST_CPU_INFO.

gcc/rust/ChangeLog:

* rust-lang.cc (rust_add_target_info): Remove sorry.
* rust-session-manager.cc: Replace include of target.h with
include of tm.h and rust-target.h.
(Session::init): Call targetrustm.rust_cpu_info.
* rust-target.def (rust_cpu_info): New hook.
* rust-target.h (rust_add_target_info): Declare.