gcc.gnu.org Git - gcc.git/log

rtl-optimization: [PR102733] DSE removing address which only differ by address space.

The problem here is DSE was not taking into account the address space
which meant if you had two addresses say `fs:0` and `gs:0` (on x86_64),
DSE would think they were the same and remove the first store.
This fixes that issue by adding a check for the address space too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR rtl-optimization/102733

gcc/ChangeLog:

* dse.cc (store_info): Add addrspace field.
(record_store): Record the address space
and check to make sure they are the same.

gcc/testsuite/ChangeLog:

* gcc.target/i386/addr-space-6.c: New test.

Fix PR 110042: ifcvt regression due to paradoxical subregs

After r14-1014-gc5df248509b489364c573e8, GCC started to emit
directly a zero_extract for `(t1&0x8)!=0`. This introduced
a small regression where ifcvt would not do the ifconversion
as there is now a paradoxical subreg in the dest which
was being rejected. Since paradoxical subreg set the whole
register, we can treat it as the same as a reg in the two places.

OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

gcc/ChangeLog:

PR rtl-optimization/110042
* ifcvt.cc (bbs_ok_for_cmove_arith): Allow paradoxical subregs.
(bb_valid_for_noce_process_p): Strip the subreg for the SET_DEST.

gcc/testsuite/ChangeLog:

PR rtl-optimization/110042
* gcc.target/aarch64/csel_bfx_2.c: New test.

Darwin, PPC: Fix struct layout with pragma pack [PR110044].

This bug was essentially that darwin_rs6000_special_round_type_align()
was ignoring externally-imposed capping of field alignment.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/110044

gcc/ChangeLog:

* config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align):
Make sure that we do not have a cap on field alignment before altering
the struct layout based on the type alignment of the first entry.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/darwin-abi-13-0.c: New test.
* gcc.target/powerpc/darwin-abi-13-1.c: New test.
* gcc.target/powerpc/darwin-abi-13-2.c: New test.
* gcc.target/powerpc/darwin-structs-0.h: New test.

Fortran: fix diagnostics for SELECT RANK [PR100607]

gcc/fortran/ChangeLog:

PR fortran/100607
* resolve.cc (resolve_select_rank): Remove duplicate error.
(resolve_fl_var_and_proc): Prevent NULL pointer dereference and
suppress error message for temporary.

gcc/testsuite/ChangeLog:

PR fortran/100607
* gfortran.dg/select_rank_6.f90: New test.

btf: fix bootstrap -Wformat errors [PR110073]

Commit 7aae58b04b9 "btf: improve -dA comments for testsuite" broke
bootstrap on a number of architectures because it introduced some
new -Wformat errors.

Fix those errors by properly using PRIu64 and a small refactor to
the offending code.

Based on the suggested patch from Rainer Orth.

PR debug/110073

gcc/ChangeLog:

* btfout.cc (btf_absolute_func_id): New function.
(btf_asm_func_type): Call it here. Change index parameter from
size_t to ctf_id_t. Use PRIu64 formatter.

btf: Fix -Wformat errors

g:7aae58b04b92303ccda3ead600be98f0d4b7f462 introduced -Wformat errors
breaking bootstrap on some targets. This patch fixes that.

Committed as obvious.

gcc/ChangeLog:

* btfout.cc (btf_asm_type): Use PRIu64 instead of %lu for uint64_t.
(btf_asm_datasec_type): Likewise.

c++: fix explicit/copy problem [PR109247]

In the testcase, the user wants the assignment to use the operator= declared
in the class, but because [over.match.list] says that explicit constructors
are also considered for list-initialization, as affirmed in CWG1228, we end
up choosing the implicitly-declared copy assignment operator, using the
explicit constructor template for the argument, which is ill-formed. Other
implementations haven't implemented CWG1228, so we keep getting bug reports.

Discussion in CWG led to the idea for this targeted relaxation: if we use an
explicit constructor for the conversion to the argument of a copy or move
special member function, that makes the candidate worse than another.

DR 2735
PR c++/109247

gcc/cp/ChangeLog:

* call.cc (sfk_copy_or_move): New.
(joust): Add tiebreaker for explicit conv and copy ctor.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-explicit3.C: New test.

rs6000: Fix arguments for __builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrhx

The third argument for __builtin_altivec_tr_stxvrhx should be short *
not int *. Similarly, the third argument for __builtin_altivec_tr_stxvrwx
should be int * not short *. This patch fixes the arguments in the two
builtins.

A runnable test case is added to test the __builtin_altivec_tr_stxvrbx,
__builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx and
__builtin_altivec_tr_stxvrdx builtins.

gcc/
* config/rs6000/rs6000-builtins.def (__builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx): Fix type of third argument.

gcc/testsuite/
* gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c: New test
for __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx.

c++: make initializer_list array static again [PR110070]

After the maybe_init_list_as_* patches, I noticed that we were putting the
array of strings into .rodata, but then memcpying it into an automatic
array, which is pointless; we should be able to use it directly.

This doesn't happen automatically because TREE_ADDRESSABLE is set (since
r12-657 for PR100464), and so gimplify_init_constructor won't promote the
variable to static.  Theoretically we could do escape analysis to recognize
that the address, though taken, never leaves the function; that would allow
promotion when we're only using the address for indexing within the
function, as in initlist-opt2.C.  But this would be a new pass.

And in initlist-opt1.C, we're passing the array address to another function,
so it definitely escapes; it's only safe in this case because it's calling a
standard library function that we know only uses it for indexing.  So, a
flag seems needed.  I first thought to put the flag on the TARGET_EXPR, but
the VAR_DECL seems more appropriate.

In a previous revision of the patch I called this flag DECL_NOT_OBSERVABLE,
but I think DECL_MERGEABLE is a better name, especially if we're going to
apply it to the backing array of initializer_list, which is observable.  I
then also check it in places that check for -fmerge-all-constants, so that
multiple equivalent initializer-lists can also be combined.  And then it
seemed to make sense for [[no_unique_address]] to have this meaning for
user-written variables.

I think the note in [dcl.init.list]/6 intended to allow this kind of merging
for initializer_lists, but it didn't actually work; for an explicit array
with the same initializer, if the address escapes the program could tell
whether the same variable in two frames have the same address.  P2752 is
trying to correct this defect, so I'm going to assume that this is the
intent.

PR c++/110070
PR c++/105838

gcc/ChangeLog:

* tree.h (DECL_MERGEABLE): New.
* tree-core.h (struct tree_decl_common): Mention it.
* gimplify.cc (gimplify_init_constructor): Check it.
* cgraph.cc (symtab_node::address_can_be_compared_p): Likewise.
* varasm.cc (categorize_decl_for_section): Likewise.

gcc/cp/ChangeLog:

* call.cc (maybe_init_list_as_array): Set DECL_MERGEABLE.
(convert_like_internal) [ck_list]: Set it.
(set_up_extended_ref_temp): Copy it.
* tree.cc (handle_no_unique_addr_attribute): Set it.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/initlist-opt1.C: Check for static array.
* g++.dg/tree-ssa/initlist-opt2.C: Likewise.
* g++.dg/tree-ssa/initlist-opt4.C: New test.
* g++.dg/opt/icf1.C: New test.
* g++.dg/opt/icf2.C: New test.
* g++.dg/opt/icf3.C: New test.
* g++.dg/tree-ssa/array-temp1.C: Revert r12-657 change.

reg-stack: Change return type of predicate functions from int to bool

Also change some internal variables to bool and recode handling of
boolean varialbes to not use bitwise or.

gcc/ChangeLog:

* rtl.h (stack_regs_mentioned): Change return type from int to bool.
* reg-stack.cc (struct_block_info_def): Change "done" to bool.
(stack_regs_mentioned_p): Change return type from int to bool
and adjust function body accordingly.
(stack_regs_mentioned): Ditto.
(check_asm_stack_operands): Ditto.  Change "malformed_asm"
variable to bool.
(move_for_stack_reg): Recode handling of control_flow_insn_deleted.
(swap_rtx_condition_1): Change return type from int to bool
and adjust function body accordingly.  Change "r" variable to bool.
(swap_rtx_condition): Change return type from int to bool
and adjust function body accordingly.
(subst_stack_regs_pat): Recode handling of control_flow_insn_deleted.
(subst_stack_regs): Ditto.
(convert_regs_entry): Change return type from int to bool and adjust
function body accordingly.  Change "inserted" variable to bool.
(convert_regs_1): Recode handling of control_flow_insn_deleted.
(convert_regs_2): Recode handling of cfg_altered.
(convert_regs): Ditto.  Change "inserted" variable to bool.

varasm: check float size

In PR95226, the testcase was failing because we tried to output_constant a
NOP_EXPR to float from a double REAL_CST, and so we output a double where
the caller wanted a float. That doesn't happen anymore, but with the
output_constant hunk we will ICE in that situation rather than emit the
wrong number of bytes.

Part of the problem was that initializer_constant_valid_p_1 returned true
for that NOP_EXPR, because it compared the sizes of integer types but not
floating-point types. So the C++ front end assumed it didn't need to fold
the initializer.

PR c++/95226

gcc/ChangeLog:

* varasm.cc (output_constant) [REAL_TYPE]: Check that sizes match.
(initializer_constant_valid_p_1): Compare float precision.

analyzer: implement various atomic builtins [PR109015]

This patch implements many of the __atomic_* builtins from
sync-builtins.def as known_function subclasses within the analyzer.

gcc/analyzer/ChangeLog:
PR analyzer/109015
* kf.cc (class kf_atomic_exchange): New.
(class kf_atomic_exchange_n): New.
(class kf_atomic_fetch_op): New.
(class kf_atomic_op_fetch): New.
(class kf_atomic_load): New.
(class kf_atomic_load_n): New.
(class kf_atomic_store_n): New.
(register_atomic_builtins): New function.
(register_known_functions): Call register_atomic_builtins.

gcc/testsuite/ChangeLog:
PR analyzer/109015
* gcc.dg/analyzer/atomic-builtins-1.c: New test.
* gcc.dg/analyzer/atomic-builtins-haproxy-proxy.c: New test.
* gcc.dg/analyzer/atomic-builtins-qemu-sockets.c: New test.
* gcc.dg/analyzer/atomic-types-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: regions in different memory spaces can't alias

gcc/analyzer/ChangeLog:
* store.cc (store::eval_alias_1): Regions in different memory
spaces can't alias.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite: Require LTO for pr107557-[12].c

pr107557-[12].c invoke -flto option but do not check that the target
support LTO. This patch adds dg-require lto to the testcases.

* gcc.dg/pr107557-1.c: Require LTO support.
* gcc.dg/pr107557-2.c: Require LTO support.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>

doc: clarify semantics of vector bitwise shifts

Explicitly say that attempted shift past element bit width is UB for
vector types. Mention that integer promotions do not happen.

gcc/ChangeLog:

* doc/extend.texi (Vector Extensions): Clarify bitwise shift
semantics.

VECT: Change flow of decrement IV

Follow Richi's suggestion, I change current decrement IV flow from:

do {
 remain -= MIN (vf, remain);
} while (remain != 0);

into:

do {
 old_remain = remain;
 len = MIN (vf, remain);
 remain -= vf;
} while (old_remain >= vf);

to enhance SCEV.

Include fixes from kewen.

This patch will need to wait for Kewen's test feedback.

Testing on X86 is on-going

Co-Authored by: Kewen Lin <linkw@linux.ibm.com>

 PR tree-optimization/109971

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.

target/110088: Improve operation of l-reg with const after move from d-reg.

After reload, there may be sequences like
 lreg = dreg
 lreg = lreg <op> const
with an LD_REGS dreg, non-LD_REGS lreg, and <op> in PLUS, IOR, AND.
If dreg dies after the first insn, it is possible to use
 dreg = dreg <op> const
 lreg = dreg
instead which is more efficient.

gcc/
PR target/110088
* config/avr/avr.md: Add an RTL peephole to optimize operations on
non-LD_REGS after a move from LD_REGS.
(piaop): New code iterator.

libstdc++: Fix broken _GLIBCXX_PARALLEL mode

Add missing <parallel/search.h> include in <parallel/algobase.h>.

libstdc++-v3/ChangeLog:

* include/parallel/algobase.h: Include <parallel/search.h>.

Support parallel testing in libgomp: fallback Perl 'flock' [PR66005]

Follow-up to commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba
"Support parallel testing in libgomp, part II [PR66005]"
("..., and enable if 'flock' is available for serializing execution testing"),
where we saw:

> On my Dell Precision 7530 laptop:
>
> $ uname -srvi
> Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
> $ grep '^model name' < /proc/cpuinfo | uniq -c
> 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
> $ nvidia-smi -L
> GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)
>
> ... [...]: case (c) standard configuration, no offloading
> configured, [...]

> $ \time make check-target-libgomp
>
> Case (c), baseline; [...]:
>
> 1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k
> 1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k
>
> Case (c), parallelized [using 'flock']:
>
> [...]
> -j12 GCC_TEST_PARALLEL_SLOTS=12
> 2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k
> 2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k

Quite the same when instead of 'flock' using this fallback Perl 'flock':

 2565.23user 194.35system 4:46.77elapsed 962%CPU (0avgtext+0avgdata 505216maxresident)k
 2549.38user 200.20system 4:46.08elapsed 961%CPU (0avgtext+0avgdata 505216maxresident)k

PR testsuite/66005
gcc/
* doc/install.texi: Document (optional) Perl usage for parallel
testing of libgomp.
libgomp/
* testsuite/lib/libgomp.exp: 'flock' through stdout.
* testsuite/flock: New.
* configure.ac (FLOCK): Point to that if no 'flock' available, but
'perl' is.
* configure: Regenerate.

Remove stale Autoconf checks for Perl

Subversion r110220 (Git commit 03b8fe495d716c004f5491eb2347537f115ab2d8) for
PR25884 "libgomp should not require perl to compile" removed all '$(PERL)'
usage from libgomp -- but didn't remove the then-unused Autoconf Perl check
itself. Later, this Autoconf Perl check appears to have been copied from
libgomp into other GCC libraries, likewise unused.

libgomp/
* configure.ac (PERL): Remove.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
libatomic/
* configure.ac (PERL): Remove.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
libgm2/
* configure.ac (PERL): Remove.
* configure: Regenerate.
* Makefile.in: Likewise.
* libm2cor/Makefile.in: Likewise.
* libm2iso/Makefile.in: Likewise.
* libm2log/Makefile.in: Likewise.
* libm2min/Makefile.in: Likewise.
* libm2pim/Makefile.in: Likewise.
libitm/
* configure.ac (PERL): Remove.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.

Back to requiring "Perl version 5.6.1 (or later)" [PR82856]

With Subversion r265695 (Git commit 22e052725189a472e4e86ebb6595278a49f4bcdd)
"Update GCC to autoconf 2.69, automake 1.15.1 (PR bootstrap/82856)" we're back
to normal; per Automake 1.15.1 'configure.ac' still "[...] perl 5.6 or better
is required [...]".

PR bootstrap/82856
gcc/
* doc/install.texi (Perl): Back to requiring "Perl version 5.6.1 (or
later)".

Fortran: Fix some problems blocking associate meta-bug [PR87477]

2023-06-02 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/87477
* parse.cc (parse_associate): Replace the existing evaluation
of the target rank with calls to gfc_resolve_ref and
gfc_expression_rank. Identify untyped target function results
with structure constructors by finding the appropriate derived
type.
* resolve.cc (resolve_symbol): Allow associate variables to be
assumed shape.

gcc/testsuite/
PR fortran/87477
* gfortran.dg/associate_54.f90 : Cope with extra error.

PR fortran/102109
* gfortran.dg/pr102109.f90 : New test.

PR fortran/102112
* gfortran.dg/pr102112.f90 : New test.

PR fortran/102190
* gfortran.dg/pr102190.f90 : New test.

PR fortran/102532
* gfortran.dg/pr102532.f90 : New test.

PR fortran/109948
* gfortran.dg/pr109948.f90 : New test.

PR fortran/99326
* gfortran.dg/pr99326.f90 : New test.

RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233

Add _mu C++ overloaded intrinsics for load && viota && vid.

Co-authored-by: KuanLin Chen <best124612@gmail.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded intrinsics.
* config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): Ditto.

RISC-V: Optimize reverse series index vector

This patch optimizes the following seriese vector:
[nunits - 1, nunits - 2, ...., 0]

Before this patch:
vid
vmul
vadd

After this patch:
vid
vrsub

This patch is an obvious and simple optimization, ok for trunk?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Optimize reverse series index vector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Add assembly check.

RISC-V: Fix warning in predicated.md

Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mode_mask(rtx, machine_mode)’:
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
(match_test "INTVAL (op) == GET_MODE_MASK (HImode)
../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
|| INTVAL (op) == GET_MODE_MASK (SImode)"))))

gcc/ChangeLog:

* config/riscv/predicates.md: Change INTVAL into UINTVAL.

MAINTAINERS: Add myself as MIPS port maintainer

ChangeLog:

* MAINTAINERS (CPU Port Maintainers): Add myself as MIPS
port maintainer.
(Write After Approval): Remove myself.

RISC-V: Add test for vfloat16*_t (non tuple) types

This patch would like to add some test cases of vfloat16*_t (non tuple),
no 'zvfh' or 'zvfhmin' will meet unknown type.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-16.c: Add test cases.
* gcc.target/riscv/rvv/base/user-7.c: Likewise.

RISC-V: Add __RISCV_ prefix to VXRM and FRM enum

According to doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

Add __RISCV_ prefix to VXRM and FRM enum.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add
__RISCV_ prefix.
(DEF_RVV_FRM_ENUM): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/frm-1.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-1.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-10.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-11.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-12.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-6.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-7.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-8.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.

RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization

1. This patch optimize the codegen of the following auto-vectorization codes:

void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n)
{
 for (int i = 0; i < n; i++)
 c[i] = (int64_t)a[i] + b[i];
}

Combine instruction from:

...
vsext.vf2
vadd.vv
...

into:

...
vwadd.wv
...

Since for PLUS operation, GCC prefer the following RTL operand order when combining:

(plus: (sign_extend:..)
 (reg:)

instead of

(plus: (reg:..)
 (sign_extend:)

which is different from MINUS pattern.

I split patterns of vwadd/vwsub, and add dedicated patterns for them.

2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv
 optimization for complicate PLUS/MINUS codes, consider this following codes:

__attribute__ ((noipa)) void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
 int16_t *__restrict dst3, int8_t *__restrict a,
 int8_t *__restrict b, int8_t *__restrict a2,
 int8_t *__restrict b2, int n)
{
 for (int i = 0; i < n; i++)
 {
 dst[i] = (int16_t) a[i] + (int16_t) b[i];
 dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
 dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
 }
}

Before this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v2,0(a3)
vle8.v v1,0(a4)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2 v3,v2
vsext.vf2 v2,v1
vadd.vv v1,v2,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v1,0(a0)
vle8.v v4,0(a5)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2 v1,v4
vadd.vv v2,v1,v2
...

After this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v3,0(a4)
vle8.v v1,0(a3)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v2,v1,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v2,0(a0)
vle8.v v2,0(a5)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v4,v3,v2
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v4,0(a1)
vsetvli t4,zero,e8,mf2,ta,ma
sub a7,a7,a6
vwadd.vv v3,v2,v1
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v3,0(a2)
...

The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS
needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate
RTL IR, extend the other operand to generate vwadd.vv.

So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv
intrinsic API expander
* config/riscv/vector.md
(@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it.
(@pred_single_widen_sub<any_extend:su><mode>): New pattern.
(@pred_single_widen_add<any_extend:su><mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.

RISC-V: Support RVV permutation auto-vectorization

This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.

Fixed following comments from Robin.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_perm<mode>): New pattern.
* config/riscv/predicates.md (vector_perm_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_perm): New function.
* config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_vec_perm): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.

Daily bump.

Fortran: force error on bad KIND specifier [PR88552]

gcc/fortran/ChangeLog:

PR fortran/88552
* decl.cc (gfc_match_kind_spec): Use error path on missing right
parenthesis.
(gfc_match_decl_type_spec): Use error return when an error occurred
during matching a KIND specifier.

gcc/testsuite/ChangeLog:

PR fortran/88552
* gfortran.dg/pr88552.f90: New test.

testsuite: print any leaking torture options for debugging

This was helpful when debugging the recent multilib testsuite failure.

gcc/testsuite:
* lib/torture-options.exp: print the value of non-empty options:
torture_without_loops, torture_with_loops, LTO_TORTURE_OPTIONS.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

testsuite: Unbork multilib setups using -march flags (RISC-V)

RISC-V multilib testing is currently busted with follow splat all over:

| Schedule of variations:
| riscv-sim/-march=rv64imafdc/-mabi=lp64d/-mcmodel=medlow
| riscv-sim/-march=rv32imafdc/-mabi=ilp32d/-mcmodel=medlow
| riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
| riscv-sim/-march=rv64imac/-mabi=lp64/-mcmodel=medlow
...
...
| ERROR: tcl error code NONE
| ERROR: torture-init: torture_without_loops is not empty as expected

causing insane amount of false failures.

| ========= Summary of gcc testsuite =========
| | # of unexpected case / # of unique unexpected case
| | gcc | g++ | gfortran |
| rv64imafdc/ lp64d/ medlow | 5421 / 4 | 1 / 1 | 6 / 1 |
| rv32imafdc/ ilp32d/ medlow | 5422 / 5 | 3 / 2 | 6 / 1 |
| rv32imac/ ilp32/ medlow | 391 / 5 | 3 / 2 | 43 / 8 |
| rv64imac/ lp64/ medlow | 5422 / 5 | 1 / 1 | 43 / 8 |

The error splat itself is from recent test harness improvements for stricter
checks for torture-{init,finish} pairing. But the real issue is a latent bug
from 2009: commit 3dd1415dc88, ("i386-prefetch.exp: Skip tests when multilib
flags contain -march") which added an "early exit" condition to i386-prefetch.exp
which could potentially cause an unpaired torture-{init,finish}.

The early exit only happens in a multlib setup using -march in flags
which is what RISC-V happens to use, hence the reason this was only seen
on RISC-V multilib testing.

Moving the early exit outside of torture-{init,finish} bracket
reinstates RISC-V testing.

| rv64imafdc/ lp64d/ medlow | 3 / 2 | 1 / 1 | 6 / 1 |
| rv32imafdc/ ilp32d/ medlow | 4 / 3 | 3 / 2 | 6 / 1 |
| rv32imac/ ilp32/ medlow | 3 / 2 | 3 / 2 | 43 / 8 |
| rv64imac/ lp64/ medlow | 5 / 4 | 1 / 1 | 43 / 8 |

gcc/testsuite:
* gcc.misc-tests/i386-prefetch.exp: Move early return outside
the torture-{init,finish}

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

doc: improve docs for -pedantic{,-errors}

Recent discussion of -Wimplicit led me to want to clarify this section of
the documentation, and mark which diagnostics other than -Wpedantic are
affected by -pedantic-errors.

gcc/ChangeLog:

* doc/invoke.texi (-Wpedantic): Improve clarity.

testsuite: Skip powerpc tests on AIX.

AIX does not support -mstrict-align.

pr109566.c had skip directive in wrong order for DejaGNU.

* gcc.target/powerpc/pr100106-sa.c: Skip on AIX.
* gcc.target/powerpc/pr109566.c: Skip on AIX.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>

libstdc++: Fix PSTL test that fails in C++20

This test fails in C++20 and later due to a warning:

warning: C++20 says that these are ambiguous, even though the second is reversed:
note: candidate 1: 'bool MyClass::operator==(const MyClass&)'
note: candidate 2: 'bool MyClass::operator==(const MyClass&)' (reversed)
note: try making the operator a 'const' member function
FAIL: 26_numerics/pstl/numeric_ops/transform_reduce.cc (test for excess errors)

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/pstl/numeric_ops/transform_reduce.cc:
Add const to equality operator.

libstdc++: Do not use std::expected::value() in monadic ops (LWG 3938)

The monadic operations in std::expected always check has_value() so we
can avoid the execptional path in value() and the assertions in error()
by accessing _M_val and _M_unex directly. This means that the monadic
operations no longer require _M_unex to be copyable so that it can be
thrown from value(), as modified by LWG 3938.

This also fixes two incorrect uses of std::move in transform(F&&)& and
transform(F&&) const& which I found while making these changes.

Now that move-only error types are supported, it's possible to properly
test the constraints that LWG 3877 added to and_then and transform. The
lwg3877.cc test now does that.

libstdc++-v3/ChangeLog:

* include/std/expected (expected::and_then, expected::or_else)
(expected::transform_error): Use _M_val and _M_unex instead of
calling value() and error(), as per LWG 3938.
(expected::transform): Likewise. Remove incorrect std::move
calls from lvalue overloads.
(expected<void, E>::and_then, expected<void, E>::or_else)
(expected<void, E>::transform): Use _M_unex instead of calling
error().
* testsuite/20_util/expected/lwg3877.cc: Add checks for and_then
and transform, and for std::expected<void, E>.
* testsuite/20_util/expected/lwg3938.cc: New test.

libstdc++: Fix code size regressions in std::vector [PR110060]

My r14-1452-gfb409a15d9babc change to add optimization hints to
std::vector causes regressions because it makes std::vector::size() and
std::vector::capacity() too big to inline. That's the opposite of what
I wanted, so revert the changes to those functions.

To achieve the original aim of optimizing vec.assign(vec.size(), x) we
can add a local optimization hint to _M_fill_assign, so that it doesn't
affect all other uses of size() and capacity().

Additionally, add the same hint to the _M_assign_aux overload for
forward iterators and add that to the testcase.

It would be nice to similarly optimize:
if (vec1.size() == vec2.size()) vec1 = vec2;
but adding hints to operator=(const vector&) doesn't help. Presumably
the relationships between the two sizes and two capacities are too
complex to track effectively.

libstdc++-v3/ChangeLog:

PR libstdc++/110060
* include/bits/stl_vector.h (_Vector_base::_M_invariant):
Remove.
(vector::size, vector::capacity): Remove calls to _M_invariant.
* include/bits/vector.tcc (vector::_M_fill_assign): Add
optimization hint to reallocating path.
(vector::_M_assign_aux(FwdIter, FwdIter, forward_iterator_tag)):
Likewise.
* testsuite/23_containers/vector/capacity/invariant.cc: Moved
to...
* testsuite/23_containers/vector/modifiers/assign/no_realloc.cc:
...here. Check assign(FwdIter, FwdIter) too.
* testsuite/23_containers/vector/types/1.cc: Revert addition
of -Wno-stringop-overread option.

libstdc++: Document removal of implicit allocator rebinding extensions

Traditionally libstdc++ allowed containers and strings to be
instantiated with allocator's that have the wrong value type, implicitly
rebinding the allocator to the container's value type. Since C++20 that
has been explicitly ill-formed, so the extension is no longer supported
in strict modes (e.g. -std=c++17) and in C++20 and later.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document removal of implicit
allocator rebinding extensions in strict mode and for C++20.
* doc/html/*: Regenerate.

cse: Change return type of predicate functions from int to bool

Also change some function arguments to bool and remove one instance
of always zero function argument.

gcc/ChangeLog:

* rtl.h (exp_equiv_p): Change return type from int to bool.
* cse.cc (mention_regs): Change return type from int to bool
and adjust function body accordingly.
(exp_equiv_p): Ditto.
(insert_regs): Ditto. Change "modified" function argument to bool
and update usage accordingly.
(record_jump_cond): Remove always zero "reversed_nonequality"
function argument and update usage accordingly.
(fold_rtx): Change "changed" variable to bool.
(record_jump_equiv): Remove unneeded "reversed_nonequality" variable.
(is_dead_reg): Change return type from int to bool.

xtensa: Add 'adddi3' and 'subdi3' insn patterns

More optimized than the default RTL generation.

gcc/ChangeLog:

* config/xtensa/xtensa.md (adddi3, subdi3):
New RTL generation patterns implemented according to the instruc-
tion idioms described in the Xtensa ISA reference manual (p. 600).

PR target/109973: CCZmode and CCCmode variants of [v]ptest on x86.

This is my proposed minimal fix for PR target/109973 (hopefully suitable
for backporting) that follows Jakub Jelinek's suggestion that we introduce
CCZmode and CCCmode variants of ptest and vptest, so that the i386
backend treats [v]ptest instructions similarly to testl instructions;
using different CCmodes to indicate which condition flags are desired,
and then relying on the RTL cmpelim pass to eliminate redundant tests.

This conveniently matches Intel's intrinsics, that provide different
functions for retrieving different flags, _mm_testz_si128 tests the
Z flag, _mm_testc_si128 tests the carry flag. Currently we use the
same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
optimization is only valid when the ptest/vptest instruction is used to
set/test the Z flag.

The downside, as predicted by Jakub, is that GCC's cmpelim pass is
currently COMPARE-centric and not able to merge the ptests from expressions
such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
known issue, PR target/80040.

2023-06-01 Roger Sayle <roger@nextmovesoftware.com>
 Uros Bizjak <ubizjak@gmail.com>

gcc/ChangeLog
PR target/109973
* config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
CODE_for_sse4_1_ptestzv2di.
(__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
(__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
(__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
* config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
when expanding UNSPEC_PTEST to compare against zero.
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
(general_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
(timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
* config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype.
* config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to
check for suitable matching modes for the UNSPEC_PTEST pattern.
* config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
(*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn. Remove
":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode.
(<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
(<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
(<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
current behavior.
(*ptest<mode>_and): Specify CCZ to only perform this optimization
when only the Z flag is required.

gcc/testsuite/ChangeLog
PR target/109973
* gcc.target/i386/pr109973-1.c: New test case.
* gcc.target/i386/pr109973-2.c: Likewise.

libstdc++: optimize EH phase 2

In the ABI's two-phase EH model, first we walk the stack looking for a
handler, then we walk the stack running cleanups until we reach that
handler.  In the cleanup phase, we shouldn't redundantly check the handlers
along the way, e.g. when walking through g():

  void f() { throw 42; }
  void g() { try { f(); } catch (void *) { } }
  int main() { try { g(); } catch (int) { } }

libstdc++-v3/ChangeLog:

* libsupc++/eh_personality.cc (PERSONALITY_FUNCTION): Don't check
handlers in the cleanup phase.

doc: Fix description of x86 -m32 option [PR109954]

This option does not imply -march=i386 so it's incorrect to say it
generates code that will run on "any i386 system".

gcc/ChangeLog:

PR target/109954
* doc/invoke.texi (x86 Options): Fix description of -m32 option.

libstdc++: Fix condition for supported SIMD types on ARMv8

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/110050
* include/experimental/bits/simd.h (__vectorized_sizeof): With
__have_neon_a32 only single-precision float works (in addition
to integers).

aarch64: Add =r,m and =m,r alternatives to 64-bit vector move patterns

We can use the X registers to load and store 64-bit vector modes, we just need to add the alternatives
to the mov patterns. This straightforward patch does that and for the pair variants too.
For the testcase in the code we now generate the optimal assembly without any superfluous
GP<->SIMD moves.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Add =r,m and =r,m alternatives.
(load_pair<DREG:mode><DREG2:mode>): Likewise.
(vec_store_pair<DREG:mode><DREG2:mode>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/xreg-vec-modes_1.c: New test.

OpenMP/Fortran: Permit pure directives inside PURE

Update permitted directives for directives marked in OpenMP's 5.2 as pure.
To ensure that list is updated, unimplemented directives are placed into
pure-2.f90 such the test FAILs once a known to be pure directive is
implemented without handling its pureness.

gcc/fortran/ChangeLog:

* parse.cc (decode_omp_directive): Accept all pure directives
inside a PURE procedures; handle 'error at(execution).

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.2): Mark pure-directive handling as 'Y'.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/nothing-2.f90: Remove one dg-error.
* gfortran.dg/gomp/pr79154-2.f90: Update expected dg-error wording.
* gfortran.dg/gomp/pr79154-simd.f90: Likewise.
* gfortran.dg/gomp/pure-1.f90: New test.
* gfortran.dg/gomp/pure-2.f90: New test.
* gfortran.dg/gomp/pure-3.f90: New test.
* gfortran.dg/gomp/pure-4.f90: New test.

RISC-V: Introduce vfloat16m{f}*_t and their machine mode.

This patch would like to introduce the built-in type vfloat16m{f}*_t, as
well as their machine mode VNx*HF. They depend on architecture zvfhmin
or zvfh.

When givn the zvfhmin or zvfh, the macro TARGET_VECTOR_ELEN_FP_16 will
be true.

The underlying PATCH will implement the zvfhmin extension based on this.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add FP_16 mask to zvfhmin
and zvfh.
* config/riscv/genrvv-type-indexer.cc (valid_type): Allow FP16.
(main): Disable FP16 tuple.
* config/riscv/riscv-opts.h (MASK_VECTOR_ELEN_FP_16): New macro.
(TARGET_VECTOR_ELEN_FP_16): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add FP16.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4_t): New type.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
(vfloat16m8_t): Ditto.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_FP_16):
New macro.
* config/riscv/riscv-vector-switch.def (ENTRY): Allow FP16
machine mode based on TARGET_VECTOR_ELEN_FP_16.

libstdc++: Reduce <functional> inclusion to <stl_algobase.h>

Move the std::search definition from stl_algo.h to stl_algobase.h and use
the later in <functional>.

For consistency also move std::__parallel::search and associated helpers from
<parallel/stl_algo.h> to <parallel/stl_algobase.h> so that std::__parallel::search
is accessible along with std::search.

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h
(std::__search, std::search(_FwdIt1, _FwdIt1, _FwdIt2, _FwdIt2, _BinPred)): Move...
* include/bits/stl_algobase.h: ...here.
* include/std/functional: Replace <stl_algo.h> include by <stl_algobase.h>.
* include/parallel/algo.h (std::__parallel::search<_FIt1, _FIt2, _BinaryPred>)
(std::__parallel::__search_switch<_FIt1, _FIt2, _BinaryPred, _ItTag1, _ItTag2>):
Move...
* include/parallel/algobase.h: ...here.
* include/experimental/functional: Remove <bits/stl_algo.h> and <parallel/algorithm>
includes. Include <bits/stl_algobase.h>.

c++: make -fpermissive avoid -Werror=narrowing

Currently we make -Wnarrowing an error by default by forcing pedantic_errors
on, but for consistency -fpermissive should prevent that.

In general I'm inclined to move away from using permerror in favor of this
kind of model, with specific flags for each diagnostic.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Check flag_permissive.

Daily bump.

RISC-V: Add RVV FRM enum for floating-point rounding mode intriniscs

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (register_frm): New function.
(DEF_RVV_FRM_ENUM): New macro.
(handle_pragma_vector): Add FRM enum
* config/riscv/riscv-vector-builtins.def (DEF_RVV_FRM_ENUM): New macro.
(RNE): Ditto.
(RTZ): Ditto.
(RDN): Ditto.
(RUP): Ditto.
(RMM): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/frm-1.c: New test.

Refactor wi::bswap as a function (instead of a method).

This patch implements Richard Sandiford's suggestion from
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618215.html
that wi::bswap (and a new wi::bitreverse) should be functions,
and ideally only accessors are member functions. This patch
implements the first step, moving/refactoring wi::bswap.

2023-05-31 Roger Sayle <roger@nextmovesoftware.com>
 Richard Sandiford <richard.sandiford@arm.com>

gcc/ChangeLog
* fold-const-call.cc (fold_const_call_ss) <CFN_BUILT_IN_BSWAP*>:
Update call to wi::bswap.
* simplify-rtx.cc (simplify_const_unary_operation) <case BSWAP>:
Update call to wi::bswap.
* tree-ssa-ccp.cc (evaluate_stmt) <case BUILT_IN_BSWAP*>:
Update calls to wi::bswap.

* wide-int.cc (wide_int_storage::bswap): Remove/rename to...
(wi::bswap_large): New function, with revised API.
* wide-int.h (wi::bswap): New (template) function prototype.
(wide_int_storage::bswap): Remove method.
(sext_large, zext_large): Consistent indentation/line wrapping.
(bswap_large): Prototype helper function containing implementation.
(wi::bswap): New template wrapper around bswap_large.

libstdc++: Add separate autoconf macro for std::float_t and std::double_t [PR109818]

This should make it possible to use openlibm with djgpp (and other
targets with missing C99 <math.h> functions). The <math.h> from openlibm
provides all the functions, but not the float_t and double_t typedefs.
By separating the autoconf checks for the functionsand the typedefs, we
don't disable support for all the functions just because those typedefs
are not present.

libstdc++-v3/ChangeLog:

PR libstdc++/109818
* acinclude.m4 (GLIBCXX_ENABLE_C99): Add separate check for
float_t and double_t and define HAVE_C99_FLT_EVAL_TYPES.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_global/cmath (float_t, double_t): Guard using new
_GLIBCXX_HAVE_C99_FLT_EVAL_TYPES macro.

libstdc++: Stop using _GLIBCXX_USE_C99_MATH_TR1 in <cmath>

Similar to the three commits r14-908, r14-909 and r14-910, the
_GLIBCXX_USE_C99_MATH_TR1 macro is misleading when it is also used for
<cmath>, not only for <tr1/cmath> headers. It is also wrong, because the
configure checks for TR1 use -std=c++98 and a target might define the
C99 features for C++11 but not for C++98.

Add separate configure checks for the <math.h> functions using
-std=c++11 for the checks. Use the new macro defined by those checks in
the C++11-specific parts of <cmath>, and in <complex>, <random> etc.

The check that defines _GLIBCXX_NO_C99_ROUNDING_FUNCS is only needed for
the C++11 <cmath> checks, so remove that from GLIBCXX_CHECK_C99_TR1 and
only do it for GLIBCXX_ENABLE_C99.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_C99): Add checks for C99 math
functions and define _GLIBCXX_USE_C99_MATH_FUNCS. Move checks
for C99 rounding functions to here.
(GLIBCXX_CHECK_C99_TR1): Remove checks for C99 rounding
functions from here.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/bits/random.h: Use _GLIBCXX_USE_C99_MATH_FUNCS instead
of _GLIBCXX_USE_C99_MATH_TR1.
* include/bits/random.tcc: Likewise.
* include/c_compatibility/math.h: Likewise.
* include/c_global/cmath: Likewise.
* include/ext/random: Likewise.
* include/ext/random.tcc: Likewise.
* include/std/complex: Likewise.
* testsuite/20_util/from_chars/4.cc: Likewise.
* testsuite/20_util/from_chars/8.cc: Likewise.
* testsuite/26_numerics/complex/proj.cc: Likewise.
* testsuite/26_numerics/headers/cmath/60401.cc: Likewise.
* testsuite/26_numerics/headers/cmath/types_std_c++0x.cc:
Likewise.
* testsuite/lib/libstdc++.exp (check_v3_target_cstdint):
Likewise.
* testsuite/util/testsuite_random.h: Likewise.

libstdc++: Express std::vector's size() <= capacity() invariant in code

This adds optimizer hints so that GCC knows that size() <= capacity() is
always true. This allows the compiler to optimize away re-allocating
paths when assigning new values to the vector without resizing it, e.g.,
vec.assign(vec.size(), new_val).

libstdc++-v3/ChangeLog:

* include/bits/stl_vector.h (_Vector_base::_M_invariant()): New
function.
(vector::size(), vector::capacity()): Call _M_invariant().
* testsuite/23_containers/vector/capacity/invariant.cc: New test.
* testsuite/23_containers/vector/types/1.cc: Add suppression for
false positive warning (PR110060).

libstdc++: Fix build for targets without _Float128 [PR109921]

My r14-1431-g7037e7b6e4ac41 change caused the _Float128 overload to be
compiled unconditionally, by moving the USE_STRTOF128_FOR_FROM_CHARS
check into the function body. That function should still only be
compiled if the target actually supports _Float128.

libstdc++-v3/ChangeLog:

PR libstdc++/109921
* src/c++17/floating_from_chars.cc: Check __FLT128_MANT_DIG__ is
defined before trying to use _Float128.

libstdc++: Fix configure test for 32-bit targets

The -mlarge model for msp430-elf uses 20-bit pointers, which means that
sizeof(void*) == 4 and so the r14-1432-g51cf0b3949b88b change gives the
wrong answer. Check __INTPTR_WIDTH__ >= 32 instead.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Fix for 32-bit pointers
to check __INT_PTR_WIDTH__ instead of sizeof(void*).
* configure: Regenerate.

testsuite: rename force_conventional_output

The procedure force_conventional_output_for is a bit misnomed, what it
primarily does is to set the required options for the corresponding
test. So rename the proc to set_required_options_for and also rename the
participating variable accordingly.

gcc/testsuite/ChangeLog:

* lib/gcc-dg.exp: Rename gcc_force_conventional_output to
gcc_set_required_options.
* lib/target-supports.exp: Rename force_conventional_output_for
to set_required_options_for.
* lib/scanasm.exp: Adjust callers.
* lib/scanrtl.exp: Same.

aarch64: PR target/99195 Annotate dot-product patterns for vec-concat-zero

This straightforward patch annotates the dotproduct instructions, including the i8mm ones.
Tests included.
Nothing unexpected here.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Rename to...
(<sur>dot_prod<vsi2qi><vczle><vczbe>): ... This.
(usdot_prod<vsi2qi>): Rename to...
(usdot_prod<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<sur>dot_lane<vsi2qi>): Rename to...
(aarch64_<sur>dot_lane<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<sur>dot_laneq<vsi2qi>): Rename to...
(aarch64_<sur>dot_laneq<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<DOTPROD_I8MM:sur>dot_lane<VB:isquadop><VS:vsi2qi>): Rename to...
(aarch64_<DOTPROD_I8MM:sur>dot_lane<VB:isquadop><VS:vsi2qi><vczle><vczbe>):
... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_11.c: New test.

aarch64: PR target/99195 Annotate saturating mult patterns for vec-concat-zero

This patch goes through the various alphabet soup saturating multiplication patterns, including those in TARGET_RDMA
and annotates them with <vczle><vczbe>. Many other patterns are widening and always write the full 128-bit vectors
so this annotation doesn't apply to them. Nothing out of the ordinary in this patch.

Bootstrapped and tested on aarch64-none-linux and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_sq<r>dmulh<mode>): Rename to...
(aarch64_sq<r>dmulh<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_n<mode>): Rename to...
(aarch64_sq<r>dmulh_n<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_lane<mode>): Rename to...
(aarch64_sq<r>dmulh_lane<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_laneq<mode>): Rename to...
(aarch64_sq<r>dmulh_laneq<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode><vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for qdmulh, qrdmulh.
* gcc.target/aarch64/simd/pr99195_10.c: New test.

btf: improve -dA comments for testsuite

Many BTF type kinds refer to other types via index to the final types
list. However, the order of the final types list is not guaranteed to
remain the same for the same source program between different runs of
the compiler, making it difficult to test inter-type references.

This patch updates the assembler comments output when writing a
given BTF record to include minimal information about the referenced
type, if any. This allows for the regular expressions used in the gcc
testsuite to do some basic integrity checks on inter-type references.

For example, for the type

unsigned int *

Assembly comments like the following are written with -dA:

.4byte 0 ; TYPE 2 BTF_KIND_PTR ''
.4byte 0x2000000 ; btt_info: kind=2, kflag=0, vlen=0
.4byte 0x1 ; btt_type: (BTF_KIND_INT 'unsigned int')

Several BTF tests which can immediately be made more robust with this
change are updated. It will also be useful in new tests for the upcoming
btf_type_tag support.

gcc/

* btfout.cc (btf_kind_names): New.
(btf_kind_name): New.
(btf_absolute_var_id): New utility function.
(btf_relative_var_id): Likewise.
(btf_relative_func_id): Likewise.
(btf_absolute_datasec_id): Likewise.
(btf_asm_type_ref): New.
(btf_asm_type): Update asm comments and use btf_asm_type_ref ().
(btf_asm_array): Likewise. Accept ctf_container_ref parameter.
(btf_asm_varent): Likewise.
(btf_asm_func_arg): Likewise.
(btf_asm_datasec_entry): Likewise.
(btf_asm_datasec_type): Likewise.
(btf_asm_func_type): Likewise. Add index parameter.
(btf_asm_enum_const): Likewise.
(btf_asm_sou_member): Likewise.
(output_btf_vars): Update btf_asm_* call accordingly.
(output_asm_btf_sou_fields): Likewise.
(output_asm_btf_enum_list): Likewise.
(output_asm_btf_func_args_list): Likewise.
(output_asm_btf_vlen_bytes): Likewise.
(output_btf_func_types): Add ctf_container_ref parameter.
Pass it to btf_asm_func_type.
(output_btf_datasec_types): Update btf_asm_datsec_type call similarly.
(btf_output): Update output_btf_func_types call similarly.

gcc/testsuite/

* gcc.dg/debug/btf/btf-array-1.c: Use new BTF asm comments
in scan-assembler expressions where useful.
* gcc.dg/debug/btf/btf-anonymous-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-anonymous-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-2.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-3.c: Likewise.
* gcc.dg/debug/btf/btf-datasec-2.c: Likewise.
* gcc.dg/debug/btf/btf-enum-1.c: Likewise.
* gcc.dg/debug/btf/btf-function-6.c: Likewise.
* gcc.dg/debug/btf/btf-pointers-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-2.c: Likewise.
* gcc.dg/debug/btf/btf-typedef-1.c: Likewise.
* gcc.dg/debug/btf/btf-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-2.c: Likewise. Update outdated comment.
* gcc.dg/debug/btf/btf-function-3.c: Update outdated comment.

btf: be clear when record size/type is not used

All BTF type records have a 4-byte field used to encode a size or link
to another type, depending on the type kind. But BTF_KIND_ARRAY and
BTF_KIND_FWD do not use this field at all, and should write zero.

GCC already correctly writes zero in this field for these type kinds,
but the process is not straightforward and results in the -dA comment
claiming the field is a reference to another type. This patch makes
the behavior explicit and updates the assembler comment to state
clearly that the field is unused.

gcc/

* btfout.cc (btf_asm_type): Add dedicated cases for BTF_KIND_ARRAY
and BTF_KIND_FWD which do not use the size/type field at all.

emit-rtl: Change return type of predicate functions from int to bool

Also fix some stalled comments.

gcc/ChangeLog:

* rtl.h (subreg_lowpart_p): Change return type from int to bool.
(active_insn_p): Ditto.
(in_sequence_p): Ditto.
(unshare_all_rtl): Change return type from int to void.
* emit-rtl.h (mem_expr_equal_p): Change return type from int to bool.
* emit-rtl.cc (subreg_lowpart_p): Change return type from int to bool
and adjust function body accordingly.
(mem_expr_equal_p): Ditto.
(unshare_all_rtl): Change return type from int to void
and adjust function body accordingly.
(verify_rtx_sharing): Remove unneeded return.
(active_insn_p): Change return type from int to bool
and adjust function body accordingly.
(in_sequence_p): Ditto.

alias: Change return type of predicate functions from int to bool

Also remove a bunch of unneeded forward declarations.

gcc/ChangeLog:

* rtl.h (true_dependence): Change return type from int to bool.
(canon_true_dependence): Ditto.
(read_dependence): Ditto.
(anti_dependence): Ditto.
(canon_anti_dependence): Ditto.
(output_dependence): Ditto.
(canon_output_dependence): Ditto.
(may_alias_p): Ditto.
* alias.h (alias_sets_conflict_p): Ditto.
(alias_sets_must_conflict_p): Ditto.
(objects_must_conflict_p): Ditto.
(nonoverlapping_memrefs_p): Ditto.
* alias.cc (rtx_equal_for_memref_p): Remove forward declaration.
(record_set): Ditto.
(base_alias_check): Ditto.
(find_base_value): Ditto.
(mems_in_disjoint_alias_sets_p): Ditto.
(get_alias_set_entry): Ditto.
(decl_for_component_ref): Ditto.
(write_dependence_p): Ditto.
(memory_modified_1): Ditto.
(mems_in_disjoint_alias_set_p): Change return type from int to bool
and adjust function body accordingly.
(alias_sets_conflict_p): Ditto.
(alias_sets_must_conflict_p): Ditto.
(objects_must_conflict_p): Ditto.
(rtx_equal_for_memref_p): Ditto.
(base_alias_check): Ditto.
(read_dependence): Ditto.
(nonoverlapping_memrefs_p): Ditto.
(true_dependence_1): Ditto.
(true_dependence): Ditto.
(canon_true_dependence): Ditto.
(write_dependence_p): Ditto.
(anti_dependence): Ditto.
(canon_anti_dependence): Ditto.
(output_dependence): Ditto.
(canon_output_dependence): Ditto.
(may_alias_p): Ditto.
(init_alias_analysis): Change "changed" variable to bool.

RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

Base on V1 patch, adding comment:
;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine PASS
;; to combine instructions as below:
;; vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv

gcc/ChangeLog:

* config/riscv/autovec.md (<optab><v_double_trunc><mode>2): Change
expand into define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test.

RISC-V: Add testcase for vrsub.vi auto-vectorization

Apparently, we are missing vrsub.vi tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto.

Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching support.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to float widening conversion)

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode switching support.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove FRM for vfncvt.rod instruction

Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: Add pattern for bswap + rotate [PR 110039]

After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
pattern to match the new GIMPLE form.

With this patch, gcc.target/aarch64/rev16_2.c passes again.

2023-05-31 Christophe Lyon <christophe.lyon@linaro.org>

PR target/110039
gcc/
* config/aarch64/aarch64.md (aarch64_rev16si2_alt3): New
pattern.

libstdc++: Do not include <exception> in <mutex>

We previously needed <exception> in <mutex> for the std::lock_error
exception class, but that was moved out of <mutex> in 2009 when it was
removed from the C++0x draft. We can stop including <exception> now.

Move the include for <bits/error_constants.h> to <bits/unique_lock.h>
where it's actually used, and only include <errno.h> in <mutex> (for
EAGAIN and EDEADLK).

Also add some headers to <mutex> that are needed but are not included
directly: <bits/functexcept.h>, <bits/invoke.h> and <bits/move.h>.

libstdc++-v3/ChangeLog:

* include/bits/unique_lock.h: Include <bits/error_constants.h>
here for std::errc constants.
* include/std/mutex: Do not include <bits/error_constants.h> and
<exception> here.

libstdc++: Replace obsolete shell syntax in configure.ac

The current POSIX standard says that the -a and -o operators to the
'test' utility are obsolete, and the shell operators && and || should be
used instead.

libstdc++-v3/ChangeLog:

* configure.ac: Replace use of -o operator for test.
* configure: Regenerate.

libstdc++: Add missing noexcept to std::scoped_allocator_adaptor

The standard requires these constructors and accessors to be noexcept.

libstdc++-v3/ChangeLog:

* include/std/scoped_allocator (scoped_allocator_adaptor): Add
noexcept to all constructors except the default constructor.
(scoped_allocator_adaptor::inner_allocator): Add noexcept.
(scoped_allocator_adaptor::outer_allocator): Likewise.
* testsuite/20_util/scoped_allocator/noexcept.cc: New test.

libstdc++: Add std::numeric_limits<__float128> specialization [PR104772]

As suggested by Jakub in the PR, this just hardcodes the constants with
a Q suffix, since the properties of __float128 are not going to change.

We can only define it for non-strict modes because the suffix gives an
error otherwise, even in system headers:

limits:2085: error: unable to find numeric literal operator 'operator""Q'

libstdc++-v3/ChangeLog:

PR libstdc++/104772
* include/std/limits (numeric_limits<__float128>): Define.
* testsuite/18_support/numeric_limits/128bit.cc: New test.

libstdc++: Disable embedded tzdata for all 16-bit targets

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Extend logic for avr and
msp430 to all 16-bit targets.
* configure: Regenerate.

libstdc++: Fix preprocessor conditions for std::from_chars [PR109921]

We use the from_chars_strtod function with __strtof128 to read a
_Float128 value, but from_chars_strtod is not defined unless uselocale
is available. This can lead to compilation failures for some targets,
because we try to define the _Flaot128 overload in terms of a
non-existing from_chars_strtod function.

Only try to use __strtof128 if uselocale is available, otherwise
fallback to the long double overload of std::from_chars (which might
fallback to the double overload, which should use fast_float).

This ensures we always define the full set of overloads, even if they
are not always accurate for all values of the wider types.

libstdc++-v3/ChangeLog:

PR libstdc++/109921
* src/c++17/floating_from_chars.cc (USE_STRTOF128_FOR_FROM_CHARS):
Only define when USE_STRTOD_FOR_FROM_CHARS is also defined.
(USE_STRTOD_FOR_FROM_CHARS): Do not undefine when long double is
binary64.
(from_chars(const char*, const char*, double&, chars_format)):
Check __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ here.
(from_chars(const char*, const char*, _Float128&, chars_format))
Only use from_chars_strtod when USE_STRTOD_FOR_FROM_CHARS is
defined, otherwise parse a long double and convert to _Float128.

libstdc++: Deprecate std::setfill for std::basic_istream [PR109922]

Prior to N0966 (July 1996) the std::setfill manipulator was specified to
work with both input and output streams. In the final C++98 standard it
is only specified to work with output streams.

We have always supported it for input streams, despite that never being
in the standard, and having no meaning for any input streams defined by
the standard. This commit adds a deprecated attribute to the overload
for input streams, so that we can stop supporting this some day.

libstdc++-v3/ChangeLog:

PR libstdc++/109922
* include/std/iomanip (operator>>(basic_istream&, _Setfill)):
Add deprecated attribute to non-standard overload.
* doc/xml/manual/evolution.xml: Document deprecation.
* doc/html/*: Regenerate.
* testsuite/27_io/manipulators/standard/char/1.cc: Add
dg-warning for expected deprecated warning.
* testsuite/27_io/manipulators/standard/char/2.cc: Likewise.
* testsuite/27_io/manipulators/standard/wchar_t/1.cc: Likewise.
* testsuite/27_io/manipulators/standard/wchar_t/2.cc: Likewise.

ipa/109983 - (IPA) PTA speedup

This improves the edge avoidance heuristic by re-ordering the
topological sort of the graph to make sure the component with
the ESCAPED node is processed first. This improves the number
of created edges which directly correlates with the number
of bitmap_ior_into calls from 141447426 to 239596 and the
compile-time from 1083s to 3s. It also improves the compile-time
for the related PR109143 from 81s to 27s.

I've modernized the topological sorting API on the way as well.

PR ipa/109983
PR tree-optimization/109143
* tree-ssa-structalias.cc (struct topo_info): Remove.
(init_topo_info): Likewise.
(free_topo_info): Likewise.
(compute_topo_order): Simplify API, put the component
with ESCAPED last so it's processed first.
(topo_visit): Adjust.
(solve_graph): Likewise.

IPA PTA stats enhancement and non-details dump slimming

The following keeps track of the number of edges we avoid to create
because they redundandly feed ESCAPED. It also avoids printing
a header for -details when not using -details.

* tree-ssa-structalias.cc (constraint_stats::num_avoided_edges):
New.
(add_graph_edge): Count redundant edges we avoid to create.
(dump_sa_stats): Dump them.
(ipa_pta_execute): Do not dump generating constraints when
we are not dumping them.

aarch64: Simplify output template emission code for a few patterns

If the output code for a define_insn just does a switch (which_alternative) with no other computation we can almost always
replace it with more compact MD syntax for each alternative in a mult-alternative '@' block.
This patch cleans up some such patterns in the aarch64 backend, making them shorter and more concise.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>): Rewrite
output template to avoid explicit switch on which_alternative.
(*aarch64_simd_mov<VQMOV:mode>): Likewise.
(and<mode>3): Likewise.
(ior<mode>3): Likewise.
* config/aarch64/aarch64.md (*mov<mode>_aarch64): Likewise.

xtensa: Improve "*shlrd_reg" insn pattern and its variant

The insn "*shlrd_reg" shifts two registers with a funnel shifter by the
third register to get a single word result:

 reg0 = (reg1 SHIFT_OP0 reg3) BIT_JOIN_OP (reg2 SHIFT_OP1 (32 - reg3))

where the funnel left shift is SHIFT_OP0 := ASHIFT, SHIFT_OP1 := LSHIFTRT
and its right shift is SHIFT_OP0 := LSHIFTRT, SHIFT_OP1 := ASHIFT,
respectively. And also, BIT_JOIN_OP can be either PLUS or IOR in either
shift direction.

 [(set (match_operand:SI 0 "register_operand" "=a")
(match_operator:SI 6 "xtensa_bit_join_operator"
[(match_operator:SI 4 "logical_shift_operator"
[(match_operand:SI 1 "register_operand" "r")
(match_operand:SI 3 "register_operand" "r")])
(match_operator:SI 5 "logical_shift_operator"
[(match_operand:SI 2 "register_operand" "r")
(neg:SI (match_dup 3))])]))]

Although the RTL matching template can express it as above, there is no
way of direcing that the operator (operands[6]) that combines the two
individual shifts is commutative.
Thus, if multiple insn sequences matching the above pattern appear
adjacently, the combiner may accidentally mix them up and get partial
results.

This patch adds a new insn-and-split pattern with the two sides swapped
representation of the bit-combining operation that was lacking and
described above.

And also changes the other "*shlrd" variants from previously describing
the arbitraryness of bit-combining operations with code iterators to a
combination of the match_operator and the predicate above.

gcc/ChangeLog:

* config/xtensa/predicates.md (xtensa_bit_join_operator):
New predicate.
* config/xtensa/xtensa.md (ior_op): Remove.
(*shlrd_reg): Rename from "*shlrd_reg_<code>", and add the
insn_and_split pattern of the same name to express and capture
the bit-combining operation with both sides swapped.
In addition, replace use of code iterator with new operator
predicate.
(*shlrd_const, *shlrd_per_byte):
Likewise regarding the code iterator.

Fix ICE in rewrite_expr_tree_parallel

1. Limit the value of tree-reassoc-width to IntegerRange(0, 256).
2. Add width limit in rewrite_expr_tree_parallel.

gcc/ChangeLog:

PR tree-optimization/110038
* params.opt: Add a limit on tree-reassoc-width.
* tree-ssa-reassoc.cc
(rewrite_expr_tree_parallel): Add width limit.

gcc/testsuite/ChangeLog:

PR tree-optimization/110038
* gcc.dg/pr110038.c: New test.

RISC-V: Add ZVFH extension to the -march= option

This patch would like to add new sub extension (aka ZVFH) to the -march= option.
To make it simple, only the sub extension itself is involved in this patch, and
the underlying FP16 related RVV intrinsic API depends on the TARGET_ZVFH.

The Zvfh extension depends on the Zve32f and Zfhmin extensions. You can locate
more information about ZVFH from below spec doc.

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfh item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv-opts.h (MASK_ZVFH): New macro.
(TARGET_ZVFH): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-21.c: New test.
* gcc.target/riscv/predef-27.c: New test.

RISC-V: Fix unreachable test code for init repeat sequence.

This patch fix one unreachable test code, which is for debugging purpose
without cleanup before commit.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c:
Remove debug code.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
intermediate integer type whenever gimple ranger can tell it's safe.

.i.e.
When there's no direct optab for vector long long -> vector float, but
the value range of integer can be represented as int, try vector int
-> vector float if availble.

gcc/ChangeLog:

PR tree-optimization/108804
* tree-vect-patterns.cc (vect_get_range_info): Remove static.
* tree-vect-stmts.cc (vect_create_vectorized_demotion_stmts):
Add new parameter narrow_src_p.
(vectorizable_conversion): Enhance NARROW FLOAT_EXPR
vectorization by truncating to lower precision.
* tree-vectorizer.h (vect_get_range_info): New declare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108804.c: New test.

testsuite: add verify-sarif-file to some testcases that were missing it

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/malloc-sarif-1.c: Add missing verify-sarif-file
directive.
* gcc.dg/analyzer/sarif-pr107366.c: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[libstdc++] [testsuite] xfail double-prec from_chars for x86_64 ldbl

When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail. Mark them as such on x86_64-*-vxworks*.

for libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/4.cc: Skip long double test06
on x86_64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
x86_64-vxworks.

testsuite/52641: Fix more of implicit int=32 assumption fallout.

gcc/testsuite/
PR testsuite/52641
* gcc.dg/torture/pr107451.c: Require int32plus.
* gcc.dg/torture/pr108574-3.c: Use __INT32_TYPE__ instead of int.
* gcc.dg/torture/pr109940.c: Use __INTPTR_TYPE__ instead of long.
* gcc.dg/torture/pr95248.c: Require size24plus.
* gcc.dg/torture/pr95295-3.c: Use var_* with at least 32 bits int.
* gcc.dg/torture/pr98640.c: Cast to __INT32_TYPE__ instead of int.
* gcc.dg/tree-ssa/pr103771.c: Use int with at least 32 bits.

LRA: Update insn sp offset if its input reload changes SP

The patch fixes a bug when there is input reload changing SP.  The bug was
triggered by switching H8300 target to LRA.  The insn in question is

(insn 21 20 22 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
        (reg/f:SI 31)) "j.c":10:3 19 {*movsi}
     (expr_list:REG_DEAD (reg/f:SI 31)
        (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
            (nil))))

The memory address is reloaded but the SP offset for the original insn was not updated.

gcc/ChangeLog:

* lra-int.h (lra_update_sp_offset): Add the prototype.
* lra.cc (setup_sp_offset): Change the return type.  Use
lra_update_sp_offset.
* lra-eliminations.cc (lra_update_sp_offset): New function.
(lra_process_new_insns): Push the current insn to reprocess if the
input reload changes sp offset.

i386: Fix misleading identation in i386-expand.cc [PR110041]

gcc/ChangeLog:

PR target/110041
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Fix misleading identation.

jump: Change return type of predicate functions from int to bool

gcc/ChangeLog:

* rtl.h (comparison_dominates_p): Change return type from int to bool.
(condjump_p): Ditto.
(any_condjump_p): Ditto.
(any_uncondjump_p): Ditto.
(simplejump_p): Ditto.
(returnjump_p): Ditto.
(eh_returnjump_p): Ditto.
(onlyjump_p): Ditto.
(invert_jump_1): Ditto.
(invert_jump): Ditto.
(rtx_renumbered_equal_p): Ditto.
(redirect_jump_1): Ditto.
(redirect_jump): Ditto.
(condjump_in_parallel_p): Ditto.
* jump.cc (invert_exp_1): Adjust forward declaration.
(comparison_dominates_p): Change return type from int to bool
and adjust function body accordingly.
(simplejump_p): Ditto.
(condjump_p): Ditto.
(condjump_in_parallel_p): Ditto.
(any_uncondjump_p): Ditto.
(any_condjump_p): Ditto.
(returnjump_p): Ditto.
(eh_returnjump_p): Ditto.
(onlyjump_p): Ditto.
(redirect_jump_1): Ditto.
(redirect_jump): Ditto.
(invert_exp_1): Ditto.
(invert_jump_1): Ditto.
(invert_jump): Ditto.
(rtx_renumbered_equal_p): Ditto.

MAINTAINERS: Add myself to write after approval

2023-05-30 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

ChangeLog:
* MAINTAINERS (Write After Approval): Add myself.

testsuite: make mve_intrinsic_type_overloads-int.c libc-agnostic

Glibc defines int32_t as 'int' while newlib defines it as 'long int'.

Although these correspond to the same size, g++ complains when using the
'wrong' version:
 invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
or
 invalid conversion from 'int*' to 'int32_t*' {aka 'long int*'} [-fpermissive]

when calling vst1q(int32*, int32x4_t) with a first parameter of type
'long int *' (resp. 'int *')

To make this test pass with any type of toolchain, this patch defines
'word_type' according to which libc is in use.

2023-05-23 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c:
Support both definitions of int32_t.

Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparison

This patch adds the support for match that was implemented for PR 87913 in phiopt.
It implements it by adding support to minmax_from_comparison for the check.
It uses the range information if available which allows to produce MIN/MAX expression
when comparing against the lower/upper bound of the range instead of lower/upper
of the type.

minmax-20.c is the new testcase which tests the ranges part.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* fold-const.cc (minmax_from_comparison): Add support for NE_EXPR.
* match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern):
Add ne as a possible cmp.
((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-22.c: New test.

MATCH: Move `a <= CST1 ? MAX<a, CST2> : a` optimization to match

This moves the `a <= CST1 ? MAX<a, CST2> : a` optimization
from phiopt to match. It just adds a new pattern to match.pd.

There is one more change needed before being able to remove
minmax_replacement from phiopt.

A few notes on the testsuite changes:
* phi-opt-5.c is now able to optimize at phiopt1 so remove
the xfail.
* pr66726-4.c can be optimized during fold before phiopt1
so need to change the scanning.
* pr66726-5.c needs two phiopt passes currently to optimize
to the right thing, it needed 2 phiopt passes before, the cast
from int to unsigned char is the reason.
* pr66726-6.c is what the original pr66726-4.c was testing
before the fold was able to optimize it.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`(a CMP CST1) ? max<a,CST2> : a`): New
pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail.
* gcc.dg/tree-ssa/pr66726-4.c: Change how scanning
works.
* gcc.dg/tree-ssa/pr66726-5.c: New test.
* gcc.dg/tree-ssa/pr66726-6.c: New test.

Fix ACLE data-intrinsics testcases

data-intrinsics-assembly.c forces -march=armv6 using dg-add-options
arm_arch_v6, which implicitly adds -mfloat-abi=softfp.

However, for a toolchain configured for arm-linux-gnueabihf and
--with-arch=armv7-a, the testcase will fail when including arm_acle.h
(which includes stdint.h, which will fail to include the non-existing
gnu/stubs-soft.h).

Other effective-targets related to arm_acle.h would also pass because
they first try without -mfloat-abi=softfp, so it seems the
simplest/safest is to add { dg-require-effective-target arm_softfp_ok }
to make sure arm_arch_v6_ok's assumption is valid.

The patch also fixes what seems to be an oversight in
data-intrinsics-armv6.c: it requires arm_arch_v6_ok, but uses
arm_arch_v6t2: the patch makes it require arm_arch_v6t2_ok.

2023-05-30 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/acle/data-intrinsics-armv6.c: Fix typo.
* gcc.target/arm/acle/data-intrinsics-assembly.c: Require
arm_softfp_ok.

libstdc++: Correct NTTP and simd_mask ctor call

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/109822
* include/experimental/bits/simd.h (to_native): Use int NTTP
as specified in PTS2.
(to_compatible): Likewise. Add missing tag to call mask
generator ctor.
* testsuite/experimental/simd/pr109822_cast_functions.cc: New
test.