gcc.gnu.org Git - gcc.git/log

aarch64: Add eh_return compile tests

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/eh_return-2.c: New test.
* gcc.target/aarch64/eh_return-3.c: New test.

aarch64: Do not force a stack frame for EH returns

EH returns no longer rely on clobbering the return address on the stack
so forcing a stack frame is not necessary.

This does not actually change the code gen for the unwinder since there
are calls before the EH return.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_needs_frame_chain): Do not
force frame chain for eh_return.

aarch64: Use br instead of ret for eh_return

The expected way to handle eh_return is to pass the stack adjustment
offset and landing pad address via

  EH_RETURN_STACKADJ_RTX
  EH_RETURN_HANDLER_RTX

to the epilogue that is shared between normal return paths and the
eh_return paths.  EH_RETURN_HANDLER_RTX is the stack slot of the
return address that is overwritten with the landing pad in the
eh_return case and EH_RETURN_STACKADJ_RTX is a register added to sp
right before return and it is set to 0 in the normal return case.

The issue with this design is that eh_return and normal return may
require different return sequence but there is no way to distinguish
the two cases in the epilogue (the stack adjustment may be 0 in the
eh_return case too).

The reason eh_return and normal return requires different return
sequence is that control flow integrity hardening may need to treat
eh_return as a forward-edge transfer (it is not returning to the
previous stack frame) and normal return as a backward-edge one.
In case of AArch64 forward-edge is protected by BTI and requires br
instruction and backward-edge is protected by PAUTH or GCS and
requires ret (or authenticated ret) instruction.

This patch resolves the issue by introducing EH_RETURN_TAKEN_RTX that
is a flag set to 1 in the eh_return path and 0 in normal return paths.
Branching on the EH_RETURN_TAKEN_RTX flag, the right return sequence
can be used in the epilogue.

The handler could be passed the old way via clobbering the return
address, but since now the eh_return case can be distinguished, the
handler can be in a different register than x30 and no stack frame
is needed for eh_return.

This patch fixes a return to anywhere gadget in the unwinder with
existing standard branch protection as well as makes EH return
compatible with the Guarded Control Stack (GCS) extension.

Some tests are adjusted because eh_return no longer prevents pac-ret
in the normal return path.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_eh_return_handler_rtx):
Remove.
* config/aarch64/aarch64.cc (aarch64_return_address_signing_enabled):
Sign return address even in functions with eh_return.
(aarch64_expand_epilogue): Conditionally return with br or ret.
(aarch64_eh_return_handler_rtx): Remove.
* config/aarch64/aarch64.h (EH_RETURN_TAKEN_RTX): Define.
(EH_RETURN_STACKADJ_RTX): Change to R5.
(EH_RETURN_HANDLER_RTX): Change to R6.
* df-scan.cc: Handle EH_RETURN_TAKEN_RTX.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document EH_RETURN_TAKEN_RTX.
* except.cc (expand_eh_return): Handle EH_RETURN_TAKEN_RTX.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/return_address_sign_1.c: Move func4 to ...
* gcc.target/aarch64/return_address_sign_2.c: ... here and fix the
scan asm check.
* gcc.target/aarch64/return_address_sign_b_1.c: Move func4 to ...
* gcc.target/aarch64/return_address_sign_b_2.c: ... here and fix the
scan asm check.

GCN: Remove 'last_arg' spec function

The LLVM 13.0.1 assembler ('llvm-mc') indeed still does complain in presence of
multiple '-mcpu=[...]' options:

as: for the --mcpu option: may only occur zero or one times!

However, as of
"GCN: Tag '-march=[...]', '-mtune=[...]' as 'Negative' of themselves [PR112669]",
the GCC-side special handling is no longer necessary.

gcc/
* config.gcc <amdgcn-*-amdhsa> (extra_gcc_objs): Don't set.
* config/gcn/driver-gcn.cc: Remove.
* config/gcn/gcn-hsa.h (ASM_SPEC, EXTRA_SPEC_FUNCTIONS): Remove
'last_arg' spec function.
* config/gcn/t-gcn-hsa (driver-gcn.o): Remove.

GCN: Tag '-march=[...]', '-mtune=[...]' as 'Negative' of themselves [PR112669]

Certain other command-line flags are mutually exclusive (random example: GCN
'-march=gfx906', '-march=gfx908').  If they're not appropriately marked up,
this does disturb the multilib selection machinery, for example:

    $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-multi-directory -march=gfx906
    gfx906
    $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-multi-directory -march=gfx908
    gfx908
    $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-multi-directory -march=gfx906 -march=gfx908
    .

In the last invocation, '-march=gfx900 -march=gfx906', for example, in
'gcc/gcc.cc:set_multilib_dir' we see both flags -- which there doesn't exist a
matching multilib for, therefore we "fail" to the default ('.').  Tagges as
'Negative', only the last flag survives, and we, for example, get the expected:

    $ build-gcc-offload-amdgcn-amdhsa/gcc/xgcc -print-multi-directory -march=gfx906 -march=gfx908
    gfx908

I quickly found that the same also applies to GCN's '-mtune=[...]', but I've
not otherwise reviewed the GCN options.

PR target/112669
gcc/
* config/gcn/gcn.opt (march=, mtune=): Tag as 'Negative' of
themselves.

hurd: Ad default-pie and static-pie support

This fixes the Hurd spec in the default-pie case, and adds static-pie
support.

gcc/ChangeLog:

* config/i386/gnu.h: Use PIE_SPEC, add static-pie case.
* config/i386/gnu64.h: Use PIE_SPEC, add static-pie case.

hurd: Add multilib paths for gnu-x86_64

We need the multilib paths in gcc to find e.g. glibc crt files on
Debian. This is essentially based on t-linux64 version.

gcc/ChangeLog:

* config/i386/t-gnu64: New file.
* config.gcc [x86_64-*-gnu*]: Add i386/t-gnu64 to
tmake_file.

aarch64: Remove redundant zeroing/merging in SVE intrinsics [PR106326]

Many predicated SVE intrinsics provide three forms of predication:
zeroing, merging, and any/dont-care. All three are equivalent when
the predicate is all-true, so this patch drops the zeroing and
merging in that case.

gcc/
PR target/106326
* config/aarch64/aarch64-sve-builtins.h (is_ptrue): Declare.
* config/aarch64/aarch64-sve-builtins.cc (is_ptrue): New function.
(gimple_folder::redirect_pred_x): Likewise.
(gimple_folder::fold): Use it.

gcc/testsuite/
PR target/106326
* gcc.target/aarch64/sve/acle/general/pr106326_1.c: New test.

aarch64: Move and generalise vect_all_same

The fix for PR106329 needs a way of testing for a ptrue of a particular
element size. We already had such a function for svlast, so this patch
moves it to common code and generalises it to work with all kinds of
vectors.

gcc/
* config/aarch64/aarch64-sve-builtins.h (vector_cst_all_same): Declare.
* config/aarch64/aarch64-sve-builtins.cc (vector_cst_all_same): New
function, a generalized replacement of...
* config/aarch64/aarch64-sve-builtins-base.cc
(svlast_impl::vect_all_same): ...this.
(svlast_impl::fold): Update accordingly.

tree-optimization/112653 - PTA and return

The following separates the escape solution for return stmts not
only during points-to solving but also for later querying.  This
requires adjusting the points-to-global tests to include escapes
through returns.  Technically the patch replaces the existing
post-processing which computes the transitive closure of the
returned value solution by a proper artificial variable with
transitive closure constraints.  Instead of adding the solution
to escaped we track it separately.

PR tree-optimization/112653
* gimple-ssa.h (gimple_df): Add escaped_return solution.
* tree-ssa.cc (init_tree_ssa): Reset it.
(delete_tree_ssa): Likewise.
* tree-ssa-structalias.cc (escaped_return_id): New.
(find_func_aliases): Handle non-IPA return stmts by
adding to ESCAPED_RETURN.
(set_uids_in_ptset): Adjust HEAP escaping to also cover
escapes through return.
(init_base_vars): Initialize ESCAPED_RETURN.
(compute_points_to_sets): Replace ESCAPED post-processing
with recording the ESCAPED_RETURN solution.
* tree-ssa-alias.cc (ref_may_alias_global_p_1): Check
the ESCAPED_RETUNR solution.
(dump_alias_info): Dump it.
* cfgexpand.cc (update_alias_info_with_stack_vars): Update it.
* ipa-icf.cc (sem_item_optimizer::fixup_points_to_sets):
Likewise.
* tree-inline.cc (expand_call_inline): Reset it.
* tree-parloops.cc (parallelize_loops): Likewise.
* tree-sra.cc (maybe_add_sra_candidate): Check it.

* gcc.dg/tree-ssa/pta-return-1.c: New testcase.

vect: Avoid duplicate_and_interleave for uniform vectors [PR112661]

can_duplicate_and_interleave_p checks whether we know a way of
building a particular VLA SLP invariant.  g:60034ecf25597bd515f
skipped that test for booleans, to support MASK_LEN_GATHER_LOAD
calls with a dummy all-ones mask.  But there's nothing fundamentally
different about VLA masks vs VLA data vectors.  If we have a VLA mask
that isn't all-ones, we need some way of loading it.  This ultimately
led to the ICE in the PR.

This patch fixes it by applying can_duplicate_and_interleave_p
to masks, while also adding a special path for uniform vectors
(of all kinds) to support the MASK_LEN_GATHER_LOAD usage.  This
also fixes an XFAIL in pr36648.cc for SVE.

The patch is mostly Richard's.  My only changes were to skip
redundant conversions and to use gimple_build_vector_from_val
for all eligible vectors.

2023-11-27  Richard Biener  <rguenther@suse.de>
    Richard Sandiford  <richard.sandiford@arm.com>

gcc/
PR tree-optimization/112661
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Defer duplicate-and-
interleave test to...
(vect_build_slp_tree_2): ...here, once we have all the operands.
Skip the test for uniform vectors.
(vect_create_constant_vectors): Detect uniform vectors.  Avoid
redundant conversions in that case.  Use gimple_build_vector_from_val
to build the vector.

gcc/testsuite/
* g++.dg/vect/pr36648.cc: Remove XFAIL for VLA load-lanes.

attribs: Use existing traits for excl_hash_traits

excl_hash_traits can be defined more simply by reusing existing traits.

gcc/
* attribs.cc (excl_hash_traits): Delete.
(test_attribute_exclusions): Use pair_hash and nofree_string_hash
instead.

amdgcn: Disallow TImode vector permute

We don't support it and it doesn't happen without vector extensions, so
just remove the unhandled case.

Fixes gcc.dg/pr78575.c failure.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Disallow TImode.

s390: Add missing builtin type

One builtin type slipped through the cracks of the last commits.

gcc/ChangeLog:

* config/s390/s390-builtin-types.def (BT_FN_UV8HI_UV8HI_UINT):
Add missing builtin type.

s390: Fixup builtins vec_rli and verll

Commit 248df13b966f46649e16dc3c8c92b263790ef503 restricted the rotate
count to immediates.  Although the documentation of vec_rli (Vector
Element Rotate Left Immediate) can be read as if it where restricted to
immediates, this is not the case.  Thus, revert this commit.

In order to finally allow register operands, the rotate count must be of
type unsigned char since the expander expects it to be of mode QI.  The
previously used type unsigned integer worked out for immediates since
those are of VOID mode anyway.

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Remove types.
* config/s390/s390-builtins.def (O_U64): Remove 64-bit literal support.
Don't restrict s390_vec_rli and s390_verll[bhfg] to immediates.
* config/s390/s390.cc (s390_const_operand_ok): Remove 64-bit
literal support.

c-family: Implement __has_feature and __has_extension [PR60512]

This patch implements clang's __has_feature and __has_extension in GCC.
Currently the patch aims to implement all documented features (and some
undocumented ones) following the documentation at
https://clang.llvm.org/docs/LanguageExtensions.html with the exception
of the legacy features for C++ type traits. These are omitted, since as
the clang documentation notes, __has_builtin is the correct "modern" way
to query for these (which GCC already implements).

gcc/c-family/ChangeLog:

PR c++/60512
* c-common.cc (struct hf_feature_info): New.
(c_common_register_feature): New.
(init_has_feature): New.
(has_feature_p): New.
* c-common.h (c_common_has_feature): New.
(c_family_register_lang_features): New.
(c_common_register_feature): New.
(has_feature_p): New.
* c-lex.cc (init_c_lex): Plumb through has_feature callback.
(c_common_has_builtin): Generalize and move common part ...
(c_common_lex_availability_macro): ... here.
(c_common_has_feature): New.
* c-ppoutput.cc (init_pp_output): Plumb through has_feature.

gcc/c/ChangeLog:

PR c++/60512
* c-lang.cc (c_family_register_lang_features): New.
* c-objc-common.cc (struct c_feature_info): New.
(c_register_features): New.
* c-objc-common.h (c_register_features): New.

gcc/cp/ChangeLog:

PR c++/60512
* cp-lang.cc (c_family_register_lang_features): New.
* cp-objcp-common.cc (struct cp_feature_selector): New.
(cp_feature_selector::has_feature): New.
(struct cp_feature_info): New.
(cp_register_features): New.
* cp-objcp-common.h (cp_register_features): New.

gcc/ChangeLog:

PR c++/60512
* doc/cpp.texi: Document __has_{feature,extension}.

gcc/objc/ChangeLog:

PR c++/60512
* objc-act.cc (struct objc_feature_info): New.
(objc_nonfragile_abi_p): New.
(objc_common_register_features): New.
* objc-act.h (objc_common_register_features): New.
* objc-lang.cc (c_family_register_lang_features): New.

gcc/objcp/ChangeLog:

PR c++/60512
* objcp-lang.cc (c_family_register_lang_features): New.

libcpp/ChangeLog:

PR c++/60512
* include/cpplib.h (struct cpp_callbacks): Add has_feature.
(enum cpp_builtin_type): Add BT_HAS_{FEATURE,EXTENSION}.
* init.cc: Add __has_{feature,extension}.
* macro.cc (_cpp_builtin_macro_text): Handle
BT_HAS_{FEATURE,EXTENSION}.

gcc/testsuite/ChangeLog:

PR c++/60512
* c-c++-common/has-feature-common.c: New test.
* c-c++-common/has-feature-pedantic.c: New test.
* g++.dg/ext/has-feature.C: New test.
* gcc.dg/asan/has-feature-asan.c: New test.
* gcc.dg/has-feature.c: New test.
* gcc.dg/ubsan/has-feature-ubsan.c: New test.
* obj-c++.dg/has-feature.mm: New test.
* objc.dg/has-feature.m: New test.

Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

tree-optimization/112706 - missed simplification of condition

We lack a match.pd pattern recognizing ptr + o ==/!= ptr + o'.
The following extends handling we have for integral types to
pointers.

PR tree-optimization/112706
* match.pd (ptr + o ==/!=/- ptr + o'): New patterns.

* gcc.dg/tree-ssa/pr112706.c: New testcase.

s390: Streamline NNPA builtins with their LLVM counterparts

For the opaque NNP-data type prefer unsigned over signed integer types.

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Add/remove types.
* config/s390/s390-builtins.def
(s390_vclfnhs,s390_vclfnls,s390_vcrnfs,s390_vcfn,s390_vcnf):
Replace type V8HI with UV8HI.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/vec-nnpa-fp16-convert.c: Replace V8HI
types with UV8HI.
* gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c: Dito.
* gcc.target/s390/zvector/vec_convert_from_fp16.c: Dito.
* gcc.target/s390/zvector/vec_convert_to_fp16.c: Dito.
* gcc.target/s390/zvector/vec_extend_to_fp32_hi.c: Dito.
* gcc.target/s390/zvector/vec_extend_to_fp32_lo.c: Dito.
* gcc.target/s390/zvector/vec_round_from_fp32.c: Dito.

s390: Fix builtins floating-point convert to/from fixed

Remove flags for non-existing operands 2 and 3.

gcc/ChangeLog:

* config/s390/s390-builtins.def
(s390_vcefb,s390_vcdgb,s390_vcelfb,s390_vcdlgb,s390_vcfeb,s390_vcgdb,
s390_vclfeb,s390_vclgdb): Remove flags for non-existing operands
2 and 3.

s390: Fix constraint for insn *cmphi_ccu

Currently for an unsigned 16-bit comparison between memory and an
immediate where the high bit is set, a clc is emitted.  This is because
the constant is created for mode HI and therefore sign extended.  This
means constraint D does not hold anymore.  Since the mode already
restricts the immediate to 16 bit, it is enough to make use of
constraint n and chop of the high bits in the output template.

gcc/ChangeLog:

* config/s390/s390.md (*cmphi_ccu): For immediate operand 1 make
use of constraint n instead of D and chop of high bits in the
output template.

mips: Fix up mips*-sde-elf* build [PR112300]

As reported in the PR, mipsisa64r2-sde-elf doesn't build because HEAP_TRAMPOLINES_INIT
macro isn't defined anywhere.
It is normally defined by
# Figure out if we need to enable heap trampolines by default
case ${target} in
*-*-darwin2*)
   # Currently, we do this for macOS 11 and above.
   tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=1"
   ;;
*)
   tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=0"
   ;;
esac
in config.gcc, but mips*-sde-elf* is the only target which overwrites
tm_defines shell variable rather than just appending to it (or in one case
prepending), all other targets append something to it, including other
mips* triplets.
I believe (just from looking at config.gcc) that the difference is that
LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3 LIBC_MUSL=4 HEAP_TRAMPOLINES_INIT=0
isn't defined without the patch and is with the patch.

I think defining those first 4 shouldn't cause any harm and defining the
last one is required for it to actually build at all.

2023-11-27  Jakub Jelinek  <jakub@redhat.com>

PR target/112300
* config.gcc (mips*-sde-elf*): Append to tm_defines rather than
overwriting them.

RISC-V: Remove incorrect function gate gather_scatter_valid_offset_mode_p

Come back to review the codes of gather/scatter, notice gather_scatter_valid_offset_mode_p looks odd.
gather_scatter_valid_offset_mode_p is supposed to block vluxei64/vsuxei64 in RV32 system.
However, it failed to do that since it is passing data_mode instead of index mode:

riscv_vector::gather_scatter_valid_offset_mode_p (<RATIO2:MODE>mode)
It should be RATIO2I instead of RATIO2.
So we have this following iterators which already can block the this situation:

(define_mode_iterator RATIO8I [
  RVVM1QI
  RVVM2HI
  RVVM4SI
  (RVVM8DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
])

We can see TARGET_64BIT to block EEW64 index mode on RV32 system.
So, gather_scatter_valid_offset_mode_p is no longer needed.

After remove it, I find due to incorrect gather_scatter_valid_offset_mode_p.
We failed to vectorize such case in RV32 in the past:

  void __attribute__ ((noinline, noclone))                                     \
  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
  {                                                                            \
    for (int i = 0; i < 128; ++i)                                              \
      if (cond[i])                                                             \
dest[i] += src[indices[i]];                                            \
  }
  T (int64_t, 8)
TEST_ALL (TEST_LOOP)

https://godbolt.org/z/T3ara3fM3

Checked compiler explorer, we can see GCC failed to vectorize it but Clang can vectorize it.

So adapt the tests checking vectorization cases from 8 -> 11.

Confirm we have same behavior as Clang now.

Tested on zvl128/zvl256/zvl512/zvl1024 no regression.

Note this is not an optimization patch, it's buggy codes fix patch.

gcc/ChangeLog:

* config/riscv/autovec.md
(mask_len_gather_load<RATIO1:mode><RATIO1:mode>):
Remove gather_scatter_valid_offset_mode_p.
(mask_len_gather_load<mode><mode>): Ditto.
(mask_len_scatter_store<RATIO1:mode><RATIO1:mode>): Ditto.
(mask_len_scatter_store<mode><mode>): Ditto.
* config/riscv/predicates.md (const_1_or_8_operand): New predicate.
(vector_gs_scale_operand_64): Remove.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): Remove.
* config/riscv/riscv-v.cc (expand_gather_scatter): Refine code.
(gather_scatter_valid_offset_mode_p): Remove.
* config/riscv/vector-iterators.md: Fix iterator bugs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_32-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_32-9.c: Ditto.

RISC-V: Initial RV64E and LP64E support

Along with RV32E, RV64E is ratified. Though ILP32E and LP64E ABIs are
still draft, it's worth supporting it.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_ext_version_table): Set version to ratified 2.0.
(riscv_subset_list::parse_std_ext): Allow RV64E.
* config.gcc: Parse base ISA 'rv64e' and ABI 'lp64e'.
* config/riscv/arch-canonicalize: Parse base ISA 'rv64e'.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Define different macro per XLEN. Add handling for ABI_LP64E.
* config/riscv/riscv-d.cc (riscv_d_handle_target_float_abi):
Add handling for ABI_LP64E.
* config/riscv/riscv-opts.h (enum riscv_abi_type): Add ABI_LP64E.
* config/riscv/riscv.cc (riscv_option_override): Enhance error
handling to support RV64E and LP64E.
(riscv_conditional_register_usage): Change "RV32E" in a comment
to "RV32E/RV64E".
* config/riscv/riscv.h
(UNITS_PER_FP_ARG): Add handling for ABI_LP64E.
(STACK_BOUNDARY): Ditto.
(ABI_STACK_BOUNDARY): Ditto.
(MAX_ARGS_IN_REGISTERS): Ditto.
(ABI_SPEC): Add support for "lp64e".
* config/riscv/riscv.opt: Parse -mabi=lp64e as ABI_LP64E.
* doc/invoke.texi: Add documentation of the LP64E ABI.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-1.c: Test for __riscv_64e.
* gcc.target/riscv/predef-2.c: Ditto.
* gcc.target/riscv/predef-3.c: Ditto.
* gcc.target/riscv/predef-4.c: Ditto.
* gcc.target/riscv/predef-5.c: Ditto.
* gcc.target/riscv/predef-6.c: Ditto.
* gcc.target/riscv/predef-7.c: Ditto.
* gcc.target/riscv/predef-8.c: Ditto.
* gcc.target/riscv/predef-9.c: New test for RV64E and LP64E,
based on predef-7.c.

bpf: remove bpf-helpers.h

Now that we are finally able to use the kernel provided bpf_helpers.h
file and associated machinery, there is no longer need to distribute
our own version.

This patch removes bpf-helpers.h and deletes most of the associated
tests from the gcc.target/bpf testsuite. Two tests are adapted and
retained: one testing the kernel_helper attribute, which is still
useful, and the other making sure that proper constant propagation is
performed with -O2, which is necessary to use the helpers defined as
static pointers in the kernel's bpf_helpers.h.

Regtested in target bpf-unknown-none and host x86_64-linux-gnu.

gcc/ChangeLog

* config/bpf/bpf-helpers.h: Remove.
* config.gcc: Adapt accordingly.

gcc/testsuite/ChangeLog

* gcc.target/bpf/helper-bind.c: Do not include bpf-helpers.h.
* gcc.target/bpf/helper-skb-ancestor-cgroup-id.c: Likewise, and
renamed from skb-ancestor-cgroup-id.c.
* gcc.target/bpf/helper-bpf-redirect.c: Remove.
* gcc.target/bpf/helper-clone-redirect.c: Likewise.
* gcc.target/bpf/helper-csum-diff.c: Likewise.
* gcc.target/bpf/helper-csum-update.c: Likewise.
* gcc.target/bpf/helper-current-task-under-cgroup.c: Likewise.
* gcc.target/bpf/helper-fib-lookup.c: Likewise.
* gcc.target/bpf/helper-get-cgroup-classid.c: Likewise.
* gcc.target/bpf/helper-get-current-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-get-current-comm.c: Likewise.
* gcc.target/bpf/helper-get-current-pid-tgid.c: Likewise.
* gcc.target/bpf/helper-get-current-task.c: Likewise.
* gcc.target/bpf/helper-get-current-uid-gid.c: Likewise.
* gcc.target/bpf/helper-get-hash-recalc.c: Likewise.
* gcc.target/bpf/helper-get-listener-sock.c: Likewise.
* gcc.target/bpf/helper-get-local-storage.c: Likewise.
* gcc.target/bpf/helper-get-numa-node-id.c: Likewise.
* gcc.target/bpf/helper-get-prandom-u32.c: Likewise.
* gcc.target/bpf/helper-get-route-realm.c: Likewise.
* gcc.target/bpf/helper-get-smp-processor-id.c: Likewise.
* gcc.target/bpf/helper-get-socket-cookie.c: Likewise.
* gcc.target/bpf/helper-get-socket-uid.c: Likewise.
* gcc.target/bpf/helper-get-stack.c: Likewise.
* gcc.target/bpf/helper-get-stackid.c: Likewise.
* gcc.target/bpf/helper-getsockopt.c: Likewise.
* gcc.target/bpf/helper-ktime-get-ns.c: Likewise.
* gcc.target/bpf/helper-l3-csum-replace.c: Likewise.
* gcc.target/bpf/helper-l4-csum-replace.c: Likewise.
* gcc.target/bpf/helper-lwt-push-encap.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-action.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-adjust-srh.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-store-bytes.c: Likewise.
* gcc.target/bpf/helper-map-delete-elem.c: Likewise.
* gcc.target/bpf/helper-map-lookup-elem.c: Likewise.
* gcc.target/bpf/helper-map-peek-elem.c: Likewise.
* gcc.target/bpf/helper-map-pop-elem.c: Likewise.
* gcc.target/bpf/helper-map-push-elem.c: Likewise.
* gcc.target/bpf/helper-map-update-elem.c: Likewise.
* gcc.target/bpf/helper-msg-apply-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-cork-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-pop-data.c: Likewise.
* gcc.target/bpf/helper-msg-pull-data.c: Likewise.
* gcc.target/bpf/helper-msg-push-data.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-hash.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-map.c: Likewise.
* gcc.target/bpf/helper-override-return.c: Likewise.
* gcc.target/bpf/helper-perf-event-output.c: Likewise.
* gcc.target/bpf/helper-perf-event-read-value.c: Likewise.
* gcc.target/bpf/helper-perf-event-read.c: Likewise.
* gcc.target/bpf/helper-perf-prog-read-value.c: Likewise.
* gcc.target/bpf/helper-probe-read-str.c: Likewise.
* gcc.target/bpf/helper-probe-read.c: Likewise.
* gcc.target/bpf/helper-probe-write-user.c: Likewise.
* gcc.target/bpf/helper-rc-keydown.c: Likewise.
* gcc.target/bpf/helper-rc-pointer-rel.c: Likewise.
* gcc.target/bpf/helper-rc-repeat.c: Likewise.
* gcc.target/bpf/helper-redirect-map.c: Likewise.
* gcc.target/bpf/helper-set-hash-invalid.c: Likewise.
* gcc.target/bpf/helper-set-hash.c: Likewise.
* gcc.target/bpf/helper-setsockopt.c: Likewise.
* gcc.target/bpf/helper-sk-fullsock.c: Likewise.
* gcc.target/bpf/helper-sk-lookup-tcp.c: Likewise.
* gcc.target/bpf/helper-sk-lookup-upd.c: Likewise.
* gcc.target/bpf/helper-sk-redirect-hash.c: Likewise.
* gcc.target/bpf/helper-sk-redirect-map.c: Likewise.
* gcc.target/bpf/helper-sk-release.c: Likewise.
* gcc.target/bpf/helper-sk-select-reuseport.c: Likewise.
* gcc.target/bpf/helper-sk-storage-delete.c: Likewise.
* gcc.target/bpf/helper-sk-storage-get.c: Likewise.
* gcc.target/bpf/helper-skb-adjust-room.c: Likewise.
* gcc.target/bpf/helper-skb-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-skb-change-head.c: Likewise.
* gcc.target/bpf/helper-skb-change-proto.c: Likewise.
* gcc.target/bpf/helper-skb-change-tail.c: Likewise.
* gcc.target/bpf/helper-skb-change-type.c: Likewise.
* gcc.target/bpf/helper-skb-ecn-set-ce.c: Likewise.
* gcc.target/bpf/helper-skb-get-tunnel-key.c: Likewise.
* gcc.target/bpf/helper-skb-get-tunnel-opt.c: Likewise.
* gcc.target/bpf/helper-skb-get-xfrm-state.c: Likewise.
* gcc.target/bpf/helper-skb-load-bytes-relative.c: Likewise.
* gcc.target/bpf/helper-skb-load-bytes.c: Likewise.
* gcc.target/bpf/helper-skb-pull-data.c: Likewise.
* gcc.target/bpf/helper-skb-set-tunnel-key.c: Likewise.
* gcc.target/bpf/helper-skb-set-tunnel-opt.c: Likewise.
* gcc.target/bpf/helper-skb-store-bytes.c: Likewise.
* gcc.target/bpf/helper-skb-under-cgroup.c: Likewise.
* gcc.target/bpf/helper-skb-vlan-pop.c: Likewise.
* gcc.target/bpf/helper-skb-vlan-push.c: Likewise.
* gcc.target/bpf/helper-skc-lookup-tcp.c: Likewise.
* gcc.target/bpf/helper-sock-hash-update.c: Likewise.
* gcc.target/bpf/helper-sock-map-update.c: Likewise.
* gcc.target/bpf/helper-sock-ops-cb-flags-set.c: Likewise.
* gcc.target/bpf/helper-spin-lock.c: Likewise.
* gcc.target/bpf/helper-spin-unlock.c: Likewise.
* gcc.target/bpf/helper-strtol.c: Likewise.
* gcc.target/bpf/helper-strtoul.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-current-value.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-name.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-new-value.c: Likewise.
* gcc.target/bpf/helper-sysctl-set-new-value.c: Likewise.
* gcc.target/bpf/helper-tail-call.c: Likewise.
* gcc.target/bpf/helper-tcp-check-syncookie.c: Likewise.
* gcc.target/bpf/helper-tcp-sock.c: Likewise.
* gcc.target/bpf/helper-trace-printk.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-head.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-meta.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-tail.c: Likewise.
* gcc.target/bpf/skb-ancestor-cgroup-id.c: Likewise.

LoongArch: Fix runtime error in a gcc build with --with-build-config=bootstrap-ubsan

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_split_plus_constant):
avoid left shift of negative value -0x8000.

LoongArch: Optimize the loading of immediate numbers with the same high and low 32-bit values

For the following immediate load operation in gcc/testsuite/gcc.target/loongarch/imm-load1.c:

long long r = 0x0101010101010101;

Before this patch:

lu12i.w     $r15,16842752>>12
ori     $r15,$r15,257
lu32i.d     $r15,0x1010100000000>>32
lu52i.d     $r15,$r15,0x100000000000000>>52

After this patch:

lu12i.w     $r15,16842752>>12
ori         $r15,$r15,257
bstrins.d   $r15,$r15,63,32

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(enum loongarch_load_imm_method): Add new method.
(loongarch_build_integer): Add relevant implementations for
new method.
(loongarch_move_integer): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load1.c: Change old check.

Daily bump.

testsuite/gcc.dg/uninit-pred-9_b.c:20: Fix XPASS for various targets

The xfail for "*-*-*" here, set in r14-4089-gd45ddc2c04e471
"tree-optimization/111294 - backwards threader PHI costing"
was somewhat too general and made this test XPASS for a
number of targets.  The common factor for those targets is
that they either explicitly or by default define
LOGICAL_OP_NON_SHORT_CIRCUIT as 0 (see fold-const.cc).

Instead of changing *-*-* to a seemingly random set of
xfailed targets or inventing a new testsuite
effective-target predicate for logical-op-short-circuited
targets or the opposite, let's just force a setting that
removes the need for the xfail for all targets, by
overriding with --param=logical-op-non-short-circuit=0.

* gcc.dg/uninit-pred-9_b.c: Remove xfail for line 20.  Pass
--param=logical-op-non-short-circuit=0.  Comment why.

testsuite/gcc.dg/uninit-pred-9_b.c:23: Un-xfail for MMIX

In a recent all-target test-round investigating XPASSes for
this file, I noticed this line XPASSing for MMIX. From the
commit history it's obvious it was left out from related
target-xfail tweaks, now the last target xfailing a bogus
warning for this line.

* gcc.dg/uninit-pred-9_b.c: Remove xfail for MMIX from line 23.

Fortran: avoid obsolescence warning for COMMON with submodule [PR111880]

gcc/fortran/ChangeLog:

PR fortran/111880
* resolve.cc (resolve_common_vars): Do not call gfc_add_in_common
for symbols that are USE associated or used in a submodule.

gcc/testsuite/ChangeLog:

PR fortran/111880
* gfortran.dg/pr111880.f90: New test.

sort.cc: fix mentions of sorting networks in comments

Avoid using 'network sort' (a misnomer) in sort.cc, the correct term is
'sorting networks'.

gcc/ChangeLog:

* sort.cc: Use 'sorting networks' in comments.

Skip analyzer strndup test on hppa*-*-hpux*

2023-11-26 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/strndup-1.c: Skip on hppa*-*-hpux*.

Skip analyzer socket tests on hppa*-*-hpux*

2023-11-26 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/fd-glibc-datagram-client.c: Skip on hppa*-*-hpux*.
* gcc.dg/analyzer/fd-glibc-datagram-socket.c: Likewise.

hppa: Fix pr104869.C on hpux

2023-11-26 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* g++.dg/pr104869.C: Add attribute visibility default to
main prototype.

hppa: Really fix g++.dg/modules/bad-mapper-1.C on hpux

2023-11-23 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* g++.dg/modules/bad-mapper-1.C: Add hppa*-*-hpux* to dg-error
"this-will-not-work" targets.

testsuite, i386: fix -fhardened test

The new test at gcc.target/i386/cf_check-6.c fails on darwin with:
Excess errors:
cc1: warning: '-fhardened' not supported for this target

gcc/testsuite/ChangeLog:

* gcc.target/i386/cf_check-6.c: Only run on Linux.

testsuite, i386: fix split-stack test

The new test at gcc.target/i386/pr112686.c fails on darwin with:

Excess errors:
cc1: error: '-fsplit-stack' currently only supported on GNU/Linux
cc1: error: '-fsplit-stack' is not supported by this compiler configuration

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112686.c: Add a requirement for split_stack.

RISC-V: Disable AVL propagation of slidedown instructions

Re-check again RVV ISA, I find that we can't allow AVL propagation not only
for vrgather, but also slidedown instructions.

Committed.

PR target/112599

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (avl_can_be_propagated_p): Add slidedown.
(vlmax_ta_p): Ditto.
(pass_avlprop::get_vlmax_ta_preferred_avl): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vf_avl-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/pr112599-3.c: New test.

Fix gcc.dg/vla-1.c

r14-5628-g53ba8d669550d3 added noipa to f1 but `-fno-ipa-vrp` should have been used
instead. The testcase is testing about the clone of f1 so turning off
IPA VRP is the correct approach here rather than turning off of IPA on the function.

gcc/testsuite/ChangeLog:

PR testsuite/112691
* gcc.dg/vla-1.c: Add -fno-ipa-vrp.
Remove noipa from f1.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Fix gcc.target/aarch64/simd/vmulxd_{f64,f32}_2.c after after IPA-VRP improvement for return values

Just like the patch against gcc.target/aarch64/movk.c, the issue here
is the two functions, foo32 and foo64 needed to mark as noipa so that
IPA-VRP cannot propagate the return value.

gcc/testsuite/ChangeLog:

PR testsuite/112688
* gcc.target/aarch64/simd/vmulx.x (foo32): Mark as noipa rather
than noinline.
(foo4): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Fix contracts-tmpl-spec2.C on targets where plain char is unsigned by default

Since contracts-tmpl-spec2.C is just testing contracts, I thought it would be better
to just add `-fsigned-char` to the options rather than change the testcase to support
both cases.

Committed after testing on aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

PR testsuite/108321
* g++.dg/contracts/contracts-tmpl-spec2.C: Add -fsigned-char
to options.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

RISC-V: Fix typo

Fix typo. Committed.

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (alv_can_be_propagated_p): Fix typo.
(avl_can_be_propagated_p): Ditto.
(vlmax_ta_p): Ditto.

Daily bump.

Fix gcc.target/aarch64/movk.c testcase after IPA-VRP improvement for return values

The problem here is dummy_number_generator returns a constant which IPA VRP is now able
propagate that so we need to mark the funciton as noipa to stop that.

gcc/testsuite/ChangeLog:

PR testsuite/112688
* gcc.target/aarch64/movk.c: Add noipa on dummy_number_generator
and remove -fno-inline option.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

doc: Complete and sort the list of front ends

gcc:

PR other/69374
* doc/install.texi (Downloading the source): Sort the list of
front ends and add D, Go, and Modula-2.

doc: Remove obsolete notes on GCC 4.x on FreeBSD

FreeBSD 6 and 7 have been end of life for years as have been GCC 4.x
releases, so no point in detailing specifics of changes around those.

gcc:

PR target/69374
* doc/install.texi (Specific) <*-*-freebsd*>: Remove older
contents referencing GCC 4.x.

doc: Update ISO C++ reference

gcc:

* doc/standards.texi (Standards): Update ISO C++ reference.

i386: Fix up *jcc_bt*_mask{,_1} [PR111408]

The following testcase is miscompiled in GCC 14 because the
*jcc_bt<mode>_mask and *jcc_bt<SWI48:mode>_mask_1 patterns have just
one argument in (match_operator 0 "bt_comparison_operator" [...])
but as bt_comparison_operator is eq,ne, we need two.
The md readers don't warn about it, after all, some checks can
be done in the predicate rather than specified explicitly, and the
behavior is that anything is accepted as the second argument.

I went through all other i386.md match_operator uses and all others
looked right (extract_operator using 3 operands, all others 2).

I think we'll want to fix this at different spots in older releases
because I think the bug was introduced already in 2008, though most
likely just latent.

2023-11-25 Jakub Jelinek <jakub@redhat.com>

PR target/111408
* config/i386/i386.md (*jcc_bt<mode>_mask,
*jcc_bt<SWI48:mode>_mask_1): Add (const_int 0) as expected
second operand of bt_comparison_operator.

* gcc.c-torture/execute/pr111408.c: New test.

aarch64: Fix up aarch64_simd_stp<mode> [PR109977]

The aarch64_simd_stp<mode> pattern uses w constraint in one alternative and
r in another, but for the latter incorrectly uses <vw> iterator in %<vw>1 which
expands to %d1 for V2DF and %s1 for V2SF and V4SF (this one not relevant to
the pattern) and %w1 for others, so it ICEs if the alternative is selected
during final.  Compared to this, <vwcore> macro has the same values for all
modes but uses w for V2DF and V2SF.

2023-11-24  Andrew Pinski  <pinskia@gmail.com>
    Jakub Jelinek  <jakub@redhat.com>

PR target/109977
* config/aarch64/aarch64-simd.md (aarch64_simd_stp<mode>): Use <vwcore>
rather than %<vw> for alternative with r constraint on input operand.

* gcc.dg/pr109977.c: New test.

c++: more checks for exporting names with using-declarations

Currently only functions are directly checked for validity when
exporting via a using-declaration. This patch also checks exporting
non-external names of variables, types, and enumerators. This also
prevents ICEs with `export using enum` for internal-linkage enums.

While we're at it this patch also improves the error messages for these
cases to provide more context about what went wrong.

gcc/cp/ChangeLog:

* name-lookup.cc (check_can_export_using_decl): New.
(do_nonmember_using_decl): Use above to check if names can be
exported.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-10.C: New test.
* g++.dg/modules/using-enum-2.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Allow exporting a typedef redeclaration [PR102341]

A typedef doesn't create a new entity, and thus should be allowed to be
exported even if it has been previously declared un-exported. See the
example in [module.interface] p6:

  export module M;
  struct S { int n; };
  typedef S S;
  export typedef S S;             // OK, does not redeclare an entity

PR c++/102341

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Allow exporting a redeclaration of
a typedef.

gcc/testsuite/ChangeLog:

* g++.dg/modules/export-1.C: Adjust test.
* g++.dg/modules/export-2_a.C: New test.
* g++.dg/modules/export-2_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

Daily bump.

preprocessor: Reinitialize frontend parser after loading a PCH [PR112319]

Since r14-2893, the frontend parser object needs to exist when running in
preprocess-only mode, because pragma_lex() is now called in that mode and
needs to make use of it. This is handled by calling c_init_preprocess() at
startup. If -fpch-preprocess is in effect (commonly, because of
-save-temps), a PCH file may be loaded during preprocessing, in which
case the parser will be destroyed, causing the issue noted in the
PR. Resolve it by reinitializing the frontend parser after loading the PCH.

gcc/c-family/ChangeLog:

PR pch/112319
* c-ppoutput.cc (cb_read_pch): Reinitialize the frontend parser
after loading a PCH.

gcc/testsuite/ChangeLog:

PR pch/112319
* g++.dg/pch/pr112319.C: New test.
* g++.dg/pch/pr112319.Hs: New test.
* gcc.dg/pch/pr112319.c: New test.
* gcc.dg/pch/pr112319.hs: New test.

c-family/c.opt (-Wopenmp): Add missing tailing '.'

gcc/c-family/ChangeLog:

* c.opt (-Wopenmp): Add missing tailing '.'.

install.texi: Update GCN entry - @uref and LLVM version remark

gcc/ChangeLog:

* doc/install.texi (amdgcn-*-amdhsa): Fix URL to ROCm;
change 'in the future' to 'in LLVM 18'.

hppa: Use INT14_OK_STRICT in a couple of places in pa_emit_move_sequence

64-bit Linux target has relocation issue and can't use 14-bit offsets.

2023-11-22 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.cc (pa_emit_move_sequence): Use INT14_OK_STRICT
in a couple of places.

Use memcpy instead of memmove in __relocate_a_1

__relocate_a_1 is used to copy data after vector reizing. This can be done by memcpy
rather than memmove.

libstdc++-v3/ChangeLog:

PR middle-end/109849
* include/bits/stl_uninitialized.h (__relocate_a_1): Use memcpy instead
of memmove.

sra: SRA of non-escaped aggregates passed by reference to calls

PR109849 shows that a loop that heavily pushes and pops from a stack
implemented by a C++ std::vec results in slow code, mainly because the
vector structure is not split by SRA and so we end up in many loads
and stores into it.  This is because it is passed by reference
to (re)allocation methods and so needs to live in memory, even though
it does not escape from them and so we could SRA it if we
re-constructed it before the call and then separated it to distinct
replacements afterwards.

This patch does exactly that, first relaxing the selection of
candidates to also include those which are addressable but do not
escape and then adding code to deal with the calls.  The
micro-benchmark that is also the (scan-dump) testcase in this patch
runs twice as fast with it than with current trunk.  Honza measured
its effect on the libjxl benchmark and it almost closes the
performance gap between Clang and GCC while not requiring excessive
inlining and thus code growth.

The patch disallows creation of replacements for such aggregates which
are also accessed with a precision smaller than their size because I
have observed that this led to excessive zero-extending of data
leading to slow-downs of perlbench (on some CPUs).  Apart from this
case I have not noticed any regressions, at least not so far.

Gimple call argument flags can tell if an argument is unused (and then
we do not need to generate any statements for it) or if it is not
written to and then we do not need to generate statements loading
replacements from the original aggregate after the call statement.
Unfortunately, we cannot symmetrically use flags that an aggregate is
not read because to avoid re-constructing the aggregate before the
call because flags don't tell which what parts of aggregates were not
written to, so we load all replacements, and so all need to have the
correct value before the call.

This version of the patch also takes care to avoid attempts to modify
abnormal edges, something which was missing in the previosu version.

gcc/ChangeLog:

2023-11-23  Martin Jambor  <mjambor@suse.cz>

PR middle-end/109849
* tree-sra.cc (passed_by_ref_in_call): New.
(sra_initialize): Allocate passed_by_ref_in_call.
(sra_deinitialize): Free passed_by_ref_in_call.
(create_access): Add decl pool candidates only if they are not
already candidates.
(build_access_from_expr_1): Bail out on ADDR_EXPRs.
(build_access_from_call_arg): New function.
(asm_visit_addr): Rename to scan_visit_addr, change the
disqualification dump message.
(scan_function): Check taken addresses for all non-call statements,
including phi nodes.  Process all call arguments, including the static
chain, build_access_from_call_arg.
(maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
non-escaped local variables.
(sort_and_splice_var_accesses): Disallow smaller-than-precision
replacements for aggregates passed by reference to functions.
(sra_modify_expr): Use a separate stmt iterator for adding satements
before the processed statement and after it.
(enum out_edge_check): New type.
(abnormal_edge_after_stmt_p): New function.
(sra_modify_call_arg): New function.
(sra_modify_assign): Adjust calls to sra_modify_expr.
(sra_modify_function_body): Likewise, use sra_modify_call_arg to
process call arguments, including the static chain.

gcc/testsuite/ChangeLog:

2023-11-23  Martin Jambor  <mjambor@suse.cz>

PR middle-end/109849
* g++.dg/tree-ssa/pr109849.C: New test.
* g++.dg/tree-ssa/sra-eh-1.C: Likewise.
* gcc.dg/tree-ssa/pr109849.c: Likewise.
* gcc.dg/tree-ssa/sra-longjmp-1.c: Likewise.
* gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.

i386: Fix ICE with -fsplit-stack -mcmodel=large [PR112686]

For -mcmodel=large, we have to load function address to a register.

PR target/112686

gcc/ChangeLog:

* config/i386/i386.cc (ix86_expand_split_stack_prologue): Load
function address to a register for ix86_cmodel == CM_LARGE.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112686.c: New test.

OpenMP: Add -Wopenmp and use it

The new warning has two purposes: First, it makes clearer to the
user that it is about OpenMP and, secondly and more importantly,
it permits to use -Wno-openmp.

The newly added -Wopenmp is enabled by default and replaces the
'0' (always warning) in several OpenMP-related warning calls.
For code shared with OpenACC, it only uses OPT_Wopenmp for
'flag_openmp | flag_openmp_simd'.

gcc/c-family/ChangeLog:

* c.opt (Wopenmp): Add, enable by default.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_num_threads,
c_parser_omp_clause_num_tasks, c_parser_omp_clause_grainsize,
c_parser_omp_clause_priority, c_parser_omp_clause_schedule,
c_parser_omp_clause_num_teams, c_parser_omp_clause_thread_limit,
c_parser_omp_clause_dist_schedule, c_parser_omp_depobj,
c_parser_omp_scan_loop_body, c_parser_omp_assumption_clauses):
Add OPT_Wopenmp to warning_at.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_dist_schedule,
cp_parser_omp_scan_loop_body, cp_parser_omp_assumption_clauses,
cp_parser_omp_depobj): Add OPT_Wopenmp to warning_at.
* semantics.cc (finish_omp_clauses): Likewise.

gcc/ChangeLog:

* doc/invoke.texi (-Wopenmp): Add.
* gimplify.cc (gimplify_omp_for): Add OPT_Wopenmp to warning_at.
* omp-expand.cc (expand_omp_ordered_sink): Likewise.
* omp-general.cc (omp_check_context_selector): Likewise.
* omp-low.cc (scan_omp_for, check_omp_nesting_restrictions,
lower_omp_ordered_clauses): Likewise.
* omp-simd-clone.cc (simd_clone_clauses_extract): Likewise.

gcc/fortran/ChangeLog:

* lang.opt (Wopenmp): Add, enabled by dafault and documented in C.
* openmp.cc (gfc_match_omp_declare_target, resolve_positive_int_expr,
resolve_nonnegative_int_expr, resolve_omp_clauses,
gfc_resolve_omp_do_blocks): Use OPT_Wopenmp with gfc_warning{,_now}.

arm: libgcc: provide implementations of __sync_synchronize

Prior to Armv6 there was no architected method to synchronize data
across processors.  Armv6 saw the first introduction of
multi-processor support, using a CP15 operation; but this was
deprecated in Armv7 and is not supported on m-profile devices of any
form.  Armv7 (and armv6-m) and later support data synchronization via
the DMB instruction.

This all leads to difficulties when linking programs as the user
generally needs to know which synchronization method is needed, but
there seems no easy way around this, when there are no OS-related
primitives available.

I've addressed this by adding multiple variants of __sync_synchronize
to libgcc, one for each of the above use cases.  I've named these
__sync_synchronize_none, __sync_synchronize_cp15dmb and
__sync_synchronize_dmb.  I've also added three specs files that can be
used to direct the linker to pick the appropriate implementation.
Using specs fragments for this is preferable to directing the user to
directly use --defsym as the latter has to be placed at the correct
position on the command line to be effective and the spec rule ensures
this automatically.

I've also added a default implementation of __sync_synchronize.  The
default implementation will use DMB if that is available in the target
ISA, or fall back to a nul-implementation if it isn't.  In the latter
case it will cause the linker (GNU LD) to emit a warning that
specifies how to pick a specific implementation.  I've chosen not to
permit this default to use the CP15 solution as that has been
deprecated.

libgcc:

* config.host (arm*-*-eabi* | arm*-*-rtems*):
Add arm/t-sync to the makefile rules.
* config/arm/lib1funcs.S (__sync_synchronize_none)
(__sync_synchronize_cp15dmb, __sync_synchronize_dmb)
(__sync_synchronize): New functions.
* config/arm/t-sync: New file.
* config/arm/sync-none.specs: Likewise.
* config/arm/sync-dmb.specs: Likewise.
* config/arm/sync-cp15dmb.specs: Likewise.

OpenMP: Accept argument to depobj's destroy clause

Since OpenMP 5.2, the destroy clause takes an depend argument as argument;
for the depobj directive, it the new argument is optional but, if present,
it must be identical to the directive's argument.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_depobj): Accept optionally an argument
to the destroy clause.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_depobj): Accept optionally an argument
to the destroy clause.

gcc/fortran/ChangeLog:

* openmp.cc (gfc_match_omp_depobj): Accept optionally an argument
to the destroy clause.

libgomp/ChangeLog:

* libgomp.texi (5.2 Impl. Status): An argument to the destroy clause
is now supported.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/depobj-3.c: New test.
* gfortran.dg/gomp/depobj-3.f90: New test.

c++: Allow exporting const-qualified namespace-scope variables [PR99232]

By [basic.link] p3.2.1, a non-template non-volatile const-qualified
variable is not necessarily internal linkage in a module declaration,
and rather may have module linkage (or external linkage if it is
exported, see p4.8).

PR c++/99232

gcc/cp/ChangeLog:

* decl.cc (grokvardecl): Don't mark variables attached to
modules as internal.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99232_a.C: New test.
* g++.dg/modules/pr99232_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

RISC-V: Fix inconsistency among all vectorization hooks

This patches 200+ ICEs exposed by testing with rv64gc_zve64d.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112694

The rootcause is we disallow poly (1,1) size vectorization in preferred_simd_mode.
with this following code:
- if (TARGET_MIN_VLEN < 128 && TARGET_MAX_LMUL < RVV_M2)
- return word_mode;

However, we allow poly (1,1) size in hook:
TARGET_VECTORIZE_RELATED_MODE
TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES

And also enables it in all vectorization patterns.

I was adding this into preferred_simd_mode because poly (1,1) size mode will cause
ICE in can_duplicate_and_interleave_p.

So, the alternative approach we need to block poly (1,1) size in both TARGET_VECTORIZE_RELATED_MODE
and TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES hooks and all vectorization patterns.
which is ugly approach and too much codes change.

Now, after investivation, I find it's nice that loop vectorizer can automatically block poly (1,1)
size vector in interleave vectorization with this commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=730909fa858bd691095bc23655077aa13b7941a9

So, we don't need to worry about ICE in interleave vectorization and allow poly (1,1) size vector
in vectorization which fixes 200+ ICEs in zve64d march.

PR target/112694

gcc/ChangeLog:

* config/riscv/riscv-v.cc (preferred_simd_mode): Allow poly_int (1,1) vectors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112694-1.c: New test.

gcc: configure: drop Valgrind 3.1 compatibility

Our system.h and configure.ac try to accommodate valgrind-3.1, but it is
more than 15 years old at this point. As Valgrind-based checking is a
developer-oriented feature, drop the compatibility stuff and streamline
the detection.

gcc/ChangeLog:

* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Delete manual checks for old Valgrind headers.
* system.h (VALGRIND_MAKE_MEM_NOACCESS): Delete.
(VALGRIND_MAKE_MEM_DEFINED): Delete.
(VALGRIND_MAKE_MEM_UNDEFINED): Delete.
(VALGRIND_MALLOCLIKE_BLOCK): Delete.
(VALGRIND_FREELIKE_BLOCK): Delete.

libcpp: configure: drop unused Valgrind detection

When top-level configure has either --enable-checking=valgrind or
--enable-valgrind-annotations, we want to activate a couple of workarounds
in libcpp. They do not use anything from the Valgrind API, so just
delete all detection.

libcpp/ChangeLog:

* config.in: Regenerate.
* configure: Regenerate.
* configure.ac (ENABLE_VALGRIND_CHECKING): Delete.
(ENABLE_VALGRIND_ANNOTATIONS): Rename to
ENABLE_VALGRIND_WORKAROUNDS. Delete Valgrind header checks.
* lex.cc (new_buff): Adjust for renaming.
(_cpp_free_buff): Ditto.

i386: Fix ICE during cbranchv16qi4 expansion [PR112681]

The following testcase ICEs, because cbranchv16qi4 expansion calls
ix86_expand_branch with op1 being a pre-AVX unaligned memory and
ix86_expand_branch emits a xorv16qi3 instruction without making sure
the operand predicates are satisfied.
While I could manually check if the argument (or both?) doesn't
match vector_operand predicate (apparently this one or bcst_vector_operand
is used in all integral 16+ bytes *xorv*3 instructions) force it into a
register, but as all gen_xorv*3 expanders call
ix86_expand_vector_logical_operator, it seems easier to just call that
function which ensures the right thing happens. Calling the individual
gen_xorv*3 functions would mean ugly switch on the modes and using high
level expand_simple_binop here seems too high level to me.

2023-11-24 Jakub Jelinek <jakub@redhat.com>

PR target/112681
* config/i386/i386-expand.cc (ix86_expand_branch): Use
ix86_expand_vector_logical_operator to expand vector XOR rather than
gen_rtx_SET on gen_rtx_XOR.

* gcc.target/i386/sse4-pr112681.c: New test.

rtl-ssa: Add some helpers for removing accesses

This adds some helpers to access-utils.h for removing accesses from an
access_array. This is needed by the upcoming aarch64 load/store pair
fusion pass.

gcc/ChangeLog:

* rtl-ssa/access-utils.h (filter_accesses): New.
(remove_regno_access): New.
(check_remove_regno_access): New.
* rtl-ssa/accesses.cc (rtl_ssa::remove_note_accesses_base): Use
new filter_accesses helper.

rtl-ssa: Support for inserting new insns

The upcoming aarch64 load pair pass needs to form store pairs, and can
re-order stores over loads when alias analysis determines this is safe.
In the case that both mem defs have uses in the RTL-SSA IR, and both
stores require re-ordering over their uses, we represent that as
(tentative) deletion of the original store insns and creation of a new
insn, to prevent requiring repeated re-parenting of uses during the
pass. We then update all mem uses that require re-parenting in one go
at the end of the pass.

To support this, RTL-SSA needs to handle inserting new insns (rather
than just changing existing ones), so this patch adds support for that.

New insns (and new accesses) are temporaries, allocated above a temporary
obstack_watermark, such that the user can easily back out of a change without
awkward bookkeeping.

gcc/ChangeLog:

* rtl-ssa/accesses.cc (function_info::create_set): New.
* rtl-ssa/accesses.h (access_info::is_temporary): New.
* rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns.
(function_info::finalize_new_accesses): Handle new/temporary
user-created accesses.
(function_info::apply_changes_to_insn): Ensure m_is_temp flag
on new insns gets cleared.
(function_info::change_insns): Handle new/temporary insns.
(function_info::create_insn): New.
* rtl-ssa/changes.h (class insn_change): Make function_info a
friend class.
* rtl-ssa/functions.h (function_info): Declare new entry points:
create_set, create_insn. Declare new change_alloc helper.
* rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns in
dump.
* rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying
is_temporary accessor.
* rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp to
false.
* rtl-ssa/member-fns.inl (function_info::change_alloc): New.
* rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add
handling for temporary defs.

match.pd: Avoid simplification into invalid BIT_FIELD_REFs [PR112673]

The following testcase is lowered by the bitint lowering pass, then
vectorizer vectorizes one of the loops in it, so we have
  vect__18.6_34 = VIEW_CONVERT_EXPR<vector(4) unsigned long>(x_35(D));
  _8 = BIT_FIELD_REF <vect__18.6_34, 64, 0>;
...
  _18 = BIT_FIELD_REF <vect__18.6_34, 64, 64>;
etc. where x_35(D) is _BitInt(256) argument.  That is valid BIT_FIELD_REF,
the first argument is a vector and it extracts the vector elements from it.
Then comes forwprop4 and simplifies that using match.pd into
  _8 = (unsigned long) x_35(D);
...
  _18 = BIT_FIELD_REF <x_35(D), 64, 64>;
and tree-cfg verification ICEs on the latter (though, even the first cast
is kind of undesirable after bitint lowering, we want large/huge bitints
lowered).  The ICE is because if BIT_FIELD_REFs first argument has
INTEGRAL_TYPE_P, we require type_has_mode_precision_p, but that is not the
case of _BitInt(256), it has BLKmode.

The following patch fixes it by doing the BIT_FIELD_REF with VCE to
BIT_FIELD_REF simplification only if the result is valid.

2023-11-24  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/112673
* match.pd (bit_field_ref (vce @0) -> bit_field_ref @0): Only simplify
if either @0 doesn't have scalar integral type or if it has mode
precision.

* gcc.dg/pr112673.c: New test.

lower-bitint: Lower FLOAT_EXPR from BITINT_TYPE INTEGER_CST [PR112679]

The bitint lowering pass only does something if it sees BITINT_TYPE (medium,
large, huge) SSA_NAMEs.  In the past I've already ran into one special case
where the above doesn't work well, if there is a store of medium/large/huge
BITINT_TYPE INTEGER_CST into memory, there might not be any BITINT_TYPE
SSA_NAMEs in the function, yet we need to lower.  This has been solved by
also checking for SSA_NAME_IS_VIRTUAL_OPERAND if at the vdef there isn't
such a store (the whole intent is make the pass as cheap as possible in the
currently very likely case that the IL doesn't have any BITINT_TYPEs at
all).
And the following testcase shows a similar problem.  With -frounding-math
we don't fold some of FLOAT_EXPRs with INTEGER_CST operands, and if those
INTEGER_CSTs are medium/large/huge BITINT_TYPEs, we need to either cast
the INTEGER_CST to corresponding INTEGER_TYPE (for medium) or lower to
internal fn call which is later turned into libgcc call (for large/huge).
The following patch does that, but of course admittedly this discovery
of stores and FLOAT_EXPRs means we already look through quite a few
SSA_NAME_DEF_STMTs even when BITINT_TYPEs never appear.

2023-11-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112679
* gimple-lower-bitint.cc (gimple_lower_bitint): Also stop first loop on
floating point SSA_NAME set in FLOAT_EXPR assignment from BITINT_TYPE
INTEGER_CST.  Set has_large_huge for those if that BITINT_TYPE is large
or huge.  Set kind to such FLOAT_EXPR assignment rhs1 BITINT_TYPE's kind.

* gcc.dg/bitint-42.c: New test.

tree-optimization/112677 - stack corruption with .COND_* reduction

The following makes sure to allocate enough space for vectype_op
in vectorizable_reduction.

PR tree-optimization/112677
* tree-vect-loop.cc (vectorizable_reduction): Use alloca
to allocate vectype_op.

Clean up by_pieces_ninsns

The by pieces compare can be implemented by overlapped operations. So
it should be taken into consideration when doing the adjustment for
overlap operations.  The mode returned from
widest_fixed_size_mode_for_size is already checked with mov_optab in
by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size.
So it is no need to check mov_optab again in by_pieces_ninsns.  The
patch fixes these issues.

gcc/
* expr.cc (by_pieces_ninsns): Include by pieces compare when
do the adjustment for overlap operations.  Replace mov_optab
checks with gcc assertion.

lower-bitint: Fix up -fnon-call-exceptions bit-field load lowering [PR112668]

As the following testcase shows, there are some bugs in the
-fnon-call-exceptions bit-field load lowering.  In particular, there
is a case where we want to emit a load early in the initialization
(before m_init_gsi) and because that load might throw exception, need
to split block after the load so that it has an EH edge.
Now, across this splitting, we have m_init_gsi, save_gsi (something
we put back into m_gsi afterwards) statement iterators and m_preheader_bb
which is used to determine the pre-header edge of a loop (if any).
As the testcase shows, both of these statement iterators and m_preheader_bb
as well need adjustments if the block was split.  If the stmt iterators
refer to a statement, they need to be updated so that if the statement is
in the bb after the split gsi_bb and gsi_seq is updated, otherwise they
ought to be the start of the new (second) bb.
Similarly, m_preheader_bb should be updated to the second bb if it was
the first before.  Other spots where we insert something before m_init_gsi
don't split blocks in there and are fine.

The m_gsi iterator is normal iterator to insert statements before it,
so gsi_end_p means insert statements at the end of basic block.
m_init_gsi is on the other side an iterator after which statements should be
inserted (so gsi_end_p means insert statements at the start of basic block
after labels), but the whole pass is written for insertion of statements before
iterators, so when in 3 spots it wants to insert something after m_init_gsi,
it saves current iterator to save_gsi and sets m_gsi to gsi_after_labels
if m_init_gsi was gsi_end_p, or to the next statement.  But it actually wasn't
updating m_init_gsi back when switching to normal iterator, this patch changes
that such that further statements after m_init_gsi will appear after the
set of statements inserted before m_init_gsi.

Finally, the pass had a couple of places where it wanted to create a gsi_end_p
iterator for a particular basic block, instead of doing
m_gsi = gsi_last_bb (bb); if (!gsi_end_p (m_gsi)) gsi_next (&m_gsi);
the pass now uses new m_gsi = gsi_end_bb (bb) function.

2023-11-24  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112668
* gimple-iterator.h (gsi_end, gsi_end_bb): New inline functions.
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): After
temporarily adding statements after m_init_gsi, update m_init_gsi
such that later additions after it will be after the added statements.
(bitint_large_huge::handle_load): Likewise.  When splitting
gsi_bb (m_init_gsi) basic block, update m_preheader_bb if needed
and update saved m_gsi as well if needed.
(bitint_large_huge::lower_mergeable_stmt,
bitint_large_huge::lower_comparison_stmt,
bitint_large_huge::lower_mul_overflow,
bitint_large_huge::lower_bit_query): Use gsi_end_bb.

* gcc.dg/bitint-40.c: New test.

tree: Fix up try_catch_may_fallthru [PR112619]

The following testcase ICEs with -std=c++98 since r14-5086 because
block_may_fallthru is called on a TRY_CATCH_EXPR whose second operand
is a MODIFY_EXPR rather than STATEMENT_LIST, which try_catch_may_fallthru
apparently expects.
I've been wondering whether that isn't some kind of FE bug and whether
there isn't some unwritten rule that second operand of TRY_CATCH_EXPR
must be a STATEMENT_LIST. Looking at the FEs, the C++ FE uses mostly its
own trees, TRY_BLOCK (TRY_CATCH_EXPR replacement) with HANDLER in it (CATCH_EXPR
replacement) - but HANDLER can be immediate second operand rather than nested
in STATEMENT_LIST, EH_SPEC_BLOCK (this one stands for both TRY_CATCH_EXPR
and EH_FILTER_EXPR in its second argument); both of these are only replaced
by the generic trees during gimplification though, so will unlikely be seen
by block_may_fallthru; and then CLEANUP_STMT, which is genericized
into TRY_CATCH_EXPR with non-CATCH_EXPR/EH_FILTER_EXPR in its body (this is
the one that causes the ICE on this testcase).
The Go and Rust FEs create TRY_CATCH_EXPR with CATCH_EXPR immediately in its
second argument (but either are unlucky that block_may_fallthru isn't called
or the body can always fallthru, or latent ICE), while the D FE most likely
hit this ICE and attempts to work around it, by checking at TRY_CATCH_EXPR
creation time if the second argument from pop_stmt_list is STATEMENT_LIST and
if not, forcefully wraps it into a STATEMENT_LIST.

Unfortunately, I don't see an easy way to create an artificial tree iterator
from just a single tree statement, so the patch duplicates what the loops
later do (after all, it is very simple, just didn't want to duplicate
also the large comments explaning it, so the 3 See below. comments).

2023-11-24 Jakub Jelinek <jakub@redhat.com>

PR c++/112619
* tree.cc (try_catch_may_fallthru): If second operand of
TRY_CATCH_EXPR is not a STATEMENT_LIST, handle it as if it was a
STATEMENT_LIST containing a single statement.

* g++.dg/eh/pr112619.C: New test.

tree-optimization/112344 - relax final value-replacement fix

The following tries to reduce the number of cases we use an unsigned
type for the addition when we know the original signed increment was
OK which is when the total unsigned increment computed fits the signed
type as well.

This fixes the observed testsuite fallout.

PR tree-optimization/112344
* tree-chrec.cc (chrec_apply): Only use an unsigned add
when the overall increment doesn't fit the signed type.

RISC-V: Optimize a special case of VLA SLP

When working on fixing bugs of zvl1024b. I notice a special VLA SLP case
can be better optimized.

v = vec_perm (op1, op2, { nunits - 1, nunits, nunits + 1, ... })

Before this patch, we are using genriec approach (vrgather):

vid
vadd.vx
vrgather
vmsgeu
vrgather

With this patch, we use vec_extract + slide1up:

scalar = vec_extract (last element of op1)
v = slide1up (op2, scalar)

Tested on zvl128b/zvl256b/zvl512b/zvl1024b of both RV32 and RV64 no regression.

Ok for trunk ?

PR target/112599

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns): New function.
(expand_vec_perm_const_1): Add new optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112599-2.c: New test.

RISC-V: Disable BSWAP optimization for NUNITS < 4

When fixing bugs, I notice there is a piece odd codes look incorrect.
which probably make codegen worse.

#include <stdint.h>

typedef int8_t vnx2qi __attribute__ ((vector_size (2)));

#define MASK_2(X, Y) (Y) - 1 - (X), (Y) - 2 - (X)

#define PERMUTE(TYPE, NUNITS)                                                  \
  __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2,     \
       TYPE *out)                      \
  {                                                                            \
    TYPE v                                                                     \
      = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \
    *(TYPE *) out = v;                                                         \
  }

#define TEST_ALL(T)                                                            \
  T (vnx2qi, 2)

TEST_ALL (PERMUTE)

Before this patch:

        vsetivli        zero,2,e8,mf8,ta,ma
        vle8.v  v1,0(a0)
        vsetivli        zero,1,e16,mf4,ta,ma
        vsrl.vi v2,v1,8
        vsll.vi v1,v1,8
        vor.vv  v1,v2,v1
        vsetivli        zero,2,e8,mf8,ta,ma
        vse8.v  v1,0(a2)
        ret

After this patch:

        vsetivli        zero,2,e8,mf8,ta,ma
        vle8.v  v3,0(a0)
        vid.v   v1
        vrsub.vi        v1,v1,1
        vrgather.vv     v2,v3,v1
        vse8.v  v2,0(a2)
        ret

Committed as it is very obvious if during code review.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_bswap_pattern): Disable for NUNIT < 4.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Ditto.

c++: Support lambdas in static template member initialisers [PR107398]

The testcase noted in the PR fails because the context of the lambda is
not in namespace scope, but rather in class scope. This patch removes
the assertion that the context must be a namespace and ensures that
lambdas in class scope still get the correct merge_kind.

PR c++/107398

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind): Handle lambdas in class
scope.
(maybe_key_decl): Remove assertion and fix whitespace.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-6_a.C: New test.
* g++.dg/modules/lambda-6_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

i386: Fix AVX512 and AVX10 option issues

gcc/ChangeLog:

PR target/112643
* config/i386/driver-i386.cc (check_avx10_avx512_features):
Renamed to ...
(check_avx512_features): this and remove avx10 check.
(host_detect_local_cpu): Never append -mno-avx10.1-{256,512} to
avoid emitting warnings when building GCC with native arch.
* config/i386/i386-builtin.def (BDESC): Add missing AVX512VL for
128/256 bit builtin for AVX512VP2INTERSECT.
* config/i386/i386-options.cc (ix86_option_override_internal):
Also check whether the AVX512 flags is set when trying to reset.
* config/i386/i386.h
(PTA_SKYLAKE_AVX512): Add missing PTA_EVEX512.
(PTA_ZNVER4): Ditto.

c++: check mismatching exports for class tags [PR98885]

Checks for exporting a declaration that was previously declared as not
exported is implemented in 'duplicate_decls', but this doesn't handle
declarations of classes. This patch adds these checks and slightly
adjusts the associated error messages for clarity.

PR c++/98885

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Adjust error message.
(xref_tag): Adjust error message. Check exporting decl that is
already declared as non-exporting.

gcc/testsuite/ChangeLog:

* g++.dg/modules/export-1.C: Adjust error messages. Remove
xfails for working case. Add new test case.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

Daily bump.

MAINTAINERS: Add myself to write after approval and DCO

ChangeLog:

* MAINTAINERS: Add myself to write after approval and DCO

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

contrib/regression/btest-gcc.sh: Optionally handle XPASS.

Tests with keys that match both PASS, FAIL (or now
optionally XPASS), count as fail.  XPASSes were previously
ignored.  Handling them as FAIL seems the most useful
alternative, but not counting XPASSes may be deliberate.
It's also a matter of compatibility, so make it optional.

Attempts to use --handle-xpass-as-fail was previously
flagged as a usage error.  If you pass it now, on state with
previous mixed XPASS and PASS results but doesn't change in
this run, the XPASS is discovered as a (new) regression.
For new XPASSing tests, it's handled as a new FAIL.

* btest-gcc.sh (--handle-xpass-as-fail): New option.

contrib/regression/btest-gcc.sh: Simplify option handling.

* btest-gcc.sh (Option handling): Break out shifts from each
option alternative.

contrib/regression/btest-gcc.sh: Handle multiple options.

This is a long-standing bug: passing "-j --add-passes-despite-regression"
or "--add-passes-despite-regression -j" caused the second option to be
treated as TARGET; the first non-option parameter.

* btest-gcc.sh (Option handling): Handle multiple options.

hppa: Fix g++.dg/modules/bad-mapper-1.C on hpux

2023-11-23 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* g++.dg/modules/bad-mapper-1.C: Add hppa*-*-hpux* to dg-error
"-:failed mapper handshake communication" targets.

hppa: Fix gcc.dg/analyzer/fd-4.c on hpux

2023-11-23 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/fd-4.c: Define _MODE_T on hpux.

hppa: Export main in pr104869.C on hpux

This is needed to avoid a linker warning.

2023-11-23 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* g++.dg/pr104869.C: Export main on hpux.

testsuite, lib: Re-allow mulitple function start labels.

The change applied in r14-5760-g2a46e0e7e20 changed the behaviour of
functions with assembly like:

bar:
__acle_se_bar:

Where both bar and __acle_se_bar are globals refering to the same
function body. The old behaviour overrides 'bar' with '__acle_se_bar'
and the scan tests for that label.

The change here re-allows the override.

Case like this are not legal Mach-O (where two global symbols cannot
have the same address in the assembler output). However, given the
constraints on the Mach-O scanning, it does not seem that it is
necessary to skip the change (any incorrect case should be easily
evident in the assembler).

gcc/testsuite/ChangeLog:

* lib/scanasm.exp: Allow multiple function start symbols,
taking the last as the function name.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

testsuite: fortran: fix invalid testcases (missing MOLD argument to NULL)

The Fortran standard requires that NULL() passed to an assumed-rank
dummy argument has a MOLD argument.

gcc/testsuite/ChangeLog:

PR fortran/104819
* gfortran.dg/assumed_rank_10.f90: Add MOLD argument to NULL().
* gfortran.dg/assumed_rank_8.f90: Likewise.

Fortran: restrictions on integer arguments to SYSTEM_CLOCK [PR112609]

Fortran 2023 added restrictions on integer arguments to SYSTEM_CLOCK to
have a decimal exponent range at least as large as a default integer,
and that all integer arguments have the same kind type parameter.

gcc/fortran/ChangeLog:

PR fortran/112609
* check.cc (gfc_check_system_clock): Add checks on integer arguments
to SYSTEM_CLOCK specific to F2023.
* error.cc (notify_std_msg): Adjust to handle new features added
in F2023.
* gfortran.texi (_gfortran_set_options): Document GFC_STD_F2023_DEL,
remove obsolete option GFC_STD_F2008_TS and fix enumeration values.
* libgfortran.h (GFC_STD_F2023_DEL): Add and use in GFC_STD_OPT_F23.
* options.cc (set_default_std_flags): Add GFC_STD_F2023_DEL.

gcc/testsuite/ChangeLog:

PR fortran/112609
* gfortran.dg/system_clock_1.f90: Add option -std=f2003.
* gfortran.dg/system_clock_3.f08: Add option -std=f2008.
* gfortran.dg/system_clock_4.f90: New test.

AVR: PR target/86776: Implement CVE-2017-5753.

gcc/
PR target/86776
* config/avr/avr.cc (TARGET_HAVE_SPECULATION_SAFE_VALUE): Define
to speculation_safe_value_not_needed.

hppa: xfail scan-assembler-not check in g++.dg/cpp0x/initlist-const1.C

2023-11-23 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-const1.C: xfail scan-assembler-not
check on hppa*-*-hpux*.

libstdc++: Define std::ranges::to for C++23 (P1206R7) [PR111055]

This adds the std::ranges::to functions for C++23. The rest of P1206R7
is not yet implemented, i.e. the new constructors taking the
std::from_range tag, and the new insert_range, assign_range, etc. member
functions. std::ranges::to works with the standard containers even
without the new constructors, so this is useful immediately.

The __cpp_lib_ranges_to_container feature test macro can be defined now,
because that only indicates support for the changes in <ranges>, which
are implemented by this patch. The __cpp_lib_containers_ranges macro
will be defined once all containers support the new member functions.

libstdc++-v3/ChangeLog:

PR libstdc++/111055
* include/bits/ranges_base.h (from_range_t): Define new tag
type.
(from_range): Define new tag object.
* include/bits/version.def (ranges_to_container): Define.
* include/bits/version.h: Regenerate.
* include/std/ranges (ranges::to): Define.
* testsuite/std/ranges/conv/1.cc: New test.
* testsuite/std/ranges/conv/2_neg.cc: New test.
* testsuite/std/ranges/conv/version.cc: New test.

libstdc++: Fix access error in __gnu_test::uneq_allocator

The operator== function is only a friend of the LHS argument, so cannot
access the private member of the RHS argument. Use the public accessor
instead.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_allocator.h (uneq_allocator): Fix
equality operator for heterogeneous comparisons.

Don't skip check for warning at line 411 in Wattributes.c on hppa*64*-*-*

2023-11-23 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* c-c++-common/Wattributes.c: Don't skip check for warning
at line 411 in Wattributes.c on hppa*64*-*-*.

gcc: Introduce -fhardened

In <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628748.html>
I proposed -fhardened, a new umbrella option that enables a reasonable set
of hardening flags.  The read of the room seems to be that the option
would be useful.  So here's a patch implementing that option.

Currently, -fhardened enables:

  -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
  -D_GLIBCXX_ASSERTIONS
  -ftrivial-auto-var-init=zero
  -fPIE  -pie  -Wl,-z,relro,-z,now
  -fstack-protector-strong
  -fstack-clash-protection
  -fcf-protection=full (x86 GNU/Linux only)

-fhardened will not override options that were specified on the command line
(before or after -fhardened).  For example,

     -D_FORTIFY_SOURCE=1 -fhardened

means that _FORTIFY_SOURCE=1 will be used.  Similarly,

      -fhardened -fstack-protector

will not enable -fstack-protector-strong.

Currently, -fhardened is only supported on GNU/Linux.

In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
to anything.  This patch provides -Whardened, enabled by default, which
warns when -fhardened couldn't enable a particular option.  I think most
often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
were not enabled.

gcc/c-family/ChangeLog:

* c-opts.cc: Include "target.h".
(c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
and _GLIBCXX_ASSERTIONS.

gcc/ChangeLog:

* common.opt (Whardened, fhardened): New options.
* config.in: Regenerate.
* config/bpf/bpf.cc: Include "opts.h".
(bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
not inform that -fstack-protector does not work.
* config/i386/i386-options.cc (ix86_option_override_internal): When
-fhardened, maybe enable -fcf-protection=full.
* config/linux-protos.h (linux_fortify_source_default_level): Declare.
* config/linux.cc (linux_fortify_source_default_level): New.
* config/linux.h (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Redefine.
* configure: Regenerate.
* configure.ac: Check if the linker supports '-z now' and '-z relro'.
Check if -fhardened is supported on $target_os.
* doc/invoke.texi: Document -fhardened and -Whardened.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Add.
* gcc.cc (driver_handle_option): Remember if any link options or -static
were specified on the command line.
(process_command): When -fhardened, maybe enable -pie and
-Wl,-z,relro,-z,now.
* opts.cc (flag_stack_protector_set_by_fhardened_p): New global.
(finish_options): When -fhardened, enable
-ftrivial-auto-var-init=zero and -fstack-protector-strong.
(print_help_hardened): New.
(print_help): Call it.
* opts.h (flag_stack_protector_set_by_fhardened_p): Declare.
* target.def (fortify_source_default_level): New target hook.
* targhooks.cc (default_fortify_source_default_level): New.
* targhooks.h (default_fortify_source_default_level): Declare.
* toplev.cc (process_options): When -fhardened, enable
-fstack-clash-protection.  If flag_stack_protector_set_by_fhardened_p,
do not warn that -fstack-protector not supported for this target.
Don't enable -fhardened when !HAVE_FHARDENED_SUPPORT.

gcc/testsuite/ChangeLog:

* gcc.misc-tests/help.exp: Test -fhardened.
* c-c++-common/fhardened-1.S: New test.
* c-c++-common/fhardened-1.c: New test.
* c-c++-common/fhardened-10.c: New test.
* c-c++-common/fhardened-11.c: New test.
* c-c++-common/fhardened-12.c: New test.
* c-c++-common/fhardened-13.c: New test.
* c-c++-common/fhardened-14.c: New test.
* c-c++-common/fhardened-15.c: New test.
* c-c++-common/fhardened-2.c: New test.
* c-c++-common/fhardened-3.c: New test.
* c-c++-common/fhardened-4.c: New test.
* c-c++-common/fhardened-5.c: New test.
* c-c++-common/fhardened-6.c: New test.
* c-c++-common/fhardened-7.c: New test.
* c-c++-common/fhardened-8.c: New test.
* c-c++-common/fhardened-9.c: New test.
* gcc.target/i386/cf_check-6.c: New test.

libgcc: mark __hardcfr_check_fail as always_inline

The function __hardcfr_check_fail in hardcfr.c is internal and static
inline.  It receives many arguments, which require more than five
registers to be passed in bpf-none-unknown targets.  BPF is limited to
that number of registers to pass arguments, and therefore libgcc fails
to build in that target.  This patch marks the function with the
always_inline attribute, fixing the bpf build.

Tested in bpf-unknown-none target and x86_64-linux-gnu host.

libgcc/ChangeLog:

* hardcfr.c (__hardcfr_check_fail): Mark as always_inline.

testsuite: Fix subexpressions with `scan-assembler-times'

We have an issue with `scan-assembler-times' handling expressions using
subexpressions as produced by capturing parentheses `()' in an odd way,
and one that is inconsistent with `scan-assembler', `scan-assembler-not',
etc.  The problem comes from calling `regexp' with `-inline -all', which
causes a list to be returned that would otherwise be placed in match
variables.

Consequently if we have say:

/* { dg-final { scan-assembler-times "\\s(foo|bar)\\s" 1 } } */

in a test case and there is a lone `foo' present in output being matched,
then our invocation of `regexp -inline -all' in `scan-assembler-times'
will return:

{ foo } foo

and that in turn will confuse our match count calculation as `llength'
will return 2 rather than 1, making the test fail even though `foo' was
only actually matched once.

It seems unclear why we chose to call `regexp' in such an odd way in the
first place just to figure out the number of matches.  The first version
of TCL that supports the `-all' option to `regexp' is 8.3, and according
to its documentation[1][2] `regexp' already returns the number of matches
found whenever `-all' has been used *unless* `-inline' has also been used.

Remove the `-inline' option then along with the `llength' invocation.

References:

[1] "Tcl Built-In Commands - regexp manual page",
    <https://www.tcl.tk/man/tcl8.2.3/TclCmd/regexp.html>

[2] "Tcl Built-In Commands - regexp manual page",
    <https://www.tcl.tk/man/tcl8.3/TclCmd/regexp.html>

gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Remove the `-inline'
option to `regexp' and the wrapping `llength' call.