gcc.gnu.org Git - gcc.git/log

i386: Fine tune STV register conversion costs for -Os.

The eagle-eyed may have spotted that my recent testcases for DImode shifts
on x86_64 included -mno-stv in the dg-options.  This is because the
Scalar-To-Vector (STV) pass currently transforms these shifts to use
SSE vector operations, producing larger code even with -Os.  The issue
is that the compute_convert_gain currently underestimates the size of
instructions required for interunit moves, which is corrected with the
patch below.

For the simple test case:

unsigned long long shl1(unsigned long long x) { return x << 1; }

without this patch, GCC -m32 -Os -mavx2 currently generates:

shl1: push   %ebp // 1 byte
mov    %esp,%ebp // 2 bytes
vmovq  0x8(%ebp),%xmm0 // 5 bytes
pop    %ebp // 1 byte
vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes
vmovd  %xmm0,%eax // 4 bytes
vpextrd $0x1,%xmm0,%edx  // 6 bytes
ret // 1 byte  = 24 bytes total

with this patch, we now generate the shorter

shl1: push   %ebp // 1 byte
mov    %esp,%ebp // 2 bytes
mov    0x8(%ebp),%eax // 3 bytes
mov    0xc(%ebp),%edx // 3 bytes
pop    %ebp // 1 byte
add    %eax,%eax // 2 bytes
adc    %edx,%edx // 2 bytes
ret // 1 byte  = 15 bytes total

Benchmarking using CSiBE, shows that this patch saves 1361 bytes
when compiling with -m32 -Os, and saves 172 bytes when compiling
with -Os.

2023-10-24  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Provide
more accurate values (sizes) for inter-unit moves with -Os.

ARC: Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.

This patch completes the ARC back-end's transition to using pre-reload
splitters for SImode shifts and rotates on targets without a barrel
shifter.  The core part is that the shift_si3 define_insn is no longer
needed, as shifts and rotates that don't require a loop are split
before reload, and then because shift_si3_loop is the only caller
of output_shift, both can be significantly cleaned up and simplified.
The output_shift function (Claudiu's "the elephant in the room") is
renamed output_shift_loop, which handles just the four instruction
zero-overhead loop implementations.

Aside from the clean-ups, the user visible changes are much improved
implementations of SImode shifts and rotates on affected targets.

For the function:
unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); }

GCC with -O2 -mcpu=em would previously generate:

rotr_1: lsr_s r2,r0
        bmsk_s r0,r0,0
        ror     r0,r0
        j_s.d   [blink]
        or_s    r0,r0,r2

with this patch, we now generate:

        j_s.d   [blink]
        ror     r0,r0

For the function:
unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); }

GCC with -O2 -mcpu=em would previously generate:

rotr_31:
        mov_s   r2,r0   ;4
        asl_s r0,r0
        add.f 0,r2,r2
        rlc r2,0
        j_s.d   [blink]
        or_s    r0,r0,r2

with this patch we now generate an add.f followed by an adc:

rotr_31:
        add.f   r0,r0,r0
        j_s.d   [blink]
        add.cs  r0,r0,1

Shifts by constants requiring a loop have been improved for even counts
by performing two operations in each iteration:

int shl10(int x) { return x >> 10; }

Previously looked like:

shl10: mov.f lp_count, 10
        lpnz    2f
        asr r0,r0
        nop
2:      # end single insn loop
        j_s     [blink]

And now becomes:

shl10:
        mov     lp_count,5
        lp      2f
        asr     r0,r0
        asr     r0,r0
2:      # end single insn loop
        j_s     [blink]

So emulating ARC's SWAP on architectures that don't have it:

unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); }

previously required 10 instructions and ~70 cycles:

rotr_16:
        mov_s   r2,r0   ;4
        mov.f lp_count, 16
        lpnz    2f
        add r0,r0,r0
        nop
2:      # end single insn loop
        mov.f lp_count, 16
        lpnz    2f
        lsr r2,r2
        nop
2:      # end single insn loop
        j_s.d   [blink]
        or_s    r0,r0,r2

now becomes just 4 instructions and ~18 cycles:

rotr_16:
        mov     lp_count,8
        lp      2f
        ror     r0,r0
        ror     r0,r0
2:      # end single insn loop
        j_s     [blink]

2023-10-24  Roger Sayle  <roger@nextmovesoftware.com>
    Claudiu Zissulescu  <claziss@gmail.com>

gcc/ChangeLog
* config/arc/arc-protos.h (output_shift): Rename to...
(output_shift_loop): Tweak API to take an explicit rtx_code.
(arc_split_ashl): Prototype new function here.
(arc_split_ashr): Likewise.
(arc_split_lshr): Likewise.
(arc_split_rotl): Likewise.
(arc_split_rotr): Likewise.
* config/arc/arc.cc (output_shift): Delete local prototype.  Rename.
(output_shift_loop): New function replacing output_shift to output
a zero overheap loop for SImode shifts and rotates on ARC targets
without barrel shifter (i.e. no hardware support for these insns).
(arc_split_ashl): New helper function to split *ashlsi3_nobs.
(arc_split_ashr): New helper function to split *ashrsi3_nobs.
(arc_split_lshr): New helper function to split *lshrsi3_nobs.
(arc_split_rotl): New helper function to split *rotlsi3_nobs.
(arc_split_rotr): New helper function to split *rotrsi3_nobs.
(arc_print_operand): Correct whitespace.
(arc_rtx_costs): Likewise.
(hwloop_optimize): Likewise.
* config/arc/arc.md (ANY_SHIFT_ROTATE): New define_code_iterator.
(define_code_attr insn): New code attribute to map to pattern name.
(<ANY_SHIFT_ROTATE>si3): New expander unifying previous ashlsi3,
ashrsi3 and lshrsi3 define_expands.  Adds rotlsi3 and rotrsi3.
(*<ANY_SHIFT_ROTATE>si3_nobs): New define_insn_and_split that
unifies the previous *ashlsi3_nobs, *ashrsi3_nobs and *lshrsi3_nobs.
We now call arc_split_<insn> in arc.cc to implement each split.
(shift_si3): Delete define_insn, all shifts/rotates are now split.
(shift_si3_loop): Rename to...
(<insn>si3_loop): define_insn to handle loop implementations of
SImode shifts and rotates, calling ouput_shift_loop for template.
(rotrsi3): Rename to...
(*rotrsi3_insn): define_insn for TARGET_BARREL_SHIFTER's ror.
(*rotlsi3): New define_insn_and_split to transform left rotates
into right rotates before reload.
(rotlsi3_cnt1): New define_insn_and_split to implement a left
rotate by one bit using an add.f followed by an adc.
* config/arc/predicates.md (shiftr4_operator): Delete.

testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

The test was declaring 'int *carry;' and wrote to '*carry' without
initializing 'carry' first, leading to an attempt to write at address
zero, and a crash.

Fix by declaring 'int carry;' and passing '&carrry' instead of 'carry'
as parameter.

2023-09-08 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.

arc: Remove mpy_dest_reg_operand predicate

The mpy_dest_reg_operand is just a wrapper for
register_operand. Remove it.

gcc/

* config/arc/arc.md (mulsi3_700): Update pattern.
(mulsi3_v2): Likewise.
* config/arc/predicates.md (mpy_dest_reg_operand): Remove it.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

Improve factor_out_conditional_operation for conversions and constants

In the case of a NOP conversion (precisions of the 2 types are equal),
factoring out the conversion can be done even if int_fits_type_p returns
false and even when the conversion is defined by a statement inside the
conditional. Since it is a NOP conversion there is no zero/sign extending
happening which is why it is ok to be done here; we were trying to prevent
an extra sign/zero extend from being moved away from definition which no-op
conversions are not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/104376
PR tree-optimization/101541
* tree-ssa-phiopt.cc (factor_out_conditional_operation):
Allow nop conversions even if it is defined by a statement
inside the conditional.

gcc/testsuite/ChangeLog:

PR tree-optimization/101541
* gcc.dg/tree-ssa/phi-opt-39.c: New test.

match: Fix the `popcnt(a&b) + popcnt(a|b)` pattern for types [PR111913]

So this pattern needs a little help on the gimple side of things to know what
the type popcount should be. For most builtins, the type is the same as the input
but popcount and others are not. And when using it with another outer expression,
genmatch needs some slight help to know that the return type was type rather than
the argument type.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111913

gcc/ChangeLog:

* match.pd (`popcount(X&Y) + popcount(X|Y)`): Add the resulting
type for popcount.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/fold-popcount-1.c: New test.
* gcc.dg/fold-popcount-8a.c: New test.

rtl-ssa: Avoid creating duplicated phis

If make_uses_available was called twice for the same use,
we could end up trying to create duplicate definitions for
the same extended live range.

gcc/
* rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Check
whether the requested phi already exists.

rtl-ssa: Don't insert after insns that can throw

rtl_ssa::can_insert_after didn't handle insns that can throw.
Fixing that avoids a regression with a later patch.

gcc/
* rtl-ssa.h: Include cfgbuild.h.
* rtl-ssa/movement.h (can_insert_after): Replace is_jump with the
more comprehensive control_flow_insn_p.

rtl-ssa: Fix handling of deleted insns

RTL-SSA queues up some invasive changes for later. But sometimes
the insns involved in those changes can be deleted by later
optimisations, making the queued change unnecessary. This patch
checks for that case.

gcc/
* rtl-ssa/changes.cc (function_info::perform_pending_updates): Check
whether an insn has been replaced by a note.

rtl-ssa: Fix null deref in first_any_insn_use

first_any_insn_use implicitly (but contrary to its documentation)
assumed that there was at least one use.

gcc/
* rtl-ssa/member-fns.inl (first_any_insn_use): Handle null
m_first_use.

i386: Avoid paradoxical subreg dests in vector zero_extend

For the V2HI -> V2SI zero extension in:

  typedef unsigned short v2hi __attribute__((vector_size(4)));
  typedef unsigned int v2si __attribute__((vector_size(8)));
  v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }

ix86_expand_sse_extend would generate:

   (set (reg:V2HI 102)
        (const_vector:V2HI [(const_int 0 [0])
    (const_int 0 [0])]))
   (set (subreg:V8HI (reg:V2HI 101) 0)
        (vec_select:V8HI
          (vec_concat:V16HI (subreg:V8HI (reg/v:V2HI 99 [ x ]) 0)
                            (subreg:V8HI (reg:V2HI 102) 0))
          (parallel [(const_int 0 [0])
                     (const_int 8 [0x8])
                     (const_int 1 [0x1])
                     (const_int 9 [0x9])
                     (const_int 2 [0x2])
                     (const_int 10 [0xa])
                     (const_int 3 [0x3])
                     (const_int 11 [0xb])])))
  (set (reg:V2SI 100)
       (subreg:V2SI (reg:V2HI 101) 0))
    (expr_list:REG_EQUAL (zero_extend:V2SI (reg/v:V2HI 99 [ x ])))

But using (subreg:V8HI (reg:V2HI 101) 0) as the destination of
the vec_select means that only the low 4 bytes of the destination
are stored.  Only the lower half of reg 100 is well-defined.

Things tend to happen to work if the register allocator ties reg 101
to reg 100.  But it caused problems with the upcoming late-combine pass
because we propagated the set of reg 100 into its uses.

gcc/
* config/i386/i386-expand.cc (ix86_split_mmx_punpck): Allow the
destination to be wider than the sources.  Take the mode from the
first source.
(ix86_expand_sse_extend): Pass the destination directly to
ix86_split_mmx_punpck, rather than using a fresh register that
is half the size.

i386: Fix unprotected REGNO in aeswidekl_operation

I hit an ICE in aeswidekl_operation while testing the late-combine
pass on x86. The predicate tested REGNO without first testing REG_P.

gcc/
* config/i386/predicates.md (aeswidekl_operation): Protect
REGNO check with REG_P.

aarch64: Define TARGET_INSN_COST

This patch adds a bare-bones TARGET_INSN_COST.  See the comment
in the patch for the rationale.

Just to get a flavour for how much difference it makes, I tried
compiling the testsuite with -Os -fno-schedule-insns{,2} and
seeing what effect the patch had on the number of instructions.
Very few tests changed, but all the changes were positive:

  Tests   Good    Bad   Delta    Best   Worst  Median
  =====   ====    ===   =====    ====   =====  ======
     19     19      0    -177     -52      -1      -4

The change for -O2 was even smaller, but more mixed:

  Tests   Good    Bad   Delta    Best   Worst  Median
  =====   ====    ===   =====    ====   =====  ======
      6      3      3      -8      -9       6      -2

There were no obvious effects on SPEC CPU2017.

The patch is needed to avoid a regression with a later change.

gcc/
* config/aarch64/aarch64.cc (aarch64_insn_cost): New function.
(TARGET_INSN_COST): Define.

aarch64: Avoid bogus atomics match

The non-LSE pattern aarch64_atomic_exchange<mode> comes before the
LSE pattern aarch64_atomic_exchange<mode>_lse. From a recog
perspective, the only difference between the patterns is that
the non-LSE one clobbers CC and needs a scratch.

However, combine and RTL-SSA can both add clobbers to make a
pattern match. This means that if they try to rerecognise an
LSE pattern, they could end up turning it into a non-LSE pattern.
This patch adds a !TARGET_LSE test to avoid that.

This is needed to avoid a regression with later patches.

gcc/
* config/aarch64/atomics.md (aarch64_atomic_exchange<mode>): Require
!TARGET_LSE.

RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

Calling vget/vset intrinsic without receiving a return value will cause
a crash. Because in this case e.target is null.
This patch should be backported to releases/gcc-13.

PR target/111935

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111935.c: New test.

libgcc: make heap-based trampolines conditional on libc presence

To build `libc` for a target one needs to build `gcc` without `libc`
support first. Commit r14-4823-g8abddb187b3348 "libgcc: support
heap-based trampolines" added unconditional `libc` dependency and broke
libc-less `gcc` builds.

An example failure on `x86_64-unknown-linux-gnu`:

    $ mkdir -p /tmp/empty
    $ ../gcc/configure \
        --disable-multilib \
        --without-headers \
        --with-newlib \
        --enable-languages=c \
        --disable-bootstrap \
        --disable-gcov \
        --disable-threads \
        --disable-shared \
        --disable-libssp \
        --disable-libquadmath \
        --disable-libgomp \
        --disable-libatomic \
        --with-build-sysroot=/tmp/empty
    $ make
    ...
    /tmp/gb/./gcc/xgcc -B/tmp/gb/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -B/usr/local/x86_64-pc-linux-gnu/lib/ -isystem /usr/local/x86_64-pc-linux-gnu/include -isystem /usr/local/x86_64-pc-linux-gnu/sys-include --sysroot=/tmp/empty   -g -O2 -O2  -g -O2 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -I. -I. -I../.././gcc -I/home/slyfox/dev/git/gcc/libgcc -I/home/slyfox/dev/git/gcc/libgcc/. -I/home/slyfox/dev/git/gcc/libgcc/../gcc -I/home/slyfox/dev/git/gcc/libgcc/../include  -DHAVE_CC_TLS  -DUSE_TLS  -o heap-trampoline.o -MT heap-trampoline.o -MD -MP -MF heap-trampoline.dep  -c .../gcc/libgcc/config/i386/heap-trampoline.c -fvisibility=hidden -DHIDE_EXPORTS
    ../gcc/libgcc/config/i386/heap-trampoline.c:3:10: fatal error: unistd.h: No such file or directory
        3 | #include <unistd.h>
          |          ^~~~~~~~~~
    compilation terminated.
    make[2]: *** [.../gcc/libgcc/static-object.mk:17: heap-trampoline.o] Error 1
    make[2]: Leaving directory '/tmp/gb/x86_64-pc-linux-gnu/libgcc'
    make[1]: *** [Makefile:13307: all-target-libgcc] Error 2

The change inhibits any heap-based trampoline code.

libgcc/

* config/aarch64/heap-trampoline.c: Disable when libc is not
present.
* config/i386/heap-trampoline.c: Ditto.

Remove obsolete debugging formats from names list

* opts.cc (debug_type_names): Remove stabs and xcoff.
(df_set_names): Adjust.

RISC-V: Fix ICE of RTL CHECK on VSETVL PASS[PR111947]

ICE on vsetvli a5, 8 instruction demand info.

The AVL is const_int 8 which ICE on RENGO caller.

Committed as it is obvious fix.

PR target/111947

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_lcm_local_properties): Add REGNO check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111947.c: New test.

Daily bump.

libcpp: Improve the diagnostic for poisoned identifiers [PR36887]

The PR requests an enhancement to the diagnostic issued for the use of a
poisoned identifier. Currently, we show the location of the usage, but not
the location which requested the poisoning, which would be helpful for the
user if the decision to poison an identifier was made externally, such as
in a library header.

In order to output this information, we need to remember a location_t for
each identifier that has been poisoned, and that data needs to be preserved
as well in a PCH. One option would be to add a field to struct cpp_hashnode,
but there is no convenient place to add it without increasing the size of
the struct for all identifiers. Given this facility will be needed rarely,
it seemed better to add a second hash map, which is handled PCH-wise the
same as the current one in gcc/stringpool.cc. This hash map associates a new
struct cpp_hashnode_extra with each identifier that needs one. Currently
that struct only contains the new location_t, but it could be extended in
the future if there is other ancillary data that may be convenient to put
there for other purposes.

libcpp/ChangeLog:

PR preprocessor/36887
* directives.cc (do_pragma_poison): Store in the extra hash map the
location from which an identifier has been poisoned.
* lex.cc (identifier_diagnostics_on_lex): When issuing a diagnostic
for the use of a poisoned identifier, also add a note indicating the
location from which it was poisoned.
* identifiers.cc (alloc_node): Convert to template function.
(_cpp_init_hashtable): Handle the new extra hash map.
(_cpp_destroy_hashtable): Likewise.
* include/cpplib.h (struct cpp_hashnode_extra): New struct.
(cpp_create_reader): Update prototype to...
* init.cc (cpp_create_reader): ...accept an argument for the extra
hash table and pass it to _cpp_init_hashtable.
* include/symtab.h (ht_lookup): New overload for convenience.
* internal.h (struct cpp_reader): Add EXTRA_HASH_TABLE member.
(_cpp_init_hashtable): Adjust prototype.

gcc/c-family/ChangeLog:

PR preprocessor/36887
* c-opts.cc (c_common_init_options): Pass new extra hash map
argument to cpp_create_reader().

gcc/ChangeLog:

PR preprocessor/36887
* toplev.h (ident_hash_extra): Declare...
* stringpool.cc (ident_hash_extra): ...this new global variable.
(init_stringpool): Handle ident_hash_extra as well as ident_hash.
(ggc_mark_stringpool): Likewise.
(ggc_purge_stringpool): Likewise.
(struct string_pool_data_extra): New struct.
(spd2): New GC root variable.
(gt_pch_save_stringpool): Use spd2 to handle ident_hash_extra,
analogous to how spd is used to handle ident_hash.
(gt_pch_restore_stringpool): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/36887
* c-c++-common/cpp/diagnostic-poison.c: New test.
* g++.dg/pch/pr36887.C: New test.
* g++.dg/pch/pr36887.Hs: New test.

compiler: move Selector_expression up in file

This is a mechanical change to move Selector_expression up in expressions.cc.
This will make it visible to Builtin_call_expression for later work.
This produces a very large "git --diff", but "git diff --minimal" is clear.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536642

compiler: make xx_constant_value methods non-const

This changes the Expression {numeric,string,boolean}_constant_value
methods non-const. This does not affect anything immediately,
but will be useful for later CLs in this series.

The only real effect is to Builtin_call_expression::do_export,
which remains const and can no longer call numeric_constant_value.
But it never needed to call it, as do_export runs after do_lower,
and do_lower replaces a constant expression with the actual constant.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536641

compiler: pass gogo to Runtime::make_call

This is a boilerplate change to pass gogo to Runtime::make_call.
It's not currently used but will be used by later CLs in this series.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536640

compiler: add Expression::is_untyped method

This method is not currently used by anything, but it will be used
by later CLs in this series.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536639

syscall: add missing type conversion

The gofrontend incorrectly accepted code that was missing a type conversion.
The test case for this is bug518.go in https://go.dev/cl/536537.
Future CLs in this series will detect the type error.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536638

vect: Allow same precision for bit-precision conversions.

In PR111794 we miss a vectorization because on riscv type precision and
mode precision differ for mask types. We can still vectorize when
allowing assignments with the same precision for dest and source which
is what this patch does.

gcc/ChangeLog:

PR tree-optimization/111794
* tree-vect-stmts.cc (vectorizable_assignment): Add
same-precision exception for dest and source.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/slp-mask-1.c: New test.
* gcc.target/riscv/rvv/autovec/slp-mask-run-1.c: New test.

RISC-V: Add popcount fallback expander.

I didn't manage to get back to the generic vectorizer fallback for
popcount so I figured I'd rather create a popcount fallback in the
riscv backend. It uses the WWG algorithm from libgcc.

gcc/ChangeLog:

* config/riscv/autovec.md (popcount<mode>2): New expander.
* config/riscv/riscv-protos.h (expand_popcount): Define.
* config/riscv/riscv-v.cc (expand_popcount): Vectorize popcount
with the WWG algorithm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount.c: New test.

tree-optimization/111916 - SRA of BIT_FIELD_REF of constant pool entries

The following adjusts a leftover BIT_FIELD_REF special-casing to only
cover the cases general code doesn't handle.

PR tree-optimization/111916
* tree-sra.cc (sra_modify_assign): Do not lower all
BIT_FIELD_REF reads that are sra_handled_bf_read_p.

* gcc.dg/torture/pr111916.c: New testcase.

tree-optimization/111915 - mixing grouped and non-grouped accesses

The change to allow SLP of non-grouped accesses failed to check
for the case of mixing with grouped accesses.

PR tree-optimization/111915
* tree-vect-slp.cc (vect_build_slp_tree_1): Check all
accesses are either grouped or not.

* gcc.dg/vect/pr111915.c: New testcase.

ipa/111914 - perform parameter init after remapping types

The following addresses a mismatch in SSA name vs. symbol when
we emit a dummy assignment when not optimizing. The temporary
we create is not remapped by initialize_inlined_parameters because
we have no easy way to get at it. The following instead emits
the additional statement after we have remapped the type of
the replacement variable.

PR ipa/111914
* tree-inline.cc (setup_one_parameter): Move code emitting
a dummy load when not optimizing ...
(initialize_inlined_parameters): ... here to after when
we remapped the parameter type.

* gcc.dg/pr111914.c: New testcase.

configure, libquadmath: Remove unintended AC_CHECK_LIBM [PR111928]

This was a rebase error, that managed to pass testing on Darwin and
Linux (but fails on bare metal).

PR libquadmath/111928

libquadmath/ChangeLog:

* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Remove AC_CHECK_LIBM.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

SH: Fix PR 111001

gcc/ChangeLog:

PR target/111001
* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
Skip over nop move insns.

middle-end: don't keep .MEM guard nodes for PHI nodes who dominate loop [PR111860]

The previous patch tried to remove PHI nodes that dominated the first loop,
however the correct fix is to only remove .MEM nodes.

This patch thus makes the condition a bit stricter and only tries to remove
MEM phi nodes.

I couldn't figure out a way to easily determine if a particular PHI is vUSE
related, so the patch does:

1. check if the definition is a vDEF and not defined in main loop.
2. check if the definition is a PHI and not defined in main loop.
3. check if the definition is a default definition.

For no 2 and 3 we may misidentify the PHI, in both cases the value is defined
outside of the loop version block which also makes it ok to remove.

gcc/ChangeLog:

PR tree-optimization/111860
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Drop .MEM nodes only.

gcc/testsuite/ChangeLog:

PR tree-optimization/111860
* gcc.dg/vect/pr111860-2.c: New test.
* gcc.dg/vect/pr111860-3.c: New test.

move the (a-b) CMP 0 ? (a-b) : (b-a) optimization from fold_cond_expr_with_comparison to match

This patch moves the `(a-b) CMP 0 ? (a-b) : (b-a)` optimization
from fold_cond_expr_with_comparison to match.

Bootstrapped and tested on x86_64-linux-gnu.

Changes in:
v2: Removes `(a == b) ? 0 : (b - a)` handling since it was handled
    via r14-3606-g3d86e7f4a8ae
    Change zerop to integer_zerop for `(a - b) == 0 ? 0 : (b - a)`,
    Add `(a - b) != 0 ? (a - b) : 0` handling.

gcc/ChangeLog:

* match.pd (`(A - B) CMP 0 ? (A - B) : (B - A)`):
New patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-38.c: New test.

Use error_mark_node after error in convert

While working on PR c/111903, I Noticed that
convert will convert integer_zero_node to that
type after an error instead of returning error_mark_node.
From what I can tell this was the old way of not having
error recovery since other places in this file does return
error_mark_node and the places I am replacing date from
when the file was imported into the repro (either via a gcc2 merge
or earlier).

I also had to update the objc front-end to allow for the error_mark_node
change, I suspect you could hit the ICE without this change though.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* convert.cc (convert_to_pointer_1): Return error_mark_node
after an error.
(convert_to_real_1): Likewise.
(convert_to_integer_1): Likewise.
(convert_to_complex_1): Likewise.

gcc/objc/ChangeLog:

* objc-gnu-runtime-abi-01.cc (build_objc_method_call): Allow
for error_operand after call to build_c_cast.
* objc-next-runtime-abi-01.cc (build_objc_method_call): Likewise.
* objc-next-runtime-abi-02.cc (build_v2_build_objc_method_call): Likewise.

convert_to_complex vs invalid_conversion [PR111903]

convert_to_complex when creating a COMPLEX_EXPR does
not currently check if either the real or imag parts
was not error_mark_node. This later on confuses the gimpilfier
when there was a SAVE_EXPR wrapped around that COMPLEX_EXPR.
The simple fix is after calling convert inside convert_to_complex_1,
check that the either result was an error_operand and return
an error_mark_node in that case.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/111903

gcc/ChangeLog:

* convert.cc (convert_to_complex_1): Return
error_mark_node if either convert was an error
when converting from a scalar.

gcc/testsuite/ChangeLog:

* gcc.target/i386/float16-8.c: New test.

tree-optimization/111917 - bougs IL after guard hoisting

The unswitching code to hoist guards inserts conditions in wrong
places. The following fixes this, simplifying code.

PR tree-optimization/111917
* tree-ssa-loop-unswitch.cc (hoist_guard): Always insert
new conditional after last stmt.

* gcc.dg/torture/pr111917.c: New testcase.

RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927]

ICE:

during RTL pass: vsetvl
<source>: In function 'riscv_lms_f32':
<source>:240:1: internal compiler error: in merge, at config/riscv/riscv-vsetvl.cc:1997
  240 | }

In general compatible_p (avl_equal_p) has:

    if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ())
      return false;

Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructions.

It is reasonable to add it into 'can_use_next_avl_p' since we don't want to
fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL.
And after the fusion, we will alway use compatible_p to check whether the demand
is correct or not.

PR target/111927

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111927.c: New test.

RISC-V: Remove unnecessary asm check for vec cvt

The vsetvl asm check is unnecessary for the vector convert. We
should be focus for constrait and leave the vsetvl test to the
vsetvl pass.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: Remove the vsetvl
asm check from func body.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

libatomic: drop redundant all-multi command

./multilib.am already specifies this same command, and make warns about
the earlier one being ignored when seeing the later one. All that needs
retaining to still satisfy the preceding comment is the extra
dependency.

libatomic/

* Makefile.am (all-multi): Drop commands.
* Makefile.in: Update accordingly.

RISC-V: Bugfix for merging undef tmp register for trunc

For trunc function autovec, there will be one step like below take MU
for the merge operand.

rtx tmp = gen_reg_rtx (vec_int_mode);
emit_vec_cvt_x_f_rtz (tmp, op_1, mask, vec_fp_mode);

The MU will leave the tmp (aka dest register) register unmasked elements
unchanged and it is undefined here. This patch would like to adjust the
MU to MA.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): Add insn type
arg.
(expand_vec_trunc): Take MA instead of MU for cvt_x_f_rtz.

Signed-off-by: Pan Li <pan2.li@intel.com>

LoongArch: Document -mexplicit-relocs={auto,none,always}

gcc/ChangeLog:

* doc/invoke.texi (-mexplicit-relocs=style): Document.
(-mexplicit-relocs): Document as an alias of
-mexplicit-relocs=always.
(-mno-explicit-relocs): Document as an alias of
-mexplicit-relocs=none.
(-mcmodel=extreme): Mention -mexplicit-relocs=always instead of
-mexplicit-relocs.

LoongArch: Use explicit relocs for addresses only used for one load or store with -mexplicit-relocs=auto and -mcmodel={normal,medium}

In these cases, if we use explicit relocs, we end up with 2
instructions:

    pcalau12i    t0, %pc_hi20(x)
    ld.d         t0, t0, %pc_lo12(x)

If we use la.local pseudo-op, in the best scenario (x is in +/- 2MiB
range) we still have 2 instructions:

    pcaddi       t0, %pcrel_20(x)
    ld.d         t0, t0, 0

If x is out of the range we'll have 3 instructions.  So for these cases
just emit machine instructions with explicit relocs.

gcc/ChangeLog:

* config/loongarch/predicates.md (symbolic_pcrel_operand): New
predicate.
* config/loongarch/loongarch.md (define_peephole2): Optimize
la.local + ld/st to pcalau12i + ld/st if the address is only used
once if -mexplicit-relocs=auto and -mcmodel=normal or medium.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-single-load-store.c:
New test.
* gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c:
New test.

LoongArch: Use explicit relocs for TLS access with -mexplicit-relocs=auto

The linker does not know how to relax TLS access for LoongArch, so let's
emit machine instructions with explicit relocs for TLS.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
Return true for TLS symbol types if -mexplicit-relocs=auto.
(loongarch_call_tls_get_addr): Replace TARGET_EXPLICIT_RELOCS
with la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE.
(loongarch_legitimize_tls_address): Likewise.
* config/loongarch/loongarch.md (@tls_low<mode>): Remove
TARGET_EXPLICIT_RELOCS from insn condition.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: New
test.
* gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c: New
test.

LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin

If we are performing LTO for a final link and linker plugin is enabled,
then we are sure any GOT access may resolve to a symbol out of the link
unit (otherwise the linker plugin will tell us the symbol should be
resolved locally and we'll use PC-relative access instead).

Produce machine instructions with explicit relocs instead of la.global
for better scheduling.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_explicit_relocs_p): Declare new function.
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
Implement.
(loongarch_symbol_insns): Call loongarch_explicit_relocs_p for
SYMBOL_GOT_DISP, instead of using TARGET_EXPLICIT_RELOCS.
(loongarch_split_symbol): Call loongarch_explicit_relocs_p for
deciding if return early, instead of using
TARGET_EXPLICIT_RELOCS.
(loongarch_output_move): CAll loongarch_explicit_relocs_p
instead of using TARGET_EXPLICIT_RELOCS.
* config/loongarch/loongarch.md (*low<mode>): Remove
TARGET_EXPLICIT_RELOCS from insn condition.
(@ld_from_got<mode>): Likewise.
* config/loongarch/predicates.md (move_operand): Call
loongarch_explicit_relocs_p instead of using
TARGET_EXPLICIT_RELOCS.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-lto.c: New test.

LoongArch: Add enum-style -mexplicit-relocs= option

To take a better balance between scheduling and relaxation when -flto is
enabled, add three-way -mexplicit-relocs={auto,none,always} options.
The old -mexplicit-relocs and -mno-explicit-relocs options are still
supported, they are mapped to -mexplicit-relocs=always and
-mexplicit-relocs=none.

The default choice is determined by probing assembler capabilities at
build time.  If the assembler does not supports explicit relocs at all,
the default will be none; if it supports explicit relocs but not
relaxation, the default will be always; if both explicit relocs and
relaxation are supported, the default will be auto.

Currently auto is same as none.  We will make auto more clever in
following changes.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Add strings for
-mexplicit-relocs={auto,none,always}.
* config/loongarch/genopts/loongarch.opt.in: Add options for
-mexplicit-relocs={auto,none,always}.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-def.h
(EXPLICIT_RELOCS_AUTO): Define.
(EXPLICIT_RELOCS_NONE): Define.
(EXPLICIT_RELOCS_ALWAYS): Define.
(N_EXPLICIT_RELOCS_TYPES): Define.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Error out if the old-style
-m[no-]explicit-relocs option is used with
-mexplicit-relocs={auto,none,always} together.  Map
-mno-explicit-relocs to -mexplicit-relocs=none and
-mexplicit-relocs to -mexplicit-relocs=always for backward
compatibility.  Set a proper default for -mexplicit-relocs=
based on configure-time probed linker capability.  Update a
diagnostic message to mention -mexplicit-relocs=always instead
of the old-style -mexplicit-relocs.
(loongarch_handle_model_attribute): Update a diagnostic message
to mention -mexplicit-relocs=always instead of the old-style
-mexplicit-relocs.
* config/loongarch/loongarch.h (TARGET_EXPLICIT_RELOCS): Define.

RISC-V: Fix typo[VSETVL PASS]

When fixing an issue, I find there is a typo in VSETVL PASS.

Change 'use_by' into 'used_by'.

Committed it as it is very obvious.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): Fix typo.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.

gcc.c-torture/execute/builtins/pr93262-chk.c: Remove return statement

The main_test function returns void, so return with an expression
is a constraint violation. The test case still fails with this
change applied before the fix for PR 93262 in r14-4813.

gcc/testsuite/

* gcc.c-torture/execute/builtins/pr93262-chk.c (main_test):
Remove unnecessary return statement.

RISC-V: Remove unnecessary asm check for binop constraint

The vsetvl asm check is unnecessary for the binop constraint. We
should be focus for constrait and leave the vsetvl test to the
vsetvl pass.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vv_constraint-1.c: Remove the
vsetvl asm check from func body.
* gcc.target/riscv/rvv/base/binop_vx_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-10.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-11.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-12.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-129.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-13.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-130.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-131.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-133.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-134.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-135.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-14.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-15.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-153.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-154.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-155.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-158.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-16.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-17.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-172.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-174.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-18.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-19.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-2.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-20.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-21.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-22.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-23.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-24.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-25.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-26.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-27.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-28.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-29.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-3.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-30.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-31.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-32.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-33.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-34.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-35.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-36.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-37.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-38.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-39.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-4.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-40.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-41.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-42.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-43.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-44.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-5.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-6.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-7.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-8.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-9.c: Ditto.
* gcc.target/riscv/rvv/base/shift_vx_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vv_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vv_constraint-2.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vv_constraint-3.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vv_constraint-4.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vv_constraint-5.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vv_constraint-6.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vx_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vx_constraint-8.c: Ditto.
* gcc.target/riscv/rvv/base/ternop_vx_constraint-9.c: Ditto.
* gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/unop_v_constraint-2.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove unnecessary asm check for rounding autovec

The vsetvl asm check is unnecessary for the rounding function autovec.
These rounding test cases should focus on the rounding insn sequence.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: Remove the
vsetvl check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-floor-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-floor-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-floor-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-floor-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-iceil-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ifloor-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-irint-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-iround-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-llfloor-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-llround-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-rint-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-rint-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-rint-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-rint-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-round-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-round-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-round-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-round-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Fix AVL_TYPE attribute of tuple mode mov<mode>

The tuple mode mov<mode> pattern doesn't have avl_type so it is invalid attribute.

gcc/ChangeLog:

* config/riscv/vector.md: Fix avl_type attribute of tuple mov<mode>.

vect: Cost adjacent vector loads/stores together [PR111784]

As comments[1][2], this patch is to change the costing way
on some adjacent vector loads/stores from costing one by
one to costing them together with the total number once.

It helps to fix the exposed regression PR111784 on aarch64,
as aarch64 specific costing could make different decisions
according to the different costing ways (counting with total
number vs. counting one by one). Based on a reduced test
case from PR111784, only considering vec_num can fix the
regression already, but vector loads/stores in regard to
ncopies are also adjacent accesses, so they are considered
as well.

btw, this patch leaves the costing on dr_explicit_realign
and dr_explicit_realign_optimized alone to make it simple.
The costing way change can cause the differences for them
since there is one costing depending on targetm.vectorize.
builtin_mask_for_load and it's costed according to the
calling times. IIUC, these two dr_alignment_support are
mainly used for old Power? (only having 16 bytes aligned
vector load/store but no unaligned vector load/store).

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630742.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630744.html

PR tree-optimization/111784

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Adjust costing way for
adjacent vector stores, by costing them with the total number
rather than costing them one by one.
(vectorizable_load): Adjust costing way for adjacent vector
loads, by costing them with the total number rather than costing
them one by one.

i386: Prevent splitting to xmm16+ when !TARGET_AVX512VL

Currently, there will be a chance in split to use x/ymm16+ w/o AVX512VL,
which finally leads to an ICE as pr111753 does.

This patch aims to fix that.

gcc/ChangeLog:

PR target/111753
* config/i386/i386.cc (ix86_standard_x87sse_constant_load_p):
Do not split to xmm16+ when !TARGET_AVX512VL.

gcc/testsuite/ChangeLog:

PR target/111753
* gcc.target/i386/pr111753.c: New test.

RISC-V: Bugfix for merging undefined tmp register in math

For math function autovec, there will be one step like

rtx tmp = gen_reg_rtx (vec_int_mode);
emit_vec_cvt_x_f (tmp, op_1, mask, UNARY_OP_TAMU_FRM_DYN, vec_fp_mode);

The MU will leave the tmp (aka dest register) register unmasked elements
unchanged and it is undefined here. This patch would like to adjust the
MU to MA.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): Add new type
values.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f): Add undef merge
operand handling.
(expand_vec_ceil): Take MA instead of MU for tmp register.
(expand_vec_floor): Ditto.
(expand_vec_nearbyint): Ditto.
(expand_vec_rint): Ditto.
(expand_vec_round): Ditto.
(expand_vec_roundeven): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

compiler: remove traverse_assignments pass

The last caller was removed in https://go.dev/cl/18261 in 2016.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536637

compiler: remove name_ field from Type_switch_statement

It's not used for anything.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536636

compiler: pass Gogo to determine types pass

Also pass Gogo to the type verification pass.

This is a refactoring that does not change the compiler behavior.
This is in preparation for future CLs that rearrange the pass ordering.

This introduces one new call to go_get_gogo, which will be removed in
a future CL.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536635

LoongArch: Define macro CLEAR_INSN_CACHE.

LoongArch's microstructure ensures cache consistency by hardware.
Due to out-of-order execution, "ibar" is required to ensure the visibility of the
store (invalidated icache) executed by this CPU before "ibar" (to the instance).
"ibar" will not invalidate the icache, so the start and end parameters are not Affect
"ibar" performance.

gcc/ChangeLog:

* config/loongarch/loongarch.h (CLEAR_INSN_CACHE): New definition.

Expand: Enable vector mode for by pieces compares

Vector mode compare instructions are efficient for equality compare on
rs6000. This patch refactors the codes of by pieces operation to enable
vector mode for compare.

gcc/
PR target/111449
* expr.cc (can_use_qi_vectors): New function to return true if
we know how to implement OP using vectors of bytes.
(qi_vector_mode_supported_p): New function to check if optabs
exists for the mode and certain by pieces operations.
(widest_fixed_size_mode_for_size): Replace the second argument
with the type of by pieces operations.  Call can_use_qi_vectors
and qi_vector_mode_supported_p to do the check.  Call
scalar_mode_supported_p to check if the scalar mode is supported.
(by_pieces_ninsns): Pass the type of by pieces operation to
widest_fixed_size_mode_for_size.
(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
record the type of by pieces operations.
(op_by_pieces_d::op_by_pieces_d): Change last argument to the
type of by pieces operations, initialize m_op with it.  Pass
m_op to function widest_fixed_size_mode_for_size.
(op_by_pieces_d::get_usable_mode): Pass m_op to function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
can_use_qi_vectors and qi_vector_mode_supported_p to do the
check.
(op_by_pieces_d::run): Pass m_op to function
widest_fixed_size_mode_for_size.
(move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES.
(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
(can_store_by_pieces): Pass the type of by pieces operations to
widest_fixed_size_mode_for_size.
(clear_by_pieces): Initialize class store_by_pieces_d with
CLEAR_BY_PIECES.
(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
COMPARE_BY_PIECES.

Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big.

There's loop in vect_peel_nonlinear_iv_init to get init_expr *
pow (step_expr, skip_niters). When skipn_iters is too big, compile time
hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to
init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is
pow of 2, otherwise give up vectorization when skip_niters >=
TYPE_PRECISION (TREE_TYPE (init_expr)).

Also give up vectorization when niters_skip is negative which will be
used for fully masked loop.

gcc/ChangeLog:

PR tree-optimization/111820
PR tree-optimization/111833
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give
up vectorization for nonlinear iv vect_step_op_mul when
step_expr is not exact_log2 and niters is greater than
TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize
for nagative niters_skip which will be used by fully masked
loop.
(vect_can_advance_ivs_p): Pass whole phi_info to
vect_can_peel_nonlinear_iv_p.
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize
init_expr * pow (step_expr, skipn) to init_expr
<< (log2 (step_expr) * skipn) when step_expr is exact_log2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111820-1.c: New test.
* gcc.target/i386/pr111820-2.c: New test.
* gcc.target/i386/pr111820-3.c: New test.
* gcc.target/i386/pr103144-mul-1.c: Adjust testcase.
* gcc.target/i386/pr103144-mul-2.c: Adjust testcase.

Remove unused mmx_pinsrw.

gcc/ChangeLog:

* config/i386/mmx.md (mmx_pinsrw): Remove.

Daily bump.

aarch64: Emit csinv again for `a ? ~b : b` [PR110986]

After r14-3110-g7fb65f10285, the canonical form for
`a ? ~b : b` changed to be `-(a) ^ b` that means
for aarch64 we need to add a few new insn patterns
to be able to catch this and change it to be
what is the canonical form for the aarch64 backend.
A secondary pattern was needed to support a zero_extended
form too; this adds a testcase for all 3 cases.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/110986

gcc/ChangeLog:

* config/aarch64/aarch64.md (*cmov<mode>_insn_insv): New pattern.
(*cmov_uxtw_insn_insv): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cond_op-1.c: New test.

d: Merge upstream dmd f4be7f6f7b.

D front-end changes:

    - Fix bootstrap failure with i686-darwin9.
      ```
      Undefined symbols for architecture i386:
          "gendocfile", referenced from:
          __ZL20d_generate_ddoc_fileP6ModuleR9OutBuffer in d-lang.o
      ld: symbol(s) not found for architecture i386
      ```
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd f4be7f6f7b.
* Make-lang.in (D_FRONTEND_OBJS): Rename d/root-rootobject.o to
d/rootobject.o.

objc++: type/expr tsubst conflation [PR111920]

After r14-4796-g3e3d73ed5e85e7, tsubst_copy_and_build (now named
tsubst_expr) no longer dispatches to tsubst for type trees, and
callers have to do it themselves if appropriate. This patch makes
some overlooked adjustments to Objective-C++-specific code paths.

PR objc++/111920

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) <case AT_ENCODE_EXPR>: Use tsubst instead
of tsubst_expr.

gcc/objcp/ChangeLog:

* objcp-lang.cc (objcp_tsubst_expr) <case CLASS_REFERENCE_EXPR>:
Use tsubst instead of tsubst_expr for type operands.

Doc: document the new Darwin options

gcc/ChangeLog:

* doc/invoke.texi: Document the new -nodefaultrpaths option.
* doc/install.texi: Document the new --with-darwin-extra-rpath
option.

Testsuite: allow non-installed testing on darwin

DYLD_LIBRARY_PATH is now removed from the environment for all system
tools, including the shell. Adapt the testsuite and pass the right
options to allow testing, even when the compiler and libraries have not
been installed.

gcc/ChangeLog:

* Makefile.in: set ENABLE_DARWIN_AT_RPATH in site.tmp.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/caf.exp: Correctly set
libatomic flags.
* gfortran.dg/dg.exp: Likewise.
* lib/asan-dg.exp: Set correct -B flags.
* lib/atomic-dg.exp: Likewise.
* lib/target-libpath.exp: Handle ENABLE_DARWIN_AT_RPATH.

libatomic/ChangeLog:

* testsuite/lib/libatomic.exp: Pass correct flags on darwin.

libffi/ChangeLog:

* testsuite/lib/libffi.exp: Likewise.

libitm/ChangeLog:

* testsuite/lib/libitm.exp: Likewise.
* testsuite/libitm.c++/c++.exp: Likewise.

Darwin, rpaths: Add --with-darwin-extra-rpath.

This is provided to allow distributions to add a single additional
runpath to allow for cases where the installed GCC library directories
are then symlinked to a common dirctory outside of any of the GCC
installations.

For example:

/opt/distro/lib:
libgfortran.dylib -> /opt/distro/lib/gcc-11.3/lib/libgfortran.dylib

So that libraries which are designed to be found in the runpath we would then
add --with-darwin-add-rpath=/opt/distro/lib to the configure line.

This patch makes the configuration a little more forgiving of using
--disable-darwin-at-rpath (although for platform versions >= 10.11 this will
result in misconfigured target libraries).

gcc/ChangeLog:

* configure.ac: Add --with-darwin-extra-rpath option.
* config/darwin.h: Handle DARWIN_EXTRA_RPATH.
* config.in: Regenerate.
* configure: Regenerate.

Config,Darwin: Allow for configuring Darwin to use embedded runpath.

Recent Darwin versions place contraints on the use of run paths
specified in environment variables.  This breaks some assumptions
in the GCC build.

This change allows the user to configure a Darwin build to use
'@rpath/libraryname.dylib' in library names and then to add an
embedded runpath to executables (and libraries with dependents).

The embedded runpath is added by default unless the user adds
'-nodefaultrpaths' to the link line.

For an installed compiler, it means that any executable built with
that compiler will reference the runtimes installed with the
compiler (equivalent to hard-coding the library path into the name
of the library).

During build-time configurations  any "-B" entries will be added to
the runpath thus the newly-built libraries will be found by exes.

Since the install name is set in libtool, that decision needs to be
available here (but might also cause dependent ones in Makefiles,
so we need to export a conditional).

This facility is not available for Darwin 8 or earlier, however the
existing environment variable runpath does work there.

We default this on for systems where the external DYLD_LIBRARY_PATH
does not work and off for Darwin 8 or earlier.  For systems that can
use either method, if the value is unset, we use the default (which
is currently DYLD_LIBRARY_PATH).

ChangeLog:

* configure: Regenerate.
* configure.ac: Do not add default runpaths to GCC exes
when we are building -static-libstdc++/-static-libgcc (the
default).
* libtool.m4: Add 'enable-darwin-at-runpath'.  Act  on the
enable flag to alter Darwin libraries to use @rpath names.

gcc/ChangeLog:

* aclocal.m4: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* config/darwin.h: Handle Darwin rpaths.
* config/darwin.opt: Handle Darwin rpaths.
* Makefile.in:  Handle Darwin rpaths.

gcc/ada/ChangeLog:

* gcc-interface/Makefile.in: Handle Darwin rpaths.

gcc/jit/ChangeLog:
* Make-lang.in: Handle Darwin rpaths.

libatomic/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

libbacktrace/ChangeLog:

* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

libcc1/ChangeLog:

* configure: Regenerate.

libffi/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.

libgcc/ChangeLog:

* config/t-slibgcc-darwin: Generate libgcc_s
with an @rpath name.
* config.host: Handle Darwin rpaths.

libgfortran/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths

libgm2/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* libm2cor/Makefile.am: Handle Darwin rpaths.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.am: Handle Darwin rpaths.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.am: Handle Darwin rpaths.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.am: Handle Darwin rpaths.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.am: Handle Darwin rpaths.
* libm2pim/Makefile.in: Regenerate.

libgomp/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths

libitm/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

libobjc/ChangeLog:

* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

libphobos/ChangeLog:

* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* libdruntime/Makefile.am: Handle Darwin rpaths.
* libdruntime/Makefile.in: Regenerate.
* src/Makefile.am: Handle Darwin rpaths.
* src/Makefile.in: Regenerate.

libquadmath/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

libsanitizer/ChangeLog:

* asan/Makefile.am: Handle Darwin rpaths.
* asan/Makefile.in: Regenerate.
* configure: Regenerate.
* hwasan/Makefile.am: Handle Darwin rpaths.
* hwasan/Makefile.in: Regenerate.
* lsan/Makefile.am: Handle Darwin rpaths.
* lsan/Makefile.in: Regenerate.
* tsan/Makefile.am: Handle Darwin rpaths.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.am: Handle Darwin rpaths.
* ubsan/Makefile.in: Regenerate.

libssp/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

libstdc++-v3/ChangeLog:

* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* src/Makefile.am: Handle Darwin rpaths.
* src/Makefile.in: Regenerate.

libvtv/ChangeLog:

* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

lto-plugin/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

zlib/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

Driver: Provide a spec to insert rpaths for compiler lib dirs.

This provides a spec to insert "-rpath DDD" for each DDD corresponding
to a compiler startfile directory. This allows a target to use @rpath
as the install path for libraries, and have the compiler provide the
necessary rpath to handle this.

Embed real paths, not relative ones.

We embed a runpath for every path in which libraries might be found. This
change ensures that we embed the actual real path and not a relative one from
the compiler's version-specific directory.

e.g.
/opt/distro/gcc-11-3Dr0/lib

instead of:
/opt/distro/gcc-11-3Dr0/lib/gcc/x86_64-apple-darwin19/11.3.0/../../..

This ensures that if we install, for example, 11.4.0 (and delete the 11.3.0
installation) exes built by 11.3 would continue to function (providing, of course
that 11.4 does not bump any SO names).

gcc/ChangeLog:
* gcc.cc (RUNPATH_OPTION): New.
(do_spec_1): Provide '%P' as a spec to insert rpaths for
each compiler startfile path.

libgcc: support heap-based trampolines

Add support for heap-based trampolines on x86_64-linux, aarch64-linux,
and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted functions for these targets.

Co-Authored-By: Maxim Blinov <maxim.blinov@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>
Co-Authored-By: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
libgcc/ChangeLog:

* libgcc2.h (__builtin_nested_func_ptr_created): Declare.
(__builtin_nested_func_ptr_deleted): Declare.
* libgcc-std.ver.in: Add the new symbols.
* config/aarch64/heap-trampoline.c: Implement heap-based
trampolines for aarch64.
* config/aarch64/t-heap-trampoline: Add rule to build
config/aarch64/heap-trampoline.c
* config/i386/heap-trampoline.c: Implement heap-based
trampolines for x86_64.
* config/i386/t-heap-trampoline: Add rule to build
config/i386/heap-trampoline.cc
* config.host: Handle --enable-heap-trampolines on
x86_64-*-linux*, aarch64-*-linux*, x86_64-*-darwin*.

target: Support heap-based trampolines

Enable -ftrampoline-impl=heap by default if we are on macOS 11
or later.

Co-Authored-By: Maxim Blinov <maxim.blinov@embecosm.com>
Co-Authored-By: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* config.gcc: Default to heap trampolines on macOS 11 and above.
* config/i386/darwin.h: Define X86_CUSTOM_FUNCTION_TEST.
* config/i386/i386.h: Define X86_CUSTOM_FUNCTION_TEST.
* config/i386/i386.cc: Use X86_CUSTOM_FUNCTION_TEST.

core: Support heap-based trampolines

Generate heap-based nested function trampolines

Add support for allocating nested function trampolines on an
executable heap rather than on the stack. This is motivated by targets
such as AArch64 Darwin, which globally prohibit executing code on the
stack.

The target-specific routines for allocating and writing trampolines are
to be provided in libgcc.

The gcc flag -ftrampoline-impl controls whether to generate code
that instantiates trampolines on the stack, or to emit calls to
__builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted. Note that this flag is completely
independent of libgcc: If libgcc is for any reason missing those
symbols, you will get a link failure.

This implementation imposes some implicit restrictions as compared to
stack trampolines. longjmp'ing back to a state before a trampoline was
created will cause us to skip over the corresponding
__builtin_nested_func_ptr_deleted, which will leak trampolines
starting from the beginning of the linked list of allocated
trampolines. There may be scope for instrumenting longjmp/setjmp to
trigger cleanups of trampolines.

Co-Authored-By: Maxim Blinov <maxim.blinov@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>
Co-Authored-By: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
gcc/ChangeLog:

* builtins.def (BUILT_IN_NESTED_PTR_CREATED): Define.
(BUILT_IN_NESTED_PTR_DELETED): Ditto.
* common.opt (ftrampoline-impl): Add option to control
generation of trampoline instantiation (heap or stack).
* coretypes.h: Define enum trampoline_impl.
* tree-nested.cc (convert_tramp_reference_op): Don't bother calling
__builtin_adjust_trampoline for heap trampolines.
(finalize_nesting_tree_1): Emit calls to
__builtin_nested_...{created,deleted} if we're generating with
-ftrampoline-impl=heap.
* tree.cc (build_common_builtin_nodes): Build
__builtin_nested_...{created,deleted}.
* doc/invoke.texi (-ftrampoline-impl): Document.

RISC-V: Prohibit combination of 'E' and 'H'

According to the ratified privileged specification (version 20211203),
it says:

> The hypervisor extension depends on an "I" base integer ISA with 32 x
> registers (RV32I or RV64I), not RV32E, which has only 16 x registers.

Also in the latest draft, it also prohibits RV64E with the 'H' extension.
This commit prohibits the combination of 'E' and 'H' extensions.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_subset_list::parse):
Prohibit 'E' and 'H' combinations.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-26.c: New test.

RISC-V: 'Zfa' extension is now ratified

Since this extension is ratified, it now has the version number 1.0.

Reference:
<https://github.com/riscv/riscv-isa-manual/pull/1096>

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Change version number of the 'Zfa' extension to 1.0.

Daily bump.

libstdc++: Split std::basic_string::_M_use_local_data into two functions

This splits out the activate-the-union-member-for-constexpr logic from
_M_use_local_data, so that it can be used separately in cases that don't
need to use std::pointer_traits<pointer>::pointer_to to obtain the
return value.

This leaves only three uses of _M_use_local_data() which are all of the
same form:

  __s._M_data(_M_use_local_data());
  __s._M_set_length(0);

We could remove _M_use_local_data() and change those three places to use
a new _M_reset() function that does:

  _M_init_local_buf();
  _M_data(_M_local_data());
  _M_set_length(0);

This is left for a future change.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (_M_init_local_buf()): New
function.
(_M_use_local_data()): Use _M_init_local_buf.
(basic_string(), basic_string(const Alloc&))
(basic_string(basic_string&&))
(basic_string(basic_string&&, const Alloc&)): Use
_M_init_local_buf instead of _M_use_local_data().
* include/bits/basic_string.tcc (swap(basic_string&))
(_M_construct(InIter, InIter, input_iterator_tag))
(_M_construct(InIter, InIter, forward_iterator_tag))
(_M_construct(size_type, CharT), reserve()): Likewise.

libstdc++: Workaround for LLVM-61763 in <ranges>

This patch adds a small workaround that avoids declaring constrained
friends when compiling with Clang, instead making some members public.
MSVC's standard library has implemented a similar workaround.

Signed-off-by: Benjamin Brock <brock@cs.berkeley.edu>
libstdc++-v3/ChangeLog:

* include/std/ranges (zip_view, adjacent_view): Implement
workaround for LLVM-61763.

libstdc++: testsuite: Enhance codecvt_unicode with tests for length()

We can test codecvt::length() with the same data that we test
codecvt::in(). For each call of in() we add another call to length().
Some additional small cosmentic changes are applied.

libstdc++-v3/ChangeLog:

* testsuite/22_locale/codecvt/codecvt_unicode.h: Test length()

libstdc++: Fix formatting of filesystem directory iterators

Fix indentation.

libstdc++-v3/ChangeLog:

* include/bits/fs_dir.h (operator==(default_sentinel_t)): Fix
indentation.

C99 testsuite readiness: Compile more tests with -std=gnu89

gcc/testsuite/

* gcc.c-torture/compile/20000403-1.c: Compile with -std=gnu89.
* gcc.c-torture/compile/20000511-1.c: Likewise.
* gcc.c-torture/compile/20000804-1.c: Likewise.
* gcc.c-torture/compile/20020418-1.c: Likewise.
* gcc.c-torture/compile/20020927-1.c: Likewise.
* gcc.c-torture/compile/20030109-1.c: Likewise.
* gcc.c-torture/compile/20030224-1.c: Likewise.
* gcc.c-torture/compile/20030415-1.c: Likewise.
* gcc.c-torture/compile/20030612-1.c: Likewise.
* gcc.c-torture/compile/20030917-1.c: Likewise.
* gcc.c-torture/compile/20031113-1.c: Likewise.
* gcc.c-torture/compile/20031220-2.c: Likewise.
* gcc.c-torture/compile/20040309-1.c: Likewise.
* gcc.c-torture/compile/20040310-1.c: Likewise.
* gcc.c-torture/compile/20040317-3.c: Likewise.
* gcc.c-torture/compile/20040817-1.c: Likewise.
* gcc.c-torture/compile/20091215-1.c: Likewise.
* gcc.c-torture/compile/86.c: Likewise.
* gcc.c-torture/compile/900216-1.c: Likewise.
* gcc.c-torture/compile/900313-1.c: Likewise.
* gcc.c-torture/compile/900407-1.c: Likewise.
* gcc.c-torture/compile/900516-1.c: Likewise.
* gcc.c-torture/compile/920409-2.c: Likewise.
* gcc.c-torture/compile/920415-1.c: Likewise.
* gcc.c-torture/compile/920428-1.c: Likewise.
* gcc.c-torture/compile/920428-5.c: Likewise.
* gcc.c-torture/compile/920428-7.c: Likewise.
* gcc.c-torture/compile/920501-1.c: Likewise.
* gcc.c-torture/compile/920501-13.c: Likewise.
* gcc.c-torture/compile/920501-15.c: Likewise.
* gcc.c-torture/compile/920501-16.c: Likewise.
* gcc.c-torture/compile/920501-18.c: Likewise.
* gcc.c-torture/compile/920501-20.c: Likewise.
* gcc.c-torture/compile/920501-6.c: Likewise.
* gcc.c-torture/compile/920501-7.c: Likewise.
* gcc.c-torture/compile/920502-1.c: Likewise.
* gcc.c-torture/compile/920502-2.c: Likewise.
* gcc.c-torture/compile/920520-1.c: Likewise.
* gcc.c-torture/compile/920521-1.c: Likewise.
* gcc.c-torture/compile/920608-1.c: Likewise.
* gcc.c-torture/compile/920617-1.c: Likewise.
* gcc.c-torture/compile/920617-2.c: Likewise.
* gcc.c-torture/compile/920625-1.c: Likewise.
* gcc.c-torture/compile/920625-2.c: Likewise.
* gcc.c-torture/compile/920626-1.c: Likewise.
* gcc.c-torture/compile/920706-1.c: Likewise.
* gcc.c-torture/compile/920710-2.c: Likewise.
* gcc.c-torture/compile/920723-1.c: Likewise.
* gcc.c-torture/compile/920808-1.c: Likewise.
* gcc.c-torture/compile/920809-1.c: Likewise.
* gcc.c-torture/compile/920817-1.c: Likewise.
* gcc.c-torture/compile/920831-1.c: Likewise.
* gcc.c-torture/compile/920917-1.c: Likewise.
* gcc.c-torture/compile/920928-2.c: Likewise.
* gcc.c-torture/compile/920928-5.c: Likewise.
* gcc.c-torture/compile/921012-1.c: Likewise.
* gcc.c-torture/compile/921021-1.c: Likewise.
* gcc.c-torture/compile/921024-1.c: Likewise.
* gcc.c-torture/compile/921103-1.c: Likewise.
* gcc.c-torture/compile/921109-1.c: Likewise.
* gcc.c-torture/compile/921111-1.c: Likewise.
* gcc.c-torture/compile/921116-2.c: Likewise.
* gcc.c-torture/compile/921118-1.c: Likewise.
* gcc.c-torture/compile/921202-1.c: Likewise.
* gcc.c-torture/compile/921202-2.c: Likewise.
* gcc.c-torture/compile/921203-1.c: Likewise.
* gcc.c-torture/compile/921203-2.c: Likewise.
* gcc.c-torture/compile/921206-1.c: Likewise.
* gcc.c-torture/compile/930109-1.c: Likewise.
* gcc.c-torture/compile/930111-1.c: Likewise.
* gcc.c-torture/compile/930117-1.c: Likewise.
* gcc.c-torture/compile/930118-1.c: Likewise.
* gcc.c-torture/compile/930120-1.c: Likewise.
* gcc.c-torture/compile/930217-1.c: Likewise.
* gcc.c-torture/compile/930325-1.c: Likewise.
* gcc.c-torture/compile/930411-1.c: Likewise.
* gcc.c-torture/compile/930427-2.c: Likewise.
* gcc.c-torture/compile/930503-2.c: Likewise.
* gcc.c-torture/compile/930506-2.c: Likewise.
* gcc.c-torture/compile/930513-2.c: Likewise.
* gcc.c-torture/compile/930530-1.c: Likewise.
* gcc.c-torture/compile/930602-1.c: Likewise.
* gcc.c-torture/compile/930618-1.c: Likewise.
* gcc.c-torture/compile/930623-1.c: Likewise.
* gcc.c-torture/compile/931003-1.c: Likewise.
* gcc.c-torture/compile/931013-1.c: Likewise.
* gcc.c-torture/compile/931013-2.c: Likewise.
* gcc.c-torture/compile/931102-2.c: Likewise.
* gcc.c-torture/compile/931203-1.c: Likewise.
* gcc.c-torture/compile/940718-1.c: Likewise.
* gcc.c-torture/compile/941014-1.c: Likewise.
* gcc.c-torture/compile/941014-2.c: Likewise.
* gcc.c-torture/compile/941014-3.c: Likewise.
* gcc.c-torture/compile/941014-4.c: Likewise.
* gcc.c-torture/compile/941111-1.c: Likewise.
* gcc.c-torture/compile/941113-1.c: Likewise.
* gcc.c-torture/compile/950124-1.c: Likewise.
* gcc.c-torture/compile/950329-1.c: Likewise.
* gcc.c-torture/compile/950612-1.c: Likewise.
* gcc.c-torture/compile/950618-1.c: Likewise.
* gcc.c-torture/compile/950719-1.c: Likewise.
* gcc.c-torture/compile/950910-1.c: Likewise.
* gcc.c-torture/compile/950922-1.c: Likewise.
* gcc.c-torture/compile/951106-1.c: Likewise.
* gcc.c-torture/compile/951222-1.c: Likewise.
* gcc.c-torture/compile/960106-1.c: Likewise.
* gcc.c-torture/compile/960319-1.c: Likewise.
* gcc.c-torture/compile/960829-1.c: Likewise.
* gcc.c-torture/compile/970206-1.c: Likewise.
* gcc.c-torture/compile/980825-1.c: Likewise.
* gcc.c-torture/compile/990829-1.c: Likewise.
* gcc.c-torture/compile/991213-2.c: Likewise.

RISC-V: Support partial VLS mode when preference fixed-vlmax [PR111857]

Given we have code like below:

typedef char vnx16i __attribute__ ((vector_size (16)));

vnx16i __attribute__ ((noinline, noclone))
test (vnx16i x, vnx16i y)
{
  return __builtin_shufflevector (x, y, 11, 12, 13, 14, 11, 12, 13, 14,
                                        11, 12, 13, 14, 11, 12, 13, 14);
}

It can perform the auto vectorization when

-march=rv64gcv_zvl1024b --param=riscv-autovec-preference=fixed-vlmax

but cannot when

-march=rv64gcv_zvl2048b --param=riscv-autovec-preference=fixed-vlmax

The reason comes from the miniaml machine mode of QI is RVVMF8QI, which
is 1024 / 8 = 128 bits, aka the size of VNx16QI. When we set zvl2048b,
the bit size of RVVMFQI is 2048 / 8 = 256, which is not matching the
bit size of VNx16QI (128 bits).

Thus, this patch would like to enable the VLS mode for such case, aka
VNx16QI vls mode for zvl2048b.

Before this patch:
test:
  srli    a4,a1,40
  andi    a4,a4,0xff
  srli    a3,a1,32
  srli    a5,a1,48
  slli    a0,a4,8
  andi    a3,a3,0xff
  andi    a5,a5,0xff
  slli    a2,a5,16
  or      a0,a3,a0
  srli    a1,a1,56
  or      a0,a0,a2
  slli    a2,a1,24
  slli    a3,a3,32
  or      a0,a0,a2
  slli    a4,a4,40
  or      a0,a0,a3
  slli    a5,a5,48
  or      a0,a0,a4
  or      a0,a0,a5
  slli    a1,a1,56
  or      a0,a0,a1
  mv      a1,a0
  ret

After this patch:
test:
  vsetivli        zero,16,e8,mf8,ta,ma
  vle8.v  v2,0(a1)
  vsetivli        zero,4,e32,mf2,ta,ma
  vrgather.vi     v1,v2,3
  vsetivli        zero,16,e8,mf8,ta,ma
  vse8.v  v1,0(a0)
  ret

PR target/111857

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_VLS): Remove.
* config/riscv/riscv-protos.h (vls_mode_valid_p): New func decl.
* config/riscv/riscv-v.cc (autovectorize_vector_modes): Replace
macro reference to func.
(vls_mode_valid_p): New func impl for vls mode valid or not.
* config/riscv/riscv-vector-switch.def (VLS_ENTRY): Replace
macro reference to func.
* config/riscv/vector-iterators.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/vls/def.h: Add help define.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-6.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

This patch is the backend piece of a solution to PRs 101955 and 106245,
that adds a define_insn_and_split to the i386 backend, to perform sign
extension of a single (least significant) bit using and $1 then neg.

Previously, (x<<31)>>31 would be generated as

        sall    $31, %eax // 3 bytes
        sarl    $31, %eax // 3 bytes

with this patch the backend now generates:

        andl    $1, %eax // 3 bytes
        negl    %eax // 2 bytes

Not only is this smaller in size, but microbenchmarking confirms
that it's a performance win on both Intel and AMD; Intel sees only a
2% improvement (perhaps just a size effect), but AMD sees a 7% win.

2023-10-21  Roger Sayle  <roger@nextmovesoftware.com>
    Uros Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split.

gcc/testsuite/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* gcc.target/i386/pr106245-2.c: New test case.
* gcc.target/i386/pr106245-3.c: New 32-bit test case.
* gcc.target/i386/pr106245-4.c: New 64-bit test case.
* gcc.target/i386/pr106245-5.c: Likewise.

c++: abstract class and overload resolution

In my implementation of P0929 I treated a conversion to an rvalue of
abstract class type as a bad conversion, but that's still too soon to check
it; we need to wait until we're done with overload resolution.

gcc/cp/ChangeLog:

* call.cc (implicit_conversion_1): Rename...
(implicit_conversion): ...to this. Remove the old wrapper.

gcc/testsuite/ChangeLog:

* g++.dg/template/sfinae-dr657.C: Adjust.

testsuite: constexpr-diag1.C and implicit constexpr

This test doesn't break as expected with implicit constexpr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-diag1.C: Add -fno-implicit-constexpr.

c++: fix tourney logic

In r13-3766 I changed the logic at the end of tourney to avoid redundant
comparisons, but the change also meant skipping any less-good matches
between the champ_compared_to_predecessor candidate and champ itself.

This should not be a correctness issue, since we believe that joust is a
partial order. But it can lead to missed warnings, as in this testcase.

gcc/cp/ChangeLog:

* call.cc (tourney): Only skip champ_compared_to_predecessor.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wsign-promo1.C: New test.

c++: Constructor streaming [PR105322]

An expresion node's type is streamed after the expression's operands,
because the type can come from some aspect of an operand (for instance
decltype and noexcept). There's a comment in the code explaining that.

But that doesn't work for constructors, which can directly reference
components of their type (eg FIELD_DECLS). If this is a
type-introducing CONSTRUCTOR, we need to ensure the type has been
streamed first. So move CONSTRUCTOR stream to after the type streaming.

The reason things like COMPONENT_REF work is that they stream their
first operand first, and that introduces the type that their second
operand looks up a field in.

gcc/cp/
PR c++/105322
* module.cc (trees_out::core_vals): Stream CONSTRUCTOR operands
after the type.
(trees_in::core_vals): Likewise.
gcc/testsuite/
* g++.dg/modules/decltype-1_a.C: New.
* g++.dg/modules/decltype-1_b.C: New.
* g++.dg/modules/lambda-5_a.C: New.
* g++.dg/modules/lambda-5_b.C: New.

c: -Wint-conversion should cover pointer/integer mismatches in ?:

gcc/c/

PR c/109827
PR other/44209
* c-typeck.cc (build_conditional_expr): Use OPT_Wint_conversion
for pointer/integer mismatch warnings.

gcc/testsuite/

* gcc.dg/Wint-conversion-3.c: New.

c: -Wincompatible-pointer-types should cover mismatches in ?:

gcc/c/

PR c/109826
PR other/44209
* c-typeck.cc (build_conditional_expr): Use
OPT_Wincompatible_pointer_types for pointer mismatches.
Emit location information for the operand.

gcc/testsuite/

* gcc.dg/Wincompatible-pointer-types-2.c: New.
* gcc.dg/Wincompatible-pointer-types-3.c: New.
* gcc.dg/Wincompatible-pointer-types-4.c: New.

bootstrap: tm_p.h requires memmodel.h on SPARC.

SPARC target header references memmodel, which requires memmodel.h.

gcc/ChangeLog:

* gimple-harden-control-flow.cc: Include memmodel.h.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>

c-family: char8_t and aliasing in C vs C++ [PR111884]

In the PR, Joseph says that in C char8_t is not a distinct type.  So
we should behave as if it can alias anything, like ordinary char.
In C, unsigned_char_type_node == char8_type_node, so with this patch
we return 0 instead of -1.  And the following comment says:

  /* The C standard guarantees that any object may be accessed via an
     lvalue that has narrow character type (except char8_t).  */
  if (t == char_type_node
      || t == signed_char_type_node
      || t == unsigned_char_type_node)
    return 0;

Which appears to be wrong, so I'm adjusting that as well.

PR c/111884

gcc/c-family/ChangeLog:

* c-common.cc (c_common_get_alias_set): Return -1 for char8_t only
in C++.

gcc/testsuite/ChangeLog:

* c-c++-common/alias-1.c: New test.

bootstrap: Include tm_p.h

ASM_GENERATE_INTERNAL_LABEL may use target-specific functions, so
tm_p.h must be included. This fixes bootstrap breakage on AIX, at least.

gcc/ChangeLog

* gimple-harden-control-flow.cc: Include tm_p.h.

Signed-off-by: David Edelsohn <dje.gcc@gmail.com>

rust: build failure after NON_DEPENDENT_EXPR removal [PR111899]

This patch removes stray NON_DEPENDENT_EXPR checks following the removal
of this tree code from the C++ FE. (Since this restores the build I
supppose it means the Rust FE never creates NON_DEPENDENT_EXPR trees in
the first place, so no further analysis is needed.)

PR rust/111899

gcc/rust/ChangeLog:

* backend/rust-constexpr.cc (potential_constant_expression_1):
Remove NON_DEPENDENT_EXPR handling.
* backend/rust-tree.cc (mark_exp_read): Likewise.
(mark_use): Likewise.
(lvalue_kind): Likewise.

libstdc++: add casts to from_chars in <charconv> [PR111883]

This fixes

.../charconv: In function 'std::from_chars_result std::from_chars(const char*, const char*, _Float16&, chars_format)':
.../charconv:687:17: warning: converting to '_Float16' from 'float' with greater conversion rank
  687 |       __value = __val;
      |                 ^~~~~
.../charconv: In function 'std::from_chars_result std::from_chars(const char*, const char*, __gnu_cxx::__bfloat16_t&, chars_format)':
.../charconv:763:17: warning: converting to '__gnu_cxx::__bfloat16_t' {aka '__bf16'} from 'float' with greater conversion rank
  763 |       __value = __val;
      |                 ^~~~~

which was breaking a test:

FAIL: g++.dg/warn/Wstringop-overflow-6.C  -std=gnu++26 (test for excess errors)

PR testsuite/111883

libstdc++-v3/ChangeLog:

* include/std/charconv (from_chars): Add explicit casts.

ifcvt: Don't lower bitfields with non-constant offsets [PR 111882]

This patch stops lowering of bitfields by ifcvt when they have non-constant
offsets as we are not likely to be able to do anything useful with those during
vectorization. That also fixes the issue reported in PR 111882, which was
being caused by an offset with a side-effect being lowered, but constants have
no side-effects so we will no longer run into that problem.

gcc/ChangeLog:

PR tree-optimization/111882
* tree-if-conv.cc (get_bitfield_rep): Return NULL_TREE for bitfields
with non-constant offsets.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr111882.c: New test.

c++: rename tsubst_copy_and_build and tsubst_expr

After the previous patch, we now only have two tsubst entry points for
expression trees: tsubst_copy_and_build and tsubst_expr.  The former
despite its unwieldy name is the main entry point, and the latter is
just a superset of the former that also handles statement trees.  We
could merge them so that we just have tsubst_expr, but it seems natural
to distinguish statement trees from expression trees and to maintain a
separate entry point for them.

To that end, this this patch renames tsubst_copy_and_build to
tsubst_expr, and renames the current tsubst_expr to tsubst_stmt, which
continues to be a superset of the former (which is convenient since
sometimes expression trees appear in statement contexts, e.g. a branch
of an IF_STMT could be NOP_EXPR).  (Making tsubst_stmt disjoint from
tsubst_expr is left as future work if deemed desirable.)

This patch in turn renames suitable existing uses of tsubst_expr (that
expect to take statement trees) to use tsubst_stmt.  Thus untouched
tsubst_expr calls are implicitly strengthened to expect only expression
trees after this patch.  For the tsubst_omp_* routines I opted to rename
all existing uses to ensure no unintended functional change.  This patch
also moves the handling of CO_YIELD_EXPR and CO_AWAIT_EXPR from tsubst_stmt
to tsubst_expr since they're indeed expression trees.

gcc/cp/ChangeLog:

* cp-lang.cc (objcp_tsubst_copy_and_build): Rename to ...
(objcp_tsubst_expr): ... this.
* cp-objcp-common.h (objcp_tsubst_copy_and_build): Rename to ...
(objcp_tsubst_expr): ... this.
* cp-tree.h (tsubst_copy_and_build): Remove declaration.
* init.cc (maybe_instantiate_nsdmi_init): Use tsubst_expr
instead of tsubst_copy_and_build.
* pt.cc (expand_integer_pack): Likewise.
(instantiate_non_dependent_expr_internal): Likewise.
(instantiate_class_template): Use tsubst_stmt instead of
tsubst_expr for STATIC_ASSERT.
(tsubst_function_decl): Adjust tsubst_copy_and_build uses.
(tsubst_arg_types): Likewise.
(tsubst_exception_specification): Likewise.
(tsubst_tree_list): Likewise.
(tsubst): Likewise.
(tsubst_name): Likewise.
(tsubst_omp_clause_decl): Use tsubst_stmt instead of tsubst_expr.
(tsubst_omp_clauses): Likewise.
(tsubst_copy_asm_operands): Adjust tsubst_copy_and_build use.
(tsubst_omp_for_iterator): Use tsubst_stmt instead of tsubst_expr.
(tsubst_expr): Rename to ...
(tsubst_stmt): ... this.
<case CO_YIELD_EXPR, CO_AWAIT_EXPR>: Move to tsubst_expr.
(tsubst_omp_udr): Use tsubst_stmt instead of tsubst_expr.
(tsubst_non_call_postfix_expression): Adjust tsubst_copy_and_build
use.
(tsubst_lambda_expr): Likewise.  Use tsubst_stmt instead of
tsubst_expr for the body of a lambda.
(tsubst_copy_and_build_call_args): Rename to ...
(tsubst_call_args): ... this.  Adjust tsubst_copy_and_build use.
(tsubst_copy_and_build): Rename to tsubst_expr.  Adjust
tsubst_copy_and_build and tsubst_copy_and_build_call_args use.
<case TRANSACTION_EXPR>: Use tsubst_stmt instead of tsubst_expr.
(maybe_instantiate_noexcept): Adjust tsubst_copy_and_build use.
(instantiate_body): Use tsubst_stmt instead of tsubst_expr for
substituting the function body.
(tsubst_initializer_list): Adjust tsubst_copy_and_build use.

gcc/objcp/ChangeLog:

* objcp-lang.cc (objcp_tsubst_copy_and_build): Rename to ...
(objcp_tsubst_expr): ... this.  Adjust tsubst_copy_and_build
uses.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: merge tsubst_copy into tsubst_copy_and_build

The relationship between tsubst_copy_and_build and tsubst_copy (two of
the main template argument substitution routines for expression trees)
is rather hazy.  The former is mostly a superset of the latter, with
some differences.

The main apparent difference is their handling of various tree codes,
but much of the tree code handling in tsubst_copy appears to be dead
code.  This is because tsubst_copy mostly gets (directly) called on
id-expressions rather than on arbitrary expressions.  The interesting
tree codes are PARM_DECL, VAR_DECL, BIT_NOT_EXPR, SCOPE_REF,
TEMPLATE_ID_EXPR and IDENTIFIER_NODE:

* for PARM_DECL and VAR_DECL, tsubst_copy_and_build calls tsubst_copy
   followed by doing some extra handling of its own
* for BIT_NOT_EXPR tsubst_copy implicitly handles unresolved destructor
   calls (i.e. the first operand is an identifier or a type)
* for SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE tsubst_copy
   refrains from doing name lookup of the terminal name

Other more minor differences are that tsubst_copy exits early when
'args' is null, and it calls maybe_dependent_member_ref, and finally
it dispatches to tsubst for type trees.[1]

Thus tsubst_copy is similar enough to tsubst_copy_and_build that it
makes sense to merge the two functions, with the main difference we
want to preserve is tsubst_copy's lack of name lookup for id-expressions.
This patch achieves this via a new tsubst flag tf_no_name_lookup which
controls name lookup and resolution of a (top-level) id-expression.

[1]: Exiting early for null 'args' doesn't seem right since it means we
return templated trees even when !processing_template_decl.  And
dispatching to tsubst for type trees muddles the distinction between
type and expressions which makes things less clear at the call site.
So these properties of tsubst_copy don't seem worth preserving.

N.B. the diff for this patch looks much cleaner when generated using
the "patience diff" algorithm via Git's --patience flag.

gcc/cp/ChangeLog:

* cp-tree.h (enum tsubst_flags): Add tf_no_name_lookup.
* pt.cc (tsubst_pack_expansion): Use tsubst for substituting
BASES_TYPE.
(tsubst_decl) <case USING_DECL>: Use tsubst_name instead of
tsubst_copy.
(tsubst) <case TEMPLATE_TYPE_PARM>: Use tsubst_copy_and_build
instead of tsubst_copy for substituting
CLASS_PLACEHOLDER_TEMPLATE.
<case TYPENAME_TYPE>: Use tsubst_name instead of tsubst_copy for
substituting TYPENAME_TYPE_FULLNAME.
(tsubst_name): Define.
(tsubst_qualified_id): Use tsubst_name instead of tsubst_copy
for substituting the component name of a SCOPE_REF.
(tsubst_copy): Remove.
(tsubst_copy_and_build): Clear tf_no_name_lookup at the start,
and remember if it was set.  Call maybe_dependent_member_ref if
tf_no_name_lookup was not set.
<case IDENTIFIER_NODE>: Don't do name lookup if tf_no_name_lookup
was set.
<case TEMPLATE_ID_EXPR>: If tf_no_name_lookup was set, use
tsubst_name instead of tsubst_copy_and_build to substitute the
template and don't finish the template-id.
<case BIT_NOT_EXPR>: Handle identifier and type operand (if
tf_no_name_lookup was set).
<case SCOPE_REF>: Avoid trying to resolve a SCOPE_REF if
tf_no_name_lookup was set by calling build_qualified_name directly
instead of tsubst_qualified_id.
<case SIZEOF_EXPR>: Handling of sizeof...  copied from tsubst_copy.
<case CALL_EXPR>: Use tsubst_name instead of tsubst_copy to
substitute a TEMPLATE_ID_EXPR callee naming an unresolved template.
<case COMPONENT_REF>: Likewise to substitute the member.
<case FUNCTION_DECL>: Copied from tsubst_copy and merged with ...
<case VAR_DECL, PARM_DECL>: ... these.  Initial handling copied
from tsubst_copy.  Optimize local variable substitution by
trying retrieve_local_specialization before checking
uses_template_parms.
<case CONST_DECL>: Copied from tsubst_copy.
<case FIELD_DECL>: Likewise.
<case NAMESPACE_DECL>: Likewise.
<case OVERLOAD>: Likewise.
<case TEMPLATE_DECL>: Likewise.
<case TEMPLATE_PARM_INDEX>: Likewise.
<case TYPE_DECL>: Likewise.
<case CLEANUP_POINT_EXPR>: Likewise.
<case OFFSET_REF>: Likewise.
<case EXPR_PACK_EXPANSION>: Likewise.
<case NONTYPE_ARGUMENT_PACK>: Likewise.
<case *_CST>: Likewise.
<case *_*_FOLD_EXPR>: Likewise.
<case DEBUG_BEGIN_STMT>: Likewise.
<case CO_AWAIT_EXPR>: Likewise.
<case TRAIT_EXPR>: Use tsubst and tsubst_copy_and_build instead
of tsubst_copy.
<default>: Copied from tsubst_copy.
(tsubst_initializer_list): Use tsubst and tsubst_copy_and_build
instead of tsubst_copy.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: non-static memfn call dependence cleanup [PR106086]

In cp_parser_postfix_expression, and in the CALL_EXPR case of
tsubst_copy_and_build, we essentially repeat the type-dependent and
COMPONENT_REF callee cases of finish_call_expr. This patch deduplicates
this logic by making both spots consistently go through finish_call_expr.

This allows us to easily fix PR106086 -- which is about us neglecting to
capture 'this' when we resolve a use of a non-static member function of
the current instantiation only at lambda regeneration time -- by moving
the call to maybe_generic_this_capture from the parser to finish_call_expr
so that we consider capturing 'this' at regeneration time as well.

PR c++/106086

gcc/cp/ChangeLog:

* parser.cc (cp_parser_postfix_expression): Consolidate three
calls to finish_call_expr, one to build_new_method_call and
one to build_min_nt_call_vec into one call to finish_call_expr.
Don't call maybe_generic_this_capture here.
* pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Remove
COMPONENT_REF callee handling.
(type_dependent_expression_p): Use t_d_object_e_p instead of
t_d_e_p for COMPONENT_REF and OFFSET_REF.
* semantics.cc (finish_call_expr): In the type-dependent case,
call maybe_generic_this_capture here instead.

gcc/testsuite/ChangeLog:

* g++.dg/template/crash127.C: Expect additional error due to
being able to check the member access expression ahead of time.
Strengthen the test by not instantiating the class template.
* g++.dg/cpp1y/lambda-generic-this5.C: New test.

c++: remove NON_DEPENDENT_EXPR, part 2

This follow-up patch removes build_non_dependent_expr (and
make_args_non_dependent) and calls thereof, no functional change.

gcc/cp/ChangeLog:

* call.cc (build_new_method_call): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
* coroutines.cc (finish_co_return_stmt): Likewise.
* cp-tree.h (build_non_dependent_expr): Remove.
(make_args_non_dependent): Remove.
* decl2.cc (grok_array_decl): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
(build_offset_ref_call_from_tree): Likewise.
* init.cc (build_new): Likewise.
* pt.cc (make_args_non_dependent): Remove.
(test_build_non_dependent_expr): Remove.
(cp_pt_cc_tests): Adjust.
* semantics.cc (finish_expr_stmt): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
(finish_for_expr): Likewise.
(finish_call_expr): Likewise.
(finish_omp_atomic): Likewise.
* typeck.cc (finish_class_member_access_expr): Likewise.
(build_x_indirect_ref): Likewise.
(build_x_binary_op): Likewise.
(build_x_array_ref): Likewise.
(build_x_vec_perm_expr): Likewise.
(build_x_shufflevector): Likewise.
(build_x_unary_op): Likewise.
(cp_build_addressof): Likewise.
(build_x_conditional_expr): Likewise.
(build_x_compound_expr): Likewise.
(build_static_cast): Likewise.
(build_x_modify_expr): Likewise.
(check_return_expr): Likewise.
* typeck2.cc (build_x_arrow): Likewise.

Reviewed-by: Jason Merrill <jason@redhat.com>