gcc.gnu.org Git - gcc.git/log

Revert some changes

Update ChangeLog.*

Add -mcpu=power11 support.

2024-03-08 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config.gcc (rs6000*-*-*, powerpc*-*-*): Add support for power11.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/power10.md (all reservations): Add power11 as an
alterntive to power10.
* config/rs6000/ppc-auxv.h (PPC_PLATFORM_POWER11): New define.
* config/rs6000/rs6000-builtin.cc (cpu_is_info): Add power11.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_PWR11 if -mcpu=power11.
* config/rs6000/rs6000-cpus.def (ISA_POWER11_MASKS_SERVER): New define.
(POWERPC_MASKS): Add power11 isa bit.
(power11 cpu): Add power11 definition.
* rs6000-opts.h (PROCESSOR_POWER11): Add power11 processor.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (power11_cost): Add power11 support.
(rs6000_option_override_internal): Likewise.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000.md (cpu attribute): Add power11.
* config/rs6000/rs6000.opt (-mpower11): Add internal power11 ISA flag.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=power11.

gcc/testsuite/

* gcc.target/powerpc/power11-1.c: New test.
* gcc.target/powerpc/power11-2.c: Likewise.
* gcc.target/powerpc/power11-3.c: Likewise.
* lib/target-supports.exp (check_effective_target_power11_ok): Add new
effective target.

Revert some changes

Update ChangeLog.*

Fix thinko

2024-03-07 Michael Meissner <meissner@linux.ibm.com>

* config/rs6000/rs6000.cc (rs6000_sched_reorder2): Fix thinko.

Update ChangeLog.*

Add power11 to rs6000-string.cc

2024-03-07 Michael Meissner <meissner@linux.ibm.com>

* config/rs6000/rs6000-string.cc (expand_compare_loop): Add power11.

Update ChangeLog.*

Allow configuration of power11

2024-03-07 Michael Meissner <meissner@linux.ibm.com>

* config.gcc (rs6000*-*-*, powerpc*-*-*): Allow configuration of
power11.

Reallow power11 2nd sched charges for sched pass one.

2024-03-07 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Set power10
fusion if -mcpu=power11
(rs6000_sched_reorder2): Add power11 change.

Update ChangeLog.*

Reallow power11 sched charges for sched pass one.

2024-03-07 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config/rs6000/rs6000.cc (rs6000_sched_reorder): Add power11 change.

Update ChangeLog.*

Revert some changes

Update ChangeLog.*

Add -mcpu=power11 support.

2024-03-07 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/power10.md (all reservations): Add power11 as an
alterntive to power10.
* config/rs6000/ppc-auxv.h (PPC_PLATFORM_POWER11): New define.
* config/rs6000/rs6000-builtin.cc (cpu_is_info): Add power11.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_PWR11 if -mcpu=power11.
* config/rs6000/rs6000-cpus.def (ISA_POWER11_MASKS_SERVER): New define.
(POWERPC_MASKS): Add power11 isa bit.
(power11 cpu): Add power11 definition.
* rs6000-opts.h (PROCESSOR_POWER11): Add power11 processor.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (power11_cost): Add power11 support.
(rs6000_option_override_internal): Likewise.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000.md (cpu attribute): Add power11.
* config/rs6000/rs6000.opt (-mpower11): Add internal power11 ISA flag.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=power11.

gcc/testsuite/

* gcc.target/powerpc/power11-1.c: New test.
* gcc.target/powerpc/power11-2.c: Likewise.
* gcc.target/powerpc/power11-3.c: Likewise.
* lib/target-supports.exp (check_effective_target_power11_ok): Add new
effective target.

Add ChangeLog.meissner and REVISION.

2024-03-07 Michael Meissner <meissner@linux.ibm.com>

gcc/

* REVISION: New file for branch.
* ChangeLog.meissner: New file.

gcc/c-family/

* ChangeLog.meissner: New file.

gcc/c/

* ChangeLog.meissner: New file.

gcc/cp/

* ChangeLog.meissner: New file.

gcc/fortran/

* ChangeLog.meissner: New file.

gcc/testsuite/

* ChangeLog.meissner: New file.

libgcc/

* ChangeLog.meissner: New file.

c++: ICE with variable template and [[deprecated]] [PR110031]

lookup_and_finish_template_variable already has and uses the complain
parameter but it is not passing it down to mark_used so we got the
default tf_warning_or_error, which causes various problems when
lookup_and_finish_template_variable gets called with complain=tf_none.

PR c++/110031

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Pass complain to
mark_used.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/inline-var11.C: New test.

doc: Fix docs for -dD regarding predefined macros

The manual has always claimed that -dD differs from -dM by not
outputting predefined macros, but that's untrue. It has been untrue
since GCC 3.0 (probably with the change to use libcpp as the default
preprocessor implementation).

gcc/ChangeLog:

* doc/cppopts.texi: Remove incorrect claim about -dD not
outputting predefined macros.

rs6000: Don't ICE when compiling the __builtin_vsx_splat_2di [PR113950]

When we expand the __builtin_vsx_splat_2di built-in, we were allowing immediate
value for second operand which causes an unrecognizable insn ICE. Even though
the immediate value was forced into a register, it wasn't correctly assigned
to the second operand. So corrected the assignment of op1 to operands[1].

2024-03-07 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
PR target/113950
* config/rs6000/vsx.md (vsx_splat_<mode>): Correct assignment to operand1
and simplify else if with else.

gcc/testsuite/
PR target/113950
* gcc.target/powerpc/pr113950.c: New testcase.

Fix bogus error on allocator for array type with Dynamic_Predicate

This is a regression present on all active branches: the compiler gives
a bogus error on an allocator for an unconstrained array type declared
with a Dynamic_Predicate because Apply_Predicate_Check is invoked directly
on a subtype reference, which it cannot handle.

This moves the check to the resulting access value (after dereference) like
in Expand_Allocator_Expression.

gcc/ada/
PR ada/113979
* exp_ch4.adb (Expand_N_Allocator): In the subtype indication case,
call Apply_Predicate_Check on the resulting access value if needed.

gcc/testsuite/
* gnat.dg/predicate15.adb: New test.

Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

When building gcc's C++ sources against recent libc++, the poisoning of
the ctype macros due to including safe-ctype.h before including C++
standard headers such as <list>, <map>, etc, causes many compilation
errors, similar to:

  In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
  In file included from /home/dim/src/gcc/master/gcc/system.h:233:
  In file included from /usr/include/c++/v1/vector:321:
  In file included from
  /usr/include/c++/v1/__format/formatter_bool.h:20:
  In file included from
  /usr/include/c++/v1/__format/formatter_integral.h:32:
  In file included from /usr/include/c++/v1/locale:202:
  /usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
  only applies to structs, variables, functions, and namespaces
    546 |     _LIBCPP_INLINE_VISIBILITY
        |     ^
  /usr/include/c++/v1/__config:813:37: note: expanded from macro
  '_LIBCPP_INLINE_VISIBILITY'
    813 | #  define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
        |                                     ^
  /usr/include/c++/v1/__config:792:26: note: expanded from macro
  '_LIBCPP_HIDE_FROM_ABI'
    792 |
    __attribute__((__abi_tag__(_LIBCPP_TOSTRING(
  _LIBCPP_VERSIONED_IDENTIFIER))))
        |                          ^
  In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
  In file included from /home/dim/src/gcc/master/gcc/system.h:233:
  In file included from /usr/include/c++/v1/vector:321:
  In file included from
  /usr/include/c++/v1/__format/formatter_bool.h:20:
  In file included from
  /usr/include/c++/v1/__format/formatter_integral.h:32:
  In file included from /usr/include/c++/v1/locale:202:
  /usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
  declaration list
    547 |     char_type toupper(char_type __c) const
        |                                     ^
  /usr/include/c++/v1/__locale:553:48: error: too many arguments
  provided to function-like macro invocation
    553 |     const char_type* toupper(char_type* __low, const
    char_type* __high) const
        |                                                ^
  /home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
  macro 'toupper' defined here
    146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
        |         ^

This is because libc++ uses different transitive includes than
libstdc++, and some of those transitive includes pull in various ctype
declarations (typically via <locale>).

There was already a special case for including <string> before
safe-ctype.h, so move the rest of the C++ standard header includes to
the same location, to fix the problem.

gcc/ChangeLog:

* system.h: Include safe-ctype.h after C++ standard headers.

Signed-off-by: Dimitry Andric <dimitry@andric.com>

analyzer: Fix up some -Wformat* warnings

I'm seeing warnings like
../../gcc/analyzer/access-diagram.cc: In member function ‘void ana::bit_size_expr::print(pretty_printer*) const’:
../../gcc/analyzer/access-diagram.cc:399:26: warning: unknown conversion type character ‘E’ in format [-Wformat=]
  399 |         pp_printf (pp, _("%qE bytes"), bytes_expr);
      |                          ^~~~~~~~~~~
when building stage2/stage3 gcc.  While such warnings would be
understandable when building stage1 because one could e.g. have some
older host compiler which doesn't understand some of the format specifiers,
the above seems to be because we have in pretty-print.h
#ifdef GCC_DIAG_STYLE
#define GCC_PPDIAG_STYLE GCC_DIAG_STYLE
#else
#define GCC_PPDIAG_STYLE __gcc_diag__
#endif
and use GCC_PPDIAG_STYLE e.g. for pp_printf, and while
diagnostic-core.h has
#ifndef GCC_DIAG_STYLE
#define GCC_DIAG_STYLE __gcc_tdiag__
#endif
(and similarly various FE headers include their own GCC_DIAG_STYLE)
when including pretty-print.h before diagnostic-core.h we end up
with __gcc_diag__ style rather than __gcc_tdiag__ style, which I think
is the right thing for the analyzer, because analyzer seems to use
default_tree_printer everywhere:
grep pp_format_decoder.*=.default_tree_printer analyzer/* | wc -l
57

The following patch fixes that by making sure diagnostic-core.h is included
before pretty-print.h.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

* access-diagram.cc: Include diagnostic-core.h before including
diagnostic.h or diagnostic-path.h.
* sm-malloc.cc: Likewise.
* diagnostic-manager.cc: Likewise.
* call-summary.cc: Likewise.
* record-layout.cc: Likewise.

contrib: Update test_mklog to correspond to mklog

contrib/ChangeLog:

* test_mklog.py: "Moved to..." -> "Move to..."

Signed-off-by: Filip Kastl <fkastl@suse.cz>

c++: Fix ICE diagnosing incomplete type of overloaded function set [PR98356]

In the linked PR the result of 'get_first_fn' is a USING_DECL against
the template parameter, to be filled in on instantiation. But we don't
actually need to get the first set of the member functions: it's enough
to know that we have a (possibly overloaded) member function at all.

PR c++/98356

gcc/cp/ChangeLog:

* typeck2.cc (cxx_incomplete_type_diagnostic): Don't assume
'member' will be a FUNCTION_DECL (or something like it).

gcc/testsuite/ChangeLog:

* g++.dg/pr98356.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Stream DECL_CONTEXT for template template parms [PR98881]

When streaming in a nested template-template parameter as in the
attached testcase, we end up reaching the containing template-template
parameter in 'tpl_parms_fini'. We should not set the DECL_CONTEXT to
this (nested) template-template parameter, as it should already be the
struct that the outer template-template parameter is declared on.

The precise logic for what DECL_CONTEXT should be for a template
template parameter in various situations seems rather obscure. Rather
than trying to determine the assumptions that need to hold, it seems
simpler to just always re-stream the DECL_CONTEXT as needed for now.

PR c++/98881

gcc/cp/ChangeLog:

* module.cc (trees_out::tpl_parms_fini): Stream out DECL_CONTEXT
for template template parameters.
(trees_in::tpl_parms_fini): Read it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-tpl-parm-3.h: New test.
* g++.dg/modules/tpl-tpl-parm-3_a.H: New test.
* g++.dg/modules/tpl-tpl-parm-3_b.C: New test.
* g++.dg/modules/tpl-tpl-parm-3_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]

The following testcase ICEs, because fix_crossing_unconditional_branches
thinks that asm goto is an unconditional jump and removes it, replacing it
with unconditional jump to one of the labels.
This doesn't happen on x86 because the function in question isn't invoked
there at all:
  /* If the architecture does not have unconditional branches that
     can span all of memory, convert crossing unconditional branches
     into indirect jumps.  Since adding an indirect jump also adds
     a new register usage, update the register usage information as
     well.  */
  if (!HAS_LONG_UNCOND_BRANCH)
    fix_crossing_unconditional_branches ();
I think for the asm goto case, for the non-fallthru edge if any we should
handle it like any other fallthru (and fix_crossing_unconditional_branches
doesn't really deal with those, it only looks at explicit branches at the
end of bbs and we are in cfglayout mode at that point) and for the labels
we just pass the labels as immediates to the assembly and it is up to the
user to figure out how to store them/branch to them or whatever they want to
do.
So, the following patch fixes this by not treating asm goto as a simple
unconditional jump.

I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
somewhere else, where outofcfglayout or whatever should actually create
those indirect jumps on the crossing edges instead of adding normal
unconditional jumps, I see e.g. in
__attribute__((cold)) int bar (char *);
__attribute__((hot)) int baz (char *);
void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto l1; l1: baz (""); }
void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: return; l1: bar (""); goto l2; }
with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
just b .L? jumps which I believe are +-32MB, so if .text is larger than
32MB, it could fail to link, but this patch doesn't address that.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/110079
* bb-reorder.cc (fix_crossing_unconditional_branches): Don't adjust
asm goto.

* gcc.dg/pr110079.c: New test.

expand: Fix UB in choose_mult_variant [PR105533]

As documented in the function comment, choose_mult_variant attempts to
compute costs of 3 different cases, val, -val and val - 1.
The -val case is actually only done if val fits into host int, so there
should be no overflow, but the val - 1 case is done unconditionally.
val is shwi (but inside of synth_mult already uhwi), so when val is
HOST_WIDE_INT_MIN, val - 1 invokes UB.  The following patch fixes that
by using val - HOST_WIDE_INT_1U, but I'm not really convinced it would
DTRT for > 64-bit modes, so I've guarded it as well.  Though, arch
would need to have really strange costs that something that could be
expressed as x << 63 would be better expressed as (x * 0x7fffffffffffffff) + 1
In the long term, I think we should just rewrite
choose_mult_variant/synth_mult etc. to work on wide_int.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/105533
* expmed.cc (choose_mult_variant): Only try the val - 1 variant
if val is not HOST_WIDE_INT_MIN or if mode has exactly
HOST_BITS_PER_WIDE_INT precision.  Avoid triggering UB while computing
val - 1.

* gcc.dg/pr105533.c: New test.

sccvn: Avoid UB in ao_ref_init_from_vn_reference [PR105533]

When compiling libgcc or on e.g.
int a[64];
int p;

void
foo (void)
{
  int s = 1;
  while (p)
    {
      s -= 11;
      a[s] != 0;
    }
}
sccvn invokes UB in the compiler as detected by ubsan:
../../gcc/poly-int.h:1089:5: runtime error: left shift of negative value -40
The problem is that we still use C++11..C++17 as the implementation language
and in those C++ versions shifting negative values left is UB (well defined
since C++20) and above in
           offset += op->off << LOG2_BITS_PER_UNIT;
op->off is poly_int64 with -40 value (in libgcc with -8).
I understand the offset_int << LOG2_BITS_PER_UNIT shifts but it is then well
defined during underlying implementation which is done on the uhwi limbs,
but for poly_int64 we use
                offset += pop->off * BITS_PER_UNIT;
a few lines earlier and I think that is both more readable in what it
actually does and triggers UB only if there would be signed multiply
overflow.  In the end, the compiler will treat them the same at least at the
RTL level (at least, if not and they aren't the same cost, it should).

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/105533
* tree-ssa-sccvn.cc (ao_ref_init_from_vn_reference) <case ARRAY_REF>:
Multiple op->off by BITS_PER_UNIT instead of shifting it left by
LOG2_BITS_PER_UNIT.

LoongArch: testsuite:Fix problems with incorrect results in vector test cases.

In simd_correctness_check.h, the role of the macro ASSERTEQ_64 is to check the
result of the passed vector values for the 64-bit data of each array element.
It turns out that it uses the abs() function to check only the lower 32 bits
of the data at a time, so it replaces abs() with the llabs() function.

However, the following two problems may occur after modification:

1.FAIL in lasx-xvfrint_s.c and lsx-vfrint_s.c
The reason for the error is because vector test cases that use __m{128,256} to
define vector types are composed of 32-bit primitive types, they should use
ASSERTEQ_32 instead of ASSERTEQ_64 to check for correctness.

2.FAIL in lasx-xvshuf_b.c and lsx-vshuf.c
The cause of the error is that the expected result of the function setting in
the test case is incorrect.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c: Replace
ASSERTEQ_64 with the macro ASSERTEQ_32.
* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Modify the expected
test results of some functions according to the function of the vector
instruction.
* gcc.target/loongarch/vector/lsx/lsx-vfrint_s.c: Same
modification as lasx-xvfrint_s.c.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Same
modification as lasx-xvshuf_b.c.
* gcc.target/loongarch/vector/simd_correctness_check.h: Use the llabs()
function instead of abs() to check the correctness of the results.

LoongArch: Use /lib instead of /lib64 as the library search path for MUSL.

gcc/ChangeLog:

* config.gcc: Add a case for loongarch*-*-linux-musl*.
* config/loongarch/linux.h: Disable the multilib-compatible
treatment for *musl* targets.
* config/loongarch/musl.h: New file.

match.pd: Optimize a * !a to 0 [PR114009]

The following patch attempts to fix an optimization regression through
adding a simple simplification.  We already have the
/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
(if (!canonicalize_math_p ())
(for cmp (tcc_comparison)
  (simplify
   (mult:c (convert (cmp@0 @1 @2)) @3)
   (if (INTEGRAL_TYPE_P (type)
        && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
     (cond @0 @3 { build_zero_cst (type); })))
optimization which otherwise triggers during the a * !a multiplication,
but that is done only late and we aren't able through range assumptions
optimize it yet anyway.

The patch adds a specific simplification for it.
If a is zero, then a * !a will be 0 * 1 (or for signed 1-bit 0 * -1)
and so 0.
If a is non-zero, then a * !a will be a * 0 and so again 0.
THe pattern is valid for scalar integers, complex integers and vector types,
but I think will actually trigger only for the scalar integers.  For
vector types I've added other two with VEC_COND_EXPR in it, for complex
there are different GENERIC trees to match and it is something that likely
would be never matched in GIMPLE, so I didn't handle that.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/114009
* genmatch.cc (decision_tree::gen): Emit ARG_UNUSED for captures
argument even for GENERIC, not just for GIMPLE.
* match.pd (a * !a -> 0): New simplifications.

* gcc.dg/tree-ssa/pr114009.c: New test.

RISC-V: Refactor expand_vec_cmp [NFC]

There are two expand_vec_cmp functions.
They have same structure and similar code.
We can use default arguments instead of overloading.

Tested on RV32 and RV64.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_cmp): Change proto
* config/riscv/riscv-v.cc (expand_vec_cmp): Use default arguments
(expand_vec_cmp_float): Adapt arguments

Signed-off-by: demin.han <demin.han@starfivetech.com>

Fortran: Fix issue with using snprintf function.

The previous patch used snprintf to set the message
string. The message string is not a formatted string
and the snprintf will interpret '%' related characters
as format specifiers when there are no associated
output variables. A segfault ensues.

This change replaces snprintf with a fortran string copy
function and null terminates the message string.

PR libfortran/105456

libgfortran/ChangeLog:

* io/list_read.c (list_formatted_read_scalar): Use fstrcpy
from libgfortran/runtime/string.c to replace snprintf.
(nml_read_obj): Likewise.
* io/transfer.c (unformatted_read): Likewise.
(unformatted_write): Likewise.
(formatted_transfer_scalar_read): Likewise.
(formatted_transfer_scalar_write): Likewise.
* io/write.c (list_formatted_write_scalar): Likewise.
(nml_write_obj): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105456.f90: Revise using '%' characters
in users error message.

Daily bump.

i386: Fix and improve insn constraint for V2QI arithmetic/shift insns

optimize_function_for_size_p predicate is not stable during optab selection,
because it also depends on node->count/node->frequency of the current function,
which are updated during IPA, so they may change between early opts and
late opts.  Use optimize_size instead - optimize_size implies
optimize_function_for_size_p (cfun), so if a named pattern uses
"&& optimize_size" and the insn it splits into uses
optimize_function_for_size_p (cfun), it shouldn't fail.

PR target/114232

gcc/ChangeLog:

* config/i386/mmx.md (negv2qi2): Enable for optimize_size instead
of optimize_function_for_size_p.  Explictily enable for TARGET_SSE2.
(negv2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(<plusminus:insn>v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.  Explictily enable for TARGET_SSE2.
(<plusminus:insn>v2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(<any_shift:insn>v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.

RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads [PR114200].

Three-operand instructions like vmacc are modeled with an implicit
output reload when the output does not match one of the operands. For
this we use vmv.v.v which is subject to length masking.

In a situation where the current vl is less than the full vlenb
and the fma's result value is used as input for a vector reduction
(which is never length masked) we effectively only reduce vl
elements. The masked-out elements are relevant for the
reduction, though, leading to a wrong result.

This patch replaces the vmv reloads by full-register reloads.

gcc/ChangeLog:

PR target/114200
PR target/114202

* config/riscv/vector.md: Use vmv[1248]r.v instead of vmv.v.v.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr114200.c: New test.
* gcc.target/riscv/rvv/autovec/pr114202.c: New test.

RISC-V: Adjust vec unit-stride load/store costs.

Scalar loads provide offset addressing while unit-stride vector
instructions cannot.  The offset must be loaded into a general-purpose
register before it can be used.  In order to account for this, this
patch adds an address arithmetic heuristic that keeps track of data
reference operands.  If we haven't seen the operand before we add the
cost of a scalar statement.

This helps to get rid of an lbm regression when vectorizing (roughly
0.5% fewer dynamic instructions).  gcc5 improves by 0.2% and deepsjeng
by 0.25%.  wrf and nab degrade by 0.1%.  This is because before we now
adjust the cost of SLP as well as loop-vectorized instructions whereas
we would only adjust loop-vectorized instructions before.
Considering higher scalar_to_vec costs (3 vs 1) for all vectorization
types causes some snippets not to get vectorized anymore.  Given these
costs the decision looks correct but appears worse when just counting
dynamic instructions.

In total SPECint 2017 has 4 bln dynamic instructions less and SPECfp 0.7
bln.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Move...
(costs::adjust_stmt_cost): ... to here and add vec_load/vec_store
offset handling.
(costs::add_stmt_cost): Also adjust cost for statements without
stmt_info.
* config/riscv/riscv-vector-costs.h: Define zero constant.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-2.c: New test.

ARM: Fix conditional execution [PR113915]

By default most patterns can be conditionalized on Arm targets.  However
Thumb-2 predication requires the "predicable" attribute be explicitly
set to "yes".  Most patterns are shared between Arm and Thumb(-2) and are
marked with "predicable".  Given this sharing, it does not make sense to
use a different default for Arm.  So only consider conditional execution
of instructions that have the predicable attribute set to yes.  This ensures
that patterns not explicitly marked as such are never conditionally executed.

gcc/ChangeLog:
PR target/113915
* config/arm/arm.md (NOCOND): Improve comment.
(arm_rev*) Add predicable.
* config/arm/arm.cc (arm_final_prescan_insn): Add check for
PREDICABLE_YES.

gcc/testsuite/ChangeLog:
PR target/113915
* gcc.target/arm/builtin-bswap-1.c: Fix test to allow conditional
execution both for Arm and Thumb-2.

Revert "Set num_threads to 50 on 32-bit hppa in two libgomp loop tests"

This reverts commit b14209715e659f6d3ca0f9eef9a4851e7bd6e373.

[PR target/113001] Fix incorrect operand swapping in conditional move

This bug totally fell off my radar.  Sorry about that.

We have some special casing the conditional move expander to simplify a
conditional move when comparing a register against zero and that same register
is one of the arms.

Specifically a (eq (reg) (const_int 0)) where reg is also the true arm or (ne
(reg) (const_int 0)) where reg is the false arm need not use the fully
generalized conditional move, thus saving an instruction for those cases.

In the NE case we swapped the operands, but didn't swap the condition, which
led to the ICE due to an unrecognized pattern.  THe backend actually has
distinct patterns for those two cases.  So swapping the operands is neither
needed nor advisable.

Regression tested on rv64gc and verified the new tests pass.

Pushing to the trunk.

PR target/113001
PR target/112871
gcc/
* config/riscv/riscv.cc (expand_conditional_move): Do not swap
operands when the comparison operand is the same as the false
arm for a NE test.

gcc/testsuite
* gcc.target/riscv/zicond-ice-3.c: New test.
* gcc.target/riscv/zicond-ice-4.c: New test.

Fortran: error recovery while simplifying expressions [PR103707,PR106987]

When an exception is encountered during simplification of arithmetic
expressions, the result may depend on whether range-checking is active
(-frange-check) or not.  However, the code path in the front-end should
stay the same for "soft" errors for which the exception is triggered by the
check, while "hard" errors should always terminate the simplification, so
that error recovery is independent of the flag.  Separation of arithmetic
error codes into "hard" and "soft" errors shall be done consistently via
is_hard_arith_error().

PR fortran/103707
PR fortran/106987

gcc/fortran/ChangeLog:

* arith.cc (is_hard_arith_error): New helper function to determine
whether an arithmetic error is "hard" or not.
(check_result): Use it.
(gfc_arith_divide): Set "Division by zero" only for regular
numerators of real and complex divisions.
(reduce_unary): Use is_hard_arith_error to determine whether a hard
or (recoverable) soft error was encountered.  Terminate immediately
on hard error, otherwise remember code of first soft error.
(reduce_binary_ac): Likewise.
(reduce_binary_ca): Likewise.
(reduce_binary_aa): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr99350.f90:
* gfortran.dg/arithmetic_overflow_3.f90: New test.

c++: ICE with noexcept and local specialization [PR114114]

Here we ICE because we call register_local_specialization while
local_specializations is null, so

  local_specializations->put ();

crashes on null this.  It's null since maybe_instantiate_noexcept calls
push_to_top_level which creates a new scope.  Normally, I would have
guessed that we need a new local_specialization_stack.  But here we're
dealing with an operand of a noexcept, which is an unevaluated operand,
and those aren't registered in the hash map.  maybe_instantiate_noexcept
wasn't signalling that it's substituting an unevaluated operand though.

PR c++/114114

gcc/cp/ChangeLog:

* pt.cc (maybe_instantiate_noexcept): Save/restore
cp_unevaluated_operand, c_inhibit_evaluation_warnings, and
cp_noexcept_operand around the tsubst_expr call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept84.C: New test.

i386: Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move

Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move and
use generic code instead.

No functional changes.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_move) [TARGET_MACHO]:
Eliminate common code and use generic code instead.

amdgcn: additional gfx1030/gfx1100 support: adjust test cases

The "SDWA" changes in commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support" caused a few regressions:

    PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler zero_extendv64qiv64si2

    PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler zero_extendv64hiv64si2

    PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler zero_extendv64qiv64si2

    PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler zero_extendv64hiv64si2

Those test cases need corresponding adjustment.

gcc/testsuite/
* gcc.target/gcn/sram-ecc-3.c: Adjust.
* gcc.target/gcn/sram-ecc-4.c: Likewise.
* gcc.target/gcn/sram-ecc-7.c: Likewise.
* gcc.target/gcn/sram-ecc-8.c: Likewise.

AVR: Adjust rtx cost of plus + zero_extend.

gcc/
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS+ZERO_EXTEND]: Adjust
rtx cost.

tree-optimization/114239 - rework reduction epilogue driving

The following reworks vectorizable_live_operation to pass the
live stmt to vect_create_epilog_for_reduction also for early breaks
and a peeled main exit.  This is to be able to figure the scalar
definition to replace.  This reverts the PR114192 fix as it is
subsumed by this cleanup.

PR tree-optimization/114239
* tree-vect-loop.cc (vect_get_vect_def): Remove.
(vect_create_epilog_for_reduction): The passed in stmt_info
should now be the live stmt that produces the scalar reduction
result.  Revert PR114192 fix.  Base reduction info off
info_for_reduction.  Remove special handling of
early-break/peeled, restore original vector def gathering.
Make sure to pick the correct exit PHIs.
(vectorizable_live_operation): Pass in the proper stmt_info
for early break exits.

* gcc.dg/vect/vect-early-break_122-pr114239.c: New testcase.

LoongArch: testsuite: Rewrite {x,}vfcmp-{d,f}.c to avoid named registers

Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names. A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.

aarch64: Define out-of-class static constants

While reworking the aarch64 feature descriptions, I forgot
to add out-of-class definitions of some static constants.
This could lead to a build failure with some compilers.

This was seen with some WIP to increase the number of extensions
beyond 64. It's latent on trunk though, and a regression from
before the rework.

gcc/
* config/aarch64/aarch64-feature-deps.h (feature_deps::info): Add
out-of-class definitions of static constants.

c++: Fix template deduction for conversion operators with xobj parameters [PR113629]

Unification for conversion operators (DEDUCE_CONV) doesn't perform
transformations like handling forwarding references. This is correct in
general, but not for xobj parameters, which should be handled "normally"
for the purposes of deduction: [temp.deduct.conv] only applies to the
return type of the conversion function.

PR c++/113629

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Only use DEDUCE_CONV for the
return type of a conversion function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-conv-op.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

tree-optimization/114249 - ICE with BB reduction vectorization

When we scrap the last def of an odd lane numbered BB reduction
we can end up recording a pattern def which will later wreck
code generation. The following puts this logic where it better
belongs, avoiding this issue.

PR tree-optimization/114249
* tree-vect-slp.cc (vect_build_slp_instance): Move making
a BB reduction lane number even ...
(vect_slp_check_for_roots): ... here to avoid leaking
pattern defs.

* gcc.dg/vect/bb-slp-pr114249.c: New testcase.

tree-optimization/114246 - invalid call argument from DSE

The following makes sure to strip type conversions added by
build_fold_addr_expr before placing the result in a call argument.

PR tree-optimization/114246
* tree-ssa-dse.cc (increment_start_addr): Strip useless
type conversions from the adjusted address.

* gcc.dg/torture/pr114246.c: New testcase.

i386: Fix up the vzeroupper REG_DEAD/REG_UNUSED note workaround [PR114190]

When writing the rest_of_handle_insert_vzeroupper workaround to manually
remove all the REG_DEAD/REG_UNUSED notes from the IL, I've missed that
there is a df_analyze () call right after it and that the problems added
earlier in the pass, like df_note_add_problem () done during mode switching,
doesn't affect just the next df_analyze () call right after it, but all
other df_analyze () calls until the end of the current pass where
df_finish_pass removes the optional problems.

So, as can be seen on the following patch, the workaround doesn't actually
work there, because while rest_of_handle_insert_vzeroupper carefully removes
all REG_DEAD/REG_UNUSED notes, the df_analyze () call at the end of the
function immediately adds them in again (so, I must say I have no idea
why the workaround worked on the earlier testcases).

Now, I could move the df_analyze () call just before the REG_DEAD/REG_UNUSED
note removal loop, but I think the following patch is better, because
the df_analyze () call doesn't have to recompute the problem when we don't
care about it and will actively strip all traces of it away.

2024-03-06 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/114190
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Call df_remove_problem for df_note before calling df_analyze.

* gcc.target/i386/avx-pr114190.c: New test.

Fortran: Add user defined error messages for UDTIO.

The defines IOMSG_LEN and MSGLEN were redundant so these are combined
into IOMSG_LEN as defined in io.h.

The remainder of the patch adds checks for when a user defined
derived type IO procedure sets the IOSTAT or IOMSG variables
independent of the librrary defined I/O messages.

PR libfortran/105456

libgfortran/ChangeLog:

* io/io.h (IOMSG_LEN): Moved to here.
* io/list_read.c (MSGLEN): Removed MSGLEN.
(convert_integer): Changed MSGLEN to IOMSG_LEN.
(parse_repeat): Likewise.
(read_logical): Likewise.
(read_integer): Likewise.
(read_character): Likewise.
(parse_real): Likewise.
(read_complex): Likewise.
(read_real): Likewise.
(check_type): Likewise.
(list_formatted_read_scalar): Adjust to IOMSG_LEN.
(nml_read_obj): Add user defined error message.
* io/transfer.c (unformatted_read): Add user defined error
message.
(unformatted_write): Add user defined error message.
(formatted_transfer_scalar_read): Add user defined error message.
(formatted_transfer_scalar_write): Add user defined error message.
* io/write.c (list_formatted_write_scalar): Add user defined error message.
(nml_write_obj): Add user defined error message.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105456-nmlr.f90: New test.
* gfortran.dg/pr105456-nmlw.f90: New test.
* gfortran.dg/pr105456-ruf.f90: New test.
* gfortran.dg/pr105456-wf.f90: New test.
* gfortran.dg/pr105456-wuf.f90: New test.

c++/modules: befriending template from current class scope

Here the TEMPLATE_DECL representing the template friend declaration
naming B has class scope since the template B has class scope, but
get_merge_kind assumes all DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P
TEMPLATE_DECL have namespace scope and wrongly returns MK_named instead
of MK_local_friend for the friend.

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind) <case depset::EK_DECL>:
Accomodate class-scope DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P
TEMPLATE_DECL. Consolidate IDENTIFIER_ANON_P cases.

gcc/testsuite/ChangeLog:

* g++.dg/modules/friend-7.h: New test.
* g++.dg/modules/friend-7_a.H: New test.
* g++.dg/modules/friend-7_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

ctf: fix incorrect CTF for multi-dimensional array types

PR debug/114186

DWARF DIEs of type DW_TAG_subrange_type are linked together to represent
the information about the subsequent dimensions. The CTF processing was
so far working through them in the opposite (incorrect) order.

While fixing the issue, refactor the code a bit for readability.

co-authored-By: Indu Bhagat <indu.bhagat@oracle.com>

gcc/
PR debug/114186
* dwarf2ctf.cc (gen_ctf_array_type): Invoke the ctf_add_array ()
in the correct order of the dimensions.
(gen_ctf_subrange_type): Refactor out handling of
DW_TAG_subrange_type DIE to here.

gcc/testsuite/
PR debug/114186
* gcc.dg/debug/ctf/ctf-array-6.c: Add test.

asan: Handle poly-int sizes in ASAN_MARK [PR97696]

This patch makes the expansion of IFN_ASAN_MARK let through
poly-int-sized objects. The expansion itself was already generic
enough, but the tests for the fast path were too strict.

gcc/
PR sanitizer/97696
* asan.cc (asan_expand_mark_ifn): Allow the length to be a poly_int.

gcc/testsuite/
PR sanitizer/97696
* gcc.target/aarch64/sve/pr97696.c: New test.

aarch64: Remove SME2.1 forms of LUTI2/4

I was over-eager when adding support for strided SME2 instructions
and accidentally included forms of LUTI2 and LUTI4 that are only
available with SME2.1, not SME2. This patch removes them for now.
We're planning to add proper support for SME2.1 in the GCC 15
timeframe.

Sorry for the blunder :(

gcc/
* config/aarch64/aarch64.md (stride_type): Remove luti_consecutive
and luti_strided.
* config/aarch64/aarch64-sme.md
(@aarch64_sme_lut<LUTI_BITS><mode>): Remove stride_type attribute.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided2): Delete.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
* config/aarch64/aarch64-early-ra.cc (is_stride_candidate)
(early_ra::maybe_convert_to_strided_access): Remove support for
strided LUTI2 and LUTI4.

gcc/testsuite/
* gcc.target/aarch64/sme/strided_1.c (test5): Remove.

arm: check for low register before applying peephole [PR113510]

For thumb1, when using a peephole to fuse

mov reg, #const
add reg, reg, SP

into

add reg, SP, #const

we must first check that reg is a low register, otherwise we will ICE
when trying to recognize the resulting insn.

gcc/ChangeLog:

PR target/113510
* config/arm/thumb1.md (peephole2 to fuse mov imm/add SP): Use
low_register_operand.

Fix testcase pr112337.c to check the options [PR112337]

gcc.target/arm/pr112337.c was failing to validate that adding MVE options
was compatible with the test environment, so add the missing checks.

gcc/testsuite/ChangeLog:

PR target/112337
* gcc.target/arm/pr112337.c: Check for, then use the right MVE
options.

AVR: Add two RTL peepholes.

Register alloc may expand a 3-operand arithmetic X = Y o CST as
   X = CST
   X o= Y
where it may be better to instead:
   X = Y
   X o= CST
because 1) the first insn may use MOVW for "X = Y", and 2) the
operation may be more efficient when performed with a constant,
for example when ADIW or SBIW can be used, or some bytes of
the constant are 0x00 or 0xff.

gcc/
* config/avr/avr.md: Add two RTL peepholes for PLUS, IOR and AND
in HI, PSI, SI that swap operation order from "X = CST, X o= Y"
to "X = Y, X o= CST".

Regenerate c.opt.urls

Fixes: 08edf85f747b ("c++/modules: relax diagnostic about GMF contents")
gcc/c-family/ChangeLog:

* c.opt.urls: Regenerate.

LoongArch: Allow s9 as a register alias

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/regname-fp-s9.c: New test.

AVR: Improve output of insn "*insv.any_shift.<mode>_split".

The instructions printed by insn "*insv.any_shift.<mode>_split" were
sub-optimal.  The code to print the improved output is lengthy and
performed by new function avr_out_insv.  As it turns out, the function
can also handle shift-offsets of zero, which is "*andhi3", "*andpsi3"
and "*andsi3".  Thus, these tree insns get a new 3-operand alternative
where the 3rd operand is an exact power of 2.

gcc/
* config/avr/avr-protos.h (avr_out_insv): New proto.
* config/avr/avr.cc (avr_out_insv): New function.
(avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle case.
(avr_cbranch_cost) [ZERO_EXTRACT]: Adjust rtx costs.
* config/avr/avr.md (define_attr "adjust_len") Add insv.
(andhi3, *andhi3, andpsi3, *andpsi3, andsi3, *andsi3):
Add constraint alternative where the 3rd operand is a power
of 2, and the source register may differ from the destination.
(*insv.any_shift.<mode>_split): Call avr_out_insv to output
instructions.  Set attr "length" to "insv".
* config/avr/constraints.md (Cb2, Cb3, Cb4): New constraints.

gcc/testsuite/
* gcc.target/avr/torture/insv-anyshift-hi.c: New test.
* gcc.target/avr/torture/insv-anyshift-si.c: New test.

tree-optimization/114231 - use patterns for BB SLP discovery root stmts

The following makes sure to use recognized patterns when vectorizing
roots during BB SLP discovery. We need to apply those late since
during root discovery we've not yet done pattern recognition.
All parts of the vectorizer assume patterns get used, for the testcase
we mix this up when doing live lane computation.

PR tree-optimization/114231
* tree-vect-slp.cc (vect_analyze_slp): Lookup patterns when
processing a BB SLP root.

* gcc.dg/vect/pr114231.c: New testcase.

lower-subreg: Fix ROTATE handling [PR114211]

On the following testcase, we have
(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
        (rotate:TI (reg/v:TI 106 [ h ])
            (const_int 64 [0x40]))) "pr114211.c":8:5 1042 {rotl64ti2_doubleword}
     (nil))
before subreg1 and the pass decides to use
(reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
register pair instead of (reg/v:TI 106 [ h ]).
resolve_operand_for_swap_move_operator implements it by pretending it is
an assignment from
(concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
to
(concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
The problem is that if the rotate argument is the same as destination or
if there is even an overlap between the first half of the destination with
second half of the source we emit incorrect code, because the store to
(reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
move.  The following patch detects that case and uses a temporary pseudo
to hold the original (reg:DI 128 [ h+8 ]) value across the first store.

2024-03-05  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/114211
* lower-subreg.cc (resolve_simple_move): For double-word
rotates by BITS_PER_WORD if there is overlap between source
and destination use a temporary.

* gcc.dg/pr114211.c: New test.

bitint: Handle BIT_FIELD_REF lowering [PR114157]

The following patch adds support for BIT_FIELD_REF lowering with
large/huge _BitInt lhs. BIT_FIELD_REF requires mode argument first
operand, so the operand shouldn't be any huge _BitInt.
If we only access limbs from inside of BIT_FIELD_REF using constant
indexes, we can just create a new BIT_FIELD_REF to extract the limb,
but if we need to use variable index in a loop, I'm afraid we need
to spill it into memory, which is what the following patch does.
If there is some bitwise type for the extraction, it extracts just
what we need and not more than that, otherwise it spills the whole
first argument of BIT_FIELD_REF and uses MEM_REF with an offset
with VIEW_CONVERT_EXPR around it.

2024-03-05 Jakub Jelinek <jakub@redhat.com>

PR middle-end/114157
* gimple-lower-bitint.cc: Include stor-layout.h.
(mergeable_op): Return true for BIT_FIELD_REF.
(struct bitint_large_huge): Declare handle_bit_field_ref method.
(bitint_large_huge::handle_bit_field_ref): New method.
(bitint_large_huge::handle_stmt): Use it for BIT_FIELD_REF.

* gcc.dg/bitint-98.c: New test.
* gcc.target/i386/avx2-pr114157.c: New test.
* gcc.target/i386/avx512f-pr114157.c: New test.

i386: For noreturn functions save at least the bp register if it is used [PR114116]

As mentioned in the PR, on x86_64 currently a lot of ICEs end up
with crashes in the unwinder like:
during RTL pass: expand
pr114044-2.c: In function ‘foo’:
pr114044-2.c:5:3: internal compiler error: in expand_fn_using_insn, at internal-fn.cc:208
    5 |   __builtin_clzg (a);
      |   ^~~~~~~~~~~~~~~~~~
0x7d9246 expand_fn_using_insn
        ../../gcc/internal-fn.cc:208

pr114044-2.c:5:3: internal compiler error: Segmentation fault
0x1554262 crash_signal
        ../../gcc/toplev.cc:319
0x2b20320 x86_64_fallback_frame_state
        ./md-unwind-support.h:63
0x2b20320 uw_frame_state_for
        ../../../libgcc/unwind-dw2.c:1013
0x2b2165d _Unwind_Backtrace
        ../../../libgcc/unwind.inc:303
0x2acbd69 backtrace_full
        ../../libbacktrace/backtrace.c:127
0x2a32fa6 diagnostic_context::action_after_output(diagnostic_t)
        ../../gcc/diagnostic.cc:781
0x2a331bb diagnostic_action_after_output(diagnostic_context*, diagnostic_t)
        ../../gcc/diagnostic.h:1002
0x2a331bb diagnostic_context::report_diagnostic(diagnostic_info*)
        ../../gcc/diagnostic.cc:1633
0x2a33543 diagnostic_impl
        ../../gcc/diagnostic.cc:1767
0x2a33c26 internal_error(char const*, ...)
        ../../gcc/diagnostic.cc:2225
0xe232c8 fancy_abort(char const*, int, char const*)
        ../../gcc/diagnostic.cc:2336
0x7d9246 expand_fn_using_insn
        ../../gcc/internal-fn.cc:208
Segmentation fault (core dumped)

The problem are the PR38534 r14-8470 changes which avoid saving call-saved
registers in noreturn functions.  If such functions ever touch the
bp register but because of the r14-8470 changes don't save it in the
prologue, the caller or any other function in the backtrace uses a frame
pointer and the noreturn function or anything it calls directly or
indirectly calls backtrace, then the unwinder crashes, because bp register
contains some unrelated value, but in the frames which do use frame pointer
CFA is based on the bp register.

In theory this could happen with any other call-saved register, e.g. code
written by hand in assembly with .cfi_* directives could use any other
call-saved register as register into which store the CFA or something
related to that, but in reality at least compiler generated code and usual
assembly probably just making sure bp doesn't contain garbage could be
enough for backtrace purposes.  In the debugger of course it will not be
enough, the values of the arguments etc. can be lost (if DW_CFA_undefined
is emitted) or garbage.

So, I think for noreturn function we should at least save the bp register
if we use it.  If user asks for it using no_callee_saved_registers
attribute, let's honor what is asked for (but then it is up to the user
to make sure e.g. backtrace isn't called from the function or anything it
calls).  As discussed in the PR, whether to save bp or not shouldn't be
based on whether compiling with -g or -g0, because we don't want code
generation changes without/with debugging, it would also break
-fcompare-debug, and users can call backtrace(3), that doesn't use debug
info, just unwind info, even backtrace_symbols{,_fd}(3) don't use debug info
but just looks at dynamic symbol table.

The patch also adds check for no_caller_saved_registers
attribute in the implicit addition of not saving callee saved register
in noreturn functions, because on I think
__attribute__((no_caller_saved_registers, noreturn)) will otherwise
error that no_caller_saved_registers and no_callee_saved_registers
attributes are incompatible (but user didn't specify anything like that).

2024-03-05  Jakub Jelinek  <jakub@redhat.com>

PR target/114116
* config/i386/i386.h (enum call_saved_registers_type): Add
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP enumerator.
* config/i386/i386-options.cc (ix86_set_func_type): Remove
has_no_callee_saved_registers variable, add no_callee_saved_registers
instead, initialize it depending on whether it is
no_callee_saved_registers function or not.  Don't set it if
no_caller_saved_registers attribute is present.  Adjust users.
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Handle
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP like
TYPE_NO_CALLEE_SAVED_REGISTERS.
(ix86_save_reg): Handle TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP.

* gcc.target/i386/pr38534-1.c: Allow push/pop of bp.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr114097-1.c: Likewise.
* gcc.target/i386/stack-check-17.c: Expect no pop on ! ia32.

RISC-V: Cleanup unused code in riscv_v_adjust_bytesize [NFC]

Cleanup mode_size related code which is not used anymore. Below tests are
passed for this patch.

* The RVV fully regresssion test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_v_adjust_bytesize): Cleanup unused
mode_size related code.

Signed-off-by: Pan Li <pan2.li@intel.com>

c++/modules: relax diagnostic about GMF contents

Issuing a hard error when the GMF doesn't consist only of preprocessing
directives happens to be inconvenient for automated testcase reduction
via cvise. This patch relaxes this diagnostic into a pedwarn that can
be disabled with -Wno-global-module.

gcc/c-family/ChangeLog:

* c.opt (Wglobal-module): New warning.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_translation_unit): Relax GMF contents
error into a pedwarn.

gcc/ChangeLog:

* doc/invoke.texi (-Wno-global-module): Document.

gcc/testsuite/ChangeLog:

* g++.dg/modules/friend-6_a.C: Pass -Wno-global-module instead
of -Wno-pedantic. Remove now unnecessary preprocessing
directives from GMF.

Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

c++: Support exporting using-decls in same namespace as target

Currently a using-declaration bringing a name into its own namespace is
a no-op, except for functions. This prevents people from being able to
redeclare a name brought in from the GMF as exported, however, which
this patch fixes.

Apart from marking declarations as exported they are also now marked as
effectively being in the module purview (due to the using-decl) so that
they are properly processed, as 'add_binding_entity' assumes that
declarations not in the module purview cannot possibly be exported.

gcc/cp/ChangeLog:

* name-lookup.cc (walk_module_binding): Remove completed FIXME.
(do_nonmember_using_decl): Mark redeclared entities as exported
when needed. Check for re-exporting internal linkage types.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-12.C: New test.
* g++.dg/modules/using-13.h: New test.
* g++.dg/modules/using-13_a.C: New test.
* g++.dg/modules/using-13_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

PR modula2/114227 InstallTerminationProcedure does not work with -fiso

This patch moves the initial/termination user procedure functionality in
pim and iso versions of M2RTS into M2Dependent. This ensures that
finalization/initialization procedures will always be invoked for both -fiso
and -fpim. Prior to this patch M2Dependent called M2RTS for
termination procedure cleanup and always invoked the pim M2RTS.

gcc/m2/ChangeLog:

PR modula2/114227
* gm2-libs-iso/M2RTS.mod (ProcedureChain): Remove.
(ProcedureList): Remove.
(ExecuteReverse): Remove.
(ExecuteTerminationProcedures): Rewrite.
(ExecuteInitialProcedures): Rewrite.
(AppendProc): Remove.
(InstallTerminationProcedure): Rewrite.
(InstallInitialProcedure): Rewrite.
(InitProcList): Remove.
* gm2-libs/M2Dependent.def (InstallTerminationProcedure):
New procedure.
(ExecuteTerminationProcedures): New procedure.
(InstallInitialProcedure): New procedure.
(ExecuteInitialProcedures): New procedure.
* gm2-libs/M2Dependent.mod (ProcedureChain): New type.
(ProcedureList): New type.
(ExecuteReverse): New procedure.
(ExecuteTerminationProcedures): New procedure.
(ExecuteInitialProcedures): New procedure.
(AppendProc): New procedure.
(InstallTerminationProcedure): New procedure.
(InstallInitialProcedure): New procedure.
(InitProcList): New procedure.
* gm2-libs/M2RTS.mod (ProcedureChain): Remove.
(ProcedureList): Remove.
(ExecuteReverse): Remove.
(ExecuteTerminationProcedures): Rewrite.
(ExecuteInitialProcedures): Rewrite.
(AppendProc): Remove.
(InstallTerminationProcedure): Rewrite.
(InstallInitialProcedure): Rewrite.
(InitProcList): Remove.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

libstdc++: Add missing std::tuple constructor [PR114147]

I caused a regression with commit r10-908 by adding a constraint to the
non-explicit allocator-extended default constructor, but seemingly
forgot to add an explicit overload with the corresponding constraint.

libstdc++-v3/ChangeLog:

PR libstdc++/114147
* include/std/tuple (tuple::tuple(allocator_arg_t, const Alloc&)):
Add missing overload of allocator-extended default constructor.
(tuple<T1,T2>::tuple(allocator_arg_t, const Alloc&)): Likewise.
* testsuite/20_util/tuple/cons/114147.cc: New test.

bpf: add inline memset expansion

Similar to memmove and memcpy, the BPF backend cannot fall back on a
library call to implement __builtin_memset, and should always expand
calls to it inline if possible.

This patch implements simple inline expansion of memset in the BPF
backend in a verifier-friendly way. Similar to memcpy and memmove, the
size must be an integer constant, as is also required by clang.

gcc/
* config/bpf/bpf-protos.h (bpf_expand_setmem): New prototype.
* config/bpf/bpf.cc (bpf_expand_setmem): New.
* config/bpf/bpf.md (setmemdi): New define_expand.

gcc/testsuite/
* gcc.target/bpf/memset-1.c: New test.

Update gcc sv.po

* sv.po: Update.

combine: Fix recent WORD_REGISTER_OPERATIONS check [PR113010]

On Mon, Mar 04, 2024 at 05:18:39PM +0100, Rainer Orth wrote:
> unfortunately, the patch broke Solaris/SPARC bootstrap
> (sparc-sun-solaris2.11):
>
> .../gcc/combine.cc: In function 'rtx_code simplify_comparison(rtx_code, rtx_def**, rtx_def**)':
> .../gcc/combine.cc:12101:25: error: '*(unsigned int*)((char*)&inner_mode + offsetof(scalar_int_mode, scalar_int_mode::m_mode))' may be used uninitialized [-Werror=maybe-uninitialized]
> 12101 |   scalar_int_mode mode, inner_mode, tmode;
>       |                         ^~~~~~~~~~

I don't see how it could ever work properly, inner_mode in that spot is
just uninitialized.

I think we shouldn't worry about paradoxical subregs of non-scalar_int_mode
REGs/MEMs and for the scalar_int_mode ones should initialize inner_mode
before we use it.
Another option would be to use
maybe_lt (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))), BITS_PER_WORD)
and
load_extend_op (GET_MODE (SUBREG_REG (op0))) == ZERO_EXTEND,
or set machine_mode smode = GET_MODE (SUBREG_REG (op0)); and use it in
those two spots.

2024-03-04  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/113010
* combine.cc (simplify_comparison): Guard the
WORD_REGISTER_OPERATIONS check on scalar_int_mode of SUBREG_REG
and initialize inner_mode.

arm: Fix a wrong attribute use and remove unused unspecs and iterators

This patch fixes the erroneous use of a mode attribute without a mode iterator
in the pattern and removes unused unspecs and iterators.

gcc/ChangeLog:

* config/arm/iterators.md (supf): Remove VMLALDAVXQ_U, VMLALDAVXQ_P_U,
VMLALDAVAXQ_U cases.
(VMLALDAVXQ): Remove iterator.
(VMLALDAVXQ_P): Likewise.
(VMLALDAVAXQ): Likewise.
* config/arm/mve.md (mve_vstrwq_p_fv4sf): Replace use of <MVE_VPRED>
mode iterator attribute with V4BI mode.
* config/arm/unspecs.md (VMLALDAVXQ_U, VMLALDAVXQ_P_U,
VMLALDAVAXQ_U): Remove unused unspecs.

arm: Annotate instructions with mve_safe_imp_xlane_pred

This patch annotates some MVE across lane instructions with a new attribute.
We use this attribute to let the compiler know that these instructions can be
safely implicitly predicated when tail predicating if their operands are
guaranteed to have zeroed tail predicated lanes. These instructions were
selected because having the value 0 in those lanes or 'tail-predicating' those
lanes have the same effect.

gcc/ChangeLog:

* config/arm/arm.md (mve_safe_imp_xlane_pred): New attribute.
* config/arm/iterators.md (mve_vmaxmin_safe_imp): New iterator
attribute.
* config/arm/mve.md (vaddvq_s, vaddvq_u, vaddlvq_s, vaddlvq_u,
vaddvaq_s, vaddvaq_u, vmaxavq_s, vmaxvq_u, vmladavq_s, vmladavq_u,
vmladavxq_s, vmlsdavq_s, vmlsdavxq_s, vaddlvaq_s, vaddlvaq_u,
vmlaldavq_u, vmlaldavq_s, vmlaldavq_u, vmlaldavxq_s, vmlsldavq_s,
vmlsldavxq_s, vrmlaldavhq_u, vrmlaldavhq_s, vrmlaldavhxq_s,
vrmlsldavhq_s, vrmlsldavhxq_s, vrmlaldavhaq_s, vrmlaldavhaq_u,
vrmlaldavhaxq_s, vrmlsldavhaq_s, vrmlsldavhaxq_s, vabavq_s, vabavq_u,
vmladavaq_u, vmladavaq_s, vmladavaxq_s, vmlsdavaq_s, vmlsdavaxq_s,
vmlaldavaq_s, vmlaldavaq_u, vmlaldavaxq_s, vmlsldavaq_s,
vmlsldavaxq_s): Added mve_safe_imp_xlane_pred.

arm: Add define_attr to to create a mapping between MVE predicated and unpredicated insns

This patch adds an attribute to the mve md patterns to be able to identify
predicable MVE instructions and what their predicated and unpredicated variants
are. This attribute is used to encode the icode of the unpredicated variant of
an instruction in its predicated variant.

This will make it possible for us to transform VPT-predicated insns in
the insn chain into their unpredicated equivalents when transforming the loop
into a MVE Tail-Predicated Low Overhead Loop. For example:
`mve_vldrbq_z_<supf><mode> -> mve_vldrbq_<supf><mode>`.

gcc/ChangeLog:

* config/arm/arm.md (mve_unpredicated_insn): New attribute.
* config/arm/arm.h (MVE_VPT_PREDICATED_INSN_P): New define.
(MVE_VPT_UNPREDICATED_INSN_P): Likewise.
(MVE_VPT_PREDICABLE_INSN_P): Likewise.
* config/arm/vec-common.md (mve_vshlq_<supf><mode>): Add attribute.
* config/arm/mve.md (arm_vcx1q<a>_p_v16qi): Add attribute.
(arm_vcx1q<a>v16qi): Likewise.
(arm_vcx1qav16qi): Likewise.
(arm_vcx1qv16qi): Likewise.
(arm_vcx2q<a>_p_v16qi): Likewise.
(arm_vcx2q<a>v16qi): Likewise.
(arm_vcx2qav16qi): Likewise.
(arm_vcx2qv16qi): Likewise.
(arm_vcx3q<a>_p_v16qi): Likewise.
(arm_vcx3q<a>v16qi): Likewise.
(arm_vcx3qav16qi): Likewise.
(arm_vcx3qv16qi): Likewise.
(@mve_<mve_insn>q_<supf><mode>): Likewise.
(@mve_<mve_insn>q_int_<supf><mode>): Likewise.
(@mve_<mve_insn>q_<supf>v4si): Likewise.
(@mve_<mve_insn>q_n_<supf><mode>): Likewise.
(@mve_<mve_insn>q_r_<supf><mode>): Likewise.
(@mve_<mve_insn>q_f<mode>): Likewise.
(@mve_<mve_insn>q_m_<supf><mode>): Likewise.
(@mve_<mve_insn>q_m_n_<supf><mode>): Likewise.
(@mve_<mve_insn>q_m_r_<supf><mode>): Likewise.
(@mve_<mve_insn>q_m_f<mode>): Likewise.
(@mve_<mve_insn>q_int_m_<supf><mode>): Likewise.
(@mve_<mve_insn>q_p_<supf>v4si): Likewise.
(@mve_<mve_insn>q_p_<supf><mode>): Likewise.
(@mve_<mve_insn>q<mve_rot>_<supf><mode>): Likewise.
(@mve_<mve_insn>q<mve_rot>_f<mode>): Likewise.
(@mve_<mve_insn>q<mve_rot>_m_<supf><mode>): Likewise.
(@mve_<mve_insn>q<mve_rot>_m_f<mode>): Likewise.
(mve_v<absneg_str>q_f<mode>): Likewise.
(mve_<mve_addsubmul>q<mode>): Likewise.
(mve_<mve_addsubmul>q_f<mode>): Likewise.
(mve_vadciq_<supf>v4si): Likewise.
(mve_vadciq_m_<supf>v4si): Likewise.
(mve_vadcq_<supf>v4si): Likewise.
(mve_vadcq_m_<supf>v4si): Likewise.
(mve_vandq_<supf><mode>): Likewise.
(mve_vandq_f<mode>): Likewise.
(mve_vandq_m_<supf><mode>): Likewise.
(mve_vandq_m_f<mode>): Likewise.
(mve_vandq_s<mode>): Likewise.
(mve_vandq_u<mode>): Likewise.
(mve_vbicq_<supf><mode>): Likewise.
(mve_vbicq_f<mode>): Likewise.
(mve_vbicq_m_<supf><mode>): Likewise.
(mve_vbicq_m_f<mode>): Likewise.
(mve_vbicq_m_n_<supf><mode>): Likewise.
(mve_vbicq_n_<supf><mode>): Likewise.
(mve_vbicq_s<mode>): Likewise.
(mve_vbicq_u<mode>): Likewise.
(@mve_vclzq_s<mode>): Likewise.
(mve_vclzq_u<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op>q_<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op>q_n_<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op>q_f<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op>q_n_f<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op1>q_m_f<mode>): Likewise.
(@mve_vcmp_<mve_cmp_op1>q_m_n_<supf><mode>): Likewise.
(@mve_vcmp_<mve_cmp_op1>q_m_<supf><mode>): Likewise.
(@mve_vcmp_<mve_cmp_op1>q_m_n_f<mode>): Likewise.
(mve_vctp<MVE_vctp>q<MVE_vpred>): Likewise.
(mve_vctp<MVE_vctp>q_m<MVE_vpred>): Likewise.
(mve_vcvtaq_<supf><mode>): Likewise.
(mve_vcvtaq_m_<supf><mode>): Likewise.
(mve_vcvtbq_f16_f32v8hf): Likewise.
(mve_vcvtbq_f32_f16v4sf): Likewise.
(mve_vcvtbq_m_f16_f32v8hf): Likewise.
(mve_vcvtbq_m_f32_f16v4sf): Likewise.
(mve_vcvtmq_<supf><mode>): Likewise.
(mve_vcvtmq_m_<supf><mode>): Likewise.
(mve_vcvtnq_<supf><mode>): Likewise.
(mve_vcvtnq_m_<supf><mode>): Likewise.
(mve_vcvtpq_<supf><mode>): Likewise.
(mve_vcvtpq_m_<supf><mode>): Likewise.
(mve_vcvtq_from_f_<supf><mode>): Likewise.
(mve_vcvtq_m_from_f_<supf><mode>): Likewise.
(mve_vcvtq_m_n_from_f_<supf><mode>): Likewise.
(mve_vcvtq_m_n_to_f_<supf><mode>): Likewise.
(mve_vcvtq_m_to_f_<supf><mode>): Likewise.
(mve_vcvtq_n_from_f_<supf><mode>): Likewise.
(mve_vcvtq_n_to_f_<supf><mode>): Likewise.
(mve_vcvtq_to_f_<supf><mode>): Likewise.
(mve_vcvttq_f16_f32v8hf): Likewise.
(mve_vcvttq_f32_f16v4sf): Likewise.
(mve_vcvttq_m_f16_f32v8hf): Likewise.
(mve_vcvttq_m_f32_f16v4sf): Likewise.
(mve_vdwdupq_m_wb_u<mode>_insn): Likewise.
(mve_vdwdupq_wb_u<mode>_insn): Likewise.
(mve_veorq_s><mode>): Likewise.
(mve_veorq_u><mode>): Likewise.
(mve_veorq_f<mode>): Likewise.
(mve_vidupq_m_wb_u<mode>_insn): Likewise.
(mve_vidupq_u<mode>_insn): Likewise.
(mve_viwdupq_m_wb_u<mode>_insn): Likewise.
(mve_viwdupq_wb_u<mode>_insn): Likewise.
(mve_vldrbq_<supf><mode>): Likewise.
(mve_vldrbq_gather_offset_<supf><mode>): Likewise.
(mve_vldrbq_gather_offset_z_<supf><mode>): Likewise.
(mve_vldrbq_z_<supf><mode>): Likewise.
(mve_vldrdq_gather_base_<supf>v2di): Likewise.
(mve_vldrdq_gather_base_wb_<supf>v2di_insn): Likewise.
(mve_vldrdq_gather_base_wb_z_<supf>v2di_insn): Likewise.
(mve_vldrdq_gather_base_z_<supf>v2di): Likewise.
(mve_vldrdq_gather_offset_<supf>v2di): Likewise.
(mve_vldrdq_gather_offset_z_<supf>v2di): Likewise.
(mve_vldrdq_gather_shifted_offset_<supf>v2di): Likewise.
(mve_vldrdq_gather_shifted_offset_z_<supf>v2di): Likewise.
(mve_vldrhq_<supf><mode>): Likewise.
(mve_vldrhq_fv8hf): Likewise.
(mve_vldrhq_gather_offset_<supf><mode>): Likewise.
(mve_vldrhq_gather_offset_fv8hf): Likewise.
(mve_vldrhq_gather_offset_z_<supf><mode>): Likewise.
(mve_vldrhq_gather_offset_z_fv8hf): Likewise.
(mve_vldrhq_gather_shifted_offset_<supf><mode>): Likewise.
(mve_vldrhq_gather_shifted_offset_fv8hf): Likewise.
(mve_vldrhq_gather_shifted_offset_z_<supf><mode>): Likewise.
(mve_vldrhq_gather_shifted_offset_z_fv8hf): Likewise.
(mve_vldrhq_z_<supf><mode>): Likewise.
(mve_vldrhq_z_fv8hf): Likewise.
(mve_vldrwq_<supf>v4si): Likewise.
(mve_vldrwq_fv4sf): Likewise.
(mve_vldrwq_gather_base_<supf>v4si): Likewise.
(mve_vldrwq_gather_base_fv4sf): Likewise.
(mve_vldrwq_gather_base_wb_<supf>v4si_insn): Likewise.
(mve_vldrwq_gather_base_wb_fv4sf_insn): Likewise.
(mve_vldrwq_gather_base_wb_z_<supf>v4si_insn): Likewise.
(mve_vldrwq_gather_base_wb_z_fv4sf_insn): Likewise.
(mve_vldrwq_gather_base_z_<supf>v4si): Likewise.
(mve_vldrwq_gather_base_z_fv4sf): Likewise.
(mve_vldrwq_gather_offset_<supf>v4si): Likewise.
(mve_vldrwq_gather_offset_fv4sf): Likewise.
(mve_vldrwq_gather_offset_z_<supf>v4si): Likewise.
(mve_vldrwq_gather_offset_z_fv4sf): Likewise.
(mve_vldrwq_gather_shifted_offset_<supf>v4si): Likewise.
(mve_vldrwq_gather_shifted_offset_fv4sf): Likewise.
(mve_vldrwq_gather_shifted_offset_z_<supf>v4si): Likewise.
(mve_vldrwq_gather_shifted_offset_z_fv4sf): Likewise.
(mve_vldrwq_z_<supf>v4si): Likewise.
(mve_vldrwq_z_fv4sf): Likewise.
(mve_vmvnq_s<mode>): Likewise.
(mve_vmvnq_u<mode>): Likewise.
(mve_vornq_<supf><mode>): Likewise.
(mve_vornq_f<mode>): Likewise.
(mve_vornq_m_<supf><mode>): Likewise.
(mve_vornq_m_f<mode>): Likewise.
(mve_vornq_s<mode>): Likewise.
(mve_vornq_u<mode>): Likewise.
(mve_vorrq_<supf><mode>): Likewise.
(mve_vorrq_f<mode>): Likewise.
(mve_vorrq_m_<supf><mode>): Likewise.
(mve_vorrq_m_f<mode>): Likewise.
(mve_vorrq_m_n_<supf><mode>): Likewise.
(mve_vorrq_n_<supf><mode>): Likewise.
(mve_vorrq_s<mode>): Likewise.
(mve_vorrq_s<mode>): Likewise.
(mve_vsbciq_<supf>v4si): Likewise.
(mve_vsbciq_m_<supf>v4si): Likewise.
(mve_vsbcq_<supf>v4si): Likewise.
(mve_vsbcq_m_<supf>v4si): Likewise.
(mve_vshlcq_<supf><mode>): Likewise.
(mve_vshlcq_m_<supf><mode>): Likewise.
(mve_vshrq_m_n_<supf><mode>): Likewise.
(mve_vshrq_n_<supf><mode>): Likewise.
(mve_vstrbq_<supf><mode>): Likewise.
(mve_vstrbq_p_<supf><mode>): Likewise.
(mve_vstrbq_scatter_offset_<supf><mode>_insn): Likewise.
(mve_vstrbq_scatter_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrdq_scatter_base_<supf>v2di): Likewise.
(mve_vstrdq_scatter_base_p_<supf>v2di): Likewise.
(mve_vstrdq_scatter_base_wb_<supf>v2di): Likewise.
(mve_vstrdq_scatter_base_wb_p_<supf>v2di): Likewise.
(mve_vstrdq_scatter_offset_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_offset_p_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn): Likewise.
(mve_vstrhq_<supf><mode>): Likewise.
(mve_vstrhq_fv8hf): Likewise.
(mve_vstrhq_p_<supf><mode>): Likewise.
(mve_vstrhq_p_fv8hf): Likewise.
(mve_vstrhq_scatter_offset_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_offset_p_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Likewise.
(mve_vstrwq_<supf>v4si): Likewise.
(mve_vstrwq_fv4sf): Likewise.
(mve_vstrwq_p_<supf>v4si): Likewise.
(mve_vstrwq_p_fv4sf): Likewise.
(mve_vstrwq_scatter_base_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_fv4sf): Likewise.
(mve_vstrwq_scatter_base_p_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_p_fv4sf): Likewise.
(mve_vstrwq_scatter_base_wb_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_wb_fv4sf): Likewise.
(mve_vstrwq_scatter_base_wb_p_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Likewise.
(mve_vstrwq_scatter_offset_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_offset_p_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_offset_p_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Likewise.

doc: update [[gnu::no_dangling]]

...to offer a more realistic example.

gcc/ChangeLog:

* doc/extend.texi: Update [[gnu::no_dangling]].

vect: Fix integer overflow calculating mask

The masks and bitvectors were broken when nunits==32 on hosts where int is
32-bit.

gcc/ChangeLog:

* dojump.cc (do_compare_and_jump): Use full-width integers for shifts.
* expr.cc (store_constructor): Likewise.
(do_store_flag): Likewise.

Regenerate opt.urls

There were several commits that didn't regenerate the opt.urls files.

Fixes: 438ef143679e ("rs6000: Neuter option -mpower{8,9}-vector")
Fixes: 50c549ef3db6 ("gccrs: enable -Winfinite-recursion warnings by default")
Fixes: 25bb8a40abd9 ("Move docs for -Wuse-after-free and -Wuseless-cast")
Fixes: 48448055fb70 ("AVR: Support .rodata in Flash for AVR64* and AVR128*")
Fixes: 42503cc257fb ("AVR: Document option -mskip-bug")
Fixes: 7de5bb642c12 ("i386: [APX] Document inline asm behavior and new switch")
Fixes: 49a14ee488b8 ("Add -mevex512 into invoke.texi")
Fixes: 4666cbde5e6d ("Sort warning options in c-family/c.opt.")
Fixes: cda383616183 ("AVR: target/114100 - Better indirect accesses for reduced Tiny")
gcc/c-family/ChangeLog:

* c.opt.urls: Regenerate.

gcc/ChangeLog:

* common.opt.urls: Regenerate.
* config/avr/avr.opt.urls: Likewise.
* config/i386/i386.opt.urls: Likewise.
* config/pru/pru.opt.urls: Likewise.
* config/riscv/riscv.opt.urls: Likewise.
* config/rs6000/rs6000.opt.urls: Likewise.

gcc/rust/ChangeLog:

* lang.opt.urls: Regenerate.

Fix 201001011-1.c on H8

Excerpt from gcc.sum:
[...]
PASS: gcc.c-torture/execute/20101011-1.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/execute/20101011-1.c   -O0  execution test
PASS: gcc.c-torture/execute/20101011-1.c   -O1  (test for excess errors)
FAIL: gcc.c-torture/execute/20101011-1.c   -O1  execution test
[ ... ]

This is because H8 MCUs do not throw a "divide by zero" exception.

gcc/testsuite
* gcc.c-torture/execute/20101011-1.c: Do not test on H8 series.

tree-optimization/114197 - unexpected if-conversion for vectorization

The following avoids lowering a volatile bitfiled access and in case
the if-converted and original loops end up in different outer loops
because of simplifcations enabled scrap the result since that is not
how the vectorizer expects the loops to be laid out.

PR tree-optimization/114197
* tree-if-conv.cc (bitfields_to_lower_p): Do not lower if
there are volatile bitfield accesses.
(pass_if_conversion::execute): Throw away result if the
if-converted and original loops are not nested as expected.

* gcc.dg/torture/pr114197.c: New testcase.

tree-optimization/114164 - unsupported SIMD clone call, unsupported VEC_COND

The following avoids creating unsupported VEC_COND_EXPRs as part of
SIMD clone call mask argument setup during vectorization which results
in inefficient decomposing of the operation during vector lowering.

PR tree-optimization/114164
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Fail if
the code generated for mask argument setup is not supported.

libgomp: Use void (*) (void *) rather than void (*)() for host_fn type [PR114216]

For the type of the target callbacks we use elsehwere void (*) (void *) and
IMHO should use that for the reverse offload fallback as well (where the actual
callback is emitted using the same code as for host fallback or device kernel
entry routines), even when it is also ok to use void (*) () before C23 and
we aren't building libgomp with C23 yet. On some arches perhaps void (*) ()
could result in worse code generation because calls in that case like casts
to unprototyped functions need to sometimes pass argument in two different spots
etc. so that it deals with both passing it through ... and as a named argument.

2024-03-04 Jakub Jelinek <jakub@redhat.com>

PR libgomp/114216
* target.c (gomp_target_rev): Change host_fn type and corresponding
cast from void (*)() to void (*) (void *).

tree-optimization/114203 - wrong CLZ niter computation

For precision less than int we apply the adjustment to make it defined
at zero after the adjustment to make it compute CLZ rather than CTZ.
That's wrong.

PR tree-optimization/114203
* tree-ssa-loop-niter.cc (build_cltz_expr): Apply CTZ->CLZ
adjustment before making the result defined at zero.

* gcc.dg/torture/pr114203.c: New testcase.

tree-optimization/114192 - scalar reduction kept live with early break vect

The following fixes a missing replacement of the reduction value
used in the epilog, causing the scalar reduction to be kept live
across the early break exit path.

PR tree-optimization/114192
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use the
appropriate def for the live out stmt in case of an alternate
exit.

bitint: Fix tree node sharing bug [PR114209]

We ICE on the following testcase due to invalid tree sharing.
The second hunk fixes that, the first one is from me looking around at
other spots which might need end up with invalid tree sharing too.

2024-03-04 Jakub Jelinek <jakub@redhat.com>

PR middle-end/114209
* gimple-lower-bitint.cc (bitint_large_huge::limb_access): Call
unshare_expr when creating a MEM_REF from MEM_REF.
(bitint_large_huge::lower_stmt): Call unshare_expr.

* gcc.dg/bitint-97.c: New test.

testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

The vect_int_mod target selector is evaluated with the options in
DEFAULT_VECTCFLAGS in effect, but these options are not automatically
passed to tests out of the vect directories. So this test fails on
targets where integer vector modulo operation is supported but requiring
an option to enable, for example LoongArch.

In this test case, the only expected optimization not happened in
original is in corge because it needs forward propogation. So we can
scan the forwprop2 dump (where the vector operation is not expanded to
scalars yet) instead of optimized, then we don't need to consider
vect_int_mod or not.

gcc/testsuite/ChangeLog:

PR testsuite/113418
* gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
instead of -fdump-tree-optimized.
(dg-final): Scan forwprop2 dump instead of optimized, and remove
the use of vect_int_mod.
* lib/target-supports.exp (check_effective_target_vect_int_mod):
Remove because it's not used anymore.

i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]

The Intel extended format has the various weird number categories,
pseudo denormals, pseudo infinities, pseudo NaNs and unnormals.
Those are not representable in the GCC real_value and so neither
GIMPLE nor RTX VIEW_CONVERT_EXPR/SUBREG folding folds those into
constants.

As can be seen on the following testcase, because it isn't folded
(since GCC 12, before that we were folding it) we can end up with
a SUBREG of a CONST_VECTOR or similar constant, which isn't valid
general_operand, so we ICE during vregs pass trying to recognize
the move instruction.
Initially I thought it is a middle-end bug, the movxf instruction
has general_operand predicate, but the middle-end certainly never
tests that predicate, seems moves are special optabs.
And looking at other mov optabs, e.g. for vector modes the i386
patterns use nonimmediate_operand predicate on the input, yet
ix86_expand_vector_move deals with CONSTANT_P and SUBREG of CONSTANT_P
arguments which if the predicate was checked couldn't ever make it through.

The following patch handles this case similarly to the
ix86_expand_vector_move's SUBREG of CONSTANT_P case, does it just for XFmode
because I believe that is the only mode that needs it from the scalar ones,
others should just be folded.

2024-03-04 Jakub Jelinek <jakub@redhat.com>

PR target/114184
* config/i386/i386-expand.cc (ix86_expand_move): If XFmode op1
is SUBREG of CONSTANT_P, force the SUBREG_REG into memory or
register.

* gcc.target/i386/pr114184.c: New test.

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself

Signed-off-by: demin.han <demin.han@starfivetech.com>

PR target/114187: Fix ?Fmode SUBREG simplification in simplify_subreg.

This patch fixes PR target/114187 a typo/missed-optimization in simplify-rtx
that's exposed by (my) changes to x86_64's parameter passing.  The context
is that construction of double word (TImode) values now uses the idiom:

(ior:TI (ashift:TI (zero_extend:TI (reg:DI x)) (const_int 64 [0x40]))
        (zero_extend:TI (reg:DI y)))

Extracting the DImode highpart and lowpart halves of this complex expression
is supported by simplications in simplify_subreg.  The problem is when the
doubleword TImode value represents two DFmode fields, there isn't a direct
simplification to extract the highpart or lowpart SUBREGs, but instead GCC
uses two steps, extract the DImode {high,low} part and then cast the result
back to a floating point mode, DFmode.

The (buggy) code to do this is:

  /* If the outer mode is not integral, try taking a subreg with the equivalent
     integer outer mode and then bitcasting the result.
     Other simplifications rely on integer to integer subregs and we'd
     potentially miss out on optimizations otherwise.  */
  if (known_gt (GET_MODE_SIZE (innermode),
                GET_MODE_SIZE (outermode))
      && SCALAR_INT_MODE_P (innermode)
      && !SCALAR_INT_MODE_P (outermode)
      && int_mode_for_size (GET_MODE_BITSIZE (outermode),
                            0).exists (&int_outermode))
    {
      rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
      if (tem)
        return simplify_gen_subreg (outermode, tem, int_outermode, byte);
    }

The issue/mistake is that the second call, to simplify_subreg, shouldn't
use "byte" as the final argument; the offset has already been handled by
the first call, to simplify_subreg, and this second call is just a type
conversion from an integer mode to floating point (from DImode to DFmode).

Interestingly, this mistake was already spotted by Richard Sandiford when
the optimization was originally contributed in January 2023.
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610920.html
>> Richard Sandiford writes:
>> Also, the final line should pass 0 rather than byte.

Unfortunately a miscommunication/misunderstanding in a later thread
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612898.html
resulted in this correction being undone.  Using lowpart_subreg should
avoid/reduce confusion in future.

2024-03-03  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/114187
* simplify-rtx.cc (simplify_context::simplify_subreg): Call
lowpart_subreg to perform type conversion, to avoid confusion
over the offset to use in the call to simplify_reg_subreg.

gcc/testsuite/ChangeLog
PR target/114187
* g++.target/i386/pr114187.C: New test case.

Daily bump.

d: Merge upstream dmd, druntime f8bae04558, phobos ba2ade9dec

D front-end changes:

    - Import dmd v2.108.1-beta-1.

D runtime changes:

    - Import druntime v2.108.1-beta-1.

Phobos changes:

    - Import phobos v2.108.1-beta-1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd f8bae04558.
* dmd/VERSION: Bump version to v2.108.0-beta.1.
* d-builtins.cc (build_frontend_type): Update for new front-end
interface.
* d-codegen.cc (build_assert_call): Likewise.
* d-convert.cc (d_array_convert): Likewise.
* decl.cc (get_vtable_decl): Likewise.
* expr.cc (ExprVisitor::visit (EqualExp *)): Likewise.
(ExprVisitor::visit (VarExp *)): Likewise.
(ExprVisitor::visit (ArrayLiteralExp *)): Likewise.
(ExprVisitor::visit (AssocArrayLiteralExp)): Likewise.
* intrinsics.cc (build_shuffle_mask_type): Likewise.
(maybe_warn_intrinsic_mismatch): Likewise.
* runtime.cc (get_libcall_type): Likewise.
* typeinfo.cc (TypeInfoVisitor::layout_string): Likewise.
(TypeInfoVisitor::visit(TypeInfoTupleDeclaration *)): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 02d6d07a69.
* src/MERGE: Merge upstream phobos a2ade9dec.

[PATCH] combine: Don't simplify paradoxical SUBREG on WORD_REGISTER_OPERATIONS [PR113010]

The sign-bit-copies of a sign-extending load cannot be known until runtime on
WORD_REGISTER_OPERATIONS targets, except in the case of a zero-extending MEM
load. See the fix for PR112758.

gcc/
PR rtl-optimization/113010
* combine.cc (simplify_comparison): Simplify a SUBREG on
WORD_REGISTER_OPERATIONS targets only if it is a zero-extending
MEM load.

gcc/testsuite
* gcc.c-torture/execute/pr113010.c: New test.

AVR: Use more C++ ish coding style.

gcc/
* config/avr/avr.cc: Resolve ATTRIBUTE_UNUSED.
Use bool in place of int for boolean logic (if possible).
Move declarations to definitions (if possible).
* config/avr/avr.md: Use C++ comments. Fix some indentation glitches.
* config/avr/avr-dimode.md: Same.
* config/avr/constraints.md: Same.
* config/avr/predicates.md: Same.