gcc.gnu.org Git - gcc.git/log

coroutines : Handle rethrow from unhandled_exception [PR98704].

Although there is still some discussion in CWG 2451 on this, the
implementors are agreed on the intent.

When promise.unhandled_exception () is entered, the coroutine is
considered to be still running - returning from the method will
cause the final await expression to be evaluated.

If the method throws, that action is considered to make the
coroutine suspend (since, otherwise, it would be impossible to
reclaim its resources, since one cannot destroy a running coro).

The wording issue is to do with how to represent the place at
which the coroutine should be considered suspended.

For the implementation here, that place is immediately before the
promise life-time ends. A handler for the rethrown exception, can
thus call xxxx.destroy() which will run DTORs for the promise and
any parameter copies [as needed] then the coroutine frame will be
deallocated.

At present, we also set "done=true" in this case (for compatibility
with other current implementations). One might consider 'done()'
to be misleading in the case of an abnormal termination - that is
also part of the CWG 2451 discussion.

gcc/cp/ChangeLog:

PR c++/98704
* coroutines.cc (build_actor_fn): Make destroy index 1
correspond to the abnormal unhandled_exception() exit.
Substitute the proxy for the resume index.
(coro_rewrite_function_body): Arrange to reset the resume
index and make done = true for a rethrown exception from
unhandled_exception ().
(morph_fn_to_coro): Adjust calls to build_actor_fn and
coro_rewrite_function_body.

gcc/testsuite/ChangeLog:

PR c++/98704
* g++.dg/coroutines/torture/pr98704.C: New test.

(cherry picked from commit 020b286c769f4dc8a6b45491351f6bc2e69d7a7f)

coroutines : Handle for await expressions in for stmts [PR98480].

The handling of await expressions in the init, condition and iteration
expressions of for loops had been omitted. Fixed thus.

gcc/cp/ChangeLog:

PR c++/98480
* coroutines.cc (replace_continue): Rewrite continue into
'goto label'.
(await_statement_walker): Handle await expressions in the
initializer, condition and iteration expressions of for
loops.

gcc/testsuite/ChangeLog:

PR c++/98480
* g++.dg/coroutines/pr98480.C: New test.
* g++.dg/coroutines/torture/co-await-24-for-init.C: New test.
* g++.dg/coroutines/torture/co-await-25-for-condition.C: New test.
* g++.dg/coroutines/torture/co-await-26-for-iteration-expr.C: New test.

(cherry picked from commit 26e0eb1071e318728bcd33f28d055729ac48792c)

coroutines : Avoid generating empty statements [PR96749].

In the compiler-only idiom:
" a = (target expr creats temp, op uses temp) "
the target expression variable needs to be promoted to a frame one
(if the expression has a suspend point). However, the only uses of
the var are in the second part of the compound expression - and we
were creating an empty statement corresponding to the (now unused)
first arm. This then produces the spurious warnings noted.

Fixed by avoiding generation of a separate variable nest for
isolated target expressions (or similarly isolated co_awaits used
in a function call).

gcc/cp/ChangeLog:

PR c++/96749
* coroutines.cc (flatten_await_stmt): Allow for the case
where a target expression variable only has uses in the
second part of a compound expression.
(maybe_promote_temps): Avoid emiting empty statements.

gcc/testsuite/ChangeLog:

PR c++/96749
* g++.dg/coroutines/pr96749-1.C: New test.
* g++.dg/coroutines/pr96749-2.C: New test.

(cherry picked from commit ed8198461735f9b5b3c2cbe50f9913690ce4b4ca)

coroutines : Adjust constraints on when to build ctors [PR98118].

PR98118 shows that TYPE_NEEDS_CONSTRUCTING is necessary but not
sufficient. Use type_build_ctor_call() instead.

gcc/cp/ChangeLog:

PR c++/98118
* coroutines.cc (build_co_await): Use type_build_ctor_call()
to determine cases when a CTOR needs to be built.
(flatten_await_stmt): Likewise.
(morph_fn_to_coro): Likewise.

gcc/testsuite/ChangeLog:

PR c++/98118
* g++.dg/coroutines/pr98118.C: New test.

(cherry picked from commit 3d9577c254003f2d18185015b75ce6e3e4af9ca2)

coroutines : Do not accept throwing final await expressions [PR95616].

From the PR:

The wording of [dcl.fct.def.coroutine]/15 states:
* The expression co_await promise.final_suspend() shall not be
potentially-throwing ([except.spec]).

See http://eel.is/c++draft/dcl.fct.def.coroutine#15
and http://eel.is/c++draft/except.spec#6

ie. all of the following must be declared noexcept (if they form part of the await-expression):
- promise_type::final_suspend()
- finalSuspendObj.operator co_await()
- finalSuspendAwaiter.await_ready()
- finalSuspendAwaiter.await_suspend()
- finalSuspendAwaiter.await_resume()
- finalSuspedObj destructor
- finalSuspendAwaiter destructor

This implements the checks for these cases and rejects such code with
a diagnostic if exceptions are enabled.

Backported from 9ee91079fd5879cba046e452ab5593372166b2ab and
4e252e23d34932f13f39cc6544bf5c9379fa2a87

gcc/cp/ChangeLog:

PR c++/95616
* coroutines.cc (coro_diagnose_throwing_fn): New helper.
(coro_diagnose_throwing_final_aw_expr): New helper.
(build_co_await): Diagnose throwing final await expression
components. Look through NOP_EXPRs inbuild_special_member_call
return value to find the CALL_EXPR. Simplify.
(build_init_or_final_await): Diagnose a throwing promise
final_suspend() call.

gcc/testsuite/ChangeLog:

PR c++/95616
* g++.dg/coroutines/pr95616-0-no-exceptions.C: New test.
* g++.dg/coroutines/pr95616-0.C: New test.
* g++.dg/coroutines/pr95616-1-no-exceptions.C: New test.
* g++.dg/coroutines/pr95616-1.C: New test.
* g++.dg/coroutines/pr95616-2.C: New test.
* g++.dg/coroutines/pr95616-3-no-exceptions.C: New test.
* g++.dg/coroutines/pr95616-3.C: New test.
* g++.dg/coroutines/pr95616-4.C: New test.
* g++.dg/coroutines/pr95616-5.C: New test.
* g++.dg/coroutines/pr95616-6.C: New test.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>

coroutines : Handle exceptions throw before the first await_resume() [PR95615].

The coroutine body is wrapped in a try-catch block which is responsible for
handling any exceptions thrown by the original function body.  Originally, the
initial suspend expression was outside this, but an amendement to the standard
places the await_resume call inside and eveything else outside.

This means that any exception thrown prior to the initial suspend expression
await_resume() will propagate to the ramp function.  However, some portion of
the coroutine state will exist at that point (how much depends on where the
exception is thrown from).  For example, we might have some frame parameter
copies, or the promise object or the return object any of which might have a
non-trivial DTOR.  Also the frame itself needs to be deallocated. This patch
fixes the handling of these cases.

gcc/cp/ChangeLog:

PR c++/95615
* coroutines.cc (struct param_info): Track parameter copies that need
a DTOR.
(coro_get_frame_dtor): New helper function factored from build_actor().
(build_actor_fn): Use coro_get_frame_dtor().
(morph_fn_to_coro): Track parameters that need DTORs on exception,
likewise the frame promise and the return object.  On exception, run the
DTORs for these, destroy the frame and then rethrow the exception.

gcc/testsuite/ChangeLog:

PR c++/95615
* g++.dg/coroutines/torture/pr95615-01.C: New test.
* g++.dg/coroutines/torture/pr95615-02.C: New test.
* g++.dg/coroutines/torture/pr95615-03.C: New test.
* g++.dg/coroutines/torture/pr95615-04.C: New test.
* g++.dg/coroutines/torture/pr95615-05.C: New test.

(cherry picked from commit fe55086547c9360b530e040a6673dae10ac77847)

coroutines : Call promise CTOR with parm copies [PR97587].

As the PR notes, we were calling the promise CTOR with the original
function parameters, not the copy (as pointed, a previous wording of
the section was unambiguous). Fixed thus.

gcc/cp/ChangeLog:

PR c++/97587
* coroutines.cc (struct param_info): Track rvalue refs.
(morph_fn_to_coro): Track rvalue refs, and call the promise
CTOR with the frame copy of passed parms.

gcc/testsuite/ChangeLog:

PR c++/97587
* g++.dg/coroutines/coro1-refs-and-ctors.h: Add a CTOR with two
reference parms, to distinguish the rvalue ref. variant.
* g++.dg/coroutines/pr97587.C: New test.

(cherry picked from commit b8ff3f8efeda02a6bedebfaf20b93645ae3bb5b8)

coroutines : Remove throwing_cleanup marks from the ramp [PR95822].

The FE contains a mechanism for cleaning up return expressions if a
function throws during the execution of cleanups prior to the return.

If the original function has a return value with a non-trivial DTOR
and the body contains a var with a DTOR that might throw, the function
decl is marked "throwing_cleanup".

However, we do not [in the coroutine ramp function, which is
synthesised], use any body var types with DTORs that might throw.

The original body [which will then contain the type with the throwing
DTOR] is transformed into the actor function which only contains void
returns, and is also wrapped in a try-catch block.

So (a) the 'throwing_cleanup' is no longer correct for the ramp and
(b) we do not need to transfer it to the actor which only contains
void returns.

gcc/cp/ChangeLog:

PR c++/95822
* coroutines.cc (morph_fn_to_coro): Unconditionally remove any
set throwing_cleanup marker.

gcc/testsuite/ChangeLog:

PR c++/95822
* g++.dg/coroutines/pr95822.C: New test.

(cherry picked from commit 7005a50d0121954031a223ea5a6c57aaa7e3efd3)

testsuite, coroutines : Make final_suspend calls noexcept.

The wording of [dcl.fct.def.coroutine]/15 states:
The expression co_await promise.final_suspend() shall not be
potentially-throwing. A fair number of testcases are not correctly
marked. Fixed here.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/co-await-void_type.C: Mark promise
final_suspend call as noexcept.
* g++.dg/coroutines/co-return-syntax-08-bad-return.C: Likewise.
* g++.dg/coroutines/co-return-syntax-10-movable.C: Likewise.
* g++.dg/coroutines/co-return-warning-1.C: Likewise.
* g++.dg/coroutines/co-yield-syntax-08-needs-expr.C: Likewise.
* g++.dg/coroutines/coro-bad-gro-00-class-gro-scalar-return.C: Likewise.
* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C: Likewise.
* g++.dg/coroutines/coro-missing-gro.C: Likewise.
* g++.dg/coroutines/coro-missing-promise-yield.C: Likewise.
* g++.dg/coroutines/coro-missing-ret-value.C: Likewise.
* g++.dg/coroutines/coro-missing-ret-void.C: Likewise.
* g++.dg/coroutines/coro-missing-ueh.h: Likewise.
* g++.dg/coroutines/coro1-allocators.h: Likewise.
* g++.dg/coroutines/coro1-refs-and-ctors.h: Likewise.
* g++.dg/coroutines/coro1-ret-int-yield-int.h: Likewise.
* g++.dg/coroutines/pr94682-preview-this.C: Likewise.
* g++.dg/coroutines/pr94752.C: Likewise.
* g++.dg/coroutines/pr94760-mismatched-traits-and-promise-prev.C: Likewise.
* g++.dg/coroutines/pr94879-folly-1.C: Likewise.
* g++.dg/coroutines/pr94883-folly-2.C: Likewise.
* g++.dg/coroutines/pr95050.C: Likewise.
* g++.dg/coroutines/pr95345.C: Likewise.
* g++.dg/coroutines/pr95440.C: Likewise.
* g++.dg/coroutines/pr95591.C: Likewise.
* g++.dg/coroutines/pr95711.C: Likewise.
* g++.dg/coroutines/pr95813.C: Likewise.
* g++.dg/coroutines/symmetric-transfer-00-basic.C: Likewise.
* g++.dg/coroutines/torture/co-await-07-tmpl.C: Likewise.
* g++.dg/coroutines/torture/co-await-17-capture-comp-ref.C: Likewise.
* g++.dg/coroutines/torture/co-ret-00-void-return-is-ready.C: Likewise.
* g++.dg/coroutines/torture/co-ret-01-void-return-is-suspend.C: Likewise.
* g++.dg/coroutines/torture/co-ret-03-different-GRO-type.C: Likewise.
* g++.dg/coroutines/torture/co-ret-04-GRO-nontriv.C: Likewise.
* g++.dg/coroutines/torture/co-ret-06-template-promise-val-1.C: Likewise.
* g++.dg/coroutines/torture/co-ret-08-template-cast-ret.C: Likewise.
* g++.dg/coroutines/torture/co-ret-09-bool-await-susp.C: Likewise.
* g++.dg/coroutines/torture/co-ret-15-default-return_void.C: Likewise.
* g++.dg/coroutines/torture/co-ret-17-void-ret-coro.C: Likewise.
* g++.dg/coroutines/torture/co-yield-00-triv.C: Likewise.
* g++.dg/coroutines/torture/co-yield-03-tmpl.C: Likewise.
* g++.dg/coroutines/torture/co-yield-04-complex-local-state.C: Likewise.
* g++.dg/coroutines/torture/exceptions-test-0.C: Likewise.
* g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C: Likewise.
* g++.dg/coroutines/torture/func-params-04.C: Likewise.
* g++.dg/coroutines/torture/local-var-06-structured-binding.C: Likewise.
* g++.dg/coroutines/torture/mid-suspend-destruction-0.C: Likewise.

(cherry picked from commit 9a4eb720b343324f7f8fd2dceed5d0347e5a0153)

testsuite, coroutines : Mark final awaiters and co_await operators noexcept.

This is part of the requirement of [dcl.fct.def.coroutine]/15.

In addition to promise final_suspend() calls, the following cases must
also be noexcept as per discussion in PR95616.

- finalSuspendObj.operator co_await()
- finalSuspendAwaiter.await_ready()
- finalSuspendAwaiter.await_suspend()
- finalSuspendAwaiter.await_resume()
- finalSuspedObj destructor
- finalSuspendAwaiter destructor

Fixed for missing cases in the testsuite as a prerequisite to fixing
PR95616.

Backported from 98fcd2513add205dcdd134eb29a2505ea9f81495 and
3c173f7890cfd6649b687adc5b0598d9e01fcd6d

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr9xxxx-mismatched-traits-and-promise-prev.C: Moved to...
* g++.dg/coroutines/pr94879-folly-1.C: ... here.
Make final suspend expression components noexcept.
* g++.dg/coroutines/pr94883-folly-2.C: Likewise.
* g++.dg/coroutines/pr95345.C: Likewise.

aix: Permit use of AIX Vector extended ABI mode

AIX only permits use of Altivec VSRs 20-31 in a Vector Extended ABI mode.
This patch explicitly enables use of the VSRs using the new -mabi=vec-extabi
command line option also implemented in LLVM for AIX.

Bootstrapped on powerpc-ibm-aix7.2.3.0 and powerpc64le-linux-gnu.

gcc/ChangeLog:

* config/rs6000/rs6000.opt (mabi=vec-extabi): New.
(mabi=vec-default): New.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
__EXTABI__ for AIX Vector extended ABI.
* config/rs6000/rs6000.c (rs6000_debug_reg_global): Print AIX Vector
extabi info.
(conditional_register_usage): If AIX vec_extabi enabled, vs20-vs31
are non-volatile.
* doc/invoke.texi (PowerPC mabi): Add AIX vec-extabi and vec-default.

(cherry picked from commit 2e7750cb518f5abedbd6fb2725882079a6934dce)

PR target/99702: Check RTL type before get value

gcc/ChangeLog:

PR target/99702
* config/riscv/riscv.c (riscv_expand_block_move): Get RTL value
after type checking.

gcc/testsuite/ChangeLog:

PR target/99702
* gcc.target/riscv/pr99702.c: New.

(cherry picked from commit 540dace2ed3949571f2ce6cb007354e69bda0cb2)

Daily bump.

coroutines : Adjust error handling for type-dependent coroutines [PR96251].

Although coroutines are not permitted to be constexpr, generic lambdas
are implicitly from C++17 and, because of this, a generic coroutine lambda
can be marked as potentially constexpr. As per the PR, this then fails when
type substitution is attempted because the check disallowing constexpr in
the coroutines code was overly restrictive.

This changes the error handing to mark the function as 'invalid_constexpr'
but suppresses the error in the case that we are instantiating a constexpr.

gcc/cp/ChangeLog:

PR c++/96251
* coroutines.cc (coro_common_keyword_context_valid_p): Suppress
error reporting when instantiating for a constexpr.

gcc/testsuite/ChangeLog:

PR c++/96251
* g++.dg/coroutines/pr96251.C: New test.

(cherry picked from commit f13d9e48eeca7ed8f8df55c9a62fc9980d5606ad)

coroutines: Fix unused value found by static analysis.

This fixes up the zero-initialization of the coro frame pointer
to avoid an unused assigned value, spotted by Martin Liska with
static analysis.

gcc/cp/ChangeLog:

* coroutines.cc (morph_fn_to_coro): Revise initialization
of the frame pointer to avoid an unused value.

(cherry picked from commit 9df0ff5f219b9e93d007f42939a6449ce2521cf5)

dwarf2unwind : Force the CFA after remember/restore pairs [44107/48097].

This address one of the more long-standing and serious regressions
for Darwin.  GCC emits unwind code by default on the assumption that
the unwinder will be (of have the same capability) as the one in the
current libgcc_s.  For Darwin platforms, this is not the case - some
of them are based on the libgcc_s from GCC-4.2.1 and some are using
the unwinder provided by libunwind (part of the LLVM project). The
latter implementation has gradually adopted a section that deals with
GNU unwind.

The most serious problem for some of the platform versions is in
handling DW_CFA_remember/restore_state pairs.  The DWARF description
talks about these in terms of saving/restoring register rows; this is
what GCC originally did (and is what the unwinders do for the Darwin
versions based on libgcc_s).

However, in r118068, this was changed so that not only the registers
but also the current frame address expression were saved.  The unwind
code assumes that the unwinder will do this; some of Darwin's unwinders
do not, leading to lockups etc.  To date, the only solution has been
to replace the system libgcc_s with a newer one which is not a viable
solution for many end-users (since that means overwritting the one
provided with the system installation).

The fix here provides a target hook that allows the target to specify
that the CFA should be reinstated after a DW_CFA_restore.  This fixes
the issue (and also the closed WONTFIX of 44107).

(As a matter of record, it also fixes reported Java issues if
backported to GCC-5).

gcc/ChangeLog:

PR target/44107
PR target/48097
* config/darwin-protos.h (darwin_should_restore_cfa_state): New.
* config/darwin.c (darwin_should_restore_cfa_state): New.
* config/darwin.h (TARGET_ASM_SHOULD_RESTORE_CFA_STATE): New.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Document TARGET_ASM_SHOULD_RESTORE_CFA_STATE.
* dwarf2cfi.c (connect_traces): If the target requests, restore
the CFA expression after a DW_CFA_restore.
* target.def (TARGET_ASM_SHOULD_RESTORE_CFA_STATE): New hook.

(cherry picked from commit 491d5b3cf8216f9285a67aa213b9a66b0035137b)

Darwin : Simplify headers.

This is a NFC patch, but needed to make follow-on patches apply
cleanly.

The darwinN.h headers were (presumably) introduced to allow specs to be
adjusted when there was no mmacosx-version-min handling, or that was
considered unreliable.

We have version-specific specs for the values that have configuration
data, and the version is set in the driver (so may be considered
reliably present).

Some of the 'darwinN.h' content has become dead code, and the reminder
is either conditionalised on version information (or is setting values
used as fall-backs in cross-compilations).

With the changes needed for Darwin20 / macOS 11 the 'darwnN.h' headers
are now too unwieldy to be useful - so this series moves the relevant
specs definitons to the common 'darwin.h' header and then finally uses
the config.gcc script to supply the fall-back defaults for cross-
compilations.

We can then delete all but the main header, since the darwinN.h are
unused.

There is no need to make the LINK_GCC_C_SEQUENCE_SPEC conditional on
configuration parameters, it is adequately conditionalized on the
macosx-version-min.

We now need a modern C++ toolchain to bootstrap GCC, so there's no
need to skip the stack protect for Darwin < 9.

Darwin defines ASM_OUTPUT_ALIGNED_DECL_COMMON which is used in
preference to ASM_OUTPUT_ALIGNED_COMMON, which makes the latter
definition dead code. Remove this.

The darwinN.h headers (with the sole exception of darwin7.h,
which contains a target macro definition) now only contain
values that set fall-backs for cross-compilations, these can
be provided from the config.gcc script which means we no longer
need the darwinN.h - so delete them.

Backported from 1dfeaca014fae0f129e1408a3e8df992892c8fed,
896607741f1ea98fc8a35e58e76d67248f6e6211,
ac6ecec4b328daf0583a125d647f9a5836fa0023,
5282e22f0e7f0d0d5ca2bdc4a952c0d383300eba,
4a04f09dc7616ebe76ee71aa50eee54f1115f1f2 and
02f305440f29c68b7368c9af9ae689cce6d26d6d

gcc/ChangeLog:

* config/darwin10.h (LINK_GCC_C_SEQUENCE_SPEC): Move the spec
for the Darwin10 unwinder stub from here ..
* config/darwin.h (LINK_COMMAND_SPEC_A): ... to here.
here...
* config/darwin10.h (LINK_GCC_C_SEQUENCE_SPEC): Move from here..
* config/darwin.h (LINK_GCC_C_SEQUENCE_SPEC): ... to here.
* config/darwin9.h (STACK_CHECK_STATIC_BUILTIN): Move from here..
* config/darwin.h (STACK_CHECK_STATIC_BUILTIN): .. to here.
* config.gcc: Compute default version information
from the configured target. Likewise defaults for
ld64. Delete reference to the now removed darwin8.h
* config/darwin10.h: Removed.
* config/darwin12.h: Removed.
* config/darwin9.h: Removed.
* config/rs6000/darwin8.h: Removed.

Darwin : Adjust defaults for current bootstrap constraints.

The toolchain now requires a C++11 compiler to bootstrap and
none of the older Darwin toolchains which were based on stabs
debugging are suitable. We can simplify the debug setup now.

gcc/ChangeLog:

* config/darwin.h (DSYMUTIL_SPEC): Default to DWARF
(ASM_DEBUG_SPEC):Only define if the assembler supports
stabs.
(PREFERRED_DEBUGGING_TYPE): Default to DWARF.
(DARWIN_PREFER_DWARF): Define.
* config/darwin9.h (PREFERRED_DEBUGGING_TYPE): Remove.
(DARWIN_PREFER_DWARF): Likewise
(DSYMUTIL_SPEC): Likewise.
(COLLECT_RUN_DSYMUTIL): Likewise.
(ASM_DEBUG_SPEC): Likewise.
(ASM_DEBUG_OPTION_SPEC): Likewise.

(cherry picked from commit b2cee5e1e89c8f939bc36fe9756befcb93d96982)

Darwin: Guard two macros in darwin.h.

Work on the Arm64 port shows that these two macros can be declared
ahead of the version in darwin.h which needs to override (for X86
and PPC this wasn't needed).

gcc/ChangeLog:

* config/darwin.h (ASM_DECLARE_FUNCTION_NAME): UNDEF before
use.
(DEF_MIN_OSX_VERSION): Only define if there's no existing
def.

(cherry picked from commit 105fe3e0b896998b4a1b5a79ad6526959c2e2e7a)

Darwin : Avoid a C++ ODR violation seen with LTO.

We have a similar code pattern in darwin-c.c to one in c-pragmas
(most likely a cut & paste) with a struct type used locally to the
TU. With C++ we need to rename the type to avoid an ODR violation.

gcc/ChangeLog:

* config/darwin-c.c (struct f_align_stack): Rename
to type from align_stack to f_align_stack.
(push_field_alignment): Likewise.
(pop_field_alignment): Likewise.

(cherry picked from commit 3c52cd517a34b6b37eb17d4defd63bb31e60888b)

Darwin : Begin rework of zero-fill sections.

Much of the existing work in the Darwin BSS and common sections
was to accommodate the PowerPC section anchors. We want to segregate
this, since it might become desirable to support section anchors for
arm64.

First revision (here) is to use the same section conventions as the Xcode
toochains for BSS and COMMON.

We also drop the constraint about putting small items into data/static data
that was a work-around for Java issues (irrelevant for several editions).

gcc/ChangeLog:

* config/darwin.c (darwin_emit_local_bss): Amend section names to
match system tools. (darwin_output_aligned_bss): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/darwin-sections.c: Adjust test for renamed BSS and common
sections. Cater for 64 and 128 bit long doubles.

(cherry picked from commit dcf59c5c0100d0649d64ec948dbe24018d48b6a5)

Darwin : Update libc function availability.

Darwin libc has sincos from 10.9 (darwin13) onwards.

gcc/ChangeLog:

* config/darwin.c (darwin_libc_has_function): Report sincos
available from 10.9.

(cherry picked from commit 2e746cebd9c6bb42b4892135942be7ddf865b3bf)

testsuite, Darwin: XFAIL runs for two timode conversion tests.

X86 Darwin fails these at present, because (to work around PR80556)
we insert libSystem ahead of libgcc. The libSystem implementation
has a similar bug to one that was fixed for GCC. We need to fix
80556 properly, and then this issue will go away - we will be able
to use the libgcc impl as intended.

XFAIL the run for now, to reduce testsuite noise.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/fp-int-convert-timode-3.c: XFAIL run.
* gcc.dg/torture/fp-int-convert-timode-4.c: Likewise.

(cherry picked from commit 94d4f4387de8264ee289cf71f692d59ca6ac36f8)

Darwin: Handle poly_int machine_modes.

The common code that selects suitable sections for literals needs
to inspect the machine_mode. For some sub-targets that might be
represented as a poly-int.

There was a workaround in place that allowed for cases where the poly
int had only one component. This removes the workaround and handles
the cases where we care about the machine_mode size.

gcc/ChangeLog:

* config/darwin.c (IN_TARGET_CODE): Remove.
(darwin_mergeable_constant_section): Handle poly-int machine modes.
(machopic_select_rtx_section): Likewise.

(cherry picked from commit 7ddee9cd99beb2c3603bf307d263c6fd9cc05e90)

Daily bump.

Use memcpy instead of strncpy to avoid error with -Werror=stringop-truncation.

gcc/ChangeLog:

* config/pa/pa.c (import_milli): Use memcpy instead of strncpy.

Daily bump.

testsuite: Fix up strlenopt-80.c on powerpc [PR99636]

Similar issue as in strlenopt-73.c, various spots in this test rely
on MOVE_MAX >= 8, this time it uses a target selector to pick up a couple
of targets, and all of them but powerpc 32-bit satisfy it, but powerpc
32-bit have MOVE_MAX just 4.

2021-03-18 Jakub Jelinek <jakub@redhat.com>

PR testsuite/99636
* gcc.dg/strlenopt-80.c: For powerpc*-*-*, only enable for lp64.

(cherry picked from commit 89d44a9f3b9ab97634b7ef894e2c83ebd83582a8)

testsuite: Fix up strlenopt-73.c on powerpc [PR99626]

As mentioned in the testcase as well as in the PR, this testcase relies on
MOVE_MAX being sufficiently large that the memcpy call is folded early into
load + store.  Some popular targets define MOVE_MAX to 8 or even 16 (e.g.
x86_64 or some options on s390x), but many other targets define it to just 4
(e.g. powerpc 32-bit), or even 2.

The testcase has already one test routine guarded on one particular target
with MOVE_MAX 16 (but does it incorrectly, __i386__ is only defined on
32-bit x86 and __SIZEOF_INT128__ is only defined on 64-bit targets), this
patch fixes that, and guards another test that relies on memcpy (, , 8)
being folded that way (which therefore needs MOVE_MAX >= 8) on a couple of
common targets that are known to have such MOVE_MAX.

2021-03-18  Jakub Jelinek  <jakub@redhat.com>

PR testsuite/99626
* gcc.dg/strlenopt-73.c: Ifdef out test_copy_cond_unequal_length_i64
on targets other than x86, aarch64, s390 and 64-bit powerpc.  Use
test_copy_cond_unequal_length_i128 for __x86_64__ with int128 support
rather than __i386__.

(cherry picked from commit fff9faa79043aa53d361e7f6e31b2680007a97e2)

aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

The gcc.dg/declare-simd.c test does not emit a warning with
-mabi=ilp32.

2021-03-16 Christophe Lyon <christophe.lyon@linaro.org>

PR target/99542
gcc/testsuite/
* gcc.dg/declare-simd.c (fn2): Expect a warning only under lp64.

(cherry picked from commit a2a6e9214e27b32d4582c670faf9cdb74e54c2c6)

c++: Ensure correct destruction order of local statics [PR99613]

As mentioned in the PR, if end of two constructions of local statics
is strongly ordered, their destructors should be run in the reverse order.
As we run __cxa_guard_release before calling __cxa_atexit, it is possible
that we have two threads that access two local statics in the same order
for the first time, one thread wins the __cxa_guard_acquire on the first
one but is rescheduled in between the __cxa_guard_release and __cxa_atexit
calls, then the other thread is scheduled and wins __cxa_guard_acquire
on the second one and calls __cxa_quard_release and __cxa_atexit and only
afterwards the first thread calls its __cxa_atexit.  This means a variable
whose completion of the constructor strongly happened after the completion
of the other one will be destructed after the other variable is destructed.

The following patch fixes that by swapping the __cxa_guard_release and
__cxa_atexit calls.

2021-03-16  Jakub Jelinek  <jakub@redhat.com>

PR c++/99613
* decl.c (expand_static_init): For thread guards, call __cxa_atexit
before calling __cxa_guard_release rather than after it.  Formatting
fixes.

(cherry picked from commit 0251051db64f13c9a31a05c8133c31dc50b2b235)

i386: Fix up _mm256_vzeroupper() handling [PR99563]

My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for
vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling
when the implicit vzeroupper handling is disabled.
The epilogue_completed splitter for vzeroupper now adds clobbers for all
registers which don't have explicit sets in the pattern and the sets are
added during vzeroupper pass.  Before my changes, for explicit user
vzeroupper, we just weren't modelling its effects at all, it was just
unspec that didn't tell that it clobbers the upper parts of all XMM < %xmm16
registers.  But now the splitter will even for those add clobbers and as
it has no sets, it will add clobbers for all registers, which means
we optimize away anything that lived across that vzeroupper.

The vzeroupper pass has two parts, one is the mode switching that computes
where to put the implicit vzeroupper calls and puts them there, and then
another that uses df to figure out what sets to add to all the vzeroupper.
The former part should be done only under the conditions we have in the
gate, but the latter as this PR shows needs to happen either if we perform
the implicit vzeroupper additions, or if there are (or could be) any
explicit vzeroupper instructions.  As that function does df_analyze and
walks the whole IL, I think it would be too expensive to run it always
whenever TARGET_AVX, so this patch remembers if we've expanded at least
one __builtin_ia32_vzeroupper in the function and runs that part of the
vzeroupper pass both when the old condition is true or when this new
flag is set.

2021-03-16  Jakub Jelinek  <jakub@redhat.com>

PR target/99563
* config/i386/i386.h (struct machine_function): Add
has_explicit_vzeroupper bitfield.
* config/i386/i386-expand.c (ix86_expand_builtin): Set
cfun->machine->has_explicit_vzeroupper when expanding
IX86_BUILTIN_VZEROUPPER.
* config/i386/i386-features.c (rest_of_handle_insert_vzeroupper):
Do the mode switching only when TARGET_VZEROUPPER, expensive
optimizations turned on and not optimizing for size.
(pass_insert_vzeroupper::gate): Enable even when
cfun->machine->has_explicit_vzeroupper is set.

* gcc.target/i386/avx-pr99563.c: New test.

(cherry picked from commit 82085eb3d44833bd1557fdd932c4738d987f559d)

aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

As the patch shows, there are several bugs in
aarch64_simd_clone_compute_vecsize_and_simdlen.
One is that unlike for function declarations that aren't definitions
it completely ignores argument types.  Such decls don't have DECL_ARGUMENTS,
but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like
the simd cloning code in the middle end does too.

Another problem is that it checks types of uniform arguments.  That is
unnecessary, uniform arguments are passed the way it normally is, it is
a scalar argument rather than vector, so there is no reason not to support
uniform argument of different size, or long double, structure etc.

2021-03-16  Jakub Jelinek  <jakub@redhat.com>

PR target/99542
* config/aarch64/aarch64.c
(aarch64_simd_clone_compute_vecsize_and_simdlen): If not a function
definition, walk TYPE_ARG_TYPES list if non-NULL for argument types
instead of DECL_ARGUMENTS.  Ignore types for uniform arguments.

* gcc.dg/gomp/pr99542.c: New test.
* gcc.dg/gomp/pr59669-2.c (bar): Don't expect a warning on aarch64.
* gcc.dg/gomp/simd-clones-2.c (setArray): Likewise.
* g++.dg/vect/simd-clone-7.cc (bar): Likewise.
* g++.dg/gomp/declare-simd-1.C (f37): Expect a different warning
on aarch64.
* gcc.dg/declare-simd.c (fn2): Expect a new warning on aarch64.

(cherry picked from commit fcefc59befd396267b824c170b6a37acaf10874e)

c++: Fix up calls to immediate functions returning reference [PR99507]

build_cxx_call calls convert_from_reference at the end, so if an immediate
function returns a reference, we were constant evaluating not just that
call, but that call wrapped in an INDIRECT_REF. That unfortunately means
it can constant evaluate to something non-addressable, so if code later
needs to take its address it will fail.

The following patch fixes that by undoing the convert_from_reference
wrapping for the cxx_constant_value evaluation and readdding it ad the end.

2021-03-12 Jakub Jelinek <jakub@redhat.com>

PR c++/99507
* call.c (build_over_call): For immediate evaluation of functions
that return references, undo convert_from_reference effects before
calling cxx_constant_value and call convert_from_reference
afterwards.

* g++.dg/cpp2a/consteval19.C: New test.

(cherry picked from commit 425afe1f0c907e6469cef1672160c9c95177e71a)

icf: Check return type of internal fn calls [PR99517]

The following testcase is miscompiled, because IPA-ICF considers the two
functions identical.  They aren't, the types of the .VEC_CONVERT call
lhs is different.  But for calls to internal functions, there is no
fntype nor callee with a function type to compare, so all we compare
is just the ifn, arguments and some call flags.

The following patch fixes it by checking the internal fn calls like e.g. gimple
assignments where the type of the lhs is checked too.

2021-03-11  Jakub Jelinek  <jakub@redhat.com>

PR ipa/99517
* ipa-icf-gimple.c (func_checker::compare_gimple_call): For internal
function calls with lhs fail if the lhs don't have compatible types.

* gcc.target/i386/avx2-pr99517-1.c: New test.
* gcc.target/i386/avx2-pr99517-2.c: New test.

(cherry picked from commit 070ab283d16d8e8e8bb70f9801aca347f008cbd0)

expand: Fix ICE in store_bit_field_using_insv [PR93235]

The following testcase ICEs on aarch64. The problem is that
op0 is (subreg:HI (reg:HF ...) 0) and because we can't create a SUBREG of a
SUBREG and aarch64 doesn't have HImode insv, only SImode insv,
store_bit_field_using_insv tries to create (subreg:SI (reg:HF ...) 0)
which is not valid for the target and so gen_rtx_SUBREG ICEs.

The following patch fixes it by punting if the to be created SUBREG
doesn't validate, callers of store_bit_field_using_insv can handle
the fallback.

2021-03-04 Jakub Jelinek <jakub@redhat.com>

PR middle-end/93235
* expmed.c (store_bit_field_using_insv): Return false of xop0 is a
SUBREG and a SUBREG to op_mode can't be created.

* gcc.target/aarch64/pr93235.c: New test.

(cherry picked from commit 0ad6de3883a1641f7ec0bd9cf56d41fa5b313dae)

c++: Fix up [[nodiscard]] on ctors on targetm.cxx.cdtor_returns_this targets [PR99362]

In the P1771R1 changes JeanHeyd reverted part of Alex' PR88146 fix,
but that seems to be incorrect to me.
Where P1771R1 suggests warnings for [[nodiscard]] on constructors is
handled in a different place - in particular the TARGET_EXPR handling
of convert_to_void.  When we have CALL_EXPR of a ctor, on most arches
that call has void return type and so returns early, and on arm where
the ctor returns the this pointer it is undesirable to warn as it warns
about all ctor calls, not just the ones where it should warn.

The P1771R1 changes added a test for this, but as it was given *.c
extension rather than *.C, the test was never run and so this didn't get
spotted immediately.  The test also had a bug, (?n) can't be used
in dg-warning/dg-error because those are implemented by prepending
some regexp before the user provided one and (?n) must come at the start
of the regexp.  Furthermore, while -ftrack-macro-expansion=0 is useful
in one nodiscard test which uses macros, I don't see how it would be
relevant to all the other cpp2a/nodiscard* tests which don't use any
macros.

2021-03-04  Jakub Jelinek  <jakub@redhat.com>

PR c++/88146
PR c++/99362
gcc/cp/
* cvt.c (convert_to_void): Revert 2019-10-17 changes.  Clarify
comment.
gcc/testsuite/
* g++.dg/cpp2a/nodiscard-constructor.c: Renamed to ...
* g++.dg/cpp2a/nodiscard-constructor1.C: ... this.  Remove
-ftrack-macro-expansion=0 from dg-options.  Don't use (?n) in
dg-warning regexps, instead replace .* with \[^\n\r]*.
* g++.dg/cpp2a/nodiscard-constructor2.C: New test.
* g++.dg/cpp2a/nodiscard-reason-only-one.C: Remove
-ftrack-macro-expansion=0 from dg-options.
* g++.dg/cpp2a/nodiscard-reason-nonstring.C: Likewise.
* g++.dg/cpp2a/nodiscard-once.C: Likewise.

(cherry picked from commit c9816196328a4f4b927f08cf2f66cf255849da0b)

c++: Fix -fstrong-eval-order for operator &&, || and , [PR82959]

P0145R3 added
"However, the operands are sequenced in the order prescribed for the built-in
operator" rule for overloaded operator calls when using the operator syntax.
op_is_ordered follows that, but added just the overloaded operators
added in that paper. &&, || and comma operators had rules that
lhs is sequenced before rhs already in C++98.
The following patch adds those cases to op_is_ordered.

2021-03-03 Jakub Jelinek <jakub@redhat.com>

PR c++/82959
* call.c (op_is_ordered): Handle TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR
and COMPOUND_EXPR.

* g++.dg/cpp1z/eval-order10.C: New test.

(cherry picked from commit 0b8fa12015f717ac7e4fe2ffbad96a0cb0df2584)

c-family: Avoid ICE on va_arg [PR99324]

build_va_arg calls the middle-end mark_addressable, which e.g. requires that
cfun is non-NULL.  The following patch calls instead c_common_mark_addressable_vec
which is the c-family variant similarly to the FE c_mark_addressable and
cxx_mark_addressable, except that it doesn't error on addresses of register
variables.  As the taking of the address is artificial for the .VA_ARG
ifn and when that is lowered goes away, it is similar case to the vector
subscripting for which c_common_mark_addressable_vec has been added.

2021-03-03  Jakub Jelinek  <jakub@redhat.com>

PR c/99324
* c-common.c (build_va_arg): Call c_common_mark_addressable_vec
instead of mark_addressable.  Fix a comment typo -
neutrallly -> neutrally.

* gcc.c-torture/compile/pr99324.c: New test.

(cherry picked from commit ba09d11a9d0ae2382bab715b102a7746d20dea6d)

cfgrtl: Fix up fixup_partitions caused ICE [PR99085]

fixup_partitions sometimes changes some basic blocks from hot partition to
cold partition, in particular if after unreachable block removal or other
optimizations a hot partition block is dominated by cold partition block(s).
It fixes up the edges and jumps on those edges, but when after reorder
blocks and in rtl (non-cfglayout) mode that is clearly not enough, because
it keeps the block order the same and so we can end up with more than
1 hot/cold section transition in the same function.

So, this patch fixes that up too.

2021-03-03 Jakub Jelinek <jakub@redhat.com>

PR target/99085
* cfgrtl.c (fixup_partitions): When changing some bbs from hot to cold
partitions, if in non-layout mode after reorder_blocks also move
affected blocks to ensure a single partition transition.

* gcc.dg/graphite/pr99085.c: New test.

(cherry picked from commit 4ad5b1915d50cc39691487f58794d699c7900ace)

c++: Fix operator() lookup in lambdas [PR95451]

During name lookup, name-lookup.c uses:
            if (!(!iter->type && HIDDEN_TYPE_BINDING_P (iter))
                && (bool (want & LOOK_want::HIDDEN_LAMBDA)
                    || !is_lambda_ignored_entity (iter->value))
                && qualify_lookup (iter->value, want))
              binding = iter->value;
Unfortunately as the following testcase shows, this doesn't work in
generic lambdas, where we on the auto b = ... lambda ICE and on the
auto d = lambda reject it even when it should be valid.  The problem
is that the binding doesn't have a FUNCTION_DECL with
LAMBDA_FUNCTION_P for the operator(), but an OVERLOAD with
TEMPLATE_DECL for such FUNCTION_DECL.

The following patch fixes that in is_lambda_ignored_entity, other
possibility would be to do that before calling is_lambda_ignored_entity
in name-lookup.c.

2021-02-26  Jakub Jelinek  <jakub@redhat.com>

PR c++/95451
* lambda.c (is_lambda_ignored_entity): Before checking for
LAMBDA_FUNCTION_P, use OVL_FIRST.  Drop FUNCTION_DECL check.

* g++.dg/cpp1y/lambda-generic-95451.C: New test.

(cherry picked from commit 27f9a87886d48448f83e0e559dcf028b1a4a4ec6)

fold-const: Fix up ((1 << x) & y) != 0 folding for vectors [PR99225]

This optimization was written purely with scalar integers in mind,
can work fine even with vectors, but we can't use build_int_cst but
need to use build_one_cst instead.

2021-02-24 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/99225
* fold-const.c (fold_binary_loc) <case NE_EXPR>: In (x & (1 << y)) != 0
to ((x >> y) & 1) != 0 simplifications use build_one_cst instead of
build_int_cst (..., 1). Formatting fixes.

* gcc.c-torture/compile/pr99225.c: New test.

(cherry picked from commit 6e646abbe02f2c79cc3ba1f3de705ee62ff9dcd1)

fold-const: Fix ICE in fold_read_from_constant_string on invalid code [PR99204]

fold_read_from_constant_string and expand_expr_real_1 have code to optimize
constant reads from string (tree vs. rtl).
If the STRING_CST array type has zero low bound, index is fold converted to
sizetype and so the compare_tree_int works fine, but if it has some other
low bound, it calls size_diffop_loc and that function from 2 sizetype
operands creates a ssizetype difference. expand_expr_real_1 then uses
tree_fits_uhwi_p + compare_tree_int and so works fine, but fold-const.c
only checked if index is INTEGER_CST and calls compare_tree_int, which means
for negative index it will succeed and result in UB in the compiler.

2021-02-23 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/99204
* fold-const.c (fold_read_from_constant_string): Check that
tree_fits_uhwi_p (index) rather than just that index is INTEGER_CST.

* gfortran.dg/pr99204.f90: New test.

(cherry picked from commit 37b64a3547b91677189c6cbf4c08d7c80770a93a)

libstdc++: Fix up constexpr std::char_traits<char>::compare [PR99181]

Because of LWG 467, std::char_traits<char>::lt compares the values
cast to unsigned char rather than char, so even when char is signed
we get unsigned comparision.  std::char_traits<char>::compare uses
__builtin_memcmp and that works the same, but during constexpr evaluation
we were calling __gnu_cxx::char_traits<char_type>::compare.  As
char_traits::lt is not virtual, __gnu_cxx::char_traits<char_type>::compare
used __gnu_cxx::char_traits<char_type>::lt rather than
std::char_traits<char>::lt and thus compared chars as signed if char is
signed.
This change fixes it by inlining __gnu_cxx::char_traits<char_type>::compare
into std::char_traits<char>::compare by hand, so that it calls the right
lt method.

2021-02-23  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/99181
* include/bits/char_traits.h (char_traits<char>::compare): For
constexpr evaluation don't call
__gnu_cxx::char_traits<char_type>::compare but do the comparison loop
directly.

* testsuite/21_strings/char_traits/requirements/char/99181.cc: New
test.

(cherry picked from commit efa64fcce12074dd542670feb02eaee53e810a30)

tree-cfg: Fix up gimple_merge_blocks FORCED_LABEL handling [PR99034]

The verifiers require that DECL_NONLOCAL or EH_LANDING_PAD_NR
labels are always the first label if there is more than one label.

When merging blocks, we don't honor that though.
On the following testcase, we try to merge blocks:
<bb 13> [count: 0]:
<L2>:
S::~S (&s);

and
<bb 15> [count: 0]:
<L0>:
resx 1

where <L2> is landing pad and <L0> is FORCED_LABEL. And the code puts
the FORCED_LABEL before the landing pad label, violating the verification
requirements.

The following patch fixes it by moving the FORCED_LABEL after the
DECL_NONLOCAL or EH_LANDING_PAD_NR label if it is the first label.

2021-02-19 Jakub Jelinek <jakub@redhat.com>

PR ipa/99034
* tree-cfg.c (gimple_merge_blocks): If bb a starts with eh landing
pad or non-local label, put FORCED_LABELs from bb b after that label
rather than before it.

* g++.dg/opt/pr99034.C: New test.

(cherry picked from commit 37bde2f87267908a93c07856317a28827f8284f7)

c: Fix ICE with -fexcess-precision=standard [PR99136]

The following testcase ICEs on i686-linux, because c_finish_return wraps
c_fully_folded retval back into EXCESS_PRECISION_EXPR, but when the function
return type is void, we don't call convert_for_assignment on it that would
then be fully folded again, but just put the retval into RETURN_EXPR's
operand, so nothing removes it anymore and during gimplification we
ICE as EXCESS_PRECISION_EXPR is not handled.

This patch fixes it by not adding that EXCESS_PRECISION_EXPR in functions
returning void, the return value is ignored and all we need is evaluate any
side-effects of the expression.

2021-02-18 Jakub Jelinek <jakub@redhat.com>

PR c/99136
* c-typeck.c (c_finish_return): Don't wrap retval into
EXCESS_PRECISION_EXPR in functions that return void.

* gcc.dg/pr99136.c: New test.

(cherry picked from commit d82f829905cfe6cb47d073825f680900274ce764)

c++: Fix up build_zero_init_1 once more [PR99106]

My earlier build_zero_init_1 patch for flexible array members created
an empty CONSTRUCTOR. As the following testcase shows, that doesn't work
very well because the middle-end doesn't expect CONSTRUCTOR elements with
incomplete type (that the empty CONSTRUCTOR at the end of outer CONSTRUCTOR
had).

The following patch just doesn't add any CONSTRUCTOR for the flexible array
members, it doesn't seem to be needed.

2021-02-17 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/99106
* init.c (build_zero_init_1): For flexible array members just return
NULL_TREE instead of returning empty CONSTRUCTOR with non-complete
ARRAY_TYPE.

* g++.dg/ubsan/pr99106.C: New test.

(cherry picked from commit 7768cadb4246117964a9ba159740da3b9c20811d)

match.pd: Fix up A % (cast) (pow2cst << B) simplification [PR99079]

The (mod @0 (convert?@3 (power_of_two_cand@1 @2))) simplification
uses tree_nop_conversion_p (type, TREE_TYPE (@3)) condition, but I believe
it doesn't check what it was meant to check.  On convert?@3
TREE_TYPE (@3) is not the type of what it has been converted from, but
what it has been converted to, which needs to be (because it is operand
of normal binary operation) equal or compatible to type of the modulo
result and first operand - type.
I could fix that by using && tree_nop_conversion_p (type, TREE_TYPE (@1))
and be done with it, but actually most of the non-nop conversions are IMHO
ok and so we would regress those optimizations.
In particular, if we have say narrowing conversions (foo5 and foo6 in
the new testcase), I think we are fine, either the shift of the power of two
constant after narrowing conversion is still that power of two (or negation
of that) and then it will still work, or the result of narrowing conversion
is 0 and then we would have UB which we can ignore.
Similarly, widening conversions where the shift result is unsigned are fine,
or even widening conversions where the shift result is signed, but we sign
extend to a signed wider divisor, the problematic case of INT_MIN will
become x % (long long) INT_MIN and we can still optimize that to
x & (long long) INT_MAX.
What doesn't work is the case in the pr99079.c testcase, widening conversion
of a signed shift result to wider unsigned divisor, where if the shift
is negative, we end up with x % (unsigned long long) INT_MIN which is
x % 0xffffffff80000000ULL where the divisor is not a power of two and
we can't optimize that to x & 0x7fffffffULL.

So, the patch rejects only the single problematic case.

Furthermore, when the shift result is signed, we were introducing UB into
a program which previously didn't have one (well, left shift into the sign
bit is UB in some language/version pairs, but it is definitely valid in
C++20 - wonder if I shouldn't move the gcc.c-torture/execute/pr99079.c
testcase to g++.dg/torture/pr99079.C and use -std=c++20), by adding that
subtraction of 1, x % (1 << 31) in C++20 is well defined, but
x & ((1 << 31) - 1) triggers UB on the subtraction.
So, the patch performs the subtraction in the unsigned type if it isn't
wrapping.

2021-02-15  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/99079
* match.pd (A % (pow2pcst << N) -> A & ((pow2pcst << N) - 1)): Remove
useless tree_nop_conversion_p (type, TREE_TYPE (@3)) check.  Instead
require both type and TREE_TYPE (@1) to be integral types and either
type having smaller or equal precision, or TREE_TYPE (@1) being
unsigned type, or type being signed type.  If TREE_TYPE (@1)
doesn't have wrapping overflow, perform the subtraction of one in
unsigned type.

* gcc.dg/fold-modpow2-2.c: New test.
* gcc.c-torture/execute/pr99079.c: New test.

(cherry picked from commit 70099a6acf5169eca55ef74549fb64de14e668f0)

c++: Fix endless errors on invalid requirement seq [PR97742]

As the testcase shows, if we reach CPP_EOF during parsing of requirement
sequence, we end up with endless loop where we always report invalid
requirement expression, don't consume any token (as we are at eof) and
repeat.

This patch stops the loop when we reach CPP_EOF.

2021-02-12 Jakub Jelinek <jakub@redhat.com>

PR c++/97742
* parser.c (cp_parser_requirement_seq): Stop iterating after reaching
CPP_EOF.

* g++.dg/cpp2a/concepts-requires24.C: New test.

(cherry picked from commit cf059e1c099ed45c97f740c030dcb8e146ac7d4a)

c++: Fix zero initialization of flexible array members [PR99033]

array_type_nelts returns error_mark_node for type of flexible array members
and build_zero_init_1 was placing an error_mark_node into the CONSTRUCTOR,
on which e.g. varasm ICEs.  I think there is nothing erroneous on zero
initialization of flexible array members though, such arrays should simply
get no elements, like they do if such classes are constructed (everything
except when some larger initializer comes from an explicit initializer).

So, this patch handles [] arrays in zero initialization like [0] arrays
and fixes handling of the [0] arrays - the
tree_int_cst_equal (max_index, integer_minus_one_node) check
didn't do what it thought it would do, max_index is typically unsigned
integer (sizetype) and so it is never equal to a -1.

What the patch doesn't do and maybe would be desirable is if it returns
error_mark_node for other reasons let the recursive callers not stick that
into CONSTRUCTOR but return error_mark_node instead.  But I don't have a
testcase where that would be needed right now.

2021-02-11  Jakub Jelinek  <jakub@redhat.com>

PR c++/99033
* init.c (build_zero_init_1): Handle zero initialiation of
flexible array members like initialization of [0] arrays.
Use integer_minus_onep instead of comparison to integer_minus_one_node
and integer_zerop instead of comparison against size_zero_node.
Formatting fixes.

* g++.dg/ext/flexary38.C: New test.

(cherry picked from commit 2dcdd15d0bafb9b45a8d7ff580217bd6ac1f0975)

varasm: Fix ICE with -fsyntax-only [PR99035]

My FE change from 2 years ago uses TREE_ASM_WRITTEN in -fsyntax-only
mode more aggressively to avoid "expanding" functions multiple times.
With -fsyntax-only nothing is really expanded, so I think it is acceptable
to adjust the assert and allow declare_weak at any time, with -fsyntax-only
we know it is during parsing only anyway.

2021-02-10 Jakub Jelinek <jakub@redhat.com>

PR c++/99035
* varasm.c (declare_weak): For -fsyntax-only, allow even
TREE_ASM_WRITTEN function decls.

* g++.dg/ext/weak6.C: New test.

(cherry picked from commit 0f39fb7b001df7cdba56cd5c572d0737667acd2c)

c++: Consider addresses of heap artificial vars always non-NULL [PR98988, PR99031]

With -fno-delete-null-pointer-checks which is e.g. implied by
-fsanitize=undefined or default on some embedded targets, the middle-end
folder doesn't consider addresses of global VAR_DECLs to be non-NULL, as one
of them could have address 0.  Still, I think malloc/operator new (at least
the nonthrowing) relies on NULL returns meaning allocation failure rather
than success.  Furthermore, the artificial VAR_DECLs we create for
constexpr new never actually live in the address space of the program,
so we can pretend they will never be NULL too.

> I'm surprised that nonzero_address has such a limited set of things it will
> actually believe have non-zero addresses with
> -fno-delete-null-pointer-checks.  But it seems that we should be able to
> arrange to satisfy
>
> >   if (definition && !DECL_EXTERNAL (decl)
>
> since these "variables" are indeed defined within the current translation
> unit.

Doing that seems to work and as added benefit it fixes another PR that has
been filed recently.  I need to create the varpool node explicitly and call
a method that sets the definition member in there, but I can also unregister
those varpool nodes at the end of constexpr processing, as the processing
ensured they don't leak outside of the processing.

2021-02-10  Jakub Jelinek  <jakub@redhat.com>

PR c++/98988
PR c++/99031
* constexpr.c: Include cgraph.h.
(cxx_eval_call_expression): Call varpool_node::finalize_decl on
heap artificial vars.
(cxx_eval_outermost_constant_expr): Remove varpool nodes for
heap artificial vars.

* g++.dg/cpp2a/constexpr-new16.C: New test.
* g++.dg/cpp2a/constexpr-new17.C: New test.

(cherry picked from commit a8db7887dfbf502b7e60d64bfeebd0de592d2d45)

openmp: Temporarily disable into_ssa when gimplifying OpenMP reduction clauses [PR99007]

gimplify_scan_omp_clauses was already calling gimplify_expr with false as
last argument to make sure it is not an SSA_NAME, but as the testcases show,
that is not enough, SSA_NAME temporaries created during that gimplification
can be reused too and we can't allow SSA_NAMEs to be used across OpenMP
region boundaries, as we can only firstprivatize decls.

Fixed by temporarily disabling into_ssa.

2021-02-10 Jakub Jelinek <jakub@redhat.com>

PR middle-end/99007
* gimplify.c (gimplify_scan_omp_clauses): For MEM_REF on reductions,
temporarily disable gimplify_ctxp->into_ssa around gimplify_expr
calls.

* g++.dg/gomp/pr99007.C: New test.
* gcc.dg/gomp/pr99007-1.c: New test.
* gcc.dg/gomp/pr99007-2.c: New test.
* gcc.dg/gomp/pr99007-3.c: New test.

(cherry picked from commit bd0e37f68a3aed944df4eb739a0734bb87153749)

c++: Fix ICE with structured binding initialized to incomplete array [PR97878]

We ICE on the following testcase, for incomplete array a on auto [b] { a }; without
giving any kind of diagnostics, with auto [c] = a; during error-recovery.
The problem is that we get too far through check_initializer and e.g.
store_init_value -> constexpr stuff can't deal with incomplete array types.

As the type of the structured binding artificial variable is always deduced,
I think it is easiest to diagnose this early, even if they have array types
we'll need their deduced type to be complete rather than just its element
type.

2021-02-05 Jakub Jelinek <jakub@redhat.com>

PR c++/97878
* decl.c (check_array_initializer): For structured bindings, require
the array type to be complete.

* g++.dg/cpp1z/decomp54.C: New test.

(cherry picked from commit b229baa75ce4627d1bd38f2d3dcd91af1a7071db)

ifcvt: Avoid ICEs trying to force_operand random RTL [PR97487]

As the testcase shows, RTL ifcvt can throw random RTL (whatever it found in
some insns) at expand_binop or expand_unop and expects it to do something
(and then will check if it created valid insns and punts if not).
These functions in the end if the operands don't match try to
copy_to_mode_reg the operands, which does
if (!general_operand (x, VOIDmode))
  x = force_operand (x, temp);
but, force_operand is far from handling all possible RTLs, it will ICE for
all more unusual RTL codes.  Basically handles just simple arithmetic and
unary RTL operations if they have an optab and
expand_simple_binop/expand_simple_unop ICE on others.

The following patch fixes it by adding some operand verification (whether
there is a hope that copy_to_mode_reg will succeed on those).  It is added
both to noce_emit_move_insn (not needed for this exact testcase,
that function simply tries to recog the insn as is and if it fails,
handles some simple binop/unop cases; the patch performs the verification
of their operands) and noce_try_sign_mask.

2021-02-03  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/97487
* ifcvt.c (noce_can_force_operand): New function.
(noce_emit_move_insn): Use it.
(noce_try_sign_mask): Likewise.  Formatting fix.

* gcc.dg/pr97487-1.c: New test.
* gcc.dg/pr97487-2.c: New test.

(cherry picked from commit 176c7bd840a3902e9e67eb0796de362677905f56)

lra-constraints: Fix error-recovery for bad inline-asms [PR97971]

The following testcase has ice-on-invalid, it can't be reloaded, but we
shouldn't ICE the compiler because the user typed non-sense.

In current_insn_transform we have:
  if (process_alt_operands (reused_alternative_num))
    alt_p = true;

  if (check_only_p)
    return ! alt_p || best_losers != 0;

  /* If insn is commutative (it's safe to exchange a certain pair of
     operands) then we need to try each alternative twice, the second
     time matching those two operands as if we had exchanged them.  To
     do this, really exchange them in operands.

     If we have just tried the alternatives the second time, return
     operands to normal and drop through.  */

  if (reused_alternative_num < 0 && commutative >= 0)
    {
      curr_swapped = !curr_swapped;
      if (curr_swapped)
        {
          swap_operands (commutative);
          goto try_swapped;
        }
      else
        swap_operands (commutative);
    }

  if (! alt_p && ! sec_mem_p)
    {
      /* No alternative works with reloads??  */
      if (INSN_CODE (curr_insn) >= 0)
        fatal_insn ("unable to generate reloads for:", curr_insn);
      error_for_asm (curr_insn,
                     "inconsistent operand constraints in an %<asm%>");
      lra_asm_error_p = true;
...
and so handle inline asms there differently (and delete/nullify them after
this) - fatal_insn is only called for non-inline asm.
But in process_alt_operands we do:
                /* Both the earlyclobber operand and conflicting operand
                   cannot both be user defined hard registers.  */
                if (HARD_REGISTER_P (operand_reg[i])
                    && REG_USERVAR_P (operand_reg[i])
                    && operand_reg[j] != NULL_RTX
                    && HARD_REGISTER_P (operand_reg[j])
                    && REG_USERVAR_P (operand_reg[j]))
                  fatal_insn ("unable to generate reloads for "
                              "impossible constraints:", curr_insn);
and thus ICE even for inline-asms.

I think it is inappropriate to delete/nullify the insn in
process_alt_operands, as it could be done e.g. in the check_only_p mode,
so this patch just returns false in that case, which results in the
caller have alt_p false, and as inline asm isn't simple move, sec_mem_p
will be also false (and it isn't commutative either), so for check_only_p
it will suggests to the callers it isn't ok and otherwise will emit
error and delete/nullify the inline asm insn.

2021-02-03  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/97971
* lra-constraints.c (process_alt_operands): For inline asm, don't call
fatal_insn, but instead return false.

* gcc.target/i386/pr97971.c: New test.

(cherry picked from commit eb69a49c4d3287e797e0d6279186221354905fe0)

i386: Remove V1DImode shift expanders [PR98287]

On Tue, Feb 02, 2021 at 02:23:55PM +0100, Richard Biener wrote:
> All I say is that the x86 target
> should either not advertise V1DF shifts or advertise the basic
> ops that reasonable simplification would expect to exist.

The backend has several V1?Imode shifts, but optab only for those V1DImode
ones:

grep '[la]sh[lr]v1[qhsdtox]' tmp-mddump.md
(define_insn ("mmx_ashlv1di3")
(define_insn ("mmx_lshrv1di3")
(define_insn ("avx512bw_ashlv1ti3")
(define_insn ("avx512bw_lshrv1ti3")
(define_insn ("sse2_ashlv1ti3")
(define_insn ("sse2_lshrv1ti3")
(define_expand ("ashlv1di3")
(define_expand ("lshrv1di3")
emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, operands[1]),

I think it has been introduced with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021#c13

Before we didn't have any V1DImode expanders (except mov/movmisalign, but
those are needed and are supplied for other V1??mode modes too).

This patch just removes the two V1DImode shift expanders with standard names.

2021-02-03 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/98287
* config/i386/mmx.md (<insn><mode>3): For shifts don't enable expander
for V1DImode.

* gcc.dg/pr98287.c: New test.

(cherry picked from commit 1b5572edb8caaed2f31a7235b8c58628da6bdb8f)

PR fortran/99205 - Out of memory with undefined character length

A character variable appearing as a data statement object cannot
be automatic, thus it shall have constant length.

gcc/fortran/ChangeLog:

PR fortran/99205
* data.c (gfc_assign_data_value): Reject non-constant character
length for lvalue.
* trans-array.c (gfc_conv_array_initializer): Restrict loop to
elements which are defined to avoid NULL pointer dereference.

gcc/testsuite/ChangeLog:

PR fortran/99205
* gfortran.dg/data_char_4.f90: New test.
* gfortran.dg/data_char_5.f90: New test.

(cherry picked from commit 8c21bc6646dbe3365d7f89843a79eee823aa3b52)

substitute @tie{} with a space for the man pages

contrib/

2021-03-19 Matthias Klose <doko@ubuntu.com>

* texi2pod.pl: Substitute @tie{} with a space for the man pages.

(cherry picked from commit 3b0155305e5168b48d19f74e9bfcdf423a532ada)

Fix segfault during encoding of CONSTRUCTORs

The segfault occurs in native_encode_initializer when it is encoding the
CONSTRUCTOR for an array whose lower bound is negative (it's OK in Ada).
The computation of the current position is done in HOST_WIDE_INT and this
does not work for arrays whose original range has a negative lower bound
and a positive upper bound; the computation must be done in sizetype
instead so that it may wrap around.

gcc/
PR middle-end/99641
* fold-const.c (native_encode_initializer) <CONSTRUCTOR>: For an
array type, do the computation of the current position in sizetype.

Daily bump.

PR target/99314: Fix integer signedness issue for cpymem pattern expansion.

Third operand of cpymem pattern is unsigned HOST_WIDE_INT, however we
are interpret that as signed HOST_WIDE_INT, that not a problem in
most case, but when the value is large than signed HOST_WIDE_INT, it
might screw up since we have using that value to calculate the buffer
size.

2021-03-05 Sinan Lin <sinan@isrc.iscas.ac.cn>
Kito Cheng <kito.cheng@sifive.com>

gcc/ChangeLog:

* config/riscv/riscv.c (riscv_block_move_straight): Change type
to unsigned HOST_WIDE_INT for parameter and local variable with
HOST_WIDE_INT type.
(riscv_adjust_block_mem): Ditto.
(riscv_block_move_loop): Ditto.
(riscv_expand_block_move): Ditto.

(cherry picked from commit d9f0ade001533c9544bf2153b6baa8844ec0bee4)

aarch64: Improve generic SVE tuning defaults

This patch adds the recently-added tweak to split some SVE VL-based scalar
operations [1] to the generic tuning used for SVE, as enabled by adding +sve
to the -march flag, for example -march=armv8.2-a+sve.

The recommendation for best performance on a particular CPU remains unchanged:
use the -mcpu option for that CPU, where possible. -mcpu=native makes this
straightforward for native compilation.

The tweak to split out SVE VL-based scalar operations is a consistent win for
the Neoverse V1 CPU and should be neutral for the Fujitsu A64FX. A run of
SPEC2017 on A64FX with this tweak on didn't show any non-noise differences.
It is also expected to be neutral on SVE2 implementations.

Therefore, the patch enables the tweak for generic +sve tuning e.g.
-march=armv8.2-a+sve. No SVE2 CPUs are expected to benefit from it,
therefore the tweak is disabled for generic tuning when +sve2 is in
-march e.g. -march=armv8.2-a+sve2.

The implementation of this approach requires a bit of custom logic in
aarch64_override_options_internal to handle these kinds of
architecture-dependent decisions, but we do believe the user-facing principle
here is important to implement.

In general, for the generic target we're using a decision framework that looks
like:

* If all cores that are known to benefit from an optimization
are of architecture X, and all other cores that implement X or above
are not impacted, or have a very slight impact, we will consider it for
generic tuning for architecture X.
* We will not enable that optimisation for generic tuning for architecture X+1
if no known cores of architecture X+1 or above will benefit.

This framework allows us to improve generic tuning for CPUs of generation X
while avoiding accumulating tweaks for future CPUs of generation X+1, X+2...
that do not need them, and thus avoid even the slight negative effects of
these optimisations if the user is willing to tell us the desired architecture
accurately.

X above can mean either annual architecture updates (Armv8.2-a, Armv8.3-a etc)
or optional architecture extensions (like SVE, SVE2).

[1] http://gcc.gnu.org/g:a65b9ad863c5fc0aea12db58557f4d286a1974d7

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_adjust_generic_arch_tuning): Define.
(aarch64_override_options_internal): Use it.
(generic_tunings): Add AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS to
tune_flags.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/aarch64-sve.exp: Add -moverride=tune=none to
sve_flags.
* g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
* g++.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise.
* gcc.target/aarch64/sve/aarch64-sve.exp: Likewise.
* gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise.

(cherry picked from commit 8f0c9d53ef3a9b8ba2579b53596cc2b7f5d8bf69)

testsuite: Update testcase for PR96078 fix [PR99363]

My fix for PR96078 made us stop warning about flatten on an alias if the
target has the alias, which is exactly the case tested here. So let's
remove the expected warning and add a similar case which does warn.

gcc/testsuite/ChangeLog:

PR c/99363
* gcc.dg/attr-flatten-1.c: Adjust.

Daily bump.

aarch64: Fix status return logic in RNG intrinsics

There is a bug with the RNG intrinsics in their return code. The definition says:

"Stores a 64-bit random number into the object pointed to by the argument and returns zero.
If the implementation could not generate a random number within a reasonable period of time
the object pointed to by the input is set to zero and a non-zero value is returned."

This means we should be testing whether to return non-zero with:
CSET W0, EQ
rather than NE.

This patch fixes that.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (aarch64_expand_rng_builtin): Use EQ
to compare against CC_REG rather than NE.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rng_2.c: New test.

(cherry picked from commit f7581eb38eeaa8af64f3cdfe2faf764f5883f16f)

rs6000: Fix disassembling a vector pair in gcc-10 in little-endian mode

In gcc-10, we don't handle disassembling a vector pair in little-endian mode
correctly. The solution is to make use of the disassemble accumulator code
that is endian friendly.

gcc/

2021-03-17 Peter Bergner <bergner@linux.ibm.com>

* config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Handle
disassembling a vector pair vector by vector in little-endian mode.

Daily bump.

ipa: Fix resolving speculations through cgraph_edge::set_call_stmt

In the PR 98078 testcase, speculative call-graph edges which were
created by IPA-CP are confirmed during inlining but
cgraph_edge::set_call_stmt does not take it very well.

The function enters the update_speculative branch and updates the
edges in the speculation bundle separately (by a recursive call), but
when it processes the first direct edge, most of the bundle actually
ceases to exist because it is devirtualized.  It nevertheless goes on
to attempt to update the indirect edge (that has just been removed),
which surprisingly gets as far as adding the edge to the
call_site_hash, the same devirtualized edge for the second time, and
that triggers an assert.

Fixed by this patch which makes the function aware that it is about to
resolve a speculation and do so instead of updating components of
speculation.  Also, it does so before dealing with the hash because
the speculation resolution code needs the hash to point to the first
speculative direct edge and also cleans the hash up by calling
update_call_stmt_hash_for_removing_direct_edge.

Bootstrapped and tested on x86_64-linux, also profile-LTO-bootstrapped
on the same system.

gcc/ChangeLog:

2021-01-20  Martin Jambor  <mjambor@suse.cz>

PR ipa/98078
* cgraph.c (cgraph_edge::set_call_stmt): Do not update all
corresponding speculative edges if we are about to resolve
sepculation.  Make edge direct (and so resolve speculations) before
removing it from call_site_hash.
(cgraph_edge::make_direct): Relax the initial assert to allow calling
the function on speculative direct edges.

(cherry picked from commit b8188b7d7382e4a74af5dd6a125e76e8d43a68a5)

c/99224 - avoid ICEing on invalid __builtin_next_arg

This avoids crashes with __builtin_next_arg on non-parameters. For
the specific testcase we arrive with an anonymous SSA_NAME so that
SSA_NAME_VAR becomes NULL and we crash.

2021-02-24 Richard Biener <rguenther@suse.de>

PR c/99224
* builtins.c (fold_builtin_next_arg): Avoid NULL arg.

* gcc.dg/pr99224.c: New testcase.

(cherry picked from commit 084963dcaca2f0836366fdb001561e29ecbfb483)

tree-optimization/99253 - fix reduction path check

This fixes an ordering problem with verifying that no intermediate
computations in a reduction path are used outside of the chain.  The
check was disabled for value-preserving conversions at the tail
but whether a stmt was a conversion or not was only computed after
the first use.  The following fixes this by re-ordering things
accordingly.

2021-02-25  Richard Biener  <rguenther@suse.de>

PR tree-optimization/99253
* tree-vect-loop.c (check_reduction_path): First compute
code, then verify out-of-loop uses.

* gcc.dg/vect/pr99253.c: New testcase.

(cherry picked from commit 1193d05465acd39b6e3c7095274d8351a1e2cd44)

Daily bump.

coroutines : Avoid a C++11ism.

The master version of the code uses a defaulted CTOR, which had
been inadvertently backported to gcc-10. Fixed thus.

gcc/cp/ChangeLog:

* coroutines.cc (struct var_nest_node): Provide a default
CTOR.

tree-nested: Update assert for Fortran module vars [PR97927]

gcc/ChangeLog:

PR fortran/97927
* tree-nested.c (convert_local_reference_stmt): Avoid calling
lookup_field_for_decl for Fortran module (= namespace context).

gcc/testsuite/ChangeLog:

PR fortran/97927
* gfortran.dg/module_variable_3.f90: New test.

(cherry picked from commit 8a6a62614a8ae4544770420416d1632d6c3d3f6e)

ira: Make sure allocno copies are ordered [PR98791]

gcc/ChangeLog:
2021-02-22 Andre Vieira <andre.simoesdiasvieira@arm.com>

PR rtl-optimization/98791
* ira-conflicts.c (process_regs_for_copy): Don't create allocno copies
for unordered modes.

gcc/testsuite/ChangeLog:
2021-02-22 Andre Vieira <andre.simoesdiasvieira@arm.com>

PR rtl-optimization/98791
* gcc.target/aarch64/sve/pr98791.c: New test.

(cherry picked from commit 4c31a3a6d31b6214ea774d403bf8ab7ebe1ea862)

Fortran: Fix problem with allocate initialization [PR99545].

2021-03-15 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran/ChangeLog

PR fortran/99545
* trans-stmt.c (gfc_trans_allocate): Mark the initialization
assignment by setting init_flag.

gcc/testsuite/ChangeLog

PR fortran/99545
* gfortran.dg/pr99545.f90: New test.

(cherry picked from commit 21ced2776a117924e52f6aab8b41afb613fef0e7)

Daily bump.

aarch64: Set AARCH64_EXTRA_TUNE_PREFER_ADVSIMD_AUTOVEC for Neoverse N2

This patch tweaks the Neoverse N2 tuning on the GCC 10 branch to have it
in line with GCC 8 and 9 to prefer AdvancedSIMD over SVE for
auto-vectorisation.

gcc/ChangeLog:

* config/aarch64/aarch64.c (neoversen2_tunings): Set
AARCH64_EXTRA_TUNE_PREFER_ADVSIMD_AUTOVEC tune_flags.

Daily bump.

aarch64: Add missing error_mark_node check [PR99381]

We were missing a check in function_resolver::require_vector_type to see
if the argument type was already invalid. This was causing us to attempt
to emit a diagnostic and subsequently ICE in print_type. Fixed thusly.

gcc/ChangeLog:

PR target/99381
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::require_vector_type): Handle error_mark_node.

gcc/testsuite/ChangeLog:

PR target/99381
* gcc.target/aarch64/pr99381.c: New test.

(cherry picked from commit a6bc1680a493de356d6a381718021c6a44401201)

Daily bump.

rs6000: Fix pr98959 testcase

It needs the int128 selector because it uses __int128, and the lp64
selector is the best we can do for -mcmodel=.

2021-03-10 Segher Boessenkool <segher@kernel.crashing.org>

gcc/testsuite/
* gcc.target/powerpc/pr98959.c: Add int128 and lp64 selectors.

(cherry picked from commit 8f316f41ce0fd90570f4d4444c29c639a322a0be)

rs6000: Fix invalid splits when using Altivec style addresses [PR98959]

The rs6000_emit_le_vsx_* functions assume they are not passed an Altivec
style "& ~16" address.  However, some of our expanders and splitters do
not verify we do not have an Altivec style address before calling those
functions, leading to an ICE.  The solution here is to guard the expanders
and splitters to ensure we do not call them if we're given an Altivec style
address.

2021-03-08  Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/98959
* config/rs6000/rs6000.c (rs6000_emit_le_vsx_permute): Add an assert
to ensure we do not have an Altivec style address.
* config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): Disable if passed
an Altivec style address.
(*vsx_le_perm_store_<mode>): Likewise.
(splitters after *vsx_le_perm_store_<mode>): Likewise.
(vsx_load_<mode>): Disable special expander if passed an Altivec
style address.
(vsx_store_<mode>): Likewise.

gcc/testsuite/
PR target/98959
* gcc.target/powerpc/pr98959.c: New test.

(cherry picked from commit cb25dea3ef2c7768007bffc56f0e31e1c42b44e2)

rs6000: Fix ICE in rs6000_init_builtins when compiling with -mcpu=440 [PR99279]

The initialization of compat builtins assumes the builtin we are creating
a compatible builtin for exists and ICEs if it doesn't. However, there are
valid reasons why some builtins are disabled for a particular compile.
In this case, the MMA builtins are disabled for -mcpu=440 (and other cpus),
so instead of ICEing, we should just skip adding the MMA compat builtin.

2021-02-25 Peter Bergner <bergner@linux.ibm.com>

gcc/
PR target/99279
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Replace assert
with an "if" test.

(cherry picked from commit 0159535adb0e7400f2c6922f14a7602f6b90cf69)

rs6000: Add support for compatibility built-ins

The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. It's too late to remove the
old names, so this patch renames the built-ins to the new names and then
adds support for creating compatibility built-ins (ie, multiple built-in
functions generate the same code) and then creates compatibility built-ins
using the old names.

2021-02-23 Peter Bergner <bergner@linux.ibm.com>

gcc/
* config/rs6000/mma.md (mma_assemble_pair): Rename from this...
(vsx_assemble_pair): ...to this.
* config/rs6000/rs6000-builtin.def (BU_MMA_V2, BU_MMA_V3,
BU_COMPAT): New macros.
(mma_assemble_pair): Rename from this...
(vsx_assemble_pair): ...to this.
(mma_disassemble_pair): Rename from this...
(vsx_disassemble_pair): ...to this.
(mma_assemble_pair): New compatibility built-in.
(mma_disassemble_pair): Likewise.
* config/rs6000/rs6000-call.c (struct builtin_compatibility): New.
(RS6000_BUILTIN_COMPAT): Define.
(bdesc_compat): New.
(rs6000_gimple_fold_mma_builtin): Use VSX_BUILTIN_ASSEMBLE_PAIR.
(rs6000_init_builtins): Register compatibility built-ins.
(mma_init_builtins): Use VSX_BUILTIN_ASSEMBLE_PAIR,
and VSX_BUILTIN_DISASSEMBLE_PAIR.
* doc/extend.texi (__builtin_mma_assemble_pair): Rename from this...
(__builtin_vsx_assemble_pair): ...to this.
(__builtin_mma_disassemble_pair): Rename from this...
(__builtin_vsx_disassemble_pair): ...to this.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-4.c: Add tests for
__builtin_vsx_assemble_pair and __builtin_vsx_disassemble_pair.
Add __has_builtin tests for built-ins.
Update expected instruction counts.

(cherry picked from commit 77ef995c1fbcab76a2a69b9f4700bcfd005d8e62)

rs6000: Fix invalid address used in MMA built-in function

The mma_assemble_input_operand predicate is too lenient on the memory
operands it will accept, leading to an ICE when illegitimate addresses
are passed in.  The solution is to only accept memory operands with
addresses that are valid for quad word memory accesses.  The test case
is a minimized test case from the Eigen library.  The creduced test case
is very noisy with respect to warnings, so the test case has added -w to
silence them.

2021-02-11  Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/99041
* config/rs6000/predicates.md (mma_assemble_input_operand): Restrict
memory addresses that are legal for quad word accesses.

gcc/testsuite/
PR target/99041
* g++.target/powerpc/pr99041.C: New test.

(cherry picked from commit 2432c47970024db6410708b582a901259dabaae1)

Fix Ada bootstrap on Cygwin64

gcc/ada/
PR bootstrap/94918
* raise-gcc.c: On Cygwin include mingw32.h to prevent
windows.h from including x86intrin.h or emmintrin.h.

Fix ICE on atomic enumeration type with LTO

This is a strange regression whereby an enumeration type declared as
atomic (or volatile) incorrectly triggers the ODR machinery for its
values in LTO mode.

gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): Build a TYPE_STUB_DECL
for the main variant of an enumeration type declared as volatile.
gcc/testsuite/
* gnat.dg/specs/lto25.ads: New test.

Daily bump.

Fix internal error on lambda function

This boils down to the RTL expander trying to take the address of a DECL
whose RTX is a register.

gcc/
PR c++/90448
* calls.c (initialize_argument_information): When the argument
is passed by reference, do not make a copy in a thunk only if
the argument is already in memory. Remove redundant test for
the case of callee copy.

Daily bump.

runtime: cast SIGSTKSZ to uintptr

PR go/99458
* libgo/runtime/proc.c: cast SIGSTKSZ to uintptr
In newer versions of glibc it is long, which causes a signed
comparison warning.

aarch64: Add internal tune flag to minimise VL-based scalar ops

This is a backport of the cse_sve_vl_constants tuning param to GCC 10.

Bootstrapped and tested on the branch on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants):
Define.
* config/aarch64/aarch64.md (add<mode>3): Force CONST_POLY_INT immediates
into a register when the above is enabled.
* config/aarch64/aarch64.c (neoversev1_tunings):
AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
(aarch64_rtx_costs): Use AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.

gcc/testsuite/

* gcc.target/aarch64/sve/cse_sve_vl_constants_1.c: New test.

Daily bump.

PR libfortran/99218 - matmul on temporary array accesses invalid memory

Do not invoke tuned rank-2 times rank-2 matmul if rank(b) == 1.

libgfortran/ChangeLog:

PR libfortran/99218
* m4/matmul_internal.m4: Invoke tuned matmul only for rank(b)>1.
* generated/matmul_c10.c: Regenerated.
* generated/matmul_c16.c: Likewise.
* generated/matmul_c4.c: Likewise.
* generated/matmul_c8.c: Likewise.
* generated/matmul_i1.c: Likewise.
* generated/matmul_i16.c: Likewise.
* generated/matmul_i2.c: Likewise.
* generated/matmul_i4.c: Likewise.
* generated/matmul_i8.c: Likewise.
* generated/matmul_r10.c: Likewise.
* generated/matmul_r16.c: Likewise.
* generated/matmul_r4.c: Likewise.
* generated/matmul_r8.c: Likewise.
* generated/matmulavx128_c10.c: Likewise.
* generated/matmulavx128_c16.c: Likewise.
* generated/matmulavx128_c4.c: Likewise.
* generated/matmulavx128_c8.c: Likewise.
* generated/matmulavx128_i1.c: Likewise.
* generated/matmulavx128_i16.c: Likewise.
* generated/matmulavx128_i2.c: Likewise.
* generated/matmulavx128_i4.c: Likewise.
* generated/matmulavx128_i8.c: Likewise.
* generated/matmulavx128_r10.c: Likewise.
* generated/matmulavx128_r16.c: Likewise.
* generated/matmulavx128_r4.c: Likewise.
* generated/matmulavx128_r8.c: Likewise.

gcc/testsuite/ChangeLog:

PR libfortran/99218
* gfortran.dg/matmul_21.f90: New test.

(cherry picked from commit b1bee29167df6b0fbc9a4c8d06e2acbf3367af47)

OpenACC: C/C++ - fix async parsing [PR99137]

gcc/c/ChangeLog:

PR c/99137
* c-parser.c (c_parser_oacc_clause_async): Reject comma expressions.

gcc/cp/ChangeLog:

PR c/99137
* parser.c (cp_parser_oacc_clause_async): Reject comma expressions.

gcc/testsuite/ChangeLog:

PR c/99137
* c-c++-common/goacc/asyncwait-1.c: Update dg-error; add
additional test.

(cherry picked from commit 6ddedd3efa3fe482f76a4037521a06b3ac9f2a8b)

Daily bump.