gcc.gnu.org Git - gcc.git/log

cfgcleanup: Fix -fcompare-debug issue in outgoing_edges_match [PR100254]

The following testcase fails with -fcompare-debug. The problem is that
outgoing_edges_match behaves differently between -g0 and -g, if
some load/store with REG_EH_REGION is followed by DEBUG_INSNs, the
REG_EH_REGION check is not done, while when there are no DEBUG_INSNs, it is
done.

We already compute last1 and last2 as BB_END (bb{1,2}) with skipped debug
insns and notes, so this patch just uses those.

2021-04-27 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/100254
* cfgcleanup.c (outgoing_edges_match): Check REG_EH_REGION on
last1 and last2 insns rather than BB_END (bb1) and BB_END (bb2) insns.

* g++.dg/opt/pr100254.C: New test.

(cherry picked from commit e600df51a15b2ec7a72731921a2464ffe59cf5ab)

vmsdbgout: Remove useless register keywords

register keyword was removed in C++17, and in vmsdbgout.c it served no
useful purpose.

2021-04-26 Jakub Jelinek <jakub@redhat.com>

PR debug/100255
* vmsdbgout.c (ASM_OUTPUT_DEBUG_STRING, vmsdbgout_begin_block,
vmsdbgout_end_block, lookup_filename, vmsdbgout_source_line): Remove
register keywords.

(cherry picked from commit 297bfacdb448c0d29b8dfac2818350b90902bc75)

cprop: Fix -fcompare-debug bug in constprop_register [PR100148]

The following testcase shows different behavior between -g and -g0
in constprop_register, if a flags register setter is separated
from a conditional jump using those flags with -g by a DEBUG_INSN.
As it uses just NEXT_INSN, for -g it will look at the DEBUG_INSN which is
not a conditional jump, while otherwise it would look at the conditional
jump and call cprop_jump.

2021-04-21 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/100148
* cprop.c (constprop_register): Use next_nondebug_insn instead of
NEXT_INSN.

* g++.dg/opt/pr100148.C: New test.

(cherry picked from commit 022f6ee3ad67ee30f62c8c2aeeb4156494f3284e)

Daily bump.

Minimal change to avoid incorrect strlen constant folding (PR 91914).

gcc/ChangeLog:
PR tree-optimization/91914
* expr.c (string_constant): Pass argument type to
fold_ctor_reference to guide it.

gcc/testsuite/ChangeLog:
PR tree-optimization/91914
* gcc.dg/strlenopt-79.c: New test.

Daily bump.

Darwin, libsanitizer : Support libsanitizer for x86_64-darwin20.

The sanitizer is supported for at least x86_64 and Darwin20.

libsanitizer/ChangeLog:

* configure.tgt: Allow x86_64 Darwin2x.

(cherry picked from commit 93f1186f7d3419c7af692295bd509d7bfaf170a1)

Darwin : Update libtool and dependencies for Darwin20 [PR97865]

The change in major version (and the increment from Darwin19 to 20)
caused libtool tests to fail which resulted in incorrect build settings
for shared libraries.

We take this opportunity to sort out the shared undefined symbols state
rather than propagating the current unsound behaviour into a new rev.

This change means that we default to the case that missing symbols are
considered an error, and if one wants to allow this intentionally, the
confiuration for that case should be set appropriately.

Three existing cases need undefined dynamic lookup:
libitm, where there is already a configuration mechanism to add the
flags.
libcc1, where we add simple configuration to add the flags for Darwin.
libsanitizer, where we can add to the existing extra flags.

Backported from 1352bc88a0525743c952197fb2db6e4f8c091cde and 5dc998933e7aa737f4a45a8a2885d42d5288d51a

libcc1/ChangeLog:

PR target/97865
* Makefile.am: Add dynamic_lookup to LD flags for Darwin.
* configure.ac: Test for Darwin host and set a flag.
* Makefile.in: Regenerate.
* configure: Regenerate.

libitm/ChangeLog:

PR target/97865
* configure.tgt: Add dynamic_lookup to XLDFLAGS for Darwin.
* configure: Regenerate.

libsanitizer/ChangeLog:

PR target/97865
* configure.tgt: Add dynamic_lookup to EXTRA_CXXFLAGS for
Darwin.
* configure: Regenerate.

ChangeLog:

PR target/97865
* libtool.m4: Update handling of Darwin platform link flags
for Darwin20.

gcc/ChangeLog:

PR target/97865
* configure: Regenerate.

libatomic/ChangeLog:

PR target/97865
* configure: Regenerate.

libbacktrace/ChangeLog:

PR target/97865
* configure: Regenerate.

libffi/ChangeLog:

PR target/97865
* configure: Regenerate.

libgfortran/ChangeLog:

PR target/97865
* configure: Regenerate.

libgomp/ChangeLog:

PR target/97865
* configure: Regenerate.

libhsail-rt/ChangeLog:

PR target/97865
* configure: Regenerate.

libobjc/ChangeLog:

PR target/97865
* configure: Regenerate.

libphobos/ChangeLog:

PR target/97865
* configure: Regenerate.

libquadmath/ChangeLog:

PR target/97865
* configure: Regenerate.

libssp/ChangeLog:

PR target/97865
* configure: Regenerate.

libstdc++-v3/ChangeLog:

PR target/97865
* configure: Regenerate.

libvtv/ChangeLog:

PR target/97865
* configure: Regenerate.

zlib/ChangeLog:

PR target/97865
* configure: Regenerate.

Co-Authored-By: Jakub Jelinek <jakub@redhat.com>

Darwin : Adjust handling of MACOSX_DEPLOYMENT_TARGET for macOS 11.

The shift to macOS version 11 also means that '11' without any
following '.x' is accepted as a valid version number. This adjusts
the validation code to accept this and map it to 11.0.0 which
matches what the clang toolchain appears to do.

gcc/ChangeLog:

* config/darwin-driver.c (validate_macosx_version_min): Allow
MACOSX_DEPLOYMENT_TARGET=11.
(darwin_default_min_version): Adjust warning spelling to avoid
an apostrophe.

(cherry picked from commit 96de87b99bf8fd1c46df373bbcc2f7d76db716ad)

Darwin : Update the kernel version to macOS version mapping.

With the change to macOS 11 and Darwin20, the algorithm for mapping
kernel version to macOS version has changed.

We now have darwin 20.X.Y => macOS 11.(X > 0 ? X - 1 : 0).??.
It currently unclear if the Y will be mapped to macOS patch version
and, if so, whether it will be one-based or 0-based.
Likewise, it's unknown if Darwin 21 will map to macOS 12, so these
entries are unchanged for the present.

gcc/ChangeLog:

* config/darwin-driver.c (darwin_find_version_from_kernel):
Compute the minor OS version from the minor kernel version.

(cherry picked from commit 0e1d4b3bfe260667fb8e055ebff2b34d8a2ec253)

Darwin: Darwin 20 is to be macOS 11 (Big Sur).

As per Nigel Tufnel's assertion "... this one goes to 11".

The various parts of the code that deal with mapping Darwin versions
to macOS (X) versions need updating to deal with  a major version of
11.

So now we have, for example:

Darwin  4 => macOS (X) 10.0
…
Darwin 14 => macOS (X) 10.10
...
Darwin 19 => macOS (X) 10.15

Darwin 20 => macOS  11.0

Because of the historical duplication of the "10" in macOSX 10.xx and
the number of tools that expect this, it is likely that system tools will
allow macos11.0 and/or macosx11.0 (despite that the latter makes little
sense).

Update the link test to cover Catalina (Darwin19/10.15) and
Big Sur (Darwin20/11.0).

gcc/ChangeLog:

* config/darwin-c.c: Allow for Darwin20 to correspond to macOS 11.
* config/darwin-driver.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/darwin-minversion-link.c: Allow for Darwin19 (macOS 10.15)
and Darwin20 (macOS 11.0).

(cherry picked from commit 556ab5125912fa2233986eb19d6cd995cf7de1d2)

Darwin: Adjust the PCH area to allow for 16384byte page size.

Newer versions of Darwin report pagesize 20 which means that we
need to adjust the aligment of the PCH area.

gcc/ChangeLog:

* config/host-darwin.c: Align pch_address_space to 16384.

(cherry picked from commit 590febb5f6624f78b36402a7c9a9c318978f1efa)

dwarf2unwind : Force the CFA after remember/restore pairs [44107/48097].

This address one of the more long-standing and serious regressions
for Darwin.  GCC emits unwind code by default on the assumption that
the unwinder will be (of have the same capability) as the one in the
current libgcc_s.  For Darwin platforms, this is not the case - some
of them are based on the libgcc_s from GCC-4.2.1 and some are using
the unwinder provided by libunwind (part of the LLVM project). The
latter implementation has gradually adopted a section that deals with
GNU unwind.

The most serious problem for some of the platform versions is in
handling DW_CFA_remember/restore_state pairs.  The DWARF description
talks about these in terms of saving/restoring register rows; this is
what GCC originally did (and is what the unwinders do for the Darwin
versions based on libgcc_s).

However, in r118068, this was changed so that not only the registers
but also the current frame address expression were saved.  The unwind
code assumes that the unwinder will do this; some of Darwin's unwinders
do not, leading to lockups etc.  To date, the only solution has been
to replace the system libgcc_s with a newer one which is not a viable
solution for many end-users (since that means overwritting the one
provided with the system installation).

The fix here provides a target hook that allows the target to specify
that the CFA should be reinstated after a DW_CFA_restore.  This fixes
the issue (and also the closed WONTFIX of 44107).

(As a matter of record, it also fixes reported Java issues if
backported to GCC-5).

gcc/ChangeLog:

PR target/44107
PR target/48097
* config/darwin-protos.h (darwin_should_restore_cfa_state): New.
* config/darwin.c (darwin_should_restore_cfa_state): New.
* config/darwin.h (TARGET_ASM_SHOULD_RESTORE_CFA_STATE): New.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Document TARGET_ASM_SHOULD_RESTORE_CFA_STATE.
* dwarf2cfi.c (connect_traces): If the target requests, restore
the CFA expression after a DW_CFA_restore.
* target.def (TARGET_ASM_SHOULD_RESTORE_CFA_STATE): New hook.

(cherry picked from commit 491d5b3cf8216f9285a67aa213b9a66b0035137b)

Darwin : Fix out-of-bounds access to df_regs_ever_live.

During changes made for LRA (or, perhaps, even before) we omitted
a check that the current register we are working on is a hard reg
before we tried to note its liveness.

A stage 1 built with fsanitize=address catches this, as does any
attempt to build master with clang and -std=c++11.

gcc/ChangeLog:

* config/darwin.c (machopic_legitimize_pic_address): Check
that the current pic register is one of the hard reg set
before setting liveness.

(cherry picked from commit 89bc1d4e7cdd0b2d012050134ad1d464ec357f0b)

Darwin : Avoid a C++ ODR violation seen with LTO.

We have a similar code pattern in darwin-c.c to one in c-pragmas
(most likely a cut & paste) with a struct type used locally to the
TU. With C++ we need to rename the type to avoid an ODR violation.

gcc/ChangeLog:

* config/darwin-c.c (struct f_align_stack): Rename
to type from align_stack to f_align_stack.
(push_field_alignment): Likewise.
(pop_field_alignment): Likewise.

(cherry picked from commit 3c52cd517a34b6b37eb17d4defd63bb31e60888b)

Darwin : Update libc function availability.

Darwin libc has sincos from 10.9 (darwin13) onwards.

gcc/ChangeLog:

* config/darwin.c (darwin_libc_has_function): Report sincos
available from 10.9.

(cherry picked from commit 2e746cebd9c6bb42b4892135942be7ddf865b3bf)

testsuite, Darwin: XFAIL runs for two timode conversion tests.

X86 Darwin fails these at present, because (to work around PR80556)
we insert libSystem ahead of libgcc. The libSystem implementation
has a similar bug to one that was fixed for GCC. We need to fix
80556 properly, and then this issue will go away - we will be able
to use the libgcc impl as intended.

XFAIL the run for now, to reduce testsuite noise.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/fp-int-convert-timode-3.c: XFAIL run.
* gcc.dg/torture/fp-int-convert-timode-4.c: Likewise.

(cherry picked from commit 94d4f4387de8264ee289cf71f692d59ca6ac36f8)

Darwin, machopic: Arrange to indirect IVARs when needed.

Objective C V2 (m64) IVAR offset refs from Apple GCC-4.x have an indirection
for m64 code on PPC (which is the only 64b user for Mach-O PIC). Apple GCC
4.x places the indirections in the .data section, however this seems to have
been unintentional - and we are placing the indirections in the non-lazy
symbol pointers section.

gcc/ChangeLog:

2020-03-08 Iain Sandoe <iain@sandoe.co.uk>

* config/darwin.c: Lookup Objective C metadata and force indirection
for IVAR refs.

Objective-C, NeXT ABI: Identify V2 IVAR refs by metadata.

For the NeXT 64b ABI, IVAR refs are supposed to be indirected for
Mach-O PIC. Identify them so that we can act as needed.

gcc/objc/ChangeLog:

2020-03-08 Iain Sandoe <iain@sandoe.co.uk>

* objc-next-metadata-tags.h (OCTI_RT_META_IVAR_REF): New.
(meta_ivar_ref): New.
* objc-next-runtime-abi-02.c
(next_runtime_abi_02_init_metadata_attributes): Create the
IVAR ref metadata identifier.
(ivar_offset_ref): Tag IVAR refs with specific metadata.

Darwin: Make sanitizer local vars linker-visible.

Another case where we need a linker-visible symbols in order to
preserve the ld64 atom model.  If these symbols are emitted as
'local' the linker cannot see that they are separate from any
global weak entry that precedes them.  This will cause the linker
to complain that there is (apparently) direct access to such a
weak global, preventing it from being replaced.

This is a short-term fix for the problem - we need generic
handling for relevant cases (that also does not pessimise objects
by emitting unnecessary symbols and relocations).

gcc/ChangeLog:

2020-05-23  Iain Sandoe  <iain@sandoe.co.uk>

* config/darwin.h (ASM_GENERATE_INTERNAL_LABEL):
Make ubsan_{data,type},ASAN symbols linker-visible.

Darwin, testsuite: Fix darwin-version-1.c fails with XCode 11.4.

From XCode 11.4 on 10.14/15 use of 10.6 and 10.7 is deprecated.
The tools issue diagnostics if -mmacosx-version-min= < 10.8

Adjust the testcase to avoid that usage on 10.14, 10.15 for now.

gcc/testsuite/ChangeLog:

2020-04-13 Iain Sandoe <iain@sandoe.co.uk>

* gcc.dg/darwin-version-1.c: Use -mmacosx-version-min= 10.8
for system versions 10.14 and 10.15.

Daily bump.

libstdc++: Fix inconsistent feature test macro

The __cpp_lib_constexpr_string feature test macro is not defined
consistently in <version> and <string>.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (__cpp_lib_constexpr_string):
Only define for C++17 and later.

(cherry picked from commit 3215d4f5b3d08e0087a88df9e155c779927ace1a)

libgfortran, configure : Adjust configure to autoconf-2.69

It appears that at some point the configure file has been
regenerated with autoconf-2.69b. This regenerates it with
autoconf-2.69.

libgfortran/ChangeLog:

* configure: Regenerate.

c++/98032 - add testcase

This adds another testcase for PR95719.

2021-04-30 Richard Biener <rguenther@suse.de>

PR c++/98032
* g++.dg/pr98032.C: New testcase.

(cherry picked from commit dfc70841eb0ca42637826177f329cf6c98ee00ad)

c++: Fix ICE with using and virtual function. [PR95719]

conversion_path points to the base where we found the using-declaration, not
where the function is actually a member; look up the actual base. And then
maybe look back to the derived class if the base is primary.

gcc/cp/ChangeLog:

PR c++/95719
* call.c (build_over_call): Look up the overrider in base_binfo.
* class.c (lookup_vfn_in_binfo): Look through BINFO_PRIMARY_P.

gcc/testsuite/ChangeLog:

PR c++/95719
* g++.dg/tree-ssa/final4.C: New test.

Use safe_dyn_cast instead of dyn_cast in find_loop_guard to fix PR92608.

2019-11-22 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>

PR tree-optimization/92608
* tree-ssa-loop-unswitch.c (find_loop_guard): Use safe_dyn_cast instead
of dyn_cast.

gcc/testsuite/
* gcc.dg/torture/pr92608.c: New test.

(cherry picked from commit b30e83f809b2aa65222eb969f8b4523e5e1961f2)

bootstrap/87338 - gcc 8.2 fails to bootstrap on ia64

This uses ASM_OUTPUT_DEBUG_LABEL instead of ASM_GENERATE_INTERNAL_LABEL
and ASM_OUTPUT_LABEL.

2019-05-21 James Clarke <jrtc27@jrtc27.com>

PR bootstrap/87338
* dwarf2out.c (dwarf2out_inline_entry): Use ASM_OUTPUT_DEBUG_LABEL
instead of ASM_GENERATE_INTERNAL_LABEL and ASM_OUTPUT_LABEL.

Daily bump.

libstdc++: Add missing 'inline' specifiers to net::ip functions [PR 100259]

libstdc++-v3/ChangeLog:

PR libstdc++/100259
* include/experimental/internet (net::ip::make_error_code)
(net::ip::make_error_condition, net::ip::make_network_v4)
(net::ip::operator==(const udp&, const udp&)): Add 'inline'.

(cherry picked from commit 3f4aa4579a6c03e0a0b0a6aec68aa5a301264d45)

libstdc++: Define __cpp_lib_constexpr_string macro

As noted in r11-1339-gb6ab9ecd550227684643b41e9e33a4d3466724d8 we define
a non-standard __cpp_lib_constexpr_char_traits feature test macro to
indicate support for P0426R1 and P1032R1. At some point last year the
__cpp_lib_constexpr_string macro was retconned to indicate support for
those papers. This adds the new macro (which we didn't previously
define, because it referred to P0980R1 "Making std::string constexpr"
which we don't support).

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (__cpp_lib_constexpr_string): Define.
* include/std/version (__cpp_lib_constexpr_string): Define.
* testsuite/21_strings/char_traits/requirements/constexpr_functions_c++17.cc:
Check for __cpp_lib_constexpr_string.
* testsuite/21_strings/char_traits/requirements/version.cc: New test.

(cherry picked from commit 3da80ed7efd582575e7850a403ce693ec882d087)

Daily bump.

i386: Fix atomic FP peepholes [PR100182]

64bit loads to/stores from x87 and SSE registers are atomic also on 32-bit
targets, so there is no need for additional atomic moves to a temporary
register.

Introduced load peephole2 patterns assume that there won't be any additional
loads from the load location outside the peepholed sequence and wrongly
removed the source location initialization.

OTOH, introduced store peephole2 patterns assume there won't be any additional
loads from the stored location outside the peepholed sequence and wrongly
removed the destination location initialization.  Note that we can't use plain
x87 FST instruction to initialize destination location because FST converts
the value to the double-precision format, changing bits during move.

The patch restores removed initializations in load and store patterns.
Additionally, plain x87 FST in store peephole2 patterns is prevented by
limiting the store operand source to SSE registers.

2021-04-23  Uroš Bizjak  <ubizjak@gmail.com>

gcc/
PR target/100182
* config/i386/sync.md (FILD_ATOMIC/FIST_ATOMIC FP load peephole2):
Copy operand 3 to operand 4.  Use sse_reg_operand
as operand 3 predicate.
(FILD_ATOMIC/FIST_ATOMIC FP load peephole2 with mem blockage): Ditto.
(LDX_ATOMIC/STX_ATOMIC FP load peephole2): Ditto.
(LDX_ATOMIC/LDX_ATOMIC FP load peephole2 with mem blockage): Ditto.
(FILD_ATOMIC/FIST_ATOMIC FP store peephole2):
Copy operand 1 to operand 0.
(FILD_ATOMIC/FIST_ATOMIC FP store peephole2 with mem blockage): Ditto.
(LDX_ATOMIC/STX_ATOMIC FP store peephole2): Ditto.
(LDX_ATOMIC/LDX_ATOMIC FP store peephole2 with mem blockage): Ditto.

gcc/testsuite/

PR target/100182
* gcc.target/i386/pr100182.c: New test.
* gcc.target/i386/pr71245-1.c (dg-final): Xfail scan-assembler-not.
* gcc.target/i386/pr71245-2.c (dg-final): Ditto.

(cherry picked from commit d2324a5ab3ff097864ae6828cb1db4dd013c70d1)

tree-optimization/99954 - fix loop distribution memcpy classification

This fixes bogus classification of a copy as memcpy.  We cannot use
plain dependence analysis to decide between memcpy and memmove when
it computes no dependence.  Instead we have to try harder later which
the patch does for the gcc.dg/tree-ssa/ldist-24.c testcase by resorting
to tree-affine to compute the difference between src and dest and
compare against the copy size.

2021-04-07  Richard Biener  <rguenther@suse.de>

PR tree-optimization/99954
* tree-loop-distribution.c: Include tree-affine.h.
(generate_memcpy_builtin): Try using tree-affine to prove
non-overlap.
(loop_distribution::classify_builtin_ldst): Always classify
as PKIND_MEMMOVE.

* gcc.dg/torture/pr99954.c: New testcase.

Daily bump.

PR fortran/100154 - ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131

Add appropriate static checks for the character and status arguments to
the GNU Fortran intrinsic extensions fget[c], fput[c]. Extend variable
check to allow a function reference having a data pointer result.

gcc/fortran/ChangeLog:

PR fortran/100154
* check.c (variable_check): Allow function reference having a data
pointer result.
(arg_strlen_is_zero): New function.
(gfc_check_fgetputc_sub): Add static check of character and status
arguments.
(gfc_check_fgetput_sub): Likewise.
* intrinsic.c (add_subroutines): Fix argument name for the
character argument to intrinsic subroutines fget[c], fput[c].

gcc/testsuite/ChangeLog:

PR fortran/100154
* gfortran.dg/pr100154.f90: New test.

(cherry picked from commit d0e7833b94953ba6b4a915150666969ad9fc66af)

Daily bump.

[PATCH] Backport fix for PR target/98952

The test in the PowerPC 32-bit trampoline support is backwards.  It aborts
if the trampoline size is greater than the expected size.  It should abort
when the trampoline size is less than the expected size.  I fixed the test
so the operands are reversed.  I then folded the load immediate into the
compare instruction.

I verified this by creating a 32-bit trampoline program and manually
changing the size of the trampoline to be 48 instead of 40.  The program
aborted with the larger size.  I updated this code and ran the test again
and it passed.

I added a test case that runs on PowerPC 32-bit Linux systems and it calls
the __trampoline_setup function with a larger buffer size than the
compiler uses.  The test is not run on 64-bit systems, since the function
__trampoline_setup is not called.  I also limited the test to just Linux
systems, in case trampolines are handled differently in other systems.

libgcc/
2021-04-26  Michael Meissner  <meissner@linux.ibm.com>

PR target/98952
* config/rs6000/tramp.S (__trampoline_setup, elfv1 #ifdef): Fix
trampoline size comparison in 32-bit by reversing test and
combining load immediate with compare.  Fix backported from trunk
change on 4/23, 886b6c1e8af502b69e3f318b9830b73b88215878.
(__trampoline_setup, elfv2 #ifdef): Fix trampoline size comparison
in 32-bit by reversing test and combining load immediate with
compare.

gcc/testsuite/
2021-04-26  Michael Meissner  <meissner@linux.ibm.com>

PR target/98952
* gcc.target/powerpc/pr98952.c: New test.  Test backported from
trunk change on 4/23, 886b6c1e8af502b69e3f318b9830b73b88215878.

Daily bump.

Check for matching CONST_VECTOR encodings [PR99929]

PR99929 is one of those “how did we get away with this for so long”
bugs: the equality routines weren't checking whether two variable-length
CONST_VECTORs had the same encoding.  This meant that:

   { 1, 0, 0, 0, 0, 0, ... }

would appear to be equal to:

   { 1, 0, 1, 0, 1, 0, ... }

since both are represented using the elements { 1, 0 }.

gcc/
PR rtl-optimization/99929
* rtl.h (same_vector_encodings_p): New function.
* cse.c (exp_equiv_p): Check that CONST_VECTORs have the same encoding.
* cselib.c (rtx_equal_for_cselib_1): Likewise.
* jump.c (rtx_renumbered_equal_p): Likewise.
* lra-constraints.c (operands_match_p): Likewise.
* reload.c (operands_match_p): Likewise.
* rtl.c (rtx_equal_p_cb, rtx_equal_p): Likewise.

(cherry picked from commit a87d3f964df31d4fbceb822c6d293e85c117d992)

aarch64: Tweak post-RA handling of CONST_INT moves [PR98136]

This PR is a regression caused by r8-5967, where we replaced
a call to aarch64_internal_mov_immediate in aarch64_add_offset
with a call to aarch64_force_temporary, which in turn uses the
normal emit_move_insn{,_1} routines.

The problem is that aarch64_add_offset can be called while
outputting a thunk, where we require all instructions to be
valid without splitting. However, the move expanders were
not splitting CONST_INT moves themselves.

I think the right fix is to make the move expanders work
even in this scenario, rather than require callers to handle
it as a special case.

gcc/
PR target/98136
* config/aarch64/aarch64.md (mov<mode>): Pass multi-instruction
CONST_INTs to aarch64_expand_mov_immediate when called after RA.

gcc/testsuite/
PR target/98136
* g++.dg/pr98136.C: New test.

(cherry picked from commit 48c79f054bf435051c95ee093c45a0f8c9de5b4e)

lra: Avoid cycling on certain subreg reloads [PR96796]

This PR is about LRA cycling for a reload of the form:

----------------------------------------------------------------------------
Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
      Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287
      Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288
  103: r203:SI=r288:SI<<0x1+r196:DI#0
      REG_DEAD r196:DI
    Inserting slow/invalid mem reload before:
  316: r287:DI=[r105:DI*0x8+r140:DI]
  317: r288:SI=r287:DI#0
----------------------------------------------------------------------------

The problem is with r287.  We rightly give it a broad starting class of
POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
However, we never make forward progress towards narrowing it down to
a specific choice of class (POINTER_REGS or FP_REGS).

I think in practice we rely on two things to narrow a reload pseudo's
class down to a specific choice:

(1) a restricted class is specified when the pseudo is created

    This happens for input address reloads, where the class is taken
    from the target's chosen base register class.  It also happens
    for simple REG reloads, where the class is taken from the chosen
    alternative's constraints.

(2) uses of the reload pseudo as a direct input operand

    In this case get_reload_reg tries to reuse the existing register
    and narrow its class, instead of creating a new reload pseudo.

However, neither occurs here.  As described above, r287 rightly
starts out with a wide choice of class, ultimately derived from
ALL_REGS, so we don't get (1).  And as the comments in the PR
explain, r287 is never used as an input reload, only the subreg is,
so we don't get (2):

----------------------------------------------------------------------------
         Choosing alt 13 in insn 317:  (0) r  (1) w {*movsi_aarch64}
      Creating newreg=291, assigning class FP_REGS to r291
  317: r288:SI=r291:SI
    Inserting insn reload before:
  320: r291:SI=r287:DI#0
----------------------------------------------------------------------------

IMO, in this case we should rely on the reload of r316 to narrow
down the class of r278.  Currently we do:

----------------------------------------------------------------------------
         Choosing alt 7 in insn 316:  (0) r  (1) m {*movdi_aarch64}
      Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289
  316: r289:DI=[r105:DI*0x8+r140:DI]
    Inserting insn reload after:
  318: r287:DI=r289:DI
---------------------------------------------------

i.e. we create a new pseudo register r289 and give *that* pseudo
GENERAL_REGS instead.  This is because get_reload_reg only narrows
down the existing class for OP_IN and OP_INOUT, not OP_OUT.

But if we have a reload pseudo in a reload instruction and have chosen
a specific class for the reload pseudo, I think we should simply install
it for OP_OUT reloads too, if the class is a subset of the existing class.
We will need to pick such a register whatever happens (for r289 in the
example above).  And as explained in the PR, doing this actually avoids
an unnecessary move via the FP registers too.

This backport is less aggressive than the trunk version, in that the new
code reuses the test for a reload move from in_class_p.  We will therefore
only narrow OP_OUT classes if the instruction is a register move or memory
load that was generated by LRA itself.

gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.

gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.

vect: Avoid generating out-of-range shifts [PR98302]

In this testcase we end up with:

  unsigned long long x = ...;
  char y = (char) (x << 37);

The overwidening pattern realised that only the low 8 bits
of x << 37 are needed, but then tried to turn that into:

  unsigned long long x = ...;
  char y = (char) x << 37;

which gives an out-of-range shift.  In this case y can simply
be replaced by zero, but as the comment in the patch says,
it's kind-of awkward to do that in the middle of vectorisation.

Most of the overwidening stuff is about keeping operations
as narrow as possible, which is important for vectorisation
but could be counter-productive for scalars (especially on
RISC targets).  In contrast, optimising y to zero in the above
feels like an independent optimisation that would benefit scalar
code and that should happen before vectorisation.

gcc/
PR tree-optimization/98302
* tree-vect-patterns.c (vect_determine_precisions_from_users): Make
sure that the precision remains greater than the shift count.

gcc/testsuite/
PR tree-optimization/98302
* gcc.dg/vect/pr98302.c: New test.

(cherry picked from commit 58a12b0eadac62e691fcf7325ab2bc2c93d46b61)

expr: Fix REDUCE_BIT_FIELD for constants [PR95694, PR96151]

This is yet another PR caused by constant integer rtxes not storing
a mode.  We were calling REDUCE_BIT_FIELD on a constant integer that
didn't fit in poly_int64, and then tripped the as_a<scalar_int_mode>
assert on VOIDmode.

AFAICT REDUCE_BIT_FIELD is always passed rtxes that have TYPE_MODE
(rather than some other mode) and it just fills in the redundant
sign bits of that TYPE_MODE value.  So it should be safe to get
the mode from the type instead of the rtx.  The patch does that
and asserts that the modes agree, where information is available.

That on its own is enough to fix the bug, but we might as well
extend the folding case to all constant integers, not just those
that fit poly_int64.

gcc/
PR middle-end/95694
* expr.c (expand_expr_real_2): Get the mode from the type rather
than the rtx, and assert that it is consistent with the mode of
the rtx (where known).  Optimize all constant integers, not just
those that can be represented in poly_int64.

gcc/testsuite/
PR middle-end/95694
* gcc.dg/pr95694.c: New test.

(cherry picked from commit 760df6d296b8fc59796f42dca5eb14012fbfa28b)

Daily bump.

c++: Fix folding of non-dependent BASELINKs [PR95468]

Here, the problem ultimately seems to be that tsubst_copy_and_build,
when called with empty args as we do during non-dependent expression
folding, doesn't touch BASELINKs at all: it delegates to tsubst_copy
which then immediately exits early due to the empty args. This means
that the CAST_EXPR int(1) in the BASELINK A::condition<int(1)> never
gets folded (as part of folding of the overall CALL_EXPR), which later
causes us to crash when performing overload resolution of the rebuilt
CALL_EXPR (which is still in terms of this templated BASELINK).

This doesn't happen when condition() is a namespace-scope function
because then condition<int(1)> is represented by a TEMPLATE_ID_EXPR
rather than by a BASELINK, which does get handled directly from
tsubst_copy_and_build.

This patch fixes this issue by having tsubst_copy_and_build handle
BASELINK directly rather than delegating to tsubst_copy, so that it
processes BASELINKs even when args is empty.

gcc/cp/ChangeLog:

PR c++/95468
* pt.c (tsubst_copy_and_build) <case BASELINK>: New case, copied
over from tsubst_copy.

gcc/testsuite/ChangeLog:

PR c++/95468
* g++.dg/template/non-dependent15.C: New test.

(cherry picked from commit 5bd7afb71fca3a5a6e9f8586d86903bae1849193)

c++: cxx_eval_vec_init after zero-initialization [PR96282]

In the first testcase below, expand_aggr_init_1 sets up t's default
constructor such that the ctor first zero-initializes the entire base b,
followed by calling b's default constructor, the latter of which just
default-initializes the array member b::m via a VEC_INIT_EXPR.

So upon constexpr evaluation of this latter VEC_INIT_EXPR, ctx->ctor is
nonempty due to the prior zero-initialization, and we proceed in
cxx_eval_vec_init to append new constructor_elts to the end of ctx->ctor
without first checking if a matching constructor_elt already exists.
This leads to ctx->ctor having two matching constructor_elts for each
index.

This patch fixes this issue by truncating a zero-initialized array
CONSTRUCTOR in cxx_eval_vec_init_1 before we begin appending array
elements to it. We propagate its zeroed out state during evaluation by
clearing CONSTRUCTOR_NO_CLEARING on each new appended aggregate element.

gcc/cp/ChangeLog:

PR c++/96282
* constexpr.c (cxx_eval_vec_init_1): Truncate ctx->ctor and
then clear CONSTRUCTOR_NO_CLEARING on each appended element
initializer if we're initializing a previously zero-initialized
array object.

gcc/testsuite/ChangeLog:

PR c++/96282
* g++.dg/cpp0x/constexpr-array26.C: New test.
* g++.dg/cpp0x/constexpr-array27.C: New test.

Co-authored-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit d21252de6c81ed236d8981d47b9a57dc4f1c6d57)

libstdc++: Disable test for non-gthreads targets [PR 100180]

The Networking TS code still requires std::mutex on this branch, so the
tests shouldn't run on targets without gthreads.

PR libstdc++/100180
* testsuite/experimental/net/internet/address/v6/members.cc:
Require gthreads.

(cherry picked from commit f0d22d31ceb1373f17045f2527ef2f2251d93be8)

c++: overload sets and placeholder return type [PR64194]

In the testcase below, template argument deduction for the call
g(id<int>) goes wrong because the functions in the overload set id<int>
each have a yet-undeduced auto return type, and this undeduced return
type makes try_one_overload fail to match up any of the overloads with
g's parameter type, leading to g's template argument going undeduced and
to the overload set going unresolved.

This patch fixes this issue by performing return type deduction via
instantiation before doing try_one_overload, in a manner similar to what
resolve_address_of_overloaded_function does.

gcc/cp/ChangeLog:

PR c++/64194
* pt.c (resolve_overloaded_unification): If the function
template specialization has a placeholder return type,
then instantiate it before attempting unification.

gcc/testsuite/ChangeLog:

PR c++/64194
* g++.dg/cpp1y/auto-fn60.C: New test.

(cherry picked from commit 2c58f5cadfac338a67723fd6e41c9097760c4a33)

testsuite/100176 - fix struct-layout-1_generate.c compile

With -Werror=return-type we run into compile fails complaining about
missing return stmts.

2021-04-21 Richard Biener <rguenther@suse.de>

PR testsuite/100176
* g++.dg/compat/struct-layout-1_generate.c: Add missing return.
* gcc.dg/compat/struct-layout-1_generate.c: Likewise.

(cherry picked from commit d8f953819e8f72e646f22c7803d79bd2b56d1e30)

Daily bump.

sanitizer: Fix asan against glibc 2.34 [PR100114]

As mentioned in the PR, SIGSTKSZ is no longer a compile time constant in
glibc 2.34 and later, so
static const uptr kAltStackSize = SIGSTKSZ * 4;
needs dynamic initialization, but is used by a function called indirectly
from .preinit_array and therefore before the variable is constructed.
This results in using 0 size instead and all asan instrumented programs
die with:
==91==ERROR: AddressSanitizer failed to allocate 0x0 (0) bytes of SetAlternateSignalStack (error code: 22)

Here is a cherry-pick from upstream to fix this.

2021-04-17 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/100114
* sanitizer_common/sanitizer_posix_libcdep.cc: Cherry-pick
llvm-project revisions 82150606fb11d28813ae6da1101f5bda638165fe
and b93629dd335ffee2fc4b9b619bf86c3f9e6b0023.

(cherry picked from commit 950bac27d63c1c2ac3a6ed867692d6a13f21feb3)

intl: Add --enable-host-shared support [PR100096]

As mentioned in the PR, building gcc with jit enabled and
--enable-host-shared doesn't work on NetBSD/i?86, as libgccjit.so.0
has text relocations.
The r0-125846-g459260ecf8b420b029601a664cdb21c185268ecb changes
added --enable-host-shared support to various libraries, but didn't
add it to intl/ subdirectory; on Linux it isn't really needed, because
all: all-no
all-no: #nothing
but on other OSes intl/libintl.a is built.

The following patch makes sure it is built with -fPIC when
--enable-host-shared is used.

2021-04-16 Jakub Jelinek <jakub@redhat.com>

PR jit/100096
* configure.ac: Add --enable-host-shared support.
* Makefile.in: Update copyright. Add @PICFLAG@ to CFLAGS.
* configure: Regenerated.

(cherry picked from commit a11f31102706e33f66b60367d6863613ab3bd051)

vectorizer: Remove dead scalar .COND_* calls from vectorized loops [PR99767]

The following testcase ICEs because disabling of DCE means there are dead
stmts in the loop (though, in theory they could become dead only shortly
before if-conv through some optimization), ifcvt which goes through all
stmts in the loop if-converts them into .COND_DIV etc. internal fn calls
in the copy of the loop meant for vectorization only, the loop is
successfully vectorized but the particular .COND_* call is not because
it isn't a live statement and the scalar .COND_* remains in the IL until
expansion where it ICEs because these ifns only support vectors and not
scalars.

These ifns are similar to .MASK_{LOAD,STORE} in this behavior.

One possible fix could be to expand scalar versions of them during
expansion, basically undoing what if-conv did to create them, i.e.
expand them as the lhs = else; if (mask) { lhs = statement; } or so.

For .MASK_LOAD we have code to replace them in vect_transform_loop already
though (not needed for .MASK_STORE, as stores should be always live
and thus always vectorized), so this patch instead replaces .COND_*
similarly to .MASK_LOAD in that loop, with the small difference
that lhs = .MASK_LOAD (...); is replaced by lhs = 0; while
lhs = .COND_* (..., else_arg); is replaced by lhs = else_arg.
The statement must be dead, otherwise it would be vectorized, so I think
it is not a big deal we don't turn it back into multiple basic blocks etc.
(and it might be not possible to do that at that point).

2021-04-16 Jakub Jelinek <jakub@redhat.com>

PR target/99767
* tree-vect-loop.c (vect_transform_loop): Don't remove just
dead scalar .MASK_LOAD calls, but also dead .COND_* calls - replace
them by their last argument.

* gcc.target/aarch64/pr99767.c: New test.

(cherry picked from commit 1730b5d6793127b1a47970f44d60da8082bab514)

c++: Fix up handling of structured bindings in extract_locals_r [PR99833]

The following testcase ICEs in tsubst_decomp_names because the assumptions
that the structured binding artificial var is followed in DECL_CHAIN by
the corresponding structured binding vars is violated.
I've tracked it to extract_locals* which is done for the constexpr
IF_STMT. extract_locals_r when it sees a DECL_EXPR adds that decl
into a hash set so that such decls aren't returned from extract_locals*,
but in the case of a structured binding that just means the artificial var
and not the vars corresponding to structured binding identifiers.
The following patch fixes it by pushing not just the artificial var
for structured bindings but also the other vars.

2021-04-16 Jakub Jelinek <jakub@redhat.com>

PR c++/99833
* pt.c (extract_locals_r): When handling DECL_EXPR of a structured
binding, add to data.internal also all corresponding structured
binding decls.

* g++.dg/cpp1z/pr99833.C: New test.

(cherry picked from commit 06d50ebc9fb2761ed2bdda5e76adb4d47a8ca983)

combine: Fix up expand_compound_operation [PR99905]

The following testcase is miscompiled on x86_64-linux.
expand_compound_operation is called on
(zero_extract:DI (mem/c:TI (reg/f:DI 16 argp) [3 i+0 S16 A128])
    (const_int 16 [0x10])
    (const_int 63 [0x3f]))
so mode is DImode, inner_mode is TImode, pos 63, len 16 and modewidth 64.

A couple of lines above the problematic spot we have:
  if (modewidth >= pos + len)
    {
      tem = gen_lowpart (mode, XEXP (x, 0));
where the code uses gen_lowpart and then shift left/right to extract it
in mode.  But the guarding condition is false - 64 >= 63 + 16
and so we enter the next condition, where the code shifts XEXP (x, 0)
right by pos and then adds AND.  It does so incorrectly though.
Given the modewidth < pos + len, inner_mode must be necessarily larger
than mode and XEXP (x, 0) has the innermode, but it was calling
simplify_shift_const with mode rather than inner_mode, which meant
inconsistent arguments to simplify_shift_const and in this case made
a DImode MEM shift out of it.

The following patch fixes it, by doing the shift in inner_mode properly
and then after the shift doing the lowpart subreg and masking already
in mode.

2021-04-13  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/99905
* combine.c (expand_compound_operation): If pos + len > modewidth,
perform the right shift by pos in inner_mode and then convert to mode,
instead of trying to simplify a shift of rtx with inner_mode by pos
as if it was a shift in mode.

* gcc.target/i386/pr99905.c: New test.

(cherry picked from commit c965254e5af9dc68444e0289250c393ae0cd6131)

combine: Don't fold away side-effects in simplify_and_const_int_1 [PR99830]

Here is an alternate patch for the PR99830 bug.
As discussed on IRC and in the PR, the reason why a (clobber:TI (const_int 0))
has been propagated into the debug insns is that it got optimized away
during simplification from the i3 instruction pattern.

And that happened because
simplify_and_const_int_1 (SImode, varop, 255)
with varop of
(ashift:SI (subreg:SI (and:TI (clobber:TI (const_int 0 [0]))
                              (const_int 255 [0xff])) 0)
           (const_int 16 [0x10]))
was called and through nonzero_bits determined that (whatever << 16) & 255
is const0_rtx.
It is, but if there are side-effects in varop and such clobbers are
considered as such, we shouldn't optimize those away.

2021-04-13  Jakub Jelinek  <jakub@redhat.com>

PR debug/99830
* combine.c (simplify_and_const_int_1): Don't optimize varop
away if it has side-effects.

* gcc.dg/pr99830.c: New test.

(cherry picked from commit 4ac7483ede91fef7cfd548ff6e30e46eeb9d95ae)

c: Avoid clobbering TREE_TYPE (error_mark_node) [PR99990]

The following testcase ICEs during error recovery, because finish_decl
overwrites TREE_TYPE (error_mark_node), which better should stay always
to be error_mark_node.

2021-04-10 Jakub Jelinek <jakub@redhat.com>

PR c/99990
* c-decl.c (finish_decl): Don't overwrite TREE_TYPE of
error_mark_node.

* gcc.dg/pr99990.c: New test.

(cherry picked from commit 91e076f3a66c1c9f6aa51e9d53d07803606e3bf1)

expand: Fix up LTO ICE with COMPOUND_LITERAL_EXPR [PR99849]

The gimplifier optimizes away COMPOUND_LITERAL_EXPRs, but they can remain
in the form of ADDR_EXPR of COMPOUND_LITERAL_EXPRs in static initializers.
By the TREE_STATIC check I meant to check that the underlying decl of
the compound literal is a global rather than automatic variable which
obviously can't be referenced in static initializers, but unfortunately
with LTO it might end up in another partition and thus be DECL_EXTERNAL
instead.

2021-04-10 Jakub Jelinek <jakub@redhat.com>

PR lto/99849
* expr.c (expand_expr_addr_expr_1): Test is_global_var rather than
just TREE_STATIC on COMPOUND_LITERAL_EXPR_DECLs.

* gcc.dg/lto/pr99849_0.c: New test.

(cherry picked from commit 2e57bc7eedb084869d17fe07b538d907b8fee819)

rtlanal: Another fix for VOIDmode MEMs [PR98601]

This is a sequel to the PR85022 changes, inline-asm can (unfortunately)
introduce VOIDmode MEMs and in PR85022 they have been changed so that
we don't pretend we know their size (as opposed to assuming they have
zero size).

This time we ICE in rtx_addr_can_trap_p_1 because it assumes that
all memory but BLKmode has known size.  The patch just treats VOIDmode
MEMs like BLKmode in that regard.  And, the STRICT_ALIGNMENT change
is needed because VOIDmode has GET_MODE_SIZE of 0 and we don't want to
check if something is a multiple of 0.

2021-04-10  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/98601
* rtlanal.c (rtx_addr_can_trap_p_1): Allow in assert unknown size
not just for BLKmode, but also for VOIDmode.  For STRICT_ALIGNMENT
unaligned_mems handle VOIDmode like BLKmode.

* gcc.dg/torture/pr98601.c: New test.

(cherry picked from commit e68ac8c2b46997af1464f2549ac520a192c928b1)

dse: Fix up hard reg conflict checking in replace_read [PR99863]

Since PR37922 fix RTL DSE has hard register conflict checking
in replace_read, so that if the replacement sequence sets (or typically just
clobbers) some hard register (usually condition codes) we verify that
hard register is not live.
Unfortunately, it compares the hard reg set clobbered/set by the sequence
(regs_set) against the currently live hard register set, but it then
emits the insn sequence not at the current insn position, but before
store_insn->insn.
So, we should not compare against the current live hard register set,
but against the hard register live set at the point of the store insn.
Fortunately, we already have that remembered in store_insn->fixed_regs_live.

In addition to bootstrapping/regtesting this patch on x86_64-linux and
i686-linux, I've also added statistics gathering and it seems the only
place where we end up rejecting the replace_read is the newly added
testcase (the PR37922 is no longer effective at that) and fixed_regs_live
has been always non-NULL at the if (store_insn->fixed_regs_live) spot.
Rather than having there an assert, I chose to just keep regs_set
as is, which means in that hypothetical case where fixed_regs_live wouldn't
be computed for some store we'd still accept sequences that don't
clobber/set any hard registers and just punt on those that clobber/set
those.

2021-04-03 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/99863
* dse.c (replace_read): Drop regs_live argument. Instead of
regs_live, use store_insn->fixed_regs_live if non-NULL,
otherwise punt if insns sequence clobbers or sets any hard
registers.

* gcc.target/i386/pr99863.c: New test.

(cherry picked from commit 7a2f91d413eb7a3eb0ba52c7ac9618a35addd12a)

c++: Fix ICE on PTRMEM_CST in lambda in inline var initializer [PR99790]

The following testcase ICEs (since the addition of inline var support),
because the lambda contains PTRMEM_CST but finish_function is called for the
lambda quite early during parsing it (from finish_lambda_function) when
the containing class is still incomplete. That means that during
genericization cplus_expand_constant keeps the PTRMEM_CST unmodified, but
later nothing lowers it when the class is finalized.
Using sizeof etc. on the class in such contexts is rejected by both g++ and
clang++, and when the PTRMEM_CST appears e.g. in static var initializers
rather than in functions, we handle it correctly because c_parse_final_cleanups
-> lower_var_init will handle those cplus_expand_constant when all classes
are already finalized.

The following patch fixes it by calling cplus_expand_constant again during
gimplification, as we are now unconditionally unit at a time, I'd think
everything that could be completed will be before we start gimplification.

2021-03-30 Jakub Jelinek <jakub@redhat.com>

PR c++/99790
* cp-gimplify.c (cp_gimplify_expr): Handle PTRMEM_CST.

* g++.dg/cpp1z/pr99790.C: New test.

(cherry picked from commit 7cdd30b43a63832d6f908b2dd64bd19a0817cd7b)

fold-const: Fix ICE in extract_muldiv_1 [PR99777]

extract_muldiv{,_1} is apparently only prepared to handle scalar integer
operations, the callers ensure it by only calling it if the divisor or
one of the multiplicands is INTEGER_CST and because neither multiplication
nor division nor modulo are really supported e.g. for pointer types, nullptr
type etc.  But the CASE_CONVERT handling doesn't really check if it isn't
a cast from some other type kind, so on the testcase we end up trying to
build MULT_EXPR in POINTER_TYPE which ICEs.  A few years ago Marek has
added ANY_INTEGRAL_TYPE_P checks to two spots, but the code uses
TYPE_PRECISION which means something completely different for vector types,
etc.
So IMNSHO we should just punt on conversions from non-integrals or
non-scalar integrals.

2021-03-29  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/99777
* fold-const.c (extract_muldiv_1): For conversions, punt on casts from
types other than scalar integral types.

* g++.dg/torture/pr99777.C: New test.

(cherry picked from commit afe9a630eae114665e77402ea083201c9d406e99)

dwarf2cfi: Defer queued register saves some more [PR99334]

On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
   0:   4c 8d 54 24 08          lea    0x8(%rsp),%r10
   5:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
   9:   41 ff 72 f8             pushq  -0x8(%r10)
   d:   55                      push   %rbp
   e:   48 89 e5                mov    %rsp,%rbp
  11:   41 57                   push   %r15
  13:   41 56                   push   %r14
  15:   41 55                   push   %r13
  17:   41 54                   push   %r12
  19:   41 52                   push   %r10
  1b:   53                      push   %rbx
  1c:   48 83 ec 20             sub    $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
  Version:               1
  Augmentation:          "zR"
  Code alignment factor: 1
  Data alignment factor: -8
  Return address column: 16
  Augmentation data:     1b
  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_offset: r16 (rip) at cfa-8
  DW_CFA_nop
  DW_CFA_nop

00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
  DW_CFA_advance_loc: 5 to 0000000000000005
  DW_CFA_def_cfa: r10 (r10) ofs 0
  DW_CFA_advance_loc: 9 to 000000000000000e
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 13 to 000000000000001b
  DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
  DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
  DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
  DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
  DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that.  The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct.  So, either we'd need to temporarily
say:
  DW_CFA_advance_loc: 9 to 000000000000000e
  DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
  DW_CFA_advance_loc: 3 to 0000000000000011
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 10 to 000000000000001b
or to me it seems more compact to just say:
  DW_CFA_advance_loc: 12 to 0000000000000011
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 10 to 000000000000001b

I've tried instead to deal with it through REG_FRAME_RELATED_EXPR
from the backend, but that failed miserably as explained in the PR,
dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the
dynamic stack realignment using drap register that only the i386 backend
does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we
can't emulate those rules.  The following patch instead does the deferring
of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and
emits it on the (set hfp sp) assignment that must appear shortly after it
and adds assertion that it is the case.

The difference before/after the patch on the assembly is:
--- pr99334.s~  2021-03-26 15:42:40.881749380 +0100
+++ pr99334.s   2021-03-26 17:38:05.729161910 +0100
@@ -11,8 +11,8 @@ _func_with_dwarf_issue_:
        andq    $-16, %rsp
        pushq   -8(%r10)
        pushq   %rbp
-       .cfi_escape 0x10,0x6,0x2,0x76,0
        movq    %rsp, %rbp
+       .cfi_escape 0x10,0x6,0x2,0x76,0
        pushq   %r15
        pushq   %r14
        pushq   %r13
i.e. does just what we IMHO need, after pushq %rbp %rbp
still contains parent's frame value and so the save rule doesn't
need to be overridden there, ditto at the start of the next insn
before the side-effect took effect, and we override it only after
it when %rbp already has the right value.

If some other target adds dynamic stack realignment in the future and
the offset 0 case wouldn't be true there, the code can be adjusted so that
it works on all the drap architectures, I'm pretty sure the code would
need other adjustments too.

For the rule 18 and for the (set hfp sp) after it we already have asserts
for the drap cases that check whether the code looks the way i?86/x86_64
emit it currently.

2021-03-26  Jakub Jelinek  <jakub@redhat.com>

PR debug/99334
* dwarf2out.h (struct dw_fde_node): Add rule18 member.
* dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp)
assignment with drap_reg active, queue reg save for hfp with offset 0
and flush queued reg saves.  When handling a push with rule18,
defer queueing reg save for hfp and just assert the offset is 0.
(scan_trace): Assert that fde->rule18 is false.

(cherry picked from commit f5df18504c1790413f293bfb50d40faa7f1ea860)

c++: Diagnose bare parameter packs in bitfield widths [PR99745]

The following invalid tests ICE because we don't diagnose (and drop) bare
parameter packs in bitfield widths.

2021-03-25 Jakub Jelinek <jakub@redhat.com>

PR c++/99745
* decl2.c (grokbitfield): Diagnose bitfields containing bare parameter
packs and don't set DECL_BIT_FIELD_REPRESENTATIVE in that case.

* g++.dg/cpp0x/variadic181.C: New test.

(cherry picked from commit f8780caf07340f5d5e55cf5fb1b2be07cabab1ea)

c++: Diagnose references to void in structured bindings [PR99650]

We ICE on the following testcase, because std::tuple_element<...,...>::type
is void and for structured bindings we therefore need to create
void & or void && which is invalid. We created such REFERENCE_TYPE and
later ICEd in the middle-end.
The following patch fixes it by diagnosing that.

2021-03-23 Jakub Jelinek <jakub@redhat.com>

PR c++/99650
* decl.c (cp_finish_decomp): Diagnose void initializers when
using tuple_element and get.

* g++.dg/cpp1z/decomp55.C: New test.

(cherry picked from commit d5e379e3fe19362442b5d0ac608fb8ddf67fecd3)

dwarf2out: Fix debug info for 2 byte floats [PR99388]

Aarch64, ARM and a couple of other architectures have 16-bit floats, HFmode.
As can be seen e.g. on
void
foo (void)
{
  __fp16 a = 1.0;
  asm ("nop");
  a = 2.0;
  asm ("nop");
  a = 3.0;
  asm ("nop");
}
testcase, GCC mishandles this on the dwarf2out.c side by assuming all
floating point types have sizes in multiples of 4 bytes, so what GCC emits
is it says that e.g. the DW_OP_implicit_value will be 2 bytes but then
doesn't emit anything and so anything emitted after it is treated by
consumers as the value and then they get out of sync.
real_to_target which insert_float uses indeed fills it that way, but putting
into an array of long 32 bits each time, but for the half floats it puts
everything into the least significant 16 bits of the first long no matter
what endianity host or target has.

The following patch fixes it.  With the patch the -g -O2 -dA output changes
(in a cross without .uleb128 support):
        .byte   0x9e    // DW_OP_implicit_value
        .byte   0x2     // uleb128 0x2
+       .2byte  0x3c00  // fp or vector constant word 0
        .byte   0x7     // DW_LLE_start_end (*.LLST0)
        .8byte  .LVL1   // Location list begin address (*.LLST0)
        .8byte  .LVL2   // Location list end address (*.LLST0)
        .byte   0x4     // uleb128 0x4; Location expression size
        .byte   0x9e    // DW_OP_implicit_value
        .byte   0x2     // uleb128 0x2
+       .2byte  0x4000  // fp or vector constant word 0
        .byte   0x7     // DW_LLE_start_end (*.LLST0)
        .8byte  .LVL2   // Location list begin address (*.LLST0)
        .8byte  .LFE0   // Location list end address (*.LLST0)
        .byte   0x4     // uleb128 0x4; Location expression size
        .byte   0x9e    // DW_OP_implicit_value
        .byte   0x2     // uleb128 0x2
+       .2byte  0x4200  // fp or vector constant word 0
        .byte   0       // DW_LLE_end_of_list (*.LLST0)

Bootstrapped/regtested on x86_64-linux, aarch64-linux and
armv7hl-linux-gnueabi, ok for trunk?

I fear the CONST_VECTOR case is still broken, while HFmode elements of vectors
should be fine (it uses eltsize of the element sizes) and likewise SFmode could
be fine, DFmode vectors are emitted as two 32-bit ints regardless of endianity
and I'm afraid it can't be right on big-endian.  But I haven't been able to
create a testcase that emits a CONST_VECTOR, for e.g. unused vector vars
with constant operands we emit CONCATN during expansion and thus ...
DW_OP_*piece for each element of the vector and for
DW_TAG_call_site_parameter we give up (because we handle CONST_VECTOR only
in loc_descriptor, not mem_loc_descriptor).

2021-03-21  Jakub Jelinek  <jakub@redhat.com>

PR debug/99388
* dwarf2out.c (insert_float): Change return type from void to
unsigned, handle GET_MODE_SIZE (mode) == 2 and return element size.
(mem_loc_descriptor, loc_descriptor, add_const_value_attribute):
Adjust callers.

(cherry picked from commit d3dd3703f1d42b14c88b91e51a2a775fe00a2974)

c: Fix up -Wunused-but-set-* warnings for _Atomics [PR99588]

As the following testcases show, compared to -D_Atomic= case we have many
-Wunused-but-set-* warning false positives.
When an _Atomic variable/parameter is read, we call mark_exp_read on it in
convert_lvalue_to_rvalue, but build_atomic_assign does not.
For consistency with the non-_Atomic case where we mark_exp_read the lhs
for lhs op= ... but not for lhs = ..., this patch does that too.
But furthermore we need to pattern match the trees emitted by _Atomic store,
so that _Atomic store itself is not marked as being a variable read, but
when the result of the store is used, we mark it.

2021-03-19 Jakub Jelinek <jakub@redhat.com>

PR c/99588
* c-typeck.c (mark_exp_read): Recognize what build_atomic_assign
with modifycode NOP_EXPR produces and mark the _Atomic var as read
if found.
(build_atomic_assign): For modifycode of NOP_EXPR, use COMPOUND_EXPRs
rather than STATEMENT_LIST. Otherwise call mark_exp_read on lhs.
Set TREE_SIDE_EFFECTS on the TARGET_EXPR.

* gcc.dg/Wunused-var-5.c: New test.
* gcc.dg/Wunused-var-6.c: New test.

(cherry picked from commit b1fc1f1c4b2e9005c40ed476b067577da2d2ce84)

aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

The gcc.dg/declare-simd.c test does not emit a warning with
-mabi=ilp32.

2021-03-16 Christophe Lyon <christophe.lyon@linaro.org>

PR target/99542
gcc/testsuite/
* gcc.dg/declare-simd.c (fn2): Expect a warning only under lp64.

(cherry picked from commit d6300df5f2b9fafa07be4f974fef1ed810d0e7fd)

c++: Ensure correct destruction order of local statics [PR99613]

As mentioned in the PR, if end of two constructions of local statics
is strongly ordered, their destructors should be run in the reverse order.
As we run __cxa_guard_release before calling __cxa_atexit, it is possible
that we have two threads that access two local statics in the same order
for the first time, one thread wins the __cxa_guard_acquire on the first
one but is rescheduled in between the __cxa_guard_release and __cxa_atexit
calls, then the other thread is scheduled and wins __cxa_guard_acquire
on the second one and calls __cxa_quard_release and __cxa_atexit and only
afterwards the first thread calls its __cxa_atexit.  This means a variable
whose completion of the constructor strongly happened after the completion
of the other one will be destructed after the other variable is destructed.

The following patch fixes that by swapping the __cxa_guard_release and
__cxa_atexit calls.

2021-03-16  Jakub Jelinek  <jakub@redhat.com>

PR c++/99613
* decl.c (expand_static_init): For thread guards, call __cxa_atexit
before calling __cxa_guard_release rather than after it.  Formatting
fixes.

(cherry picked from commit 1703937a05b8b95bc29d2de292387dfd9eb7c9a3)

aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]

As the patch shows, there are several bugs in
aarch64_simd_clone_compute_vecsize_and_simdlen.
One is that unlike for function declarations that aren't definitions
it completely ignores argument types.  Such decls don't have DECL_ARGUMENTS,
but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like
the simd cloning code in the middle end does too.

Another problem is that it checks types of uniform arguments.  That is
unnecessary, uniform arguments are passed the way it normally is, it is
a scalar argument rather than vector, so there is no reason not to support
uniform argument of different size, or long double, structure etc.

2021-03-16  Jakub Jelinek  <jakub@redhat.com>

PR target/99542
* config/aarch64/aarch64.c
(aarch64_simd_clone_compute_vecsize_and_simdlen): If not a function
definition, walk TYPE_ARG_TYPES list if non-NULL for argument types
instead of DECL_ARGUMENTS.  Ignore types for uniform arguments.

* gcc.dg/gomp/pr99542.c: New test.
* gcc.dg/gomp/pr59669-2.c (bar): Don't expect a warning on aarch64.
* gcc.dg/gomp/simd-clones-2.c (setArray): Likewise.
* g++.dg/vect/simd-clone-7.cc (bar): Likewise.
* g++.dg/gomp/declare-simd-1.C (f37): Expect a different warning
on aarch64.
* gcc.dg/declare-simd.c (fn2): Expect a new warning on aarch64.

(cherry picked from commit 06589d2232abc92ac9bcb43e4a4ec64ead627752)

expand: Fix ICE in store_bit_field_using_insv [PR93235]

The following testcase ICEs on aarch64. The problem is that
op0 is (subreg:HI (reg:HF ...) 0) and because we can't create a SUBREG of a
SUBREG and aarch64 doesn't have HImode insv, only SImode insv,
store_bit_field_using_insv tries to create (subreg:SI (reg:HF ...) 0)
which is not valid for the target and so gen_rtx_SUBREG ICEs.

The following patch fixes it by punting if the to be created SUBREG
doesn't validate, callers of store_bit_field_using_insv can handle
the fallback.

2021-03-04 Jakub Jelinek <jakub@redhat.com>

PR middle-end/93235
* expmed.c (store_bit_field_using_insv): Return false of xop0 is a
SUBREG and a SUBREG to op_mode can't be created.

* gcc.target/aarch64/pr93235.c: New test.

(cherry picked from commit 510ff5def87c70836fdbf832228661ae28e524b6)

c++: Fix -fstrong-eval-order for operator &&, || and , [PR82959]

P0145R3 added
"However, the operands are sequenced in the order prescribed for the built-in
operator" rule for overloaded operator calls when using the operator syntax.
op_is_ordered follows that, but added just the overloaded operators
added in that paper. &&, || and comma operators had rules that
lhs is sequenced before rhs already in C++98.
The following patch adds those cases to op_is_ordered.

2021-03-03 Jakub Jelinek <jakub@redhat.com>

PR c++/82959
* call.c (op_is_ordered): Handle TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR
and COMPOUND_EXPR.

* g++.dg/cpp1z/eval-order10.C: New test.

(cherry picked from commit 529e3b3402bd2a97b02318bd834df72815be5f0f)

c-family: Avoid ICE on va_arg [PR99324]

build_va_arg calls the middle-end mark_addressable, which e.g. requires that
cfun is non-NULL.  The following patch calls instead c_common_mark_addressable_vec
which is the c-family variant similarly to the FE c_mark_addressable and
cxx_mark_addressable, except that it doesn't error on addresses of register
variables.  As the taking of the address is artificial for the .VA_ARG
ifn and when that is lowered goes away, it is similar case to the vector
subscripting for which c_common_mark_addressable_vec has been added.

2021-03-03  Jakub Jelinek  <jakub@redhat.com>

PR c/99324
* c-common.c (build_va_arg): Call c_common_mark_addressable_vec
instead of mark_addressable.  Fix a comment typo -
neutrallly -> neutrally.

* gcc.c-torture/compile/pr99324.c: New test.

(cherry picked from commit 0e87dc86eb56f732a41af2590f0b807031003fbe)

c++: Fix operator() lookup in lambdas [PR95451]

During name lookup, name-lookup.c uses:
            if (!(!iter->type && HIDDEN_TYPE_BINDING_P (iter))
                && (bool (want & LOOK_want::HIDDEN_LAMBDA)
                    || !is_lambda_ignored_entity (iter->value))
                && qualify_lookup (iter->value, want))
              binding = iter->value;
Unfortunately as the following testcase shows, this doesn't work in
generic lambdas, where we on the auto b = ... lambda ICE and on the
auto d = lambda reject it even when it should be valid.  The problem
is that the binding doesn't have a FUNCTION_DECL with
LAMBDA_FUNCTION_P for the operator(), but an OVERLOAD with
TEMPLATE_DECL for such FUNCTION_DECL.

The following patch fixes that in is_lambda_ignored_entity, other
possibility would be to do that before calling is_lambda_ignored_entity
in name-lookup.c.

2021-02-26  Jakub Jelinek  <jakub@redhat.com>

PR c++/95451
* lambda.c (is_lambda_ignored_entity): Before checking for
LAMBDA_FUNCTION_P, use OVL_FIRST.  Drop FUNCTION_DECL check.

* g++.dg/cpp1y/lambda-generic-95451.C: New test.

(cherry picked from commit 8f9308936cf1df134d5aac1f890eb67266530ab5)

fold-const: Fix up ((1 << x) & y) != 0 folding for vectors [PR99225]

This optimization was written purely with scalar integers in mind,
can work fine even with vectors, but we can't use build_int_cst but
need to use build_one_cst instead.

2021-02-24 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/99225
* fold-const.c (fold_binary_loc) <case NE_EXPR>: In (x & (1 << y)) != 0
to ((x >> y) & 1) != 0 simplifications use build_one_cst instead of
build_int_cst (..., 1). Formatting fixes.

* gcc.c-torture/compile/pr99225.c: New test.

(cherry picked from commit 4de402ab60c54fff48cb7371644b024d10d7e5bb)

fold-const: Fix ICE in fold_read_from_constant_string on invalid code [PR99204]

fold_read_from_constant_string and expand_expr_real_1 have code to optimize
constant reads from string (tree vs. rtl).
If the STRING_CST array type has zero low bound, index is fold converted to
sizetype and so the compare_tree_int works fine, but if it has some other
low bound, it calls size_diffop_loc and that function from 2 sizetype
operands creates a ssizetype difference. expand_expr_real_1 then uses
tree_fits_uhwi_p + compare_tree_int and so works fine, but fold-const.c
only checked if index is INTEGER_CST and calls compare_tree_int, which means
for negative index it will succeed and result in UB in the compiler.

2021-02-23 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/99204
* fold-const.c (fold_read_from_constant_string): Check that
tree_fits_uhwi_p (index) rather than just that index is INTEGER_CST.

* gfortran.dg/pr99204.f90: New test.

(cherry picked from commit f53a9b563b5017af179f1fd900189c0ba83aa2ec)

libstdc++: Fix up constexpr std::char_traits<char>::compare [PR99181]

Because of LWG 467, std::char_traits<char>::lt compares the values
cast to unsigned char rather than char, so even when char is signed
we get unsigned comparision.  std::char_traits<char>::compare uses
__builtin_memcmp and that works the same, but during constexpr evaluation
we were calling __gnu_cxx::char_traits<char_type>::compare.  As
char_traits::lt is not virtual, __gnu_cxx::char_traits<char_type>::compare
used __gnu_cxx::char_traits<char_type>::lt rather than
std::char_traits<char>::lt and thus compared chars as signed if char is
signed.
This change fixes it by inlining __gnu_cxx::char_traits<char_type>::compare
into std::char_traits<char>::compare by hand, so that it calls the right
lt method.

2021-02-23  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/99181
* include/bits/char_traits.h (char_traits<char>::compare): For
constexpr evaluation don't call
__gnu_cxx::char_traits<char_type>::compare but do the comparison loop
directly.

* testsuite/21_strings/char_traits/requirements/char/99181.cc: New
test.

(cherry picked from commit 311c57f6d8f285d69e44bf94152c753900cb1a0a)

tree-cfg: Fix up gimple_merge_blocks FORCED_LABEL handling [PR99034]

The verifiers require that DECL_NONLOCAL or EH_LANDING_PAD_NR
labels are always the first label if there is more than one label.

When merging blocks, we don't honor that though.
On the following testcase, we try to merge blocks:
<bb 13> [count: 0]:
<L2>:
S::~S (&s);

and
<bb 15> [count: 0]:
<L0>:
resx 1

where <L2> is landing pad and <L0> is FORCED_LABEL. And the code puts
the FORCED_LABEL before the landing pad label, violating the verification
requirements.

The following patch fixes it by moving the FORCED_LABEL after the
DECL_NONLOCAL or EH_LANDING_PAD_NR label if it is the first label.

2021-02-19 Jakub Jelinek <jakub@redhat.com>

PR ipa/99034
* tree-cfg.c (gimple_merge_blocks): If bb a starts with eh landing
pad or non-local label, put FORCED_LABELs from bb b after that label
rather than before it.

* g++.dg/opt/pr99034.C: New test.

(cherry picked from commit 33be24d77d3d8f0c992eb344ce63f78e14cf753d)

c: Fix ICE with -fexcess-precision=standard [PR99136]

The following testcase ICEs on i686-linux, because c_finish_return wraps
c_fully_folded retval back into EXCESS_PRECISION_EXPR, but when the function
return type is void, we don't call convert_for_assignment on it that would
then be fully folded again, but just put the retval into RETURN_EXPR's
operand, so nothing removes it anymore and during gimplification we
ICE as EXCESS_PRECISION_EXPR is not handled.

This patch fixes it by not adding that EXCESS_PRECISION_EXPR in functions
returning void, the return value is ignored and all we need is evaluate any
side-effects of the expression.

2021-02-18 Jakub Jelinek <jakub@redhat.com>

PR c/99136
* c-typeck.c (c_finish_return): Don't wrap retval into
EXCESS_PRECISION_EXPR in functions that return void.

* gcc.dg/pr99136.c: New test.

(cherry picked from commit 3d7ce7ce6c03165ca1041b38e02428c925254968)

c++: Fix up build_zero_init_1 once more [PR99106]

My earlier build_zero_init_1 patch for flexible array members created
an empty CONSTRUCTOR. As the following testcase shows, that doesn't work
very well because the middle-end doesn't expect CONSTRUCTOR elements with
incomplete type (that the empty CONSTRUCTOR at the end of outer CONSTRUCTOR
had).

The following patch just doesn't add any CONSTRUCTOR for the flexible array
members, it doesn't seem to be needed.

2021-02-17 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/99106
* init.c (build_zero_init_1): For flexible array members just return
NULL_TREE instead of returning empty CONSTRUCTOR with non-complete
ARRAY_TYPE.

* g++.dg/ubsan/pr99106.C: New test.

(cherry picked from commit af868e89ec21340d1cafd26eaed356ce4b0104c3)

match.pd: Fix up A % (cast) (pow2cst << B) simplification [PR99079]

The (mod @0 (convert?@3 (power_of_two_cand@1 @2))) simplification
uses tree_nop_conversion_p (type, TREE_TYPE (@3)) condition, but I believe
it doesn't check what it was meant to check.  On convert?@3
TREE_TYPE (@3) is not the type of what it has been converted from, but
what it has been converted to, which needs to be (because it is operand
of normal binary operation) equal or compatible to type of the modulo
result and first operand - type.
I could fix that by using && tree_nop_conversion_p (type, TREE_TYPE (@1))
and be done with it, but actually most of the non-nop conversions are IMHO
ok and so we would regress those optimizations.
In particular, if we have say narrowing conversions (foo5 and foo6 in
the new testcase), I think we are fine, either the shift of the power of two
constant after narrowing conversion is still that power of two (or negation
of that) and then it will still work, or the result of narrowing conversion
is 0 and then we would have UB which we can ignore.
Similarly, widening conversions where the shift result is unsigned are fine,
or even widening conversions where the shift result is signed, but we sign
extend to a signed wider divisor, the problematic case of INT_MIN will
become x % (long long) INT_MIN and we can still optimize that to
x & (long long) INT_MAX.
What doesn't work is the case in the pr99079.c testcase, widening conversion
of a signed shift result to wider unsigned divisor, where if the shift
is negative, we end up with x % (unsigned long long) INT_MIN which is
x % 0xffffffff80000000ULL where the divisor is not a power of two and
we can't optimize that to x & 0x7fffffffULL.

So, the patch rejects only the single problematic case.

Furthermore, when the shift result is signed, we were introducing UB into
a program which previously didn't have one (well, left shift into the sign
bit is UB in some language/version pairs, but it is definitely valid in
C++20 - wonder if I shouldn't move the gcc.c-torture/execute/pr99079.c
testcase to g++.dg/torture/pr99079.C and use -std=c++20), by adding that
subtraction of 1, x % (1 << 31) in C++20 is well defined, but
x & ((1 << 31) - 1) triggers UB on the subtraction.
So, the patch performs the subtraction in the unsigned type if it isn't
wrapping.

2021-02-15  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/99079
* match.pd (A % (pow2pcst << N) -> A & ((pow2pcst << N) - 1)): Remove
useless tree_nop_conversion_p (type, TREE_TYPE (@3)) check.  Instead
require both type and TREE_TYPE (@1) to be integral types and either
type having smaller or equal precision, or TREE_TYPE (@1) being
unsigned type, or type being signed type.  If TREE_TYPE (@1)
doesn't have wrapping overflow, perform the subtraction of one in
unsigned type.

* gcc.dg/fold-modpow2-2.c: New test.
* gcc.c-torture/execute/pr99079.c: New test.

(cherry picked from commit 45de8afb2d534e3b38b4d1898686b20c29cc6a94)

c++: Fix zero initialization of flexible array members [PR99033]

array_type_nelts returns error_mark_node for type of flexible array members
and build_zero_init_1 was placing an error_mark_node into the CONSTRUCTOR,
on which e.g. varasm ICEs.  I think there is nothing erroneous on zero
initialization of flexible array members though, such arrays should simply
get no elements, like they do if such classes are constructed (everything
except when some larger initializer comes from an explicit initializer).

So, this patch handles [] arrays in zero initialization like [0] arrays
and fixes handling of the [0] arrays - the
tree_int_cst_equal (max_index, integer_minus_one_node) check
didn't do what it thought it would do, max_index is typically unsigned
integer (sizetype) and so it is never equal to a -1.

What the patch doesn't do and maybe would be desirable is if it returns
error_mark_node for other reasons let the recursive callers not stick that
into CONSTRUCTOR but return error_mark_node instead.  But I don't have a
testcase where that would be needed right now.

2021-02-11  Jakub Jelinek  <jakub@redhat.com>

PR c++/99033
* init.c (build_zero_init_1): Handle zero initialiation of
flexible array members like initialization of [0] arrays.
Use integer_minus_onep instead of comparison to integer_minus_one_node
and integer_zerop instead of comparison against size_zero_node.
Formatting fixes.

* g++.dg/ext/flexary38.C: New test.

(cherry picked from commit ea535f59b19f65e5b313c990ee6c194a7b055bd7)

varasm: Fix ICE with -fsyntax-only [PR99035]

My FE change from 2 years ago uses TREE_ASM_WRITTEN in -fsyntax-only
mode more aggressively to avoid "expanding" functions multiple times.
With -fsyntax-only nothing is really expanded, so I think it is acceptable
to adjust the assert and allow declare_weak at any time, with -fsyntax-only
we know it is during parsing only anyway.

2021-02-10 Jakub Jelinek <jakub@redhat.com>

PR c++/99035
* varasm.c (declare_weak): For -fsyntax-only, allow even
TREE_ASM_WRITTEN function decls.

* g++.dg/ext/weak6.C: New test.

(cherry picked from commit a964f494cd5a90f631b8c0c01777a9899e0351ce)

openmp: Temporarily disable into_ssa when gimplifying OpenMP reduction clauses [PR99007]

gimplify_scan_omp_clauses was already calling gimplify_expr with false as
last argument to make sure it is not an SSA_NAME, but as the testcases show,
that is not enough, SSA_NAME temporaries created during that gimplification
can be reused too and we can't allow SSA_NAMEs to be used across OpenMP
region boundaries, as we can only firstprivatize decls.

Fixed by temporarily disabling into_ssa.

2021-02-10 Jakub Jelinek <jakub@redhat.com>

PR middle-end/99007
* gimplify.c (gimplify_scan_omp_clauses): For MEM_REF on reductions,
temporarily disable gimplify_ctxp->into_ssa around gimplify_expr
calls.

* g++.dg/gomp/pr99007.C: New test.
* gcc.dg/gomp/pr99007-1.c: New test.
* gcc.dg/gomp/pr99007-2.c: New test.
* gcc.dg/gomp/pr99007-3.c: New test.

(cherry picked from commit deba6b20a3889aa23f0e4b3a5248de4172a0167d)

c++: Fix ICE with structured binding initialized to incomplete array [PR97878]

We ICE on the following testcase, for incomplete array a on auto [b] { a }; without
giving any kind of diagnostics, with auto [c] = a; during error-recovery.
The problem is that we get too far through check_initializer and e.g.
store_init_value -> constexpr stuff can't deal with incomplete array types.

As the type of the structured binding artificial variable is always deduced,
I think it is easiest to diagnose this early, even if they have array types
we'll need their deduced type to be complete rather than just its element
type.

2021-02-05 Jakub Jelinek <jakub@redhat.com>

PR c++/97878
* decl.c (check_array_initializer): For structured bindings, require
the array type to be complete.

* g++.dg/cpp1z/decomp54.C: New test.

(cherry picked from commit 8b7f2d3eae16dd629ae7ae40bb76f4bb0099f441)

ifcvt: Avoid ICEs trying to force_operand random RTL [PR97487]

As the testcase shows, RTL ifcvt can throw random RTL (whatever it found in
some insns) at expand_binop or expand_unop and expects it to do something
(and then will check if it created valid insns and punts if not).
These functions in the end if the operands don't match try to
copy_to_mode_reg the operands, which does
if (!general_operand (x, VOIDmode))
  x = force_operand (x, temp);
but, force_operand is far from handling all possible RTLs, it will ICE for
all more unusual RTL codes.  Basically handles just simple arithmetic and
unary RTL operations if they have an optab and
expand_simple_binop/expand_simple_unop ICE on others.

The following patch fixes it by adding some operand verification (whether
there is a hope that copy_to_mode_reg will succeed on those).  It is added
both to noce_emit_move_insn (not needed for this exact testcase,
that function simply tries to recog the insn as is and if it fails,
handles some simple binop/unop cases; the patch performs the verification
of their operands) and noce_try_sign_mask.

2021-02-03  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/97487
* ifcvt.c (noce_can_force_operand): New function.
(noce_emit_move_insn): Use it.
(noce_try_sign_mask): Likewise.  Formatting fix.

* gcc.dg/pr97487-1.c: New test.
* gcc.dg/pr97487-2.c: New test.

(cherry picked from commit 025a0ee3911c0866c69f841df24a558c7c8df0eb)

lra-constraints: Fix error-recovery for bad inline-asms [PR97971]

The following testcase has ice-on-invalid, it can't be reloaded, but we
shouldn't ICE the compiler because the user typed non-sense.

In current_insn_transform we have:
  if (process_alt_operands (reused_alternative_num))
    alt_p = true;

  if (check_only_p)
    return ! alt_p || best_losers != 0;

  /* If insn is commutative (it's safe to exchange a certain pair of
     operands) then we need to try each alternative twice, the second
     time matching those two operands as if we had exchanged them.  To
     do this, really exchange them in operands.

     If we have just tried the alternatives the second time, return
     operands to normal and drop through.  */

  if (reused_alternative_num < 0 && commutative >= 0)
    {
      curr_swapped = !curr_swapped;
      if (curr_swapped)
        {
          swap_operands (commutative);
          goto try_swapped;
        }
      else
        swap_operands (commutative);
    }

  if (! alt_p && ! sec_mem_p)
    {
      /* No alternative works with reloads??  */
      if (INSN_CODE (curr_insn) >= 0)
        fatal_insn ("unable to generate reloads for:", curr_insn);
      error_for_asm (curr_insn,
                     "inconsistent operand constraints in an %<asm%>");
      lra_asm_error_p = true;
...
and so handle inline asms there differently (and delete/nullify them after
this) - fatal_insn is only called for non-inline asm.
But in process_alt_operands we do:
                /* Both the earlyclobber operand and conflicting operand
                   cannot both be user defined hard registers.  */
                if (HARD_REGISTER_P (operand_reg[i])
                    && REG_USERVAR_P (operand_reg[i])
                    && operand_reg[j] != NULL_RTX
                    && HARD_REGISTER_P (operand_reg[j])
                    && REG_USERVAR_P (operand_reg[j]))
                  fatal_insn ("unable to generate reloads for "
                              "impossible constraints:", curr_insn);
and thus ICE even for inline-asms.

I think it is inappropriate to delete/nullify the insn in
process_alt_operands, as it could be done e.g. in the check_only_p mode,
so this patch just returns false in that case, which results in the
caller have alt_p false, and as inline asm isn't simple move, sec_mem_p
will be also false (and it isn't commutative either), so for check_only_p
it will suggests to the callers it isn't ok and otherwise will emit
error and delete/nullify the inline asm insn.

2021-02-03  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/97971
* lra-constraints.c (process_alt_operands): For inline asm, don't call
fatal_insn, but instead return false.

* gcc.target/i386/pr97971.c: New test.

(cherry picked from commit 4dd7141653b57f638fc32291245d57d4dcfa3813)

expand: Fix up find_bb_boundaries [PR98331]

When expansion emits some control flow insns etc. inside of a former GIMPLE
basic block, find_bb_boundaries needs to split it into multiple basic
blocks.
The code needs to ignore debug insns in decisions how many splits to do or
where in between some non-debug insns the split should be done, but it can
decide where to put debug insns if they can be kept and otherwise throws
them away (they can't stay outside of basic blocks).
On the following testcase, we end up in the bb from expander with
control flow insn
debug insns
barrier
some other insn
(the some other insn is effectively dead after __builtin_unreachable and
we'll optimize that out later).
Without debug insns, we'd do the split when encountering some other insn
and split after PREV_INSN (some other insn), i.e. after barrier (and the
splitting code then moves the barrier in between basic blocks).
But if there are debug insns, we actually split before the first debug insn
that appeared after the control flow insn, so after control flow insn,
and get a basic block that starts with debug insns and then has a barrier
in the middle that nothing moves it out of the bb. This leads to ICEs and
even if it wouldn't, different behavior from -g0.
The reason for treating debug insns that way is a different case, e.g.
control flow insn
debug insns
some other insn
or even
control flow insn
barrier
debug insns
some other insn
where splitting before the first such debug insn allows us to keep them
while otherwise we would have to drop them on the floor, and in those
situations we behave the same with -g0 and -g.

So, the following patch fixes it by resetting debug_insn not just when
splitting the blocks (it is set only after seeing a control flow insn and
before splitting for it if needed), but also when seeing a barrier,
which effectively means we always throw away debug insns after a control
flow insn and before following barrier if any, but there is no way around
that, control flow insn must be the last in the bb (BB_END) and BARRIER
after it, debug insns aren't allowed outside of bb.
We still handle the other cases fine (when there is no barrier or when
debug insns appear only after the barrier).

2021-01-29 Jakub Jelinek <jakub@redhat.com>

PR debug/98331
* cfgbuild.c (find_bb_boundaries): Reset debug_insn when seeing
a BARRIER.

* gcc.dg/pr98331.c: New test.

(cherry picked from commit ea0e1eaa30f42e108f6c716745347cc1dcfdc475)

c++: Fix up handling of register ... asm ("...") vars in templates [PR33661, PR98847]

As the testcase shows, for vars appearing in templates, we don't attach
the asm spec string to the pattern decls, nor pass it back to cp_finish_decl
during instantiation.

The following patch does that.

2021-01-28 Jakub Jelinek <jakub@redhat.com>

PR c++/33661
PR c++/98847
* decl.c (cp_finish_decl): For register vars with asmspec in templates
call set_user_assembler_name and set DECL_HARD_REGISTER.
* pt.c (tsubst_expr): When instantiating DECL_HARD_REGISTER vars,
pass asmspec_tree to cp_finish_decl.

* g++.target/i386/pr98847.C: New test.

(cherry picked from commit cf93f94b3498f3925895fb0bbfd4b64232b9987a)

aarch64: Fix up *aarch64_bfxilsi_uxtw [PR98853]

The https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html
patch that introduced this pattern claimed:
Would generate:

combine_balanced_int:
        bfxil   w0, w1, 0, 16
        uxtw    x0, w0
        ret

But with this patch generates:

combine_balanced_int:
        bfxil   w0, w1, 0, 16
        ret
and it is indeed what it should generate, but it doesn't do that,
it emits bfxil  x0, x1, 0, 16
instead which doesn't zero extend from 32 to 64 bits, but preserves
the bits from the destination register.

2021-01-27  Jakub Jelinek  <jakub@redhat.com>

PR target/98853
* config/aarch64/aarch64.md (*aarch64_bfxilsi_uxtw): Use
%w0, %w1 and %2 instead of %0, %1 and %2.

* gcc.c-torture/execute/pr98853-1.c: New test.
* gcc.c-torture/execute/pr98853-2.c: New test.

(cherry picked from commit 2a2c1e22c2501457608f12d5ab560caaca59c425)

aarch64: Tighten up checks for ubfix [PR98681]

The testcase in the patch doesn't assemble, because the instruction requires
that the penultimate operand (lsb) range is [0, 32] (or [0, 64]) and the last
operand's range is [1, 32 - lsb] (or [1, 64 - lsb]).
The INTVAL (shft_amnt) < GET_MODE_BITSIZE (mode) will accept the lsb operand
to be in range [MIN, 32] (or [MIN, 64]) and then we invoke UB in the
compiler and sometimes it will make it through.
The patch changes all the INTVAL uses in that function to UINTVAL,
which isn't strictly necessary, but can be done (e.g. after the
UINTVAL (shft_amnt) < GET_MODE_BITSIZE (mode) check we know it is not
negative and thus INTVAL (shft_amnt) and UINTVAL (shft_amnt) then behave the
same.  But, I had to add INTVAL (mask) > 0 check in that case, otherwise we
risk (hypothetically) emitting instruction that doesn't assemble.
The problem is with masks that have the MSB bit set, while the instruction
can handle those, e.g.
ubfiz w1, w0, 13, 19
will do
(w0 << 13) & 0xffffe000
in RTL we represent SImode constants with MSB set as negative HOST_WIDE_INT,
so it will actually be HOST_WIDE_INT_C (0xffffffffffffe000), and
the instruction uses %P3 to print the last operand, which calls
asm_fprintf (f, "%u", popcount_hwi (INTVAL (x)))
to print that.  But that will not print 19, but 51 instead, will include
there also all the copies of the sign bit.
Not supporting those masks with MSB set isn't a big loss though, they really
shouldn't appear normally, as both GIMPLE and RTL optimizations should
optimize those away (one isn't masking any bits off with such masks, so
just w0 << 13 will do too).

2021-01-26  Jakub Jelinek  <jakub@redhat.com>

PR target/98681
* config/aarch64/aarch64.c (aarch64_mask_and_shift_for_ubfiz_p):
Use UINTVAL (shft_amnt) and UINTVAL (mask) instead of INTVAL (shft_amnt)
and INTVAL (mask).  Add && INTVAL (mask) > 0 condition.

* gcc.c-torture/execute/pr98681.c: New test.

(cherry picked from commit fb09d7242a25971b275292332337a56b86637f2c)

rs6000: Fix up __m64 typedef in mmintrin.h [PR97301]

The x86 __m64 type is defined as:
/* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components. */
typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
and so matches the comment above it in that reads and stores through
pointers to __m64 can alias anything.
But in the rs6000 headers that is the case only for __m128, but not __m64.

The following patch adds that attribute, which fixes the
FAIL: gcc.target/powerpc/sse-movhps-1.c execution test
FAIL: gcc.target/powerpc/sse-movlps-1.c execution test
regressions that appeared when Honza improved ipa-modref.

2021-01-23 Jakub Jelinek <jakub@redhat.com>

PR testsuite/97301
* config/rs6000/mmintrin.h (__m64): Add __may_alias__ attribute.

(cherry picked from commit db9a3ce7b83ce3ed3e0ffe7eb7a918595640e161)

c++: Fix up ubsan false positives on references [PR95693]

Alex' 2 years old change to build_zero_init_1 to return NULL pointer with
reference type for references breaks the sanitizers, the assignment of NULL
to a reference typed member is then instrumented before it is overwritten
with a non-NULL address later on.
That change has been done to fix error recovery ICE during
process_init_constructor_record, where we:
          if (TYPE_REF_P (fldtype))
            {
              if (complain & tf_error)
                error ("member %qD is uninitialized reference", field);
              else
                return PICFLAG_ERRONEOUS;
            }
a few lines earlier, but then continue and ICE when build_zero_init returns
NULL.

The following patch reverts the build_zero_init_1 change and instead creates
the NULL with reference type constants during the error recovery.

The pr84593.C testcase Alex' change was fixing still works as before.

2021-01-22  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/95693
* init.c (build_zero_init_1): Revert the 2018-03-06 change to
return build_zero_cst for reference types.
* typeck2.c (process_init_constructor_record): Instead call
build_zero_cst here during error recovery instead of build_zero_init.

* g++.dg/ubsan/pr95693.C: New test.

(cherry picked from commit e5750f847158e7f9bdab770fd9c5fff58c5074d3)

match.pd: Replace incorrect simplifications into copysign [PR90248]

In the PR Andrew said he has implemented a simplification that has been
added to LLVM, but that actually is not true, what is in there are
X * (X cmp 0.0 ? +-1.0 : -+1.0) simplifications into +-abs(X)
but what has been added into GCC are (X cmp 0.0 ? +-1.0 : -+1.0)
simplifications into copysign(1, +-X) and then
X * copysign (1, +-X) into +-abs (X).
The problem is with the (X cmp 0.0 ? +-1.0 : -+1.0) simplifications,
they don't work correctly when X is zero.
E.g.
(X > 0.0 ? 1.0 : -1.0)
is -1.0 when X is either -0.0 or 0.0, but copysign will make it return
1.0 for 0.0 and -1.0 only for -0.0.
(X >= 0.0 ? 1.0 : -1.0)
is 1.0 when X is either -0.0 or 0.0, but copysign will make it return
still 1.0 for 0.0 and -1.0 for -0.0.
The simplifications were guarded on !HONOR_SIGNED_ZEROS, but as discussed in
the PR, that option doesn't mean that -0.0 will not ever appear as operand
of some operation, it is hard to guarantee that without compiler adding
canonicalizations of -0.0 to 0.0 after most of the operations and thus
making it very slow, but that the user asserts that he doesn't care if the result
of operations will be 0.0 or -0.0.  Not to mention that some of the
transformations are incorrect even for positive 0.0.

So, instead of those simplifications this patch recognizes patterns where
those ?: expressions are multiplied by X, directly into +-abs.
That works fine even for 0.0 and -0.0 (as long as we don't care about
whether the result is exactly 0.0 or -0.0 in those cases), because
whether the result of copysign is -1.0 or 1.0 doesn't matter when it is
multiplied by 0.0 or -0.0.

As a follow-up, maybe we should add the simplification mentioned in the PR,
in particular doing copysign by hand through
VIEW_CONVERT_EXPR <int, float_X> < 0 ? -float_constant : float_constant
into copysign (float_constant, float_X).  But I think that would need to be
done in phiopt.

2021-01-22  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/90248
* match.pd (X cmp 0.0 ? 1.0 : -1.0 -> copysign(1, +-X),
X cmp 0.0 ? -1.0 : +1.0 -> copysign(1, -+X)): Remove
simplifications.
(X * (X cmp 0.0 ? 1.0 : -1.0) -> +-abs(X),
X * (X cmp 0.0 ? -1.0 : 1.0) -> +-abs(X)): New simplifications.

* gcc.dg/tree-ssa/copy-sign-1.c: Don't expect any copysign
builtins.
* gcc.dg/pr90248.c: New test.

(cherry picked from commit dd92986ea6d2d363146e1726817a84910453fdc8)

c++: Fix up potential_constant_expression_1 FOR/WHILE_STMT handling [PR98672]

The following testcase is rejected even when it is valid.
The problem is that potential_constant_expression_1 doesn't have the
accurate *jump_target tracking cxx_eval_* has, and when the loop has
a condition that isn't guaranteed to be always true, the body isn't walked
at all.  That is mostly a correct conservative behavior, except that it
doesn't detect if there are any return statements in the body, which means
the loop might return instead of falling through to the next statement.
We already have code for return stmt discovery in code snippets we don't
try to evaluate for switches, so this patch reuses that for FOR_STMT
and WHILE_STMT bodies.

Note, I haven't touched FOR_EXPR, with statement expressions it could
have return stmts in it too, or it could have break or continue statements
that wouldn't bind to the current loop but to something outer.  That
case is clearly mishandled by potential_constant_expression_1 even
when the condition is missing or is always true, and it wouldn't surprise me
if cxx_eval_* didn't handle it right either, so I'm deferring that to
separate PR for later.  We'd need proper test coverage for all of that.

> Hmm, IF_STMT probably also needs to check the else clause, if the condition
> isn't a known constant.

You're right, I thought it was ok because it recurses with tf_none, but
if the then branch is potentially constant and only else returns, continues
or breaks, then as the enhanced testcase shows we were mishandling it too.

2021-01-21  Jakub Jelinek  <jakub@redhat.com>

PR c++/98672
* constexpr.c (check_for_return_continue_data): Add break_stmt member.
(check_for_return_continue): Also look for BREAK_STMT.  Handle
SWITCH_STMT by ignoring break_stmt from its body.
(potential_constant_expression_1) <case FOR_STMT>,
<case WHILE_STMT>: If the condition isn't constant true, check if
the loop body can contain a return stmt.
<case SWITCH_STMT>: Adjust check_for_return_continue_data initializer.
<case IF_STMT>: If recursion with tf_none is successful,
merge *jump_target from the branches - returns with highest priority,
breaks or continues lower.  If then branch is potentially constant and
doesn't return, check the else branch if it could return, break or
continue.

* g++.dg/cpp1y/constexpr-98672.C: New test.

(cherry picked from commit 8182cbe3fb2c2d20e8dff9d2476fb94046e560b3)