gcc.gnu.org Git - gcc.git/log

testsuite: [aarch64] Fix aarch64/advsimd-intrinsics/v{trn,uzp,zip}_half.c

Since r11-3402 (g:65c9878641cbe0ed898aa7047b7b994e9d4a5bb1), the
vtrn_half, vuzp_half and vzip_half started failing with

vtrn_half.c:76:17: error: redeclaration of 'vector_float64x2' with no linkage
vtrn_half.c:77:17: error: redeclaration of 'vector2_float64x2' with no linkage
vtrn_half.c:80:17: error: redeclaration of 'vector_res_float64x2' with no linkage

This is because r11-3402 now always declares float64x2 variables for
aarch64, leading to a duplicate declaration in these testcases.

The fix is simply to remove these now useless declarations.

These tests are skipped on arm*, so there is no impact on that target.

2020-09-25 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c: Remove
declarations of vector, vector2, vector_res for float64x2 type.
* gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vzip_half.c: Likewise.

(cherry picked from commit 8c775bf447e190024fa08c55e38db94dd013a393)

AArch64: Implement missing p128<->f64 reinterpret intrinsics

This patch implements the missing reinterprets to and from poly128_t and
float64x2_t.
I've plugged in the appropriate testing in the advsimd-intrinsics.exp
too.

Bootstrapped and tested on aarch64-none-linux-gnu.
Tested advsimd-intrinsics.exp on arm-none-eabi too to make sure arm
testing isn't affected.

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vreinterpretq_f64_p128,
vreinterpretq_p128_f64): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(clean_results): Add float64x2_t cleanup.
(DECL_VARIABLE_128BITS_VARIANTS): Add float64x2_t variable.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: Add
testing of vreinterpretq_f64_p128, vreinterpretq_p128_f64.

(cherry picked from commit 65c9878641cbe0ed898aa7047b7b994e9d4a5bb1)

AArch64: Implement missing vrndns_f32 intrinsic

This patch implements the missing vrndns_f32 intrinsic. This operates on a scalar float32_t value.
It can be mapped down to a __builtin_aarch64_frintnsf builtin.

This patch does that.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
PR target/71233
* config/aarch64/aarch64-simd-builtins.def (frintn): Use BUILTIN_VHSDF_HSDF
for modes. Remove explicit hf instantiation.
* config/aarch64/arm_neon.h (vrndns_f32): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/vrndns_f32_1.c: New test.

(cherry picked from commit 02b5377b3766804059b7824330d33d0e1cef2e5b)

AArch64: Implement missing _p64 intrinsics for vector permutes

This patch implements some missing vector permute intrinsics operating on poly64x2_t types.
They are implemented identically to their uint64x2_t brethren.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vtrn1q_p64, vtrn2q_p64, vuzp1q_p64,
vuzp2q_p64, vzip1q_p64, vzip2q_p64): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/trn_zip_p64_1.c: New test.

(cherry picked from commit e8e818399d70c5a5a3d30a54d305c6e2b92e2c66)

AArch64: Implement vldrq_p128 intrinsic

This patch implements the missing vldrq_p128 intrinsic that just loads from the appropriate pointer.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vldrq_p128): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/vldrq_p128_1.c: New test.

(cherry picked from commit f2868e4bcff2c7b882d01231f039459c00e59d7b)

AArch64: Implement vstrq_p128 intrinsic

This patch implements the missing vstrq_p128 intrinsic.
It just performs a store of the poly128_t argument to a memory location.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vstrq_p128): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/vstrq_p128_1.c: New test.

(cherry picked from commit d23ea1e865301cd45f14ccbdb0bca49251fde9e1)

AArch64: Implement missing vcls intrinsics on unsigned types

This patch implements some missing intrinsics that perform a CLS on unsigned SIMD types.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vcls_u8, vcls_u16, vcls_u32,
vclsq_u8, vclsq_u16, vclsq_u32): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/vcls_unsigned_1.c: New test.

(cherry picked from commit 30957092db46d8798e632feefb5df634488dbb33)

AArch64: Implement missing vceq*_p* intrinsics

This patch implements some missing vceq* intrinsics on poly types.
The behaviour is to produce the appropriate CMEQ instruction as for the unsigned types.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vceqq_p64, vceqz_p64, vceqzq_p64): Define.

gcc/testsuite/

PR target/71233
* gcc.target/aarch64/simd/vceq_poly_1.c: New test.

(cherry picked from commit d4703be185b422f637deebd3bb9222a41c8023d6)

AArch64: Implement poly-type vadd intrinsics

This implements the vadd[p]_p* intrinsics.
In terms of functionality they are aliases of veor operations on the relevant unsigned types.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vadd_p8, vadd_p16, vadd_p64, vaddq_p8,
vaddq_p16, vaddq_p64, vaddq_p128): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/vadd_poly_1.c: New test.

(cherry picked from commit fa9ad35dae03dcb20c4ccb50ba1b351a8ab77970)

Revert "Fortran : ICE in build_field PR95614"

This reverts commit e28cc38ac34cb4de31b983f817c6e5f7dde55e2c.

Daily bump.

optabs: Don't reuse target for multi-word expansions if it overlaps operand(s) [PR97073]

The following testcase is miscompiled on i686-linux, because
we try to expand a double-word bitwise logic operation with op0
being a (mem:DI u) and target (mem:DI u+4), i.e. partial overlap, and
thus end up with:
movl 4(%esp), %eax
andl u, %eax
movl %eax, u+4
! movl u+4, %eax optimized out
andl 8(%esp), %eax
movl %eax, u+8
rather than with the desired:
movl 4(%esp), %edx
movl 8(%esp), %eax
andl u, %edx
andl u+4, %eax
movl %eax, u+8
movl %edx, u+4
because the store of the first word to target overwrites the second word of
the operand.
expand_binop for this (and several similar places) already check for target
== op0 or target == op1, this patch just adds reg_overlap_mentioned_p calls
next to it.
Pedantically, at least for some of these it might be sufficient to force
a different target if there is overlap but target is not rtx_equal_p to
the operand (e.g. in this bitwise logical case, but e.g. not in the shift
cases where there is reordering), though that would go against the
preexisting target == op? checks and the rationale that REG_EQUAL notes in
that case isn't correct.

2020-09-27 Jakub Jelinek <jakub@redhat.com>

PR middle-end/97073
* optabs.c (expand_binop, expand_absneg_bit, expand_unop,
expand_copysign_bit): Check reg_overlap_mentioned_p between target
and operand(s) and if it returns true, force a pseudo as target.

* gcc.c-torture/execute/pr97073.c: New test.

(cherry picked from commit a4b31d5807f2bc67c8999b3d53369cf2a5c6e1ec)

Fortran  :  ICE in build_field PR95614

Local identifiers can not be the same as a module name.  Original
patch by Steve Kargl resulted in name clashes between common block
names and local identifiers.  A local identifier can be the same as
a global identier if that identifier represents a common.  The patch
was modified to allow global identifiers that represent a common
block.

2020-09-27  Steven G. Kargl  <kargl@gcc.gnu.org>
    Mark Eggleston  <markeggleston@gcc.gnu.org>

gcc/fortran/

PR fortran/95614
* decl.c (gfc_get_common): Use gfc_match_common_name instead
of match_common_name.
* decl.c (gfc_bind_idents): Use gfc_match_common_name instead
of match_common_name.
* match.c : Rename match_common_name to gfc_match_common_name.
* match.c (gfc_match_common): Use gfc_match_common_name instead
of match_common_name.
* match.h : Rename match_common_name to gfc_match_common_name.
* resolve.c (resolve_common_vars): Check each symbol in a
common block has a global symbol.  If there is a global symbol
issue an error if the symbol type is known as is not a common
block name.

2020-09-27  Mark Eggleston  <markeggleston@gcc.gnu.org>

gcc/testsuite/

PR fortran/95614
* gfortran.dg/pr95614_1.f90: New test.
* gfortran.dg/pr95614_2.f90: New test.

(cherry picked from commit e5a76af3a2f3324efc60b4b2778ffb29d5c377bc)

Daily bump.

AArch64: Implement Armv8.3-a complex arithmetic intrinsics

I'd like to backport some patches from Tamar in GCC 9 to GCC 8 that implement the complex arithmetic intrinsics for Advanced SIMD.
These should have been present in GCC 8 that gained support for Armv8.3-a.

There were 4 follow-up fixes that I've rolled into the one commit.

Bootstrapped and tested on aarch64-none-linux-gnu and arm-none-linux-gnueabihf on the GCC 8 branch.

gcc/
PR target/71233
* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers):
Add qualifier_lane_pair_index.
(emit-rtl.h): Include.
(TYPES_QUADOP_LANE_PAIR): New.
(aarch64_simd_expand_args): Use it.
(aarch64_simd_expand_builtin): Likewise.
(AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_laneq_builtin_datum): New.
(FCMLA_LANEQ_BUILTIN, AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE,
AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_lane_builtin_data,
aarch64_init_fcmla_laneq_builtins, aarch64_expand_fcmla_builtin): New.
(aarch64_init_builtins): Add aarch64_init_fcmla_laneq_builtins.
(aarch64_expand_buildin): Add AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V2SF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V2SF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V2SF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ2700_V2SF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V4HF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V4HF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V4HF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ270_V4HF.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add __ARM_FEATURE_COMPLEX.
* config/aarch64/aarch64-simd-builtins.def (fcadd90, fcadd270, fcmla0, fcmla90,
fcmla180, fcmla270, fcmla_lane0, fcmla_lane90, fcmla_lane180, fcmla_lane270,
fcmla_laneq0, fcmla_laneq90, fcmla_laneq180, fcmla_laneq270,
fcmlaq_lane0, fcmlaq_lane90, fcmlaq_lane180, fcmlaq_lane270): New.
* config/aarch64/aarch64-simd.md (aarch64_fcmla_lane<rot><mode>,
aarch64_fcmla_laneq<rot>v4hf, aarch64_fcmlaq_lane<rot><mode>,aarch64_fcadd<rot><mode>,
aarch64_fcmla<rot><mode>): New.
* config/aarch64/arm_neon.h:
(vcadd_rot90_f16): New.
(vcaddq_rot90_f16): New.
(vcadd_rot270_f16): New.
(vcaddq_rot270_f16): New.
(vcmla_f16): New.
(vcmlaq_f16): New.
(vcmla_lane_f16): New.
(vcmla_laneq_f16): New.
(vcmlaq_lane_f16): New.
(vcmlaq_rot90_lane_f16): New.
(vcmla_rot90_laneq_f16): New.
(vcmla_rot90_lane_f16): New.
(vcmlaq_rot90_f16): New.
(vcmla_rot90_f16): New.
(vcmlaq_laneq_f16): New.
(vcmla_rot180_laneq_f16): New.
(vcmla_rot180_lane_f16): New.
(vcmlaq_rot180_f16): New.
(vcmla_rot180_f16): New.
(vcmlaq_rot90_laneq_f16): New.
(vcmlaq_rot270_laneq_f16): New.
(vcmlaq_rot270_lane_f16): New.
(vcmla_rot270_laneq_f16): New.
(vcmlaq_rot270_f16): New.
(vcmla_rot270_f16): New.
(vcmlaq_rot180_laneq_f16): New.
(vcmlaq_rot180_lane_f16): New.
(vcmla_rot270_lane_f16): New.
(vcadd_rot90_f32): New.
(vcaddq_rot90_f32): New.
(vcaddq_rot90_f64): New.
(vcadd_rot270_f32): New.
(vcaddq_rot270_f32): New.
(vcaddq_rot270_f64): New.
(vcmla_f32): New.
(vcmlaq_f32): New.
(vcmlaq_f64): New.
(vcmla_lane_f32): New.
(vcmla_laneq_f32): New.
(vcmlaq_lane_f32): New.
(vcmlaq_laneq_f32): New.
(vcmla_rot90_f32): New.
(vcmlaq_rot90_f32): New.
(vcmlaq_rot90_f64): New.
(vcmla_rot90_lane_f32): New.
(vcmla_rot90_laneq_f32): New.
(vcmlaq_rot90_lane_f32): New.
(vcmlaq_rot90_laneq_f32): New.
(vcmla_rot180_f32): New.
(vcmlaq_rot180_f32): New.
(vcmlaq_rot180_f64): New.
(vcmla_rot180_lane_f32): New.
(vcmla_rot180_laneq_f32): New.
(vcmlaq_rot180_lane_f32): New.
(vcmlaq_rot180_laneq_f32): New.
(vcmla_rot270_f32): New.
(vcmlaq_rot270_f32): New.
(vcmlaq_rot270_f64): New.
(vcmla_rot270_lane_f32): New.
(vcmla_rot270_laneq_f32): New.
(vcmlaq_rot270_lane_f32): New.
(vcmlaq_rot270_laneq_f32): New.
* config/aarch64/aarch64.h (TARGET_COMPLEX): New.
* config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New.
(FCADD, FCMLA): New.
(rot): New.
(FCMLA_maybe_lane): New.
* config/arm/types.md (neon_fcadd, neon_fcmla): New.

gcc/testsuite/
PR target/71233
* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache,
check_effective_target_arm_v8_3a_complex_neon_ok,
add_options_for_arm_v8_3a_complex_neon,
check_effective_target_arm_v8_3a_complex_neon_hw,
check_effective_target_vect_complex_rot_N): New.
* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: New test.

AArch64: Implement __rndr, __rndrrs intrinsics

This patch implements the recently published[1] __rndr and __rndrrs
intrinsics used to access the RNG in Armv8.5-A.
The __rndrrs intrinsics can be used to reseed the generator too.
They are guarded by the __ARM_FEATURE_RNG feature macro.
A quirk with these intrinsics is that they store the random number in
their pointer argument and return a status
code if the generation succeeded.

The instructions themselves write the CC flags indicating the success of
the operation that we can then read with a CSET.
Therefore this implementation makes use of the IGNORE indicator to the
builtin expand machinery to avoid generating
the CSET if its result is unused (the CC reg clobbering effect is still
reflected in the pattern).
I've checked that using unspec_volatile prevents undesirable CSEing of
the instructions.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

gcc/
PR target/71233
* config/aarch64/aarch64.md (UNSPEC_RNDR, UNSPEC_RNDRRS):
Define.
(aarch64_rndr): New define_insn.
(aarch64_rndrrs): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_RNG): Define.
(TARGET_RNG): Likewise.
(AARCH64_FL_RNG): Likewise.
* config/aarch64/aarch64-option-extensions.def (rng): Define.
* config/aarch64/aarch64-builtins.c (enum aarch64_builtins):
Add AARCH64_BUILTIN_RNG_RNDR, AARCH64_BUILTIN_RNG_RNDRRS.
(aarch64_init_rng_builtins): Define.
(aarch64_init_builtins): Call aarch64_init_rng_builtins.
(aarch64_expand_rng_builtin): Define.
(aarch64_expand_builtin): Use IGNORE argument, handle
RNG builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_RNG when TARGET_RNG.
* config/aarch64/arm_acle.h (__rndr, __rndrrs): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/rng_1.c: New test.

Daily bump.

rtl_data: Add sp_is_clobbered_by_asm

Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.

gcc/

PR target/97032
* cfgexpand.c (expand_asm_stmt): Set sp_is_clobbered_by_asm to
true if the stack pointer is clobbered by asm statement.
* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
if the stack pointer is clobbered by asm statement.

gcc/testsuite/

PR target/97032
* gcc.target/i386/pr97032.c: New test.

(cherry picked from commit 453a20c65722719b9e2d84339f215e7ec87692dc)

Add support for __jcvt intrinsic

This patch implements the __jcvt ACLE intrinsic [1] that maps down to the FJCVTZS [2] instruction from Armv8.3-a.
No fancy mode iterators or nothing. Just a single builtin, UNSPEC and define_insn and the associate plumbing.
This patch also defines __ARM_FEATURE_JCVT to indicate when the intrinsic is available.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/fjcvtzs-floating-point-javascript-convert-to-signed-fixed-point-rounding-toward-zero

gcc/
PR target/71233
* config/aarch64/aarch64.md (UNSPEC_FJCVTZS): Define.
(aarch64_fjcvtzs): New define_insn.
* config/aarch64/aarch64.h (TARGET_JSCVT): Define.
* config/aarch64/aarch64-builtins.c (aarch64_builtins):
Add AARCH64_JSCVT.
(aarch64_init_builtins): Initialize __builtin_aarch64_jcvtzs.
(aarch64_expand_builtin): Handle AARCH64_JSCVT.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_JCVT where appropriate.
* config/aarch64/arm_acle.h (__jcvt): Define.
* doc/sourcebuild.texi (aarch64_fjcvtzs_hw) Document new
target supports option.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/jcvt_1.c: New test.
* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.

Co-Authored-By: Andrea Corallo <andrea.corallo@arm.com>
(cherry picked from commit e1d5d19ec4f84b67ac693fef5b2add7dc9cf056d)
(cherry picked from commit 2c62952f8160bdc8d4111edb34a4bc75096c1e05)
(cherry picked from commit d2b86e14c14020f3e119ab8f462e2a91bd7d46e5)
(cherry picked from commit 58ae77d3ba70a2b9ccc90a90f3f82cf46239d5f1)

AArch64: Update Armv8.4-a's FP16 FML intrinsics

This patch updates the Armv8.4-a FP16 FML intrinsics's suffixes from u32 to f16
to be more consistent with the naming convention for intrinsics.

The specifications for these intrinsics have not been published yet so we do
not need to maintain the old names.

The patch was created with the following script:

grep -lIE "(vfml[as].+)_u32" -r gcc/ | grep -iEv ".+Changelog.*" \
| xargs sed -i -E -e "s/(vfml[as].+)_u32/\1_f16/g"

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
vfmlalq_low_u32, vfmlslq_low_u32, vfmlal_high_u32, vfmlsl_high_u32,
vfmlalq_high_u32, vfmlslq_high_u32, vfmlal_lane_low_u32,
vfmlsl_lane_low_u32, vfmlal_laneq_low_u32, vfmlsl_laneq_low_u32,
vfmlalq_lane_low_u32, vfmlslq_lane_low_u32, vfmlalq_laneq_low_u32,
vfmlslq_laneq_low_u32, vfmlal_lane_high_u32, vfmlsl_lane_high_u32,
vfmlal_laneq_high_u32, vfmlsl_laneq_high_u32, vfmlalq_lane_high_u32,
vfmlslq_lane_high_u32, vfmlalq_laneq_high_u32, vfmlslq_laneq_high_u32):
Rename ...
(vfmlal_low_f16, vfmlsl_low_f16, vfmlalq_low_f16, vfmlslq_low_f16,
vfmlal_high_f16, vfmlsl_high_f16, vfmlalq_high_f16, vfmlslq_high_f16,
vfmlal_lane_low_f16, vfmlsl_lane_low_f16, vfmlal_laneq_low_f16,
vfmlsl_laneq_low_f16, vfmlalq_lane_low_f16, vfmlslq_lane_low_f16,
vfmlalq_laneq_low_f16, vfmlslq_laneq_low_f16, vfmlal_lane_high_f16,
vfmlsl_lane_high_f16, vfmlal_laneq_high_f16, vfmlsl_laneq_high_f16,
vfmlalq_lane_high_f16, vfmlslq_lane_high_f16, vfmlalq_laneq_high_f16,
vfmlslq_laneq_high_f16): ... To this.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/fp16_fmul_high.h (test_vfmlal_high_u32,
test_vfmlalq_high_u32, test_vfmlsl_high_u32, test_vfmlslq_high_u32):
Rename ...
(test_vfmlal_high_f16, test_vfmlalq_high_f16, test_vfmlsl_high_f16,
test_vfmlslq_high_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_lane_high.h (test_vfmlal_lane_high_u32,
tets_vfmlsl_lane_high_u32, test_vfmlal_laneq_high_u32,
test_vfmlsl_laneq_high_u32, test_vfmlalq_lane_high_u32,
test_vfmlslq_lane_high_u32, test_vfmlalq_laneq_high_u32,
test_vfmlslq_laneq_high_u32): Rename ...
(test_vfmlal_lane_high_f16, tets_vfmlsl_lane_high_f16,
test_vfmlal_laneq_high_f16, test_vfmlsl_laneq_high_f16,
test_vfmlalq_lane_high_f16, test_vfmlslq_lane_high_f16,
test_vfmlalq_laneq_high_f16, test_vfmlslq_laneq_high_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_lane_low.h (test_vfmlal_lane_low_u32,
test_vfmlsl_lane_low_u32, test_vfmlal_laneq_low_u32,
test_vfmlsl_laneq_low_u32, test_vfmlalq_lane_low_u32,
test_vfmlslq_lane_low_u32, test_vfmlalq_laneq_low_u32,
test_vfmlslq_laneq_low_u32): Rename ...
(test_vfmlal_lane_low_f16, test_vfmlsl_lane_low_f16,
test_vfmlal_laneq_low_f16, test_vfmlsl_laneq_low_f16,
test_vfmlalq_lane_low_f16, test_vfmlslq_lane_low_f16,
test_vfmlalq_laneq_low_f16, test_vfmlslq_laneq_low_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_low.h (test_vfmlal_low_u32,
test_vfmlalq_low_u32, test_vfmlsl_low_u32, test_vfmlslq_low_u32):
Rename ...
(test_vfmlal_low_f16, test_vfmlalq_low_f16, test_vfmlsl_low_f16,
test_vfmlslq_low_f16): ... To This.
* lib/target-supports.exp
(check_effective_target_arm_fp16fml_neon_ok_nocache): Update test.

(cherry picked from commit 9d04c986b6faed878dbcc86d2f9392a721a3936e)

Add missing AArch64 NEON instrinctics for Armv8.2-a to Armv8.4-a

This patch adds the missing neon intrinsics for all 128 bit vector Integer modes for the
three-way XOR and negate and xor instructions for Arm8.2-a to Armv8.4-a.

gcc/
PR target/71233
* config/aarch64/aarch64-simd.md (aarch64_eor3qv8hi): Change to
eor3q<mode>4.
(aarch64_bcaxqv8hi): Change to bcaxq<mode>4.
* config/aarch64/aarch64-simd-builtins.def (veor3q_u8, veor3q_u32,
veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8,
vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
vbcaxq_s64): New.
* config/aarch64/arm_neon.h: Likewise.
* config/aarch64/iterators.md (VQ_I): New.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/sha3.h (veor3q_u8, veor3q_u32,
veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8,
vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
vbcaxq_s64): New.
* gcc.target/aarch64/sha3_1.c: Likewise.
* gcc.target/aarch64/sha3_2.c: Likewise.
* gcc.target/aarch64/sha3_3.c: Likewise.

(cherry picked from commit d21052ebd7ac9d545a26dde3229c57f872c1d5f3)

aarch64: Add support for Neoverse V1 CPU

This patch backports the AArch64 support for Arm's Neoverse V1 CPU to
GCC 8.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Neoverse V1.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document support for Neoverse V1.

Daily bump.

[AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64.

gcc/
2018-08-31 Vlad Lazar <vlad.lazar@arm.com>

PR target/71233
* config/aarch64/arm_neon.h (vabsd_s64): New.
(vnegd_s64): Likewise.

gcc/testsuite/
2018-08-31 Vlad Lazar <vlad.lazar@arm.com>

PR target/71233
* gcc.target/aarch64/scalar_intrinsics.c (test_vnegd_s64): New.
* gcc.target/aarch64/vneg_s.c (RUN_TEST_SCALAR): New.
(test_vnegd_s64): Likewise.
* gcc.target/aarch64/vnegd_s64.c: New.
* gcc.target/aarch64/vabsd_s64.c: New.

(cherry picked from commit 66da5b53107962a1c115a9686f2220de27f276f7)

libstdc++: Use correct argument type for __use_alloc [PR 96803]

The _Tuple_impl constructor for allocator-extended construction from a
different tuple type uses the _Tuple_impl's own _Head type in the
__use_alloc test. That is incorrect, because the argument tuple could
have a different type. Using the wrong type might select the
leading-allocator convention when it should use the trailing-allocator
convention, or vice versa.

This backport includes the value category fix from r11-3348.

libstdc++-v3/ChangeLog:

PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl<U...>&)):
Replace parameter pack with a type parameter and a pack and pass
the first type to __use_alloc.
* testsuite/20_util/tuple/cons/96803.cc: New test.

(cherry picked from commit 5494edae83ad33c769bd1ebc98f0c492453a6417)

Daily bump.

Fortran: Avoid double-free with parse error (PR96041, PR93423)

gcc/fortran/

PR fortran/96041
PR fortran/93423
* decl.c (gfc_match_submod_proc): Avoid later double-free
in the error case.

(cherry picked from commit c12facd22881517127ebbe213d7ecc7fc1fcea4e)

PR fortran/93423 - ICE on invalid with argument list for module procedure

When recovering from an error, a NULL pointer dereference could occur.
Check for that situation and punt.

gcc/fortran/
PR fortran/93423
* resolve.c (resolve_symbol): Avoid NULL pointer dereference.

(cherry picked from commit b88744905a46be44ffa3c57d46080f601ae832b8)

Daily bump.

store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053]

As the testcases show, if we have something like:
  MEM <char[12]> [&b + 8B] = {};
  MEM[(short *) &b] = 5;
  _5 = *x_4(D);
  MEM <long long unsigned int> [&b + 2B] = _5;
  MEM[(char *)&b + 16B] = 88;
  MEM[(int *)&b + 20B] = 1;
then in sort_by_bitpos the stores are almost like in the given order,
except the first store is after the = _5; store.
We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
while the former INTEGER_CST, and we can't coalesce the = _5 store with
the = {} store because the former is MEM_REF, the latter INTEGER_CST.
But we happily coalesce the remaining 3 stores, which is wrong, because the
= _5; store overlaps those and is in between them in the program order.
We already have code to deal with similar cases in check_no_overlap, but we
deal only with the following stores in sort_by_bitpos order, not the earlier
ones.

The following patch checks also the earlier ones.  In coalesce_immediate_stores
it computes the first one that needs to be checked (all the ones whose
bitpos + bitsize is smaller or equal to merged_store->start don't need to be
checked and don't need to be checked even for any following attempts because
of the sort_by_bitpos sorting) and the end of that (that is the first store
in the merged_store).

2020-09-16  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/97053
* gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
START, FIRST_EARLIER and LAST_EARLIER arguments.  Return false if
any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
be merged store.
(imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
Adjust check_no_overlap caller.
(imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
and last_earlier variables, adjust them during iterations.  Adjust
check_no_overlap callers, call check_no_overlap even when extending
overlapping stores by extra INTEGER_CST stores.

* gcc.dg/store_merging_31.c: New test.
* gcc.dg/store_merging_32.c: New test.

(cherry picked from commit bd909071ac04e94f4b6f0baab64d0687ec55681d)

arm: Extend the PR94780 fix to arm

Essentially the same fix as for x86.

2020-04-29 Richard Sandiford <richard.sandiford@arm.com>

gcc/
* config/arm/arm-builtins.c (arm_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for the first assignments to
fenv_var and new_fenv_var.

(cherry picked from commit 1d7ead9cba91533291e0048d22b711ca124e19de)

Daily bump.

rs6000: Properly handle LE index munging in vec_shr (PR94710)

The PR shows the compiler crashing with -mvsx -mlittle -O0. This turns
out to be caused by a failure to make of the higher bits in an index
endian conversion.

2020-04-24 Segher Boessenkool <segher@kernel.crashing.org>

PR target/94710
* config/rs6000/vector.md (vec_shr_<mode> for VEC_L): Correct little
endian byteshift_val calculation.

(cherry picked from commit 9c725245beed2f056b67f5dc218fef6cb869c5f2)

Fix up ChangeLog entries.

dwarf2out: Fix up dwarf2out_next_real_insn caching [PR96729]

The addition of NOTE_INSN_BEGIN_STMT and NOTE_INSN_INLINE_ENTRY notes
reintroduced quadratic behavior into dwarf2out_var_location.
This function needs to know the next real instruction to which the var
location note applies, but the way final_scan_insn is called outside of
final.c main loop doesn't make it easy to look up the next real insn in
there (and for non-dwarf it is even useless).  Usually next real insn is
only a few notes away, but we can have hundreds of thousands of consecutive
notes only followed by a real insn.  dwarf2out_var_location to avoid the
quadratic behavior contains a cache, it remembers the next note and when it
is called again on that loc_note, it can use the previously computed
dwarf2out_next_real_insn result, rather than walking the insn chain once
again.  But, for NOTE_INSN_{BEGIN_STMT,INLINE_ENTRY} dwarf2out_var_location
is not called while the code puts into the cache those notes, which means if
we have e.g. in the worst case NOTE_INSN_VAR_LOCATION and
NOTE_INSN_BEGIN_STMT notes alternating, the cache is not really used.

The following patch fixes it by looking up the next NOTE_INSN_VAR_LOCATION
if any.  While the lookup could be perhaps done together with looking for
the next real insn once (e.g. in dwarf2out_next_real_insn or its copy),
there are other dwarf2out_next_real_insn callers which don't need/want that
behavior and if there are more than two NOTE_INSN_VAR_LOCATION notes
followed by the same real insn, we need to do that "find next
NOTE_INSN_VAR_LOCATION" walk anyway.

On the testcase from the PR this patch speeds it 2.8times, from 0m0.674s
to 0m0.236s (why it takes for the reporter more than 60s is unknown).

2020-08-26  Jakub Jelinek  <jakub@redhat.com>

PR debug/96729
* dwarf2out.c (dwarf2out_next_real_insn): Adjust function comment.
(dwarf2out_var_location): Look for next_note only if next_real is
non-NULL, in that case look for the first non-deleted
NOTE_INSN_VAR_LOCATION between loc_note and next_real, if any.

(cherry picked from commit ca1afa261d03c9343dff1208325f87d9ba69ec7a)

gimple: Ignore *0 = {CLOBBER} in path isolation [PR96722]

Clobbers of MEM_REF with NULL address are just fancy nops, something we just
ignore and don't emit any code for it (ditto for other clobbers), they just
mark end of life on something, so we shouldn't infer from those that there
is some UB.

2020-08-25 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/96722
* gimple.c (infer_nonnull_range): Formatting fix.
(infer_nonnull_range_by_dereference): Return false for clobber stmts.

* g++.dg/opt/pr96722.C: New test.

(cherry picked from commit a5b15fcb954ba63d58f0daa700281aba33b5f24a)

c: Fix -Wunused-but-set-* warning with _Generic [PR96571]

The following testcase shows various problems with -Wunused-but-set*
warnings and _Generic construct.  I think it is best to treat the selector
and the ignored expressions as (potentially) read, because when they are
parsed, the vars in there are already marked as TREE_USED.

2020-08-18  Jakub Jelinek  <jakub@redhat.com>

PR c/96571
* c-parser.c (c_parser_generic_selection): Change match_found from bool
to int, holding index of the match.  Call mark_exp_read on the selector
expression and on expressions other than the selected one.

* gcc.dg/Wunused-var-4.c: New test.

(cherry picked from commit 6d42cbe5ad7a7b46437f2576c9920e44dc14b386)

c-family: Fix ICE in get_atomic_generic_size [PR96545]

As the testcase shows, we would ICE if the type of the first argument of
various atomic builtins was pointer to (non-void) incomplete type, we would
assume that TYPE_SIZE_UNIT must be non-NULL.  This patch diagnoses it
instead.  And also changes the TREE_CODE != INTEGER_CST check to
!tree_fits_uhwi_p, as we use tree_to_uhwi after this and at least in theory
the int could be too large and not fit.

2020-08-11  Jakub Jelinek  <jakub@redhat.com>

PR c/96545
* c-common.c (get_atomic_generic_size): Require that first argument's
type points to a complete type and use tree_fits_uhwi_p instead of
just INTEGER_CST TREE_CODE check for the TYPE_SIZE_UNIT.

* c-c++-common/pr96545.c: New test.

(cherry picked from commit 7840b4dc05539cf5575b3e9ff57ff5f6c3da2cae)

openmp: Handle clauses with gimple sequences in convert_nonlocal_omp_clauses properly

If the walk_body on the various sequences of reduction, lastprivate and/or linear
clauses needs to create a temporary variable, we should declare that variable
in that sequence rather than outside, where it would need to be privatized inside of
the construct.

2020-08-08 Jakub Jelinek <jakub@redhat.com>

PR fortran/93553
* tree-nested.c (convert_nonlocal_omp_clauses): For
OMP_CLAUSE_REDUCTION, OMP_CLAUSE_LASTPRIVATE and OMP_CLAUSE_LINEAR
save info->new_local_var_chain around walks of the clause gimple
sequences and declare_vars if needed into the sequence.

2020-08-08 Tobias Burnus <tobias@codesourcery.com>

PR fortran/93553
* testsuite/libgomp.fortran/pr93553.f90: New test.

(cherry picked from commit 676b5525e8333005bdc1c596ed086f1da27a450f)

fix _mm512_{,mask_}cmp*_p[ds]_mask at -O0 [PR96174]

The _mm512_{,mask_}cmp_p[ds]_mask and also _mm_{,mask_}cmp_s[ds]_mask
intrinsics have an argument which must have a constant passed to it
and so use an inline version only for ifdef __OPTIMIZE__ and have
a #define for -O0. But the _mm512_{,mask_}cmp*_p[ds]_mask intrinsics
don't need a constant argument, they are essentially the first
set with the constant added to them implicitly based on the comparison
name, and so there is no #define version for them (correctly).
But their inline versions are defined in between the first and s[ds]
set and so inside of ifdef __OPTIMIZE__, which means that with -O0
they aren't defined at all.

This patch fixes that by moving those after the #ifdef __OPTIMIZE #else
use #define #endif block.

2020-07-15 Jakub Jelinek <jakub@redhat.com>

PR target/96174
* config/i386/avx512fintrin.h (_mm512_cmpeq_pd_mask,
_mm512_mask_cmpeq_pd_mask, _mm512_cmplt_pd_mask,
_mm512_mask_cmplt_pd_mask, _mm512_cmple_pd_mask,
_mm512_mask_cmple_pd_mask, _mm512_cmpunord_pd_mask,
_mm512_mask_cmpunord_pd_mask, _mm512_cmpneq_pd_mask,
_mm512_mask_cmpneq_pd_mask, _mm512_cmpnlt_pd_mask,
_mm512_mask_cmpnlt_pd_mask, _mm512_cmpnle_pd_mask,
_mm512_mask_cmpnle_pd_mask, _mm512_cmpord_pd_mask,
_mm512_mask_cmpord_pd_mask, _mm512_cmpeq_ps_mask,
_mm512_mask_cmpeq_ps_mask, _mm512_cmplt_ps_mask,
_mm512_mask_cmplt_ps_mask, _mm512_cmple_ps_mask,
_mm512_mask_cmple_ps_mask, _mm512_cmpunord_ps_mask,
_mm512_mask_cmpunord_ps_mask, _mm512_cmpneq_ps_mask,
_mm512_mask_cmpneq_ps_mask, _mm512_cmpnlt_ps_mask,
_mm512_mask_cmpnlt_ps_mask, _mm512_cmpnle_ps_mask,
_mm512_mask_cmpnle_ps_mask, _mm512_cmpord_ps_mask,
_mm512_mask_cmpord_ps_mask): Move outside of __OPTIMIZE__ guarded
section.

* gcc.target/i386/avx512f-vcmppd-3.c: New test.
* gcc.target/i386/avx512f-vcmpps-3.c: New test.

(cherry picked from commit 12d69dbfff9dd5ad4a30b20d1636f5cab6425e8c)

tree-cfg: Fix ICE with switch stmt to unreachable opt and forced labels [PR95857]

The following testcase ICEs, because during the cfg cleanup, we see:
  switch (i$e_11) <default: <L12> [33.33%], case -3: <lab2> [33.33%], case 0: <L10> [33.33%], case 2: <lab2> [33.33%]>
...
lab2:
  __builtin_unreachable ();
where lab2 is FORCED_LABEL.  The way it works, we go through the case labels
and when we reach the first one that points to gimple_seq_unreachable*
basic block, we remove the edge (if any) from the switch bb to the bb
containing the label and bbs reachable only through that edge we've just
removed.  Once we do that, we must throw away all other cases that use
the same label (or some other labels from the same bb we've removed the edge
to and the bb).  To avoid quadratic behavior, this is not done by walking
all remaining cases immediately before removing, but only when processing
them later.
For normal labels this works, fine, if the label is in a deleted bb, it will
have NULL label_to_block and we handle that case, or, if the unreachable bb
has some other edge to it, only the edge will be removed and not the bb,
and again, find_edge will not find the edge and we only remove the case.
And if a label would be to some other block, that other block wouldn't have
been removed earlier because there would be still an edge from the switch
block.
Now, FORCED_LABEL (and I think DECL_NONLOCAL too) break this, because
those labels aren't removed, but instead moved to some surrounding basic
block.  So, when we later process those, when their gimple_seq_unreachable*
basic block is removed, label_to_block will return some unrelated block
(in the testcase the switch bb), so we decide to keep the case which doesn't
seem to be unreachable, but we don't really have an edge from the switch
block to the block the label got moved to.

I thought first about punting in gimple_seq_unreachable* on
FORCED_LABEL/DECL_NONLOCAL labels, but that might penalize even code that
doesn't care, so this instead just makes sure that for
FORCED_LABEL/DECL_NONLOCAL labels that are being removed (and thus moved
randomly) we remember in a hash_set the fact that those labels should be
treated as removed for the purpose of the optimization, and later on
handle those labels that way.

2020-07-02  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/95857
* tree-cfg.c (group_case_labels_stmt): When removing an unreachable
base_bb, remember all forced and non-local labels on it and later
treat those as if they have NULL label_to_block.  Formatting fix.
Fix a comment typo.

* gcc.dg/pr95857.c: New test.

(cherry picked from commit 00f24f56732861d09a9716fa5b6b8a96c2289143)

c-family: Use TYPE_OVERFLOW_UNDEFINED instead of !TYPE_UNSIGNED in pointer_sum [PR95903]

For lp64 targets and int off ... ptr[off + 1]
is lowered in pointer_sum to *(ptr + ((sizetype) off + (sizetype) 1)).
That is fine when signed integer wrapping is undefined (and is not done
already if off has unsigned type), but changes behavior for -fwrapv, where
overflow is well defined.  Runtime test could be:
int
main ()
{
  char *p = __builtin_malloc (0x100000000UL);
  if (!p) return 0;
  char *q = p + 0x80000000UL;
  int o = __INT_MAX__;
  q[o + 1] = 1;
  if (q[-__INT_MAX__ - 1] != 1) __builtin_abort ();
  return 0;
}
with -fwrapv or so, not included in the testsuite because it requires 4GB
allocation (with some other test it would be enough to have something
slightly above 2GB, but still...).

2020-06-27  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/95903
gcc/c-family/
* c-common.c (pointer_int_sum): Use TYPE_OVERFLOW_UNDEFINED instead of
!TYPE_UNSIGNED check to see if we can apply distributive law and handle
smaller precision intop operands separately.
gcc/testsuite/
* c-c++-common/pr95903.c: New test.

(cherry picked from commit 37995960984ea2222346dd9d168d332cd6f7adf0)

c++: Try to complete decomp types [PR95328]

Two years ago Paolo has added the
  else if (processing_template_decl && !COMPLETE_TYPE_P (type))
    pedwarn (...);
lines into cp_finish_decomp.  For type dependent decl we punt much earlier,
but even for types which aren't type dependent COMPLETE_TYPE_P might be
false as this testcase shows, so this patch tries to complete_type first
(the reason for writing it that way is that it is then followed by another
else if and if complete_type returns error_mark_node, we shouldn't report
anything, as a bug should have been reported already.

2020-05-28  Jakub Jelinek  <jakub@redhat.com>

PR c++/95328
* decl.c (cp_finish_decomp): Call complete_type before checking
COMPLETE_TYPE_P.

* g++.dg/cpp1z/decomp53.C: New test.

(cherry picked from commit 3d8d5ddb539a5254c7ef83414377f4c74c7701d4)

openmp: Fix placement of 2nd+ preparation statement for PHIs in simd clone lowering [PR95108]

For normal stmts, preparation statements are inserted before the stmt, so if we need multiple,
they are in the correct order, but for PHIs we emit them after labels in the entry successor
bb, and we used to emit them in the reverse order that way.

2020-05-14 Jakub Jelinek <jakub@redhat.com>

PR middle-end/95108
* omp-simd-clone.c (struct modify_stmt_info): Add after_stmt member.
(ipa_simd_modify_stmt_ops): For PHIs, only add before first stmt in
entry block if info->after_stmt is NULL, otherwise add after that stmt
and update it after adding each stmt.
(ipa_simd_modify_function_body): Initialize info.after_stmt.

* gcc.dg/gomp/pr95108.c: New test.

(cherry picked from commit d0fb9ffc1b8f3b86bbdf0e915cec2136141b329b)

Fix -fcompare-debug issue in purge_dead_edges [PR95080]

The following testcase fails with -fcompare-debug, the bug used to be latent
since introduction of -fcompare-debug.
The loop at the start of purge_dead_edges behaves differently between -g0
and -g - if the last insn is a DEBUG_INSN, then it skips not just
DEBUG_INSNs but also NOTEs until it finds some other real insn (or bb head),
while with -g0 it will not skip any NOTEs, so if we have
real_insn
note
debug_insn // not present with -g0
then with -g it might remove useless REG_EH_REGION from real_insn, while
with -g0 it will not.

Yet another option would be not skipping NOTE_P in the loop; I couldn't find
in history rationale for why it is done.

2020-05-13 Jakub Jelinek <jakub@redhat.com>

PR debug/95080
* cfgrtl.c (purge_dead_edges): Skip over debug and note insns even
if the last insn is a note.

* g++.dg/opt/pr95080.C: New test.

(cherry picked from commit 18edc195442291525e04f0fa4d5ef972155117da)

c++: Avoid strict_aliasing_warning on dependent types or expressions [PR94951]

The following testcase gets a bogus warning during build_base_path,
when cp_build_indirect_ref* calls strict_aliasing_warning with a dependent
expression. IMHO calling get_alias_set etc. on dependent types feels wrong
to me, we should just defer the warnings in those cases until instantiation
and only handle the cases where neither type nor expr are dependent.

2020-05-06 Jakub Jelinek <jakub@redhat.com>

PR c++/94951
* typeck.c (cp_strict_aliasing_warning): New function.
(cp_build_indirect_ref_1, build_reinterpret_cast_1): Use
it instead of strict_aliasing_warning.

* g++.dg/warn/Wstrict-aliasing-bogus-tmpl.C: New test.

(cherry picked from commit d82414ebcf7716ea24688510594a2c464a105908)

riscv: Fix up riscv_atomic_assign_expand_fenv [PR94950]

Similarly to the fixes on many other targets, riscv needs to use TARGET_EXPR
to avoid having the create_tmp_var_raw temporaries without proper DECL_CONTEXT
and not mentioned in local decls.

2020-05-06 Jakub Jelinek <jakub@redhat.com>

PR target/94950
* config/riscv/riscv-builtins.c (riscv_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for first assignment to old_flags.

(cherry picked from commit 5454a13add37fa6a8eedbf9d2f6bdc63a7825e2c)

combine: Don't replace SET_SRC with REG_EQUAL note content if SET_SRC has side-effects [PR94873]

There were some discussions about whether REG_EQUAL notes are valid on insns with a single
set which contains auto-inc-dec side-effects in the SET_SRC and the majority thinks that
it should be valid. So, this patch fixes the combiner to punt in that case, because otherwise
the auto-inc-dec side-effects from the SET_SRC are lost.

2020-05-06 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/94873
* combine.c (combine_instructions): Don't optimize using REG_EQUAL
note if SET_SRC (set) has side-effects.

* gcc.dg/pr94873.c: New test.

(cherry picked from commit 8982e39b46b1e4a4b09022ddebd758b77ab73bac)

c: Fix ICE with _Atomic side-effect in nested fn param decls [PR94842]

If there are _Atomic side-effects in the parameter declarations
of non-nested function, when they are parsed, current_function_decl is
NULL, the create_artificial_label created labels during build_atomic* are
then adjusted by store_parm_decls through set_labels_context_r callback.
Unfortunately, if such thing happens in nested function parameter
declarations, while those decls are parsed current_function_decl is the
parent function (and am not sure it is a good idea to temporarily clear it,
some code perhaps should be aware it is in a nested function, or it can
refer to variables from the parent function etc.) and that means
store_param_decls through set_labels_context_r doesn't adjust anything.
As those labels are emitted in the nested function body rather than in the
parent, I think it is ok to override the context in those cases.

2020-04-30 Jakub Jelinek <jakub@redhat.com>

PR c/94842
* c-decl.c (set_labels_context_r): In addition to context-less
LABEL_DECLs adjust also LABEL_DECLs with context equal to
parent function if any.
(store_parm_decls): Adjust comment.

* gcc.dg/pr94842.c: New test.

(cherry picked from commit 61fb8963c22d91152a9c46a3512307bef3b3d7f7)

tilegx: Unbreak build

../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '1' (via 'I124MODE:n') or '4' (via 'I48MODE:n')
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '1' (via 'I124MODE:n') or '' (via 'I48MODE:n')
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '2' (via 'I124MODE:n') or '4' (via 'I48MODE:n')
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '2' (via 'I124MODE:n') or '' (via 'I48MODE:n')
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '4' (via 'I124MODE:n') or '' (via 'I48MODE:n')

The insn name already uses <I124MODE:n> explicitly, just the preparation
stmts don't, and as it creates a I124MODE lowpart subreg of a word mode
register, <I124MODE:n> seems obviously correct.

2020-05-02 Jakub Jelinek <jakub@redhat.com>

* config/tilegx/tilegx.md
(insn_stnt<I124MODE:n>_add<I48MODE:bitsuffix>): Use <I124MODE:n>
rather than just <n>.

(cherry picked from commit 0118d0397f94c307b76aa14abec99347a93da621)

x86: Fix -O0 remaining intrinsic macros [PR94832]

A few other macros seem to suffer from the same issue. What I've done was:
cat gcc/config/i386/*intrin.h | sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' \
| grep '^[[:blank:]]*#[[:blank:]]*define[[:blank:]].*(' | sed 's/[ ]\+/ /g' \
> /tmp/macros
and then looking for regexps:
)[a-zA-Z]
) [a-zA-Z]
[a-zA-Z][-+*/%]
[a-zA-Z] [-+*/%]
[-+*/%][a-zA-Z]
[-+*/%] [a-zA-Z]
in the resulting file.

2020-04-29 Jakub Jelinek <jakub@redhat.com>

PR target/94832
* config/i386/avx512bwintrin.h (_mm512_alignr_epi8,
_mm512_mask_alignr_epi8, _mm512_maskz_alignr_epi8): Wrap macro operands
used in casts into parens.
* config/i386/avx512fintrin.h (_mm512_cvt_roundps_ph, _mm512_cvtps_ph,
_mm512_mask_cvt_roundps_ph, _mm512_mask_cvtps_ph,
_mm512_maskz_cvt_roundps_ph, _mm512_maskz_cvtps_ph,
_mm512_mask_cmp_epi64_mask, _mm512_mask_cmp_epi32_mask,
_mm512_mask_cmp_epu64_mask, _mm512_mask_cmp_epu32_mask,
_mm512_mask_cmp_round_pd_mask, _mm512_mask_cmp_round_ps_mask,
_mm512_mask_cmp_pd_mask, _mm512_mask_cmp_ps_mask): Likewise.
* config/i386/avx512vlbwintrin.h (_mm256_mask_alignr_epi8,
_mm256_maskz_alignr_epi8, _mm_mask_alignr_epi8, _mm_maskz_alignr_epi8,
_mm256_mask_cmp_epu8_mask): Likewise.
* config/i386/avx512vlintrin.h (_mm_mask_cvtps_ph, _mm_maskz_cvtps_ph,
_mm256_mask_cvtps_ph, _mm256_maskz_cvtps_ph): Likewise.
* config/i386/f16cintrin.h (_mm_cvtps_ph, _mm256_cvtps_ph): Likewise.
* config/i386/shaintrin.h (_mm_sha1rnds4_epu32): Likewise.

(cherry picked from commit 0c8217b16f307c3eedce8f22354714938613f701)

x86: Fix -O0 intrinsic *gather*/*scatter* macros [PR94832]

As reported in the PR, while most intrinsic -O0 macro argument uses
are properly wrapped in ()s or used in context where having a complex
expression passed as the argument doesn't pose a problem (e.g. when
macro argument use is in between commas, or between ( and comma, or
between comma and ) etc.), especially the gather/scatter macros don't do
this and if one passes to some macro e.g. x + y as argument, the
corresponding inline function would do cast on the argument, but
the macro does (int) ARG, then it is (int) x + y rather than (int) (x + y).

The following patch fixes those issues in *gather/*scatter*; additionally,
the AVX2 macros were passing incorrect mask of e.g.
(__v2df)_mm_set1_pd((double)(long long int) -1)
which is IMHO equivalent to
(__v2df){-1.0, -1.0}
when it really wants to pass __v2df vector with all bits set.
I've used what the inline functions use for those cases.

2020-04-29 Jakub Jelinek <jakub@redhat.com>

PR target/94832
* config/i386/avx2intrin.h (_mm_mask_i32gather_pd,
_mm256_mask_i32gather_pd, _mm_mask_i64gather_pd,
_mm256_mask_i64gather_pd, _mm_mask_i32gather_ps,
_mm256_mask_i32gather_ps, _mm_mask_i64gather_ps,
_mm256_mask_i64gather_ps, _mm_i32gather_epi64,
_mm_mask_i32gather_epi64, _mm256_i32gather_epi64,
_mm256_mask_i32gather_epi64, _mm_i64gather_epi64,
_mm_mask_i64gather_epi64, _mm256_i64gather_epi64,
_mm256_mask_i64gather_epi64, _mm_i32gather_epi32,
_mm_mask_i32gather_epi32, _mm256_i32gather_epi32,
_mm256_mask_i32gather_epi32, _mm_i64gather_epi32,
_mm_mask_i64gather_epi32, _mm256_i64gather_epi32,
_mm256_mask_i64gather_epi32): Surround macro parameter uses with
parens.
(_mm_i32gather_pd, _mm256_i32gather_pd, _mm_i64gather_pd,
_mm256_i64gather_pd, _mm_i32gather_ps, _mm256_i32gather_ps,
_mm_i64gather_ps, _mm256_i64gather_ps): Likewise. Don't use
as mask vector containing -1.0 or -1.0f elts, but instead vector
with all bits set using _mm*_cmpeq_p? with zero operands.
* config/i386/avx512fintrin.h (_mm512_i32gather_ps,
_mm512_mask_i32gather_ps, _mm512_i32gather_pd,
_mm512_mask_i32gather_pd, _mm512_i64gather_ps,
_mm512_mask_i64gather_ps, _mm512_i64gather_pd,
_mm512_mask_i64gather_pd, _mm512_i32gather_epi32,
_mm512_mask_i32gather_epi32, _mm512_i32gather_epi64,
_mm512_mask_i32gather_epi64, _mm512_i64gather_epi32,
_mm512_mask_i64gather_epi32, _mm512_i64gather_epi64,
_mm512_mask_i64gather_epi64, _mm512_i32scatter_ps,
_mm512_mask_i32scatter_ps, _mm512_i32scatter_pd,
_mm512_mask_i32scatter_pd, _mm512_i64scatter_ps,
_mm512_mask_i64scatter_ps, _mm512_i64scatter_pd,
_mm512_mask_i64scatter_pd, _mm512_i32scatter_epi32,
_mm512_mask_i32scatter_epi32, _mm512_i32scatter_epi64,
_mm512_mask_i32scatter_epi64, _mm512_i64scatter_epi32,
_mm512_mask_i64scatter_epi32, _mm512_i64scatter_epi64,
_mm512_mask_i64scatter_epi64): Surround macro parameter uses with
parens.
* config/i386/avx512pfintrin.h (_mm512_prefetch_i32gather_pd,
_mm512_prefetch_i32gather_ps, _mm512_mask_prefetch_i32gather_pd,
_mm512_mask_prefetch_i32gather_ps, _mm512_prefetch_i64gather_pd,
_mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_pd,
_mm512_mask_prefetch_i64gather_ps, _mm512_prefetch_i32scatter_pd,
_mm512_prefetch_i32scatter_ps, _mm512_mask_prefetch_i32scatter_pd,
_mm512_mask_prefetch_i32scatter_ps, _mm512_prefetch_i64scatter_pd,
_mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_pd,
_mm512_mask_prefetch_i64scatter_ps): Likewise.
* config/i386/avx512vlintrin.h (_mm256_mmask_i32gather_ps,
_mm_mmask_i32gather_ps, _mm256_mmask_i32gather_pd,
_mm_mmask_i32gather_pd, _mm256_mmask_i64gather_ps,
_mm_mmask_i64gather_ps, _mm256_mmask_i64gather_pd,
_mm_mmask_i64gather_pd, _mm256_mmask_i32gather_epi32,
_mm_mmask_i32gather_epi32, _mm256_mmask_i32gather_epi64,
_mm_mmask_i32gather_epi64, _mm256_mmask_i64gather_epi32,
_mm_mmask_i64gather_epi32, _mm256_mmask_i64gather_epi64,
_mm_mmask_i64gather_epi64, _mm256_i32scatter_ps,
_mm256_mask_i32scatter_ps, _mm_i32scatter_ps, _mm_mask_i32scatter_ps,
_mm256_i32scatter_pd, _mm256_mask_i32scatter_pd, _mm_i32scatter_pd,
_mm_mask_i32scatter_pd, _mm256_i64scatter_ps,
_mm256_mask_i64scatter_ps, _mm_i64scatter_ps, _mm_mask_i64scatter_ps,
_mm256_i64scatter_pd, _mm256_mask_i64scatter_pd, _mm_i64scatter_pd,
_mm_mask_i64scatter_pd, _mm256_i32scatter_epi32,
_mm256_mask_i32scatter_epi32, _mm_i32scatter_epi32,
_mm_mask_i32scatter_epi32, _mm256_i32scatter_epi64,
_mm256_mask_i32scatter_epi64, _mm_i32scatter_epi64,
_mm_mask_i32scatter_epi64, _mm256_i64scatter_epi32,
_mm256_mask_i64scatter_epi32, _mm_i64scatter_epi32,
_mm_mask_i64scatter_epi32, _mm256_i64scatter_epi64,
_mm256_mask_i64scatter_epi64, _mm_i64scatter_epi64,
_mm_mask_i64scatter_epi64): Likewise.

(cherry picked from commit 78cef09019cc9c80d1b39a49861f8827a2ee2e60)

rs6000: Fix rs6000_atomic_assign_expand_fenv [PR94826]

This is the rs6000 version of the earlier committed x86, aarch64 and arm
fixes, as create_tmp_var_raw is used because the C FE can call this outside
of function context, we need to make sure the first references to those
VAR_DECLs are through a TARGET_EXPR, so that it gets gimple_add_tmp_var
marked in whatever function it gets expanded in.  Without that DECL_CONTEXT
is NULL and the vars aren't added as local decls of the containing function.

2020-04-29  Jakub Jelinek  <jakub@redhat.com>

PR target/94826
* config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for first assignment to
fenv_var, fenv_clear and old_fenv variables.  For fenv_addr
take address of TARGET_EXPR of fenv_var with void_node initializer.
Formatting fixes.

(cherry picked from commit c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c)

pr94780.c fails with ICE on aarch64 [PR94820]

This is a simple fix for pr94820.
The PR was only fixed on i386, the same error was also reported on aarch64.
This function, because it is sometimes called even outside of function bodies, uses create_tmp_var_raw rather than create_tmp_var.
But in order for that to work, when first referenced, the VAR_DECLs need to appear in a TARGET_EXPR so that during gimplification
the var gets the right DECL_CONTEXT and is added to local decls. Without that, e.g. tree-nested.c ICEs on those.

2020-04-29 Haijian Zhang <z.zhanghaijian@huawei.com>

PR target/94820
* config/aarch64/aarch64-builtins.c
(aarch64_atomic_assign_expand_fenv): Use TARGET_EXPR instead of
MODIFY_EXPR for first assignment to fenv_cr, fenv_sr and
new_fenv_var.

(cherry picked from commit d81bc2af7d2700888e414eb5a322ff5f5b0df0bb)

tree: Fix up TREE_SIDE_EFFECTS on internal calls [PR94809]

On the following testcase, match.pd during GENERIC folding
changes the -1U / x < y into __imag__ .MUL_OVERFLOW (x, y),
but unfortunately unlike for normal calls nothing sets TREE_SIDE_EFFECTS on
the call. There is the process_call_operands function that non-internal
call creation calls and it is usable for internal calls too,
e.g. TREE_SIDE_EFFECTS is derived from checking whether the
call has side-effects (non-ECF_{CONST,PURE}; we have those for internal
calls) and from whether any of the arguments has TREE_SIDE_EFFECTS.

2020-04-28 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/94809
* tree.c (build_call_expr_internal_loc_array): Call
process_call_operands.

* gcc.c-torture/execute/pr94809.c: New test.

(cherry picked from commit 34f6b14ff33e0c64b3a4a1a2cd871df715d69151)

x86: Fix up ix86_atomic_assign_expand_fenv [PR94780]

This function, because it is sometimes called even outside of function
bodies, uses create_tmp_var_raw rather than create_tmp_var.  But in order
for that to work, when first referenced, the VAR_DECLs need to appear in a
TARGET_EXPR so that during gimplification the var gets the right
DECL_CONTEXT and is added to local decls.  Without that, e.g. tree-nested.c
ICEs on those.

2020-04-27  Jakub Jelinek  <jakub@redhat.com>

PR target/94780
* config/i386/i386.c (ix86_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for first assignment to
sw_var, exceptions_var, mxcsr_orig_var and mxcsr_mod_var.

* gcc.dg/pr94780.c: New test.

(cherry picked from commit 9b8e9006bb35641865358e2df4f6b3ae185b239a)

c++: Avoid -Wreturn-type warning if a template fn calls noreturn template fn [PR94742]

finish_call_expr already has code to set current_function_returns_abnormally
if a template calls a noreturn function, but on the following testcase it
doesn't call a FUNCTION_DECL, but TEMPLATE_DECL instead, in which case
we didn't check noreturn at all and just assumed it could return.

2020-04-25 Jakub Jelinek <jakub@redhat.com>

PR c++/94742
* semantics.c (finish_call_expr): When looking if all overloads
are noreturn, use STRIP_TEMPLATE to look through TEMPLATE_DECLs.

* g++.dg/warn/Wreturn-type-12.C: New test.

(cherry picked from commit 4ff685a8705e8ee55fa86e75afb769ffb0975aea)

Shortcut identity VEC_PERM expansion [PR94710]

This PR is about the rs6000 backend emitting wrong assembly
for whole vector shift by 0, and while I think it is desirable
to fix the backend, I don't see a point why the expander should
try to emit that, whole vector shift by 0 is identity, we can just
return the operand.

2020-04-23 Jakub Jelinek <jakub@redhat.com>

PR target/94710
* optabs.c (expand_vec_perm_const): For shift_amt const0_rtx
just return v2.

(cherry picked from commit f51be2fb8653f81092f8158a0f0527275f86603b)

attribs: Don't diagnose attribute exclusions during error recovery [PR94705]

On the following testcase GCC ICEs, because last_decl is error_mark_node,
and diag_attr_exclusions assumes that if it is not NULL, it must be a decl.

The following patch just doesn't diagnose attribute exclusions if the
other decl is erroneous (and thus we've already reported errors for it).

2020-04-23 Jakub Jelinek <jakub@redhat.com>

PR c/94705
* attribs.c (decl_attribute): Don't diagnose attribute exclusions
if last_decl is error_mark_node or has such a TREE_TYPE.

* gcc.dg/pr94705.c: New test.

(cherry picked from commit e2a71816b4949225498bec37e947293aa7f5841b)

ubsan: Avoid -Wpadded warnings [PR94641]

-Wpadded warnings aren't really useful for the artificial types that GCC
lays out for ubsan.

2020-04-21 Jakub Jelinek <jakub@redhat.com>

PR c/94641
* stor-layout.c (place_field, finalize_record_size): Don't emit
-Wpadded warning on TYPE_ARTIFICIAL rli->t.
* ubsan.c (ubsan_get_type_descriptor_type,
ubsan_get_source_location_type, ubsan_create_data): Set
TYPE_ARTIFICIAL.
* asan.c (asan_global_struct): Likewise.

* c-c++-common/ubsan/pr94641.c: New test.

(cherry picked from commit 73f8e9dca5ff891ed19001b213fd1f6ce31417e3)

Fix -fcompare-debug issue in delete_insn_and_edges [PR94618]

delete_insn_and_edges calls purge_dead_edges whenever deleting the last insn
in a bb, whatever it is. If it called it only for mandatory last insns
in the basic block (that may not be followed by DEBUG_INSNs, dunno if that
is control_flow_insn_p or something more complex), that wouldn't be a
problem, but as it calls it on any last insn and can actually do something
in the bb, if such an insn is followed by one more more DEBUG_INSNs and
nothing else in the same bb, we don't call purge_dead_edges with -g and do
call it with -g0.

On the testcase, there are two reg-to-reg moves with REG_EH_REGION notes
(previously memory accesses but simplified and yet not optimized), and the
second is followed by DEBUG_INSNs; the second move is delete_insn_and_edges
and after removing it, for -g0 purge_dead_edges removes the REG_EH_REGION
from the now last insn in the bb (the first reg-to-reg move), while
for -g it isn't called and things diverge from that quickly on.

Fixed by calling purdge_dead_edges even if we remove the last real insn
followed only by DEBUG_INSNs in the same bb.

2020-04-17 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/94618
* cfgrtl.c (delete_insn_and_edges): Set purge not just when
insn is the BB_END of its block, but also when it is only followed
by DEBUG_INSNs in its block.

* g++.dg/opt/pr94618.C: New test.

(cherry picked from commit c41884a09206be0e21cad7eea71b9754daa969d4)

c++: Fix pasto in structured binding diagnostics [PR94571]

This snippet has been copied from the non-structured binding declaration
parsing later in the function, and while for non-structured bindings
it can be followed by comma or semicolon, structured bindings may be
only followed by semicolon.

Or, do we want to have a different message for the case when there is
a comma (and keep this corrected one only if there is something else)
that would explain better what is the bug (or add a fix-it hint)?
Marek said in the PR that clang++ reports
error: decomposition declaration must be the only declaration in its group

There is another thing Marek noted (though, something for different spot),
that diagnostic for auto x(1), [e,f] = test2; could also use a clearer
wording like the above (or a fix-it hint), but the question is if we should
assume [ after , as a structured binding or if we should do some tentative
parsing first to figure out if it looks like a structured binding.

2020-04-16 Jakub Jelinek <jakub@redhat.com>

PR c++/94571
* parser.c (cp_parser_simple_declaration): Fix up a pasto in
diagnostics.

* g++.dg/cpp1z/decomp51.C: New test.

(cherry picked from commit e4658c7dbbe88f742c96e5f58ee4a6d549d642ca)

vect: Fix up lowering of TRUNC_MOD_EXPR by negative constant [PR94524]

The first testcase below is miscompiled, because for the division part
of the lowering we canonicalize negative divisors to their absolute value
(similarly how expmed.c canonicalizes it), but when multiplying the division
result back by the VECTOR_CST, we use the original constant, which can
contain negative divisors.

Fixed by computing ABS_EXPR of the VECTOR_CST.  Unfortunately, fold-const.c
doesn't support const_unop (ABS_EXPR, VECTOR_CST) and I think it is too late
in GCC 10 cycle to add it now.

Furthermore, while modulo by most negative constant happens to return the
right value, it does that only by invoking UB in the IL, because
we then expand division by that 1U+INT_MAX and say for INT_MIN % INT_MIN
compute the division as -1, and then multiply by INT_MIN, which is signed
integer overflow.  We in theory could do the computation in unsigned vector
types instead, but is it worth bothering.  People that are doing % INT_MIN
are either testing for standard conformance, or doing something wrong.
So, I've also added punting on % INT_MIN, both in vect lowering and vect
pattern recognition (we punt already for / INT_MIN).

2020-04-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/94524
* tree-vect-generic.c (expand_vector_divmod): If any elt of op1 is
negative for signed TRUNC_MOD_EXPR, multiply with absolute value of
op1 rather than op1 itself at the end.  Punt for signed modulo by
most negative constant.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Punt for signed
modulo by most negative constant.

* gcc.c-torture/execute/pr94524-1.c: New test.
* gcc.c-torture/execute/pr94524-2.c: New test.

(cherry picked from commit f52eb4f988992d393c69ee4ab76f236dced80e36)

i386: Don't use AVX512F integral masks for V*TImode [PR94438]

The ix86_get_mask_mode hook uses int mask for 512-bit vectors or 128/256-bit
vectors with AVX512VL (that is correct), and only for V*[SD][IF]mode if not
AVX512BW (also correct), but with AVX512BW it would stop checking the
elem_size altogether and pretend the hw has masking support for V*TImode
etc., which it doesn't. That can lead to various ICEs later on.

2020-04-08 Jakub Jelinek <jakub@redhat.com>

PR target/94438
* config/i386/i386.c (ix86_get_mask_mode): Only use int mask for elem_size
1, 2, 4 and 8.

* gcc.target/i386/avx512bw-pr94438.c: New test.
* gcc.target/i386/avx512vlbw-pr94438.c: New test.

(cherry picked from commit 8bf5faa9c463f0d53ffe835ba03d4502edfb959d)

c++: Further fix for -fsanitize=vptr [PR94325]

For -fsanitize=vptr, we insert a NULL store into the vptr instead of just
adding a CLOBBER of this. build_clobber_this makes the CLOBBER conditional
on in_charge (implicit) parameter whenever CLASSTYPE_VBASECLASSES, but when
adding this conditionalization to the -fsanitize=vptr code in PR87095,
I wanted it to catch some more cases when the class has CLASSTYPE_VBASECLASSES,
but the vptr is still not shared with something else, otherwise the
sanitization would be less effective.
The following testcase shows that the chosen test that CLASSTYPE_PRIMARY_BINFO
is non-NULL and has BINFO_VIRTUAL_P set wasn't sufficient,
the D class has still sizeof(D) == sizeof(void*) and thus contains just
a single vptr, but while in B::~B() this results in the vptr not being
cleared, in C::~C() this condition isn't true, as CLASSTYPE_PRIMARY_BINFO
in that case is B and is not BINFO_VIRTUAL_P, so it clears the vptr, but the
D::~D() dtor after invoking C::~C() invokes A::~A() with an already cleared
vptr, which is then reported.
The following patch is just a shot in the dark, keep looking through
CLASSTYPE_PRIMARY_BINFO until we find BINFO_VIRTUAL_P, but it works on the
existing testcase as well as this new one.

2020-04-08 Jakub Jelinek <jakub@redhat.com>

PR c++/94325
* decl.c (begin_destructor_body): For CLASSTYPE_VBASECLASSES class
dtors, if CLASSTYPE_PRIMARY_BINFO is non-NULL, but not BINFO_VIRTUAL_P,
look at CLASSTYPE_PRIMARY_BINFO of its BINFO_TYPE if it is not
BINFO_VIRTUAL_P, and so on.

* g++.dg/ubsan/vptr-15.C: New test.

(cherry picked from commit 4cf6b06cb5b02c053738e2975e3b7a4b3c577401)

i386: Fix V{64QI,32HI}mode constant permutations [PR94509]

The following testcases are miscompiled, because expand_vec_perm_pshufb
incorrectly thinks it can use vpshufb instruction for the permutations
when it can't.
The
          if (vmode == V32QImode)
            {
              /* vpshufb only works intra lanes, it is not
                 possible to shuffle bytes in between the lanes.  */
              for (i = 0; i < nelt; ++i)
                if ((d->perm[i] ^ i) & (nelt / 2))
                  return false;
            }
intra-lane check which is correct has been copied and adjusted for 64-byte
modes into:
          if (vmode == V64QImode)
            {
              /* vpshufb only works intra lanes, it is not
                 possible to shuffle bytes in between the lanes.  */
              for (i = 0; i < nelt; ++i)
                if ((d->perm[i] ^ i) & (nelt / 4))
                  return false;
            }
which is not correct, because 64-byte modes have 4 lanes rather than just
two and the above is only testing that the permutation grabs even lane elts
from even lanes and odd lane elts from odd lanes, but not that they are
from the same 256-bit half.

The following patch fixes it by using 3 * nelt / 4 instead of nelt / 4,
so we actually check the most significant 2 bits rather than just one.

2020-04-07  Jakub Jelinek  <jakub@redhat.com>

PR target/94509
* config/i386/i386.c (expand_vec_perm_pshufb): Fix the check
for inter-lane permutation for 64-byte modes.

* gcc.target/i386/avx512bw-pr94509-1.c: New test.
* gcc.target/i386/avx512bw-pr94509-2.c: New test.

(cherry picked from commit 14192f1ed48cb3982b1b3c794e0f313835d0cdcd)

aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488]

The following testcase ICEs on aarch64 apparently since the introduction of
the aarch64 port.  The reason is that the {ashl,ashr,lshr}<mode>3 expanders
completely unnecessarily FAIL; if operands[2] is something other than
a CONST_INT or REG or MEM and the middle-end code can't cope with the
pattern giving up in these cases.  All the expanders use general_operand
predicate for the shift amount operand, but then have just a special case
for CONST_INT (if in-bound, emit an immediate shift, otherwise force into
REG), or MEM (force into REG), or REG (that is the case it handles).
In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid
general_operand.
I don't see any reason what is magic about MEMs that it should be forced
into REG and others like SUBREGs that it shouldn't, there isn't even a
reason to check for !REG_P because force_reg will do nothing if the operand
is already a REG, and otherwise can handle general_operand just fine.

2020-04-07  Jakub Jelinek  <jakub@redhat.com>

PR target/94488
* config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3,
ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT.
Assume it is a REG after that instead of testing it and doing FAIL
otherwise.  Formatting fix.

* gcc.c-torture/compile/pr94488.c: New test.

(cherry picked from commit 7f3ac38b3c765d49a46f65f1e5e9a812fb1da49c)

debug: Improve debug info of c++14 deduced return type [PR94459]

On the following testcase, in gdb ptype S<long>::m1 prints long as return
type, but all the other methods show void instead.
PR53756 added code to add_type_attribute if the return type is
auto/decltype(auto), but we actually should look through references,
pointers and qualifiers.
Haven't included there DW_TAG_atomic_type, because I think at least ATM
one can't use that in C++.  Not sure about DW_TAG_array_type or what else
could be deduced.

> http://eel.is/c++draft/dcl.spec.auto#3 says it has to appear as a
> decl-specifier.
>
> http://eel.is/c++draft/temp.deduct.type#8 lists the forms where a template
> argument can be deduced.
>
> Looks like you are missing arrays, pointers to members, and function return
> types.

2020-04-04  Hannes Domani  <ssbssa@yahoo.de>
    Jakub Jelinek  <jakub@redhat.com>

PR debug/94459
* dwarf2out.c (gen_subprogram_die): Look through references, pointers,
arrays, pointer-to-members, function types and qualifiers when
checking if in-class DIE had an 'auto' or 'decltype(auto)' return type
to emit type again on definition.

* g++.dg/debug/pr94459.C: New test.

Co-Authored-By: Hannes Domani <ssbssa@yahoo.de>
(cherry picked from commit b5039b7259e64a92f5c077fe4d023556d6b12550)

i386: Fix vph{add,subs?}[wd] 256-bit AVX2 RTL patterns [PR94460]

The following testcase is miscompiled, because the AVX2 patterns don't
describe correctly what the insn does.  E.g. vphaddd with %ymm* operands
(the second pattern) instruction as per:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_hadd_epi32&expand=2941
does { a0+a1, a2+a3, b0+b1, b2+b3, a4+a5, a6+a7, b4+b5, b6+b7 }
but our RTL pattern did
     { a0+a1, a2+a3, a4+a5, a6+a7, b0+b1, b2+b3, b4+b5, b6+b7 }
where the first and last 64 bits are the same and two middle 64 bits
swapped.
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_hadd_epi16&expand=2939
similarly, insn does:
     { a0+a1, a2+a3, a4+a5, a6+a7, b0+b1, b2+b3, b4+b5, b6+b7,
       a8+a9, a10+a11, a12+a13, a14+a15, b8+b9, b10+b11, b12+b13, b14+b15 }
but RTL pattern did
     { a0+a1, a2+a3, a4+a5, a6+a7, a8+a9, a10+a11, a12+a13, a14+a15,
       b0+b1, b2+b3, b4+b5, b6+b7, b8+b9, b10+b11, b12+b13, b14+b15 }
again, first and last 64 bits are the same and the two middle 64 bits
swapped.

2020-04-03  Jakub Jelinek  <jakub@redhat.com>

PR target/94460
* config/i386/sse.md (avx2_ph<plusminus_mnemonic>wv16hi3,
avx2_ph<plusminus_mnemonic>dv8si3): Fix up RTL pattern to do
second half of first lane from first lane of second operand and
first half of second lane from second lane of first operand.

* gcc.target/i386/avx2-pr94460.c: New test.

(cherry picked from commit dbff1829843180dc2a6c8ce5ce7883146b5cf083)

objsz: Don't call replace_uses_by on SSA_NAME_OCCURS_IN_ABNORMAL_PHI [PR94423]

The following testcase ICEs because the objsz pass calls replace_uses_by
on SSA_NAME_OCCURS_IN_ABNORMAL_PHI SSA_NAME.  The following patch instead
of that calls replace_call_with_value, which will turn it into
  xyz_123(ab) = 234;

2020-04-01  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/94423
* tree-object-size.c (pass_object_sizes::execute): Don't call
replace_uses_by for SSA_NAME_OCCURS_IN_ABNORMAL_PHI lhs, instead
call replace_call_with_value.

* gcc.dg/ubsan/pr94423.c: New test.

(cherry picked from commit 4486a537f14bc3b05ac552c3cbe18e540e397ed7)

fold-const: Fix division folding with vector operands [PR94412]

The following testcase is miscompiled since 4.9, we treat unsigned
vector types as if they were signed and "optimize" negations across it.

2020-03-31 Marc Glisse <marc.glisse@inria.fr>
Jakub Jelinek <jakub@redhat.com>

PR middle-end/94412
* fold-const.c (fold_binary_loc) <case TRUNC_DIV_EXPR>: Use
ANY_INTEGRAL_TYPE_P instead of INTEGRAL_TYPE_P.

* gcc.c-torture/execute/pr94412.c: New test.

Co-authored-by: Marc Glisse <marc.glisse@inria.fr>
(cherry picked from commit 8f99f9e6ccec167a5ba67dcc08e6c14948595b82)

Fix vextract* masked patterns [PR93069]

The AVX512F documentation clearly states that in instructions where the
destination is a memory only merging-masking is possible, not zero-masking,
and the assembler enforces that.

The testcase in this patch fails to assemble because of
Error: unsupported masking for `vextracti32x8'
on
vextracti32x8 $0x0, %zmm1, -64(%rsp){%k1}{z}
For the vector extraction patterns, we apparently have 7 *_maskm patterns
that only accept memory destinations and rtx_equal_p merge-masking source
for it, 7 *<mask_name> corresponding patterns that allow memory destination
only for the non-masked cases (through <store_mask_constraint>), then 2
*<mask_name> patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
do allow memory destination even for masked cases and are the cause of the
testsuite failure, because we must not allow C constraint if the destination
is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
VI4F_256), which has another issue (for which I don't have a testcase
though), where if it would match zero-masking with register destination,
it wouldn't emit the needed {z} into assembly.
The attached patch fixes those 3 issues only, perhaps more suitable for
backporting.

2020-03-30 Jakub Jelinek <jakub@redhat.com>

PR target/93069
* config/i386/sse.md (vec_extract_lo_<mode><mask_name>): Use
<store_mask_constraint> instead of m in output operand constraint.
(vec_extract_hi_<mode><mask_name>): Use <mask_operand2> instead of
%{%3%}.

* gcc.target/i386/avx512vl-pr93069.c: New test.
* gcc.dg/vect/pr93069.c: New test.

(cherry picked from commit 57e276f3e304ef92483763ee1028e5b3e1345e0f)

reassoc: Fix -fcompare-debug bug in reassociate_bb [PR94329]

The following testcase FAILs with -fcompare-debug, because reassociate_bb
mishandles the case when the last stmt in a bb has zero uses.  In that case
reassoc_remove_stmt (like gsi_remove) moves the iterator to the next stmt,
i.e. gsi_end_p is true, which means the code sets the iterator back to
gsi_last_bb.  The problem is that the for loop does gsi_prev on that before
handling the next statement, which means the former penultimate stmt, now
last one, is not processed by reassociate_bb.
Now, with -g, if there is at least one debug stmt at the end of the bb,
reassoc_remove_stmt moves the iterator to that following debug stmt and we
just do gsi_prev and continue with the former penultimate non-debug stmt,
now last non-debug stmt.

The following patch fixes that by not doing the gsi_prev in this case; there
are too many continue; cases, so I didn't want to copy over the gsi_prev to
all of them, so this patch uses a bool for that instead.  The second
gsi_end_p check isn't needed anymore, because when we don't do the
undesirable gsi_prev after gsi = gsi_last_bb, the loop !gsi_end_p (gsi)
condition will catch the removal of the very last stmt from a bb.

2020-03-28  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/94329
* tree-ssa-reassoc.c (reassociate_bb): When calling reassoc_remove_stmt
on the last stmt in a bb, make sure gsi_prev isn't done immediately
after gsi_last_bb.

* gfortran.dg/pr94329.f90: New test.

(cherry picked from commit aa9c08ef97f4df1ebb1fc8d72f2e7f9f8c1045c2)

varasm: Fix output_constructor where a RANGE_EXPR index needs to skip some elts [PR94303]

The following testcase is miscompiled, because output_constructor doesn't
output the initializer correctly.  The FE creates {[1...2] = 9} in this
case, and we emit .long 9; long 9; .zero 8 instead of the expected
.zero 8; .long 9; .long 9.  If the CONSTRUCTOR is {[1] = 9, [2] = 9},
output_constructor_regular_field has code to notice that the current
location (local->total_bytes) is smaller than the location we want to write
to (1*sizeof(elt)) and will call assemble_zeros to skip those.  But
RANGE_EXPRs are handled by a different function which didn't do this,
so for RANGE_EXPRs we emitted them properly only if local->total_bytes
was always equal to the location where the RANGE_EXPR needs to start.

2020-03-25  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/94303
* varasm.c (output_constructor_array_range): If local->index
RANGE_EXPR doesn't start at the current location in the constructor,
skip needed number of bytes using assemble_zeros or assert we don't
go backwards.

PR middle-end/94303
* g++.dg/torture/pr94303.C: New test.

(cherry picked from commit 56407bab53a514ffcd6ac011965cebdc5eb3ef54)

if-conv: Delete dead stmts backwards in ifcvt_local_dce [PR94283]

> > This patch caused:
> >
> > gcc /home/marxin/Programming/gcc/gcc/testsuite/gcc.c-torture/compile/990625-2.c -O3 -g -fno-tree-dce -c
> > during GIMPLE pass: ifcvt
> > /home/marxin/Programming/gcc/gcc/testsuite/gcc.c-torture/compile/990625-2.c: In function ‘broken030599’:
> > /home/marxin/Programming/gcc/gcc/testsuite/gcc.c-torture/compile/990625-2.c:2:1: internal compiler error: Segmentation fault
>
> Likely
>
>   /* Delete dead statements.  */
>   gsi = gsi_start_bb (bb);
>   while (!gsi_end_p (gsi))
>     {
>
> needs to instead work back-to-front for debug stmt adjustment to work

Indeed, that seems to work.

2020-03-25  Richard Biener  <rguenther@suse.de>
    Jakub Jelinek  <jakub@redhat.com>

PR debug/94283
* tree-if-conv.c (ifcvt_local_dce): Delete dead statements backwards.

* gcc.dg/pr94283.c: New test.

Co-authored-by: Richard Biener <rguenther@suse.de>
(cherry picked from commit 8ea7970c4968517fb73f42bcca40d316adacf215)

if-conv: Fix -fcompare-debug bugs in ifcvt_local_dce [PR94283]

The following testcase shows -fcompare-debug bugs in ifcvt_local_dce,
where the decisions what statements are needed is based also on debug stmt
operands, which is wrong.
So, this patch makes sure to never add debug stmt to the worklist, or never
add an assign to worklist just because it is used in a debug stmt in another
bb.

2020-03-24  Jakub Jelinek  <jakub@redhat.com>

PR debug/94283
* tree-if-conv.c (ifcvt_local_dce): For gimple debug stmts, just set
GF_PLF_2, but don't add them to worklist.  Don't add an assigment to
worklist or set GF_PLF_2 just because it is used in a debug stmt in
another bb.  Formatting improvements.

* gcc.target/i386/pr94283.c: New test.

(cherry picked from commit 4dcfd4e56b0d22af12750372f3e0b49249b1d473)

c++: Fix up handling of captured vars in lambdas in OpenMP clauses [PR93931]

Without the parser.c change we were ICEing on the testcase, because while the
uses of the captured vars inside of the constructs were replaced with capture
proxy decls, we didn't do that for decls in OpenMP clauses.

With that fixed, we don't ICE anymore, but the testcase is miscompiled and FAILs
at runtime.  This is because the capture proxy decls have DECL_VALUE_EXPR and
during gimplification we were gimplifying those to their DECL_VALUE_EXPRs.
That is fine for shared vars, but for privatized ones we must not do that.
So that is what the cp-gimplify.c changes do.  Had to add a DECL_CONTEXT check
before calling is_capture_proxy because some VAR_DECLs don't have DECL_CONTEXT
set (yet) and is_capture_proxy relies on that being non-NULL always.

2020-03-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/93931
* parser.c (cp_parser_omp_var_list_no_open): Call process_outer_var_ref
on outer_automatic_var_p decls.
* cp-gimplify.c (cxx_omp_disregard_value_expr): Return true also for
capture proxy decls.

* testsuite/libgomp.c++/pr93931.C: New test.

(cherry picked from commit 484206967f958fc47827a71654fe52a98adc95cb)

phiopt: Avoid -fcompare-debug bug in phiopt [PR94211]

Two years ago, I've added support for up to 2 simple preparation statements
in value_replacement, but the
-      && estimate_num_insns (assign, &eni_time_weights)
+      && estimate_num_insns (bb_seq (middle_bb), &eni_time_weights)
change, meant that we compute the cost of all those statements rather than
just the single assign that has been the single supported non-debug
statement in the bb before, doesn't do what I thought would do, gimple_seq
is just gimple * and thus it can't be really overloaded depending on whether
we pass a single gimple * or a whole sequence.  Which means in the last
two years it doesn't count all the statements, but only the first one.
With -g that happens to be a DEBUG_STMT, or it could be e.g. the first
preparation statement which could be much cheaper than the actual assign.

2020-03-19  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/94211
* tree-ssa-phiopt.c (value_replacement): Use estimate_num_insns_seq
instead of estimate_num_insns for bb_seq (middle_bb).  Rename
emtpy_or_with_defined_p variable to empty_or_with_defined_p, adjust
all uses.

* gcc.dg/pr94211.c: New test.

(cherry picked from commit 8db876e9c045c57d2dc5bd08a6e250f822efaad0)

c: Handle C_TYPE_INCOMPLETE_VARS even for ENUMERAL_TYPEs [PR94172]

The following testcases ICE, because they contain extern variable
declarations with incomplete enum types that is later completed and after
that those variables are accessed.  The ICEs are because the vars then may have
incorrect DECL_MODE etc., e.g. in the first case the var has SImode
DECL_MODE (the guessed mode for the enum), but the enum then actually has
DImode because its enumerators don't fit into unsigned int.

The following patch fixes it by using C_TYPE_INCOMPLETE_VARS not just on
incomplete struct/union types, but also incomplete enum types.
TYPE_VFIELD can't be used as it is TYPE_MIN_VALUE on ENUMERAL_TYPE,
thankfully TYPE_LANG_SLOT_1 has been used in the C FE only on
FUNCTION_TYPEs.

2020-03-17  Jakub Jelinek  <jakub@redhat.com>

PR c/94172
* c-tree.h (C_TYPE_INCOMPLETE_VARS): Define to TYPE_LANG_SLOT_1
instead of TYPE_VFIELD, and support it on {RECORD,UNION,ENUMERAL}_TYPE.
(TYPE_ACTUAL_ARG_TYPES): Check that it is only used on FUNCTION_TYPEs.
* c-decl.c (pushdecl): Push C_TYPE_INCOMPLETE_VARS also to
ENUMERAL_TYPEs.
(finish_incomplete_vars): New function, moved from finish_struct.  Use
relayout_decl instead of layout_decl.
(finish_struct): Remove obsolete comment about C_TYPE_INCOMPLETE_VARS
being TYPE_VFIELD.  Use finish_incomplete_vars.
(finish_enum): Clear C_TYPE_INCOMPLETE_VARS.  Call
finish_incomplete_vars.
* c-typeck.c (c_build_qualified_type): Clear C_TYPE_INCOMPLETE_VARS
also on ENUMERAL_TYPEs.

* gcc.dg/pr94172-1.c: New test.
* gcc.dg/pr94172-2.c: New test.

(cherry picked from commit 87ce34fa00cd6b87452d747235da40dfe5b6e00f)

c++: Fix parsing of invalid enum specifiers [PR90995]

The testcase shows some accepts-invalid (the ones without alignas) and
ice-on-invalid-code (the ones with alignas) cases.
If the enum doesn't have an underlying type and is not a definition,
the caller retries to parse it as elaborated type specifier.
E.g. for enum struct S s it will then pedwarn that elaborated type specifier
shouldn't have the struct/class keywords.
The problem is if the enum specifier is not followed by { when it has
underlying type.  In that case we have already called
cp_parser_parse_definitely to end the tentative parsing started at the
beginning of cp_parser_enum_specifier.  But the
cp_parser_error (parser, "expected %<;%> or %<{%>");
doesn't emit any error because the whole function is called from yet another
tentative parse and the caller starts parsing the elaborated type
specifier where the cp_parser_enum_specifier stopped (i.e. after the
underlying type token(s)).  The ultimate caller than commits the tentative
parsing (and even if it wouldn't, it wouldn't know what kind of error
to report).  I think after seeing enum {,struct,class} : type not being
followed by { or ;, there is no reason not to report it right away, as it
can't be valid C++, which is what the patch does.  Not sure if we shouldn't
also return error_mark_node instead of NULL_TREE, so that the caller doesn't
try to parse it as elaborated type specifier (the patch doesn't do that
right now).

Furthermore, while reading the code, I've noticed that
parser->colon_corrects_to_scope_p is saved and set to false at the start
of the function, but not restored back in some cases.  Don't have a testcase
where this would be a problem, but it just seems wrong.  Either we can in
the two spots replace return NULL_TREE; with { type = NULL_TREE; goto out; }
or we could perhaps abuse warning_sentinel or create a special class with
dtor to clean the flag up.

And lastly, I've fixed some formatting issues in the function while reading
it.

2020-03-17  Jakub Jelinek  <jakub@redhat.com>

PR c++/90995
* parser.c (cp_parser_enum_specifier): Use temp_override for
parser->colon_corrects_to_scope_p, replace goto out with return.
If scoped enum or enum with underlying type is not followed by
{ or ;, call cp_parser_commit_to_tentative_parse before calling
cp_parser_error and make sure to return error_mark_node instead of
NULL_TREE.  Formatting fixes.

* g++.dg/cpp0x/enum40.C: New test.

(cherry picked from commit 980a7a0be5a114e285c49ab05ac70881e4f27fc3)

tree-inline: Fix a -fcompare-debug issue in the inliner [PR94167]

The following testcase fails with -fcompare-debug.  The problem is that
bar is marked as address_taken only with -g and not without.
I've tracked it down to insert_init_stmt calling gimple_regimplify_operands
even on DEBUG_STMTs.  That function will just insert normal stmts before
the DEBUG_STMT if the DEBUG_STMT operand isn't gimple val or invariant.
While DCE will turn those statements into debug temporaries, it can cause
differences in SSA_NAMEs and more importantly, the ipa references are
generated from those before the DCE happens.
On the testcase, the DEBUG_STMT value is (int)bar.

We could generate DEBUG_STMTs with debug temporaries instead, but I fail to
see the reason to do that, DEBUG_STMTs allow other expressions and all we
want to ensure is that the expressions aren't too large (arbitrarily
complex), but during inlining/function versioning I don't see why something
would queue a DEBUG_STMT with arbitrarily complex expressions in there.

2020-03-16  Jakub Jelinek  <jakub@redhat.com>

PR debug/94167
* tree-inline.c (insert_init_stmt): Don't gimple_regimplify_operands
DEBUG_STMTs.

* gcc.dg/pr94167.c: New test.

(cherry picked from commit 378e830538afd4a02e41674cc9161fa59b5e09a9)

tree-nested: Fix handling of *reduction clauses with C array sections [PR93566]

tree-nested.c didn't handle C array sections in {,task_,in_}reduction clauses.

2020-03-14 Jakub Jelinek <jakub@redhat.com>

PR middle-end/93566
* tree-nested.c (convert_nonlocal_omp_clauses,
convert_local_omp_clauses): Handle {,in_,task_}reduction clauses
with C/C++ array sections.

* testsuite/libgomp.c/pr93566.c: New test.

(cherry picked from commit a8fc40fd551a60a97efbfe3fee08721accd80964)

aarch64: Fix another bug in aarch64_add_offset_1 [PR94121]

> I'm getting this ICE with -mabi=ilp32:
>
> during RTL pass: fwprop1
> /opt/gcc/gcc-20200312/gcc/testsuite/gcc.dg/pr94121.c: In function 'bar':
> /opt/gcc/gcc-20200312/gcc/testsuite/gcc.dg/pr94121.c:16:1: internal compiler error: in decompose, at rtl.h:2279

That is a preexisting issue, caused by another bug in the same function.
When mode is SImode and moffset is 0x80000000 (or anything else with the
bit 31 set), we need to sign-extend it.

2020-03-13 Jakub Jelinek <jakub@redhat.com>

PR target/94121
* config/aarch64/aarch64.c (aarch64_add_offset_1): Use gen_int_mode
instead of GEN_INT.

(cherry picked from commit c2f836c413b1e9ae45598338b4a2ecd33bd926fb)

maintainer-scripts: Fix up gcc_release without -l, where mkdir was using umask 077 after migration

2020-03-12 Jakub Jelinek <jakub@redhat.com>

* gcc_release (upload_files): Without -l, pass -m 755 to the mkdir
command invoked through ssh.

(cherry picked from commit 3739894d0cfc88b6d84134b827f33b31d646d32a)

doc: Fix up ASM_OUTPUT_ALIGNED_DECL_LOCAL description

When looking into PR94134, I've noticed bugs in the
ASM_OUTPUT_ALIGNED_DECL_LOCAL documentation.  varasm.c has:
  #if defined ASM_OUTPUT_ALIGNED_DECL_LOCAL
    unsigned int align = symtab_node::get (decl)->definition_alignment ();
    ASM_OUTPUT_ALIGNED_DECL_LOCAL (asm_out_file, decl, name,
                                   size, align);
    return true;
  #elif defined ASM_OUTPUT_ALIGNED_LOCAL
    unsigned int align = symtab_node::get (decl)->definition_alignment ();
    ASM_OUTPUT_ALIGNED_LOCAL (asm_out_file, name, size, align);
    return true;
  #else
    ASM_OUTPUT_LOCAL (asm_out_file, name, size, rounded);
    return false;
  #endif
and the ASM_OUTPUT_ALIGNED_LOCAL documentation properly mentions:
Like @code{ASM_OUTPUT_LOCAL} and mentions the same macro in another place.
The ASM_OUTPUT_ALIGNED_DECL_LOCAL description mentions non-existing macros
ASM_OUTPUT_ALIGNED_DECL and ASM_OUTPUT_DECL instead of the right ones
ASM_OUTPUT_ALIGNED_LOCAL and ASM_OUTPUT_LOCAL.

2020-03-12  Jakub Jelinek  <jakub@redhat.com>

* doc/tm.texi.in (ASM_OUTPUT_ALIGNED_DECL_LOCAL): Change
ASM_OUTPUT_ALIGNED_DECL in description to ASM_OUTPUT_ALIGNED_LOCAL
and ASM_OUTPUT_DECL to ASM_OUTPUT_LOCAL.
* doc/tm.texi: Regenerated.

(cherry picked from commit 9a8af207d7d03149a438185a2a0c50eeeb96a402)

tree-dse: Fix mem* head trimming if call has lhs [PR94130]

As the testcase shows, if DSE decides to head trim {mem{set,cpy,move},strncpy}
and the call has lhs, it is incorrect to leave the lhs as is, because it
will then point to the adjusted address (base + head_trim) instead of the
original base.
The following patch fixes that by dropping the lhs of the call and assigning
lhs the original base in a following statement.

2020-03-12 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/94130
* tree-ssa-dse.c: Include gimplify.h.
(increment_start_addr): If stmt has lhs, drop the lhs from call and
set it after the call to the original value of the first argument.
Formatting fixes.
(decrement_count): Formatting fix.

* gcc.c-torture/execute/pr94130.c: New test.

(cherry picked from commit a545ffafa380fa958393e1dfbf7f5f8f129bc5cf)

pdp11: Fix handling of common (local and global) vars [PR94134]

As mentioned in the PR, the generic code decides to put the a variable into
lcomm_section, which is a NOSWITCH section and thus the generic code doesn't
switch into a particular section before using
ASM_OUTPUT{_ALIGNED{,_DECL}_}_LOCAL, on many targets that results just in
.lcomm (or for non-local .comm) directives which don't need a switch to some
section, other targets put switch_to_section (bss_section) at the start of
that macro.
pdp11 doesn't do that (and doesn't have bss_section), and so emits the
lcomm/comm variables in whatever section is current (it has only .text/.data
and for DEC assembler rodata).

The following patch fixes that by putting it always into data section, and
additionally avoids emitting an empty line in the assembly for the lcomm
vars.

2020-03-11 Jakub Jelinek <jakub@redhat.com>

PR target/94134
* config/pdp11/pdp11.c (pdp11_asm_output_var): Call switch_to_section
at the start to switch to data section. Don't print extra newline if
.globl directive has not been emitted.

* gcc.c-torture/execute/pr94134.c: New test.

(cherry picked from commit f1125cf88ac0c97d819e4f81d556fbcd1161270e)

aarch64: Fix ICE in aarch64_add_offset_1 [PR94121]

abs_hwi asserts that the argument is not HOST_WIDE_INT_MIN and as the
(invalid) testcase shows, the function can be called with such an offset.
The following patch is IMHO minimal fix, absu_hwi unlike abs_hwi allows even
that value and will return (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MIN
in that case.  The function then uses moffset in two spots which wouldn't
care if the value is (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MIN or
HOST_WIDE_INT_MIN and wouldn't accept it (!moffset and
aarch64_uimm12_shift (moffset)), then in one spot where the signedness of
moffset does matter and using unsigned is the right thing -
moffset < 0x1000000 - and finally has code which will handle even this
value right; the assembler doesn't really care for DImode immediates if
        mov     x1, -9223372036854775808
or
        mov     x1, 9223372036854775808
is used and similarly it doesn't matter if we add or sub it in DImode.

2020-03-11  Jakub Jelinek  <jakub@redhat.com>

PR target/94121
* config/aarch64/aarch64.c (aarch64_add_offset_1): Use absu_hwi
instead of abs_hwi, change moffset type to unsigned HOST_WIDE_INT.

* gcc.dg/pr94121.c: New test.

(cherry picked from commit a644079a702a6228df2ffaace1d88a5f74e4bb9f)

dfp: Fix decimal_to_binary [PR94111]

As e.g. decimal_from_decnumber shows, the REAL_VALUE_TYPE representation
contains a decimal128 embedded in ->sig only if it is rvc_normal, for
other kinds like rvc_inf or rvc_nan, ->sig is ignored and everything is
contained in the REAL_VALUE_TYPE flags (cl, sign, signalling and decimal).
decimal_to_binary which is used when folding a decimal{32,64,128} constant
to a binary floating point type ignores this and thus folds infinities and
NaNs into +0.0.
The following patch fixes that by only doing that for rvc_normal.
Similarly to the binary to decimal folding, it goes through a string, in
order to e.g. deal with canonical NaN mantissas, or binary float formats
that don't support infinities and/or NaNs.

2020-03-11 Jakub Jelinek <jakub@redhat.com>

PR middle-end/94111
* dfp.c (decimal_to_binary): Only use decimal128ToString if from->cl
is rvc_normal, otherwise use real_to_decimal to print the number to
string.

* gcc.dg/dfp/pr94111.c: New test.

(cherry picked from commit 343c467ccdc24edb9acd7c60d54914d9656ab499)

ldist: Further fixes for -ftrapv [PR94114]

As the testcase shows, arithmetics that for -ftrapv would need multiple
basic blocks can show up not just in nb_bytes expressions where we
are calling rewrite_to_non_trapping_overflow for a while already,
but also in the pointer expression to the start of the region.
While the testcase covers just the first hunk and I've failed to create
a testcase for the latter, it is at least in theory possible too, so I've
adjusted that hunk too.

2020-03-11 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/94114
* tree-loop-distribution.c (generate_memset_builtin): Call
rewrite_to_non_trapping_overflow even on mem.
(generate_memcpy_builtin): Call rewrite_to_non_trapping_overflow even
on dest and src.

* gcc.dg/pr94114.c: New test.

(cherry picked from commit 2fd27691f213f2e808626c4cd492b00c801a00fa)

print-rtl: Fix printing of CONST_STRING in DEBUG_INSNs [PR93399]

The following testcase fails to assemble, as CONST_STRING in the DEBUG_INSNs
is printed as is, so if it contains \n and/or \r, we are in trouble:
        .loc 1 14 3
        # DEBUG haystack => [si]
        # DEBUG needle => "
"
In the gimple dumps we print those (STRING_CSTs) as
  # DEBUG haystack => D#1
  # DEBUG needle => "\n"
so this patch uses what we use in tree printing for the CONST_STRINGs too.

2020-03-05  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/93399
* tree-pretty-print.h (pretty_print_string): Declare.
* tree-pretty-print.c (pretty_print_string): Remove forward
declaration, no longer static.  Change nbytes parameter type
from unsigned to size_t.
* print-rtl.c (print_value) <case CONST_STRING>: Use
pretty_print_string and for shrink way too long strings.

* gcc.dg/pr93399.c: New test.

(cherry picked from commit e0d6777cda966b04fc47d544c09839c4fa94343c)

inliner: Copy DECL_BY_REFERENCE in copy_decl_to_var [PR93888]

In the following testcase we emit wrong debug info for the karg
parameter in the DW_TAG_inlined_subroutine into main.
The problem is that the karg PARM_DECL is DECL_BY_REFERENCE and thus
in the IL has const K & type, but in the source just const K.
When the function is inlined, we create a VAR_DECL for it, but don't
set DECL_BY_REFERENCE, so when emitting DW_AT_location, we treat it like
a const K & typed variable, but it has DW_AT_abstract_origin which has
just the const K type and thus the debugger thinks the variable has
const K type.

Fixed by copying the DECL_BY_REFERENCE flag. Not doing it in
copy_decl_for_dup_finish, because copy_decl_no_change already copies
that flag through copy_node and in copy_result_decl_to_var it is
undesirable, as we handle DECL_BY_REFERENCE in that case instead
by changing the type.

2020-03-04 Jakub Jelinek <jakub@redhat.com>

PR debug/93888
* tree-inline.c (copy_decl_to_var): Copy DECL_BY_REFERENCE flag.

* g++.dg/guality/pr93888.C: New test.

(cherry picked from commit d2a810ee83e2952bf351498cecf8f5db28860a24)

i386: Fix some -O0 avx2intrin.h and xopintrin.h intrinsic macros [PR94046]

As the testcases show, the macros we have for -O0 for intrinsics that require
constant argument(s) should first cast the argument to the type the -O1+
inline uses and afterwards to whatever type e.g. a builtin needs.
The PR reported one which violated this, and I've grepped for all double-casts
and grepped out from that meaningful casts where the __m{128,256,512}{,d,i}
first cast is cast to same sized __v* type and has the same kind of element
type (float, double, integral). These 7 macros were using different casts,
and I've double checked them against the inline function types.

2020-03-05 Jakub Jelinek <jakub@redhat.com>

PR target/94046
* config/i386/avx2intrin.h (_mm_mask_i32gather_ps): Fix first cast of
SRC and MASK arguments to __m128 from __m128d.
(_mm256_mask_i32gather_ps): Fix first cast of MASK argument to __m256
from __m256d.
(_mm_mask_i64gather_ps): Fix first cast of MASK argument to __m128
from __m128d.
* config/i386/xopintrin.h (_mm_permute2_pd): Fix first cast of C
argument to __m128i from __m128d.
(_mm256_permute2_pd): Fix first cast of C argument to __m256i from
__m256d.
(_mm_permute2_ps): Fix first cast of C argument to __m128i from __m128.
(_mm256_permute2_ps): Fix first cast of C argument to __m256i from
__m256.

* g++.dg/ext/pr94046-1.C: New test.
* g++.dg/ext/pr94046-2.C: New test.

(cherry picked from commit 07d52e63d999a0a10c7598c34c48365a357d3d5a)

explow: Fix ICE caused by plus_constant [PR94002]

The following testcase ICEs in cross to riscv64-linux.  The problem is
that we have a DImode integral constant (that doesn't fit into SImode),
which is pushed into a constant pool and later access just the first half of
it using a MEM.  When plus_constant is called on such a MEM, if the constant
has mode, we verify the mode, but if it doesn't, we don't and ICE later on
when we think the CONST_INT is a valid SImode constant.

2020-03-03  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/94002
* explow.c (plus_constant): Punt if cst has VOIDmode and
get_pool_mode is different from mode.

* gcc.dg/pr94002.c: New test.

(cherry picked from commit e913d4f4771e04d4254bf6c0e720fec5e324a898)

Daily bump.

[PATCH, rs6000] Fix vector long long subtype (PR96139)

Hi,
This corrects an issue with the powerpc vector long long subtypes.
As reported by SjMunroe, when building some code with -Wall, and
attempting to print an element of a "long long vector" with a
long long printf format string, we will report an error because
the vector sub-type was improperly defined as int.

When defining a V2DI_type_node we use a TARGET_POWERPC64 ternary to
define the V2DI_type_node with "vector long" or "vector long long".
We also need to specify the proper sub-type when we define the type.

Due to some file renames, This is a backport and rework of both
    [PATCH, rs6000] Fix vector long long subtype (PR96139)
and
    [PATCH, rs6000] Testsuite fixup pr96139 tests

PR target/96139

gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_init_builtin): Update V2DI_type_node
  and unsigned_V2DI_type_node definitions.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96139-a.c: New test.
* gcc.target/powerpc/pr96139-b.c: New test.
* gcc.target/powerpc/pr96139-c.c: New test.