Jason Merrill [Fri, 12 Feb 2021 03:01:19 +0000 (22:01 -0500)]
cgraph: flatten and same_body aliases [PR96078]
The patch for PR92372 made us start warning about a flatten attribute on an
alias. But in the case of C++ 'tor base/complete variants, the user didn't
create the alias. If the alias target also has the attribute, the alias
points to a flattened function, so we shouldn't warn.
gcc/ChangeLog:
PR c++/96078
* cgraphunit.c (process_function_and_variable_attributes): Don't
warn about flatten on an alias if the target also has it.
* cgraph.h (symtab_node::get_alias_target_tree): New.
gcc/testsuite/ChangeLog:
PR c++/96078
* g++.dg/ext/attr-flatten1.C: New test.
Jason Merrill [Thu, 25 Feb 2021 21:47:53 +0000 (16:47 -0500)]
c++: Fix class NTTP constness handling [PR98810]
Here, when substituting still-dependent args into an alias template, we see
a non-const type because the default argument is non-const, and is not a
template parm object because it's still dependent.
gcc/cp/ChangeLog:
PR c++/98810
* pt.c (tsubst_copy) [VIEW_CONVERT_EXPR]: Add const
to a class non-type template argument that needs it.
gcc/testsuite/ChangeLog:
PR c++/98810
* g++.dg/cpp2a/nontype-class-defarg1.C: New test.
Eric Botcazou [Wed, 3 Mar 2021 11:25:03 +0000 (12:25 +0100)]
Fix ICE with pathologically large frames
gcc/
PR target/99234
* config/i386/i386.c (ix86_compute_frame_layout): For a SEH target,
point back the hard frame pointer to its default location when the
frame is larger than SEH_MAX_FRAME_SIZE.
Richard Biener [Wed, 20 Jan 2021 07:48:34 +0000 (08:48 +0100)]
tree-optimization/98758 - fix integer arithmetic in data-ref analysis
This fixes some int arithmetic issues and a bogus truncation.
2021-01-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/98758
* tree-data-ref.c (int_divides_p): Use lambda_int arguments.
(lambda_matrix_right_hermite): Avoid undefinedness with
signed integer abs and multiplication.
(analyze_subscript_affine_affine): Use lambda_int.
Richard Biener [Wed, 13 Jan 2021 08:43:52 +0000 (09:43 +0100)]
tree-optimization/98640 - fix bogus sign-extension with VN
VN tried to express a sign extension from int to long of
a trucated quantity with a plain conversion but that loses the
truncation. Since there's no single operand doing truncate plus
sign extend (there was a proposed SEXT_EXPR to do that at some
point mapping to RTL sign_extract) don't bother to appropriately
model this with two ops (which the VN insert machinery doesn't
handle and which is unlikely to CSE fully).
2021-01-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/98640
* tree-ssa-sccvn.c (visit_nary_op): Do not try to
handle plus or minus from a truncated operand to be
sign-extended.
This fixes a double-counting in the reduction cost when vectorizing
the reduction through the regular vectorizable_* functions.
2021-01-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/98526
* tree-vect-loop.c (vect_model_reduction_cost): Remove costing
of the actual reduction op for the regular case.
(vectorizable_reduction): Cost the stmts
vect_transform_reduction produces here.
Richard Biener [Thu, 19 Nov 2020 08:06:50 +0000 (09:06 +0100)]
tree-optimization/97897 - complex lowering on abnormal edges
This fixes complex lowering to not put constants into abnormal
edge PHI values by making sure abnormally used SSA names are
VARYING in its propagation lattice.
2020-11-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/97897
* tree-complex.c (complex_propagate::visit_stmt): Make sure
abnormally used SSA names are VARYING.
(complex_propagate::visit_phi): Likewise.
Eric Botcazou [Tue, 2 Mar 2021 16:58:46 +0000 (17:58 +0100)]
Fix PR ada/99095
This is a regression present on the mainline and 10 branch, where we fail
to make the bounds explicit for the return value of a function returning
an unconstrained array of a limited record type.
gcc/ada/
PR ada/99095
* sem_ch8.adb (Check_Constrained_Object): Restrict again the special
optimization for limited types to non-array types except in the case
of an extended return statement.
gcc/testsuite/
* gnat.dg/limited5.adb: New test.
Kito Cheng [Tue, 7 Jul 2020 08:20:53 +0000 (16:20 +0800)]
RISC-V: Implement __builtin_thread_pointer
RISC-V has a dedicate register for thread pointer which is specified in psABI
doc, so we could support __builtin_thread_pointer in straightforward way.
Note: clang/llvm was supported __builtin_thread_pointer for RISC-V port
recently.
- https://reviews.llvm.org/rGaabc24acf0d5f8677bd22fe9c108581e07c3e180
Richard Earnshaw [Mon, 22 Feb 2021 15:00:53 +0000 (15:00 +0000)]
arm: force use of r4 for __gnu_cmse_nonsecure_call when !FPCXT [PR99271]
Commit r10-6017 relaxed the constraint on thumb2 calls to
__gnu_cmse_nonsecure_call to allow any register for the call address.
Although the initial code expansion continues to use r4 with the FPCXT
extension is not enabled, the change was unsafe because subsequent
optimizations could use the additional freedom to change which
register was being used.
To fix this we need to split the output patterns in the machine
description to use distinct recognizers: one with the additional
freedom when FPCXT is enabled an another that retains the original
restrictions when the extension is not available.
gcc:
PR target/99271
* config/arm/thumb2.md (nonsecure_call_reg_thumb2_fpcxt): New pattern.
(nonsecure_call_value_reg_thumb2_fpcxt): Likewise.
(nonsecure_call_reg_thumb2): Restrict to using r4 for the callee
address and disable when the FPCXT is not available.
(nonsecure_call_value_reg_thumb2): Likewise.
gcc/testsuite:
* gcc.target/arm/cmse/cmse-18.c: New test.
Eric Botcazou [Mon, 1 Mar 2021 06:53:05 +0000 (07:53 +0100)]
Fix wrong result for 1.0/3.0 at -O2 -fno-omit-frame-pointer -frounding-math
This wrong-code PR for the C++ compiler on x86-64/Windows is a regression
in GCC 9 and later, but the underlying issue has probably been there since
SEH was implemented and is exposed by this comment in config/i386/winnt.c:
/* SEH records saves relative to the "current" stack pointer, whether
or not there's a frame pointer in place. This tracks the current
stack pointer offset from the CFA. */
HOST_WIDE_INT sp_offset;
That's not what the (current) Microsoft documentation says; instead it says:
/* SEH records offsets relative to the lowest address of the fixed stack
allocation. If there is no frame pointer, these offsets are from the
stack pointer; if there is a frame pointer, these offsets are from the
value of the stack pointer when the frame pointer was established, i.e.
the frame pointer minus the offset in the .seh_setframe directive. */
That's why the implementation is correct only under the condition that the
frame pointer be established *after* the fixed stack allocation; as a matter
of fact, that's clearly the model underpinning SEH, but is the opposite of
what is done e.g. on Linux.
However the issue is mostly papered over in practice because:
1. SEH forces use_fast_prologue_epilogue to false, which in turns forces
save_regs_using_mov to false, so the general regs are always pushed when
they need to be saved, which eliminates the offset computation for them.
2. As soon as a frame is larger than 240 bytes, the frame pointer is fixed
arbitrarily to 128 bytes above the stack pointer, which of course requires
that it be established after the fixed stack allocation.
So you need a small frame clobbering one of the call-saved XMM registers in
order to generate wrong SEH unwind info.
The attached fix makes sure that the frame pointer is always established
after the fixed stack allocation by pointing it at or below the lowest used
register save area, i.e. the SSE save area, and removing the special early
saves in the prologue; the end result is a uniform prologue sequence for
SEH whatever the frame size. And it avoids a discrepancy between cases
where the number of saved general regs is even and cases where it is odd.
gcc/
PR target/99234
* config/i386/i386.c (ix86_compute_frame_layout): For a SEH target,
point the hard frame pointer to the SSE register save area instead
of the general register save area. Perform only minimal adjustment
for small frames if it is initially not correctly aligned.
(ix86_expand_prologue): Remove early saves for a SEH target.
* config/i386/winnt.c (struct seh_frame_state): Document constraint.
gcc/testsuite/
* g++.dg/eh/seh-xmm-unwind.C: New test.
Jason Merrill [Fri, 26 Feb 2021 10:45:02 +0000 (05:45 -0500)]
c++: Allow GNU attributes before lambda -> [PR90333]
In my 9.3/10 patch for 90333 I allowed attributes between [] and (), and
after the trailing return type, but not in the place that GCC 8 expected
them, and we've gotten several bug reports about that. So let's allow them
there, as well.
gcc/cp/ChangeLog:
PR c++/90333
* parser.c (cp_parser_lambda_declarator_opt): Accept GNU attributes
between () and ->.
gcc/testsuite/ChangeLog:
PR c++/90333
* g++.dg/ext/attr-lambda3.C: New test.
Jason Merrill [Fri, 12 Feb 2021 00:45:22 +0000 (19:45 -0500)]
c++: variadic lambda template and empty pack [PR97246]
In get<0>, Is is empty, so the first parameter pack of the lambda is empty,
but after the fix for PR94546 we were wrongly associating it with the
partial instantiation of 'v'.
Substrings were not reduced early enough for use in initializations,
such as DATA statements. Add an early simplification for substrings
with constant starting and ending points.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_resolve_substring): Add prototype.
* primary.c (match_string_constant): Simplify substrings with
constant starting and ending points.
* resolve.c: Rename resolve_substring to gfc_resolve_substring.
(gfc_resolve_ref): Use renamed function gfc_resolve_substring.
gcc/testsuite/ChangeLog:
* substr_10.f90: New test.
* substr_9.f90: New test.
Christophe Lyon [Wed, 24 Feb 2021 15:51:52 +0000 (15:51 +0000)]
arm: Fix CMSE support detection in libgcc (PR target/99157)
As discussed in the PR, the Makefile fragment lacks a double '$' to
get the return-code from GCC invocation, resulting is CMSE support
missing from multilibs.
I checked that the simple patch proposed in the PR fixes the problem.
2021-02-23 Christophe Lyon <christophe.lyon@linaro.org>
Hau Hsu <hsuhau617@gmail.com>
PR target/99157
libgcc/
* config/arm/t-arm: Fix cmse support detection.
Paul Thomas [Tue, 23 Feb 2021 19:29:04 +0000 (19:29 +0000)]
Fortran: Fix for class defined operators [PR99124].
2021-02-23 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/99124
* resolve.c (resolve_fl_procedure): Include class results in
the test for F2018, C15100.
* trans-array.c (get_class_info_from_ss): Do not use the saved
descriptor to obtain the class expression for variables. Use
gfc_get_class_from_expr instead.
gcc/testsuite/
PR fortran/99124
* gfortran.dg/class_defined_operator_2.f03 : New test.
* gfortran.dg/elemental_result_2.f90 : New test.
* gfortran.dg/class_assign_4.f90: Correct the non-conforming
elemental function with an allocatable result with an operator
interface with array dummies and result.
PR target/85074
* config/pa/pa.c (TARGET_ASM_CAN_OUTPUT_MI_THUNK): Define as
hook_bool_const_tree_hwi_hwi_const_tree_true.
(pa_asm_output_mi_thunk): Add support for nonzero vcall_offset.
Kyrylo Tkachov [Fri, 19 Feb 2021 09:02:58 +0000 (09:02 +0000)]
aarch64: Introduce prefer_advsimd_autovec to GCC 10
This patch introduces the prefer_advsimd_autovec internal tune flag that's already available on the GCC 8 and 9 branches.
It allows a CPU tuning to specify that it prefers Advanced SIMD for autovectorisation rather than SVE.
In GCC 10 onwards this can be easily adjusted through the aarch64_autovec_preference param in the options override hook.
The neoversev1_tunings struct makes use of this tuning flag
Bootstrapped and tested on aarch64-none-linux-gnu.
Confirmed that an --param aarch64-autovec-preference can override the CPU setting if the user really wishes to.
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (prefer_advsimd_autovec): Define.
* config/aarch64/aarch64.c (neoversev1_tunings): Use it.
(aarch64_override_options_internal): Adjust aarch64_autovec_preference
param when prefer_advsimd_autovec is enabled.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/advsimd_autovec_only_1.c: New test.
Patrick Palka [Wed, 17 Feb 2021 14:55:42 +0000 (09:55 -0500)]
c++: Revert EXPR_LOCATION change to build_aggr_init_expr [PR96997]
My change in r10-7718 to make build_aggr_init_expr set EXPR_LOCATION
(mimicking build_target_expr) causes the debuginfo regression PR96997.
Given that this change is mostly independent of the rest of the commit,
and that the only fallout of reverting it is a less accurate error
message location in a testcase introduced in the same commit, it seems
the best way forward is to just revert this part of the commit.
Steve Kargl [Fri, 12 Feb 2021 21:04:14 +0000 (13:04 -0800)]
libgfortran: Fix PR95647 by changing the interfaces of operators .eq. and .ne.
The FE converts the old school .eq. to ==,
and then tracks the ==. The module starts with == and so it does not
properly overload the .eq. Reversing the interfaces fixes this.
2021-02-12 Steve Kargl <sgk@troutmask.apl.washington.edu>
libgfortran/ChangeLog:
PR libfortran/95647
* ieee/ieee_arithmetic.F90: Flip interfaces of operators .eq. to
== and .ne. to /= .
gcc/testsuite/ChangeLog:
PR libfortran/95647
* gfortran.dg/ieee/ieee_12.f90: New test.
Jason Merrill [Mon, 8 Feb 2021 22:04:03 +0000 (17:04 -0500)]
c++: generic lambda, fn* conv, empty class [PR98326]
Here, in the thunk returned from the captureless lambda conversion to
pointer-to-function, we try to pass through invisible reference parameters
by reference, without doing a copy. The empty class copy optimization was
messing that up.
Marek Polacek [Tue, 9 Feb 2021 20:17:48 +0000 (15:17 -0500)]
c++: Endless loop with targ deduction in member tmpl [PR95888]
My r10-7007 patch tweaked tsubst not to reduce the template level of
template parameters when tf_partial. That caused infinite looping in
is_specialization_of: we ended up with a class template specialization
whose TREE_TYPE (CLASSTYPE_TI_TEMPLATE (t)) == t, so the second for
loop in is_specialization_of never finished.
There's a lot going on in this test, but essentially: the template fn
here has two template parameters, we call it with one explicitly
provided, the other one has to be deduced. So we'll find ourselves
in fn_type_unification which uses tf_partial when tsubsting the
*explicit* template arguments into the function type. That leads to
tsubstituting the return type, C<T>. C is a member template; its
most general template is
we figure out (tsubst_template_args) that the template argument list
is <int, int>. They come from different levels, one comes from B<int>,
the other one from fn<int>.
So now we lookup_template_class to see if we have C<int, int>. We
do the
/* This is a full instantiation of a member template. Find
the partial instantiation of which this is an instance. */
TREE_VEC_LENGTH (arglist)--;
// arglist is now <int>, not <int, int>
found = tsubst (gen_tmpl, arglist, complain, NULL_TREE);
TREE_VEC_LENGTH (arglist)++;
magic which is looking for the partial instantiation, in this case,
that would be template<class V> struct B<int>::C. Note we're still
in a tf_partial context! So the tsubst_template_args in the tsubst
(which tries to substitute <int> into <U, V>) returns <int, V>, but
V's template level hasn't been reduced! After tsubst_template_args,
tsubst_template_decl looks to see if we already have this specialization:
// t = template_decl C
// full_args = <int, V>
spec = retrieve_specialization (t, full_args, hash);
but doesn't find the one we created a while ago, when processing
B<int> b; in the test, because V's levels don't match. Whereupon
tsubst_template_decl creates a new TEMPLATE_DECL, one that leads to
the infinite looping problem.
Fixed by using tf_none when looking for an existing partial instantiation.
It also occurred to me that I should be able to trigger a similar
problem with 'auto', since r10-7007 removed an is_auto check. And lo,
I constructed deduce10.C which exhibits the same issue with pre-r10-7007
compilers. This patch fixes that problem as well. I'm ecstatic.
gcc/cp/ChangeLog:
PR c++/95888
* pt.c (lookup_template_class_1): Pass tf_none to tsubst when looking
for the partial instantiation.
gcc/testsuite/ChangeLog:
PR c++/95888
* g++.dg/template/deduce10.C: New test.
* g++.dg/template/deduce9.C: New test.
Eric Botcazou [Thu, 11 Feb 2021 23:16:49 +0000 (00:16 +0100)]
Fix -freorder-blocks-and-partition glitch with Windows SEH
Since GCC 8, the -freorder-blocks-and-partition pass can split a function
into hot and cold parts, thus generating 2 CIEs for a single function in
DWARF for exception purposes and doing an equivalent trick for Windows SEH.
Now the Windows system unwinder is picky when it comes to the boundary
between an active EH region and the end of the function and, therefore,
a nop may need to be added in specific cases.
gcc/
* config/i386/winnt.c (i386_pe_seh_unwind_emit): When switching to
the cold section, emit a nop before the directive if the previous
active instruction can throw.
Paul Thomas [Thu, 11 Feb 2021 13:24:50 +0000 (13:24 +0000)]
Fortran: Fix calls to associate name typebound subroutines [PR98897].
2021-02-11 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/98897
* match.c (gfc_match_call): Include associate names as possible
entities with typebound subroutines. The target needs to be
resolved for the type.
gcc/testsuite/
PR fortran/98897
* gfortran.dg/typebound_call_32.f90: New test.
Eric Botcazou [Tue, 9 Feb 2021 18:49:18 +0000 (19:49 +0100)]
Fix miscompilation of Python on HP-PA/Linux
This is the miscompilation of Python at -O2 on HP-PA/Linux present
on the mainline and 10 branch, caused by the presence of a call to
__builtin_unreachable () in the middle of a heavily branchy code,
which confuses the reorg pass.
gcc/
PR rtl-optimization/96015
* reorg.c (skip_consecutive_labels): Minor comment tweaks.
(relax_delay_slots): When deleting a jump to the next active
instruction over a barrier, first delete the barrier if the
jump is the only way to reach the target label.
Jason Merrill [Thu, 4 Feb 2021 02:56:59 +0000 (21:56 -0500)]
c++: No aggregate CTAD with explicit dguide [PR98802]
In my implementation of P2082R1 I missed this piece: the aggregate deduction
candidate is not generated if the class has user-written deduction guides.
gcc/cp/ChangeLog:
PR c++/98802
* pt.c (deduction_guides_for): Add any_dguides_p parm.
(do_class_deduction): No aggregate guide if any_dguides_p.
gcc/testsuite/ChangeLog:
PR c++/98802
* g++.dg/cpp1z/class-deduction78.C: New test.
Richard Biener [Fri, 29 Jan 2021 15:02:36 +0000 (16:02 +0100)]
rtl-optimization/98863 - tame i386 specific RPAD pass
This removes analyzing DF with expensive problems which we do not
use at all and which somehow cause 5GB of memory to leak. Instead
just do a defered rescan of added insns.
by clearing DF_DEFER_INSN_RESCAN after calling df_process_deferred_rescans,
so that it doesn't leak into following unprepared passes that expect
non-deferred rescans.
2021-02-03 Richard Biener <rguenther@suse.de>
Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/98863
* config/i386/i386-features.c (remove_partial_avx_dependency):
Do not perform DF analysis.
(pass_data_remove_partial_avx_dependency): Remove
TODO_df_finish.
Eric Botcazou [Wed, 3 Feb 2021 10:38:04 +0000 (11:38 +0100)]
Fix regression with partial rep clause on variant record type
It can yield an incorrect layout when there is a partial representation
clause on a discriminated record type with a variant part.
gcc/ada/
* gcc-interface/decl.c (components_to_record): If the first component
with rep clause is the _Parent field with variable size, temporarily
set it aside when computing the internal layout of the REP part again.
* gcc-interface/utils.c (finish_record_type): Revert to taking the
maximum when merging sizes for all record types with rep clause.
(merge_sizes): Put SPECIAL parameter last and adjust recursive calls.
Eric Botcazou [Wed, 3 Feb 2021 10:11:26 +0000 (11:11 +0100)]
Assorted LTO fixes for Ada
This polishes a few rough edges visible in LTO mode.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Array_Type>: Make the
two fields of the fat pointer type addressable, and do not make the
template type read-only.
<E_Record_Type>: If the type has discriminants mark it as may_alias.
* gcc-interface/utils.c (make_dummy_type): Likewise.
(build_dummy_unc_pointer_types): Likewise.
Richard Biener [Fri, 29 Jan 2021 09:23:40 +0000 (10:23 +0100)]
rtl-optimization/98144 - tame REE memory usage
This changes the REE dataflow to change the explicit all-ones
starting solution to be implicit via a visited flag, removing
the need to initially start with fully populated bitmaps for
all basic-blocks. That reduces peak memory use when compiling
the RTL checking enabled insn-extract.c testcase from PR98144
from 6GB to less than 2GB.
2021-01-29 Richard Biener <rguenther@suse.de>
PR rtl-optimization/98144
* df.h (df_mir_bb_info): Add con_visited member.
* df-problems.c (df_mir_alloc): Initialize con_visited,
do not fully populate IN and OUT.
(df_mir_reset): Likewise.
(df_mir_confluence_0): Set con_visited.
(df_mir_confluence_n): Properly handle implicitely
fully populated IN and OUT as designated by con_visited
and update con_visited accordingly.
Patrick Palka [Mon, 1 Feb 2021 15:27:45 +0000 (10:27 -0500)]
c++: Fix ICE from verify_ctor_sanity [PR98295]
In this testcase we're crashing during constexpr evaluation of the
ARRAY_REF b[0] as part of evaluation of the lambda's by-copy capture of b
(which is encoded as a VEC_INIT_EXPR<b>). Since A's constexpr default
constructor is not yet defined, b's initialization is not actually
constant, but because A is an empty type, evaluation of b from
cxx_eval_array_ref is successful and yields an empty CONSTRUCTOR.
And since this CONSTRUCTOR is empty, we {}-initialize the desired array
element, and end up crashing from verify_ctor_sanity during evaluation
of this initializer because we updated new_ctx.ctor without updating
new_ctx.object: the former now has type A[3] and the latter is still the
target of a TARGET_EXPR for b[0][0] created from cxx_eval_vec_init
(and so has type A).
This patch fixes this by setting new_ctx.object appropriately at the
same time that we set new_ctx.ctor from cxx_eval_array_reference.
gcc/cp/ChangeLog:
PR c++/98295
* constexpr.c (cxx_eval_array_reference): Also set
new_ctx.object when setting new_ctx.ctor.
gcc/testsuite/ChangeLog:
PR c++/98295
* g++.dg/cpp0x/constexpr-98295.C: New test.
Marek Polacek [Fri, 29 Jan 2021 16:29:25 +0000 (11:29 -0500)]
c++: Improve sorry for __builtin_has_attribute [PR98355]
__builtin_has_attribute doesn't work in templates yet (bug 92104), so
in r11-471 I added a sorry. But that only caught type-dependent
expressions and we also want to sorry on value-dependent expressions.
This patch uses uses_template_parms, but guarded with p_t_d, because
u_t_p sets p_t_d and then v_d_e_p considers variables with reference
types value-dependent, which breaks builtin-has-attribute-6.c.
This is a regression and I also plan to apply this to gcc-10.
gcc/cp/ChangeLog:
PR c++/98355
* parser.c (cp_parser_has_attribute_expression): Use
uses_template_parms instead of type_dependent_expression_p.
gcc/testsuite/ChangeLog:
PR c++/98355
* g++.dg/ext/builtin-has-attribute2.C: New test.
Patrick Palka [Wed, 5 Aug 2020 19:05:30 +0000 (15:05 -0400)]
c++: cxx_eval_vec_init after zero-initialization [PR96282]
In the first testcase below, expand_aggr_init_1 sets up t's default
constructor such that the ctor first zero-initializes the entire base b,
followed by calling b's default constructor, the latter of which just
default-initializes the array member b::m via a VEC_INIT_EXPR.
So upon constexpr evaluation of this latter VEC_INIT_EXPR, ctx->ctor is
nonempty due to the prior zero-initialization, and we proceed in
cxx_eval_vec_init to append new constructor_elts to the end of ctx->ctor
without first checking if a matching constructor_elt already exists.
This leads to ctx->ctor having two matching constructor_elts for each
index.
This patch fixes this issue by truncating a zero-initialized array
CONSTRUCTOR in cxx_eval_vec_init_1 before we begin appending array
elements to it. We propagate its zeroed out state during evaluation by
clearing CONSTRUCTOR_NO_CLEARING on each new appended aggregate element.
gcc/cp/ChangeLog:
PR c++/96282
* constexpr.c (cxx_eval_vec_init_1): Truncate ctx->ctor and
then clear CONSTRUCTOR_NO_CLEARING on each appended element
initializer if we're initializing a previously zero-initialized
array object.
gcc/testsuite/ChangeLog:
PR c++/96282
* g++.dg/cpp0x/constexpr-array26.C: New test.
* g++.dg/cpp0x/constexpr-array27.C: New test.
* g++.dg/cpp2a/constexpr-init18.C: New test.
PR target/96307
* gcc.dg/pr96307.c: New.
* gcc.target/riscv/pr96260.c: Move this test case from here to ...
* gcc.dg/pr96260.c: ... here.
* gcc.target/riscv/pr91441.c: Move this test case from here to ...
* gcc.dg/pr91441.c: ... here.
* lib/target-supports.exp (check_effective_target_no_fsanitize_address):
New proc.
Jakub Jelinek [Fri, 29 Jan 2021 09:30:09 +0000 (10:30 +0100)]
expand: Fix up find_bb_boundaries [PR98331]
When expansion emits some control flow insns etc. inside of a former GIMPLE
basic block, find_bb_boundaries needs to split it into multiple basic
blocks.
The code needs to ignore debug insns in decisions how many splits to do or
where in between some non-debug insns the split should be done, but it can
decide where to put debug insns if they can be kept and otherwise throws
them away (they can't stay outside of basic blocks).
On the following testcase, we end up in the bb from expander with
control flow insn
debug insns
barrier
some other insn
(the some other insn is effectively dead after __builtin_unreachable and
we'll optimize that out later).
Without debug insns, we'd do the split when encountering some other insn
and split after PREV_INSN (some other insn), i.e. after barrier (and the
splitting code then moves the barrier in between basic blocks).
But if there are debug insns, we actually split before the first debug insn
that appeared after the control flow insn, so after control flow insn,
and get a basic block that starts with debug insns and then has a barrier
in the middle that nothing moves it out of the bb. This leads to ICEs and
even if it wouldn't, different behavior from -g0.
The reason for treating debug insns that way is a different case, e.g.
control flow insn
debug insns
some other insn
or even
control flow insn
barrier
debug insns
some other insn
where splitting before the first such debug insn allows us to keep them
while otherwise we would have to drop them on the floor, and in those
situations we behave the same with -g0 and -g.
So, the following patch fixes it by resetting debug_insn not just when
splitting the blocks (it is set only after seeing a control flow insn and
before splitting for it if needed), but also when seeing a barrier,
which effectively means we always throw away debug insns after a control
flow insn and before following barrier if any, but there is no way around
that, control flow insn must be the last in the bb (BB_END) and BARRIER
after it, debug insns aren't allowed outside of bb.
We still handle the other cases fine (when there is no barrier or when
debug insns appear only after the barrier).
2021-01-29 Jakub Jelinek <jakub@redhat.com>
PR debug/98331
* cfgbuild.c (find_bb_boundaries): Reset debug_insn when seeing
a BARRIER.
Jakub Jelinek [Thu, 28 Jan 2021 15:13:11 +0000 (16:13 +0100)]
c++: Fix up handling of register ... asm ("...") vars in templates [PR33661, PR98847]
As the testcase shows, for vars appearing in templates, we don't attach
the asm spec string to the pattern decls, nor pass it back to cp_finish_decl
during instantiation.
The following patch does that.
2021-01-28 Jakub Jelinek <jakub@redhat.com>
PR c++/33661
PR c++/98847
* decl.c (cp_finish_decl): For register vars with asmspec in templates
call set_user_assembler_name and set DECL_HARD_REGISTER.
* pt.c (tsubst_expr): When instantiating DECL_HARD_REGISTER vars,
pass asmspec_tree to cp_finish_decl.
Jakub Jelinek [Wed, 27 Jan 2021 19:35:21 +0000 (20:35 +0100)]
aarch64: Fix up *aarch64_bfxilsi_uxtw [PR98853]
The https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html
patch that introduced this pattern claimed:
Would generate:
combine_balanced_int:
bfxil w0, w1, 0, 16
uxtw x0, w0
ret
But with this patch generates:
combine_balanced_int:
bfxil w0, w1, 0, 16
ret
and it is indeed what it should generate, but it doesn't do that,
it emits bfxil x0, x1, 0, 16
instead which doesn't zero extend from 32 to 64 bits, but preserves
the bits from the destination register.
2021-01-27 Jakub Jelinek <jakub@redhat.com>
PR target/98853
* config/aarch64/aarch64.md (*aarch64_bfxilsi_uxtw): Use
%w0, %w1 and %2 instead of %0, %1 and %2.
* gcc.c-torture/execute/pr98853-1.c: New test.
* gcc.c-torture/execute/pr98853-2.c: New test.
Jakub Jelinek [Tue, 26 Jan 2021 13:48:26 +0000 (14:48 +0100)]
aarch64: Tighten up checks for ubfix [PR98681]
The testcase in the patch doesn't assemble, because the instruction requires
that the penultimate operand (lsb) range is [0, 32] (or [0, 64]) and the last
operand's range is [1, 32 - lsb] (or [1, 64 - lsb]).
The INTVAL (shft_amnt) < GET_MODE_BITSIZE (mode) will accept the lsb operand
to be in range [MIN, 32] (or [MIN, 64]) and then we invoke UB in the
compiler and sometimes it will make it through.
The patch changes all the INTVAL uses in that function to UINTVAL,
which isn't strictly necessary, but can be done (e.g. after the
UINTVAL (shft_amnt) < GET_MODE_BITSIZE (mode) check we know it is not
negative and thus INTVAL (shft_amnt) and UINTVAL (shft_amnt) then behave the
same. But, I had to add INTVAL (mask) > 0 check in that case, otherwise we
risk (hypothetically) emitting instruction that doesn't assemble.
The problem is with masks that have the MSB bit set, while the instruction
can handle those, e.g.
ubfiz w1, w0, 13, 19
will do
(w0 << 13) & 0xffffe000
in RTL we represent SImode constants with MSB set as negative HOST_WIDE_INT,
so it will actually be HOST_WIDE_INT_C (0xffffffffffffe000), and
the instruction uses %P3 to print the last operand, which calls
asm_fprintf (f, "%u", popcount_hwi (INTVAL (x)))
to print that. But that will not print 19, but 51 instead, will include
there also all the copies of the sign bit.
Not supporting those masks with MSB set isn't a big loss though, they really
shouldn't appear normally, as both GIMPLE and RTL optimizations should
optimize those away (one isn't masking any bits off with such masks, so
just w0 << 13 will do too).
2021-01-26 Jakub Jelinek <jakub@redhat.com>
PR target/98681
* config/aarch64/aarch64.c (aarch64_mask_and_shift_for_ubfiz_p):
Use UINTVAL (shft_amnt) and UINTVAL (mask) instead of INTVAL (shft_amnt)
and INTVAL (mask). Add && INTVAL (mask) > 0 condition.
Jakub Jelinek [Mon, 25 Jan 2021 09:03:40 +0000 (10:03 +0100)]
fold: Fix up strn{case,}cmp folding [PR98771]
As mentioned in the PR, the compiler behaves differently during strncmp
and strncasecmp folding between 32-bit and 64-bit hosts targeting 64-bit
target. I think that is highly undesirable.
The culprit is the host_size_t_cst_p predicate that is used by
fold_const_call, which punts if the target size_t constants don't fit into
host size_t. This patch gets rid of that behavior, instead it punts the
same when it doesn't fit into uhwi.
The predicate was used for strncmp and strncasecmp folding and for bcmp, memcmp and
memchr folding.
The constant is in all cases compared to 0, we can do that whether it fits
into size_t or unsigned HOST_WIDE_INT, then it is used in s2 <= s0 or
s2 <= s1 comparisons where s0 and s1 already have uhwi type and represent
the sizes of the objects.
The important difference is for strn{,case}cmp folding, we pass that s2
value as the last argument to the host functions comparing the c_getstr
results. If s2 fits into size_t, then my patch makes no difference,
but if it is larger, we know the 2 c_getstr objects need to fit into the
host address space, so larger s2 should just act essentially as strcmp
or strcasecmp; as none of those objects can occupy 100% of the address
space, using MIN (SIZE_MAX, s2) achieves that.
2021-01-25 Jakub Jelinek <jakub@redhat.com>
PR testsuite/98771
* fold-const-call.c (host_size_t_cst_p): Renamed to ...
(size_t_cst_p): ... this. Check and store unsigned HOST_WIDE_INT
value rather than host size_t.
(fold_const_call): Change type of s2 from size_t to
unsigned HOST_WIDE_INT. Use size_t_cst_p instead of
host_size_t_cst_p. For strncmp calls, pass MIN (s2, SIZE_MAX)
instead of s2 as last argument.
Jakub Jelinek [Sat, 23 Jan 2021 08:41:58 +0000 (09:41 +0100)]
rs6000: Fix up __m64 typedef in mmintrin.h [PR97301]
The x86 __m64 type is defined as:
/* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components. */
typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
and so matches the comment above it in that reads and stores through
pointers to __m64 can alias anything.
But in the rs6000 headers that is the case only for __m128, but not __m64.
The following patch adds that attribute, which fixes the
FAIL: gcc.target/powerpc/sse-movhps-1.c execution test
FAIL: gcc.target/powerpc/sse-movlps-1.c execution test
regressions that appeared when Honza improved ipa-modref.
Jakub Jelinek [Fri, 22 Jan 2021 18:03:23 +0000 (19:03 +0100)]
c++: Fix up ubsan false positives on references [PR95693]
Alex' 2 years old change to build_zero_init_1 to return NULL pointer with
reference type for references breaks the sanitizers, the assignment of NULL
to a reference typed member is then instrumented before it is overwritten
with a non-NULL address later on.
That change has been done to fix error recovery ICE during
process_init_constructor_record, where we:
if (TYPE_REF_P (fldtype))
{
if (complain & tf_error)
error ("member %qD is uninitialized reference", field);
else
return PICFLAG_ERRONEOUS;
}
a few lines earlier, but then continue and ICE when build_zero_init returns
NULL.
The following patch reverts the build_zero_init_1 change and instead creates
the NULL with reference type constants during the error recovery.
The pr84593.C testcase Alex' change was fixing still works as before.
2021-01-22 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/95693
* init.c (build_zero_init_1): Revert the 2018-03-06 change to
return build_zero_cst for reference types.
* typeck2.c (process_init_constructor_record): Instead call
build_zero_cst here during error recovery instead of build_zero_init.
Jakub Jelinek [Fri, 22 Jan 2021 10:50:18 +0000 (11:50 +0100)]
match.pd: Replace incorrect simplifications into copysign [PR90248]
In the PR Andrew said he has implemented a simplification that has been
added to LLVM, but that actually is not true, what is in there are
X * (X cmp 0.0 ? +-1.0 : -+1.0) simplifications into +-abs(X)
but what has been added into GCC are (X cmp 0.0 ? +-1.0 : -+1.0)
simplifications into copysign(1, +-X) and then
X * copysign (1, +-X) into +-abs (X).
The problem is with the (X cmp 0.0 ? +-1.0 : -+1.0) simplifications,
they don't work correctly when X is zero.
E.g.
(X > 0.0 ? 1.0 : -1.0)
is -1.0 when X is either -0.0 or 0.0, but copysign will make it return
1.0 for 0.0 and -1.0 only for -0.0.
(X >= 0.0 ? 1.0 : -1.0)
is 1.0 when X is either -0.0 or 0.0, but copysign will make it return
still 1.0 for 0.0 and -1.0 for -0.0.
The simplifications were guarded on !HONOR_SIGNED_ZEROS, but as discussed in
the PR, that option doesn't mean that -0.0 will not ever appear as operand
of some operation, it is hard to guarantee that without compiler adding
canonicalizations of -0.0 to 0.0 after most of the operations and thus
making it very slow, but that the user asserts that he doesn't care if the result
of operations will be 0.0 or -0.0. Not to mention that some of the
transformations are incorrect even for positive 0.0.
So, instead of those simplifications this patch recognizes patterns where
those ?: expressions are multiplied by X, directly into +-abs.
That works fine even for 0.0 and -0.0 (as long as we don't care about
whether the result is exactly 0.0 or -0.0 in those cases), because
whether the result of copysign is -1.0 or 1.0 doesn't matter when it is
multiplied by 0.0 or -0.0.
As a follow-up, maybe we should add the simplification mentioned in the PR,
in particular doing copysign by hand through
VIEW_CONVERT_EXPR <int, float_X> < 0 ? -float_constant : float_constant
into copysign (float_constant, float_X). But I think that would need to be
done in phiopt.
Jakub Jelinek [Fri, 22 Jan 2021 10:42:03 +0000 (11:42 +0100)]
on ARRAY_REFs sign-extend offsets only from sizetype's precision [PR98255]
As discussed in the PR, the problem here is that the routines changed in
this patch sign extend the difference of index and low_bound from the
precision of the index, so e.g. when index is unsigned int and contains
value -2U, we treat it as index -2 rather than 0x00000000fffffffeU on 64-bit
arches.
On the other hand, get_inner_reference which is used during expansion, does:
if (! integer_zerop (low_bound))
index = fold_build2 (MINUS_EXPR, TREE_TYPE (index),
index, low_bound);
offset = size_binop (PLUS_EXPR, offset,
size_binop (MULT_EXPR,
fold_convert (sizetype, index),
unit_size));
which effectively requires that either low_bound is constant 0 and then
index in ARRAY_REFs can be arbitrary type which is then sign or zero
extended to sizetype, or low_bound is something else and then index and
low_bound must have compatible types and it is still converted afterwards to
sizetype and from there then a few lines later:
expr.c- if (poly_int_tree_p (offset))
expr.c- {
expr.c: poly_offset_int tem = wi::sext (wi::to_poly_offset (offset),
expr.c- TYPE_PRECISION (sizetype));
The following patch makes those routines match what get_inner_reference is
doing.
2021-01-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/98255
* tree-dfa.c (get_ref_base_and_extent): For ARRAY_REFs, sign
extend index - low_bound from sizetype's precision rather than index
precision.
(get_addr_base_and_unit_offset_1): Likewise.
* tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Likewise.
* gimple-fold.c (fold_const_aggregate_ref_1): Likewise.
Jakub Jelinek [Thu, 21 Jan 2021 16:20:24 +0000 (17:20 +0100)]
c++: Fix up potential_constant_expression_1 FOR/WHILE_STMT handling [PR98672]
The following testcase is rejected even when it is valid.
The problem is that potential_constant_expression_1 doesn't have the
accurate *jump_target tracking cxx_eval_* has, and when the loop has
a condition that isn't guaranteed to be always true, the body isn't walked
at all. That is mostly a correct conservative behavior, except that it
doesn't detect if there are any return statements in the body, which means
the loop might return instead of falling through to the next statement.
We already have code for return stmt discovery in code snippets we don't
try to evaluate for switches, so this patch reuses that for FOR_STMT
and WHILE_STMT bodies.
Note, I haven't touched FOR_EXPR, with statement expressions it could
have return stmts in it too, or it could have break or continue statements
that wouldn't bind to the current loop but to something outer. That
case is clearly mishandled by potential_constant_expression_1 even
when the condition is missing or is always true, and it wouldn't surprise me
if cxx_eval_* didn't handle it right either, so I'm deferring that to
separate PR for later. We'd need proper test coverage for all of that.
> Hmm, IF_STMT probably also needs to check the else clause, if the condition
> isn't a known constant.
You're right, I thought it was ok because it recurses with tf_none, but
if the then branch is potentially constant and only else returns, continues
or breaks, then as the enhanced testcase shows we were mishandling it too.
2021-01-21 Jakub Jelinek <jakub@redhat.com>
PR c++/98672
* constexpr.c (check_for_return_continue_data): Add break_stmt member.
(check_for_return_continue): Also look for BREAK_STMT. Handle
SWITCH_STMT by ignoring break_stmt from its body.
(potential_constant_expression_1) <case FOR_STMT>,
<case WHILE_STMT>: If the condition isn't constant true, check if
the loop body can contain a return stmt.
<case SWITCH_STMT>: Adjust check_for_return_continue_data initializer.
<case IF_STMT>: If recursion with tf_none is successful,
merge *jump_target from the branches - returns with highest priority,
breaks or continues lower. If then branch is potentially constant and
doesn't return, check the else branch if it could return, break or
continue.
Jason Merrill [Fri, 22 Jan 2021 18:17:10 +0000 (13:17 -0500)]
c++: [[no_unique_address]] in empty base [PR98463]
In this testcase, cxx_eval_store_expression got confused trying to build up
CONSTRUCTORs for initializing a subobject because the subobject is a member
of an empty base. In C++14 mode and below we don't build FIELD_DECLs for
empty bases, so the CONSTRUCTOR skipped the empty base, and treated the
member as a member of the derived class, which breaks.
Fixed by recognizing this situation and giving up on trying to build a
CONSTRUCTOR for the inner target at that point; since it doesn't have any
data, we don't need to actually store anything.