Martin Jambor [Mon, 3 Aug 2020 16:13:00 +0000 (18:13 +0200)]
Removal of HSA offloading from gcc and libgomp
This patch removes the generation of HSAIL from the compiler, the HSA
offloading plugin from libgomp and the associated testsuite tests and
infrastructure bits from the respective testsuites.
Apart from removal of the obvious files, I removed bits that I found
by searching for HSA related terms and by re-tracing my steps and
looking at the patches that introduced HSA in the first place. I did
not remove everything these patches brought in, for example:
- the mechanism to pass offload-target specific info from the application to
the offloading plugin - but the same mechanism is also used to
communicate number of teams and the thread limit to all offload targets.
- run_func hook in gomp_device_descr stays too, although now it is
not used. If some future offload target would like the ability to
refuse to offload some functions, it can use it. It is easy to
remove as a follow-up if it is considered clutter, though.
- configure options --with-hsa-runtime=PATH, -with-hsa-runtime-include=PATH
and --with-hsa-runtime-lib=PATH rmeain because GCN uses them too.
- Surprisingly, GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES (a constant
from gomp-constants.h) appears in the source of the amdgcn libgomp
plugin, although I tend to think that code path is not ever used
and this patch certainly removes it from the compiler.
Nevertheless, it seems it has potential value beyond HSAIL and so
I've kept it, it can of course always be easily removed in the
future of GCN folk abandon it too.
- I assume constants OFFLOAD_TARGET_TYPE_HSA and GOMP_DEVICE_HSA
need to stay indefinitely too just so that no future offload
target picks that number.
- I have kept dg-require-effective-target
offload_device_nonshared_as requirement of thests which have it.
It is quite probable I missed some small HSA artifacts but those
should be easy to remove later as we find them.
include/ChangeLog:
2020-07-24 Martin Jambor <mjambor@suse.cz>
* gomp-constants.h (GOMP_VERSION_HSA): Remove.
gcc/ChangeLog:
2020-07-24 Martin Jambor <mjambor@suse.cz>
* hsa-brig-format.h: Moved to brig/brigfrontend.
* hsa-brig.c: Removed.
* hsa-builtins.def: Likewise.
* hsa-common.c: Likewise.
* hsa-common.h: Likewise.
* hsa-dump.c: Likewise.
* hsa-gen.c: Likewise.
* hsa-regalloc.c: Likewise.
* ipa-hsa.c: Likewise.
* omp-grid.c: Likewise.
* omp-grid.h: Likewise.
* Makefile.in (BUILTINS_DEF): Remove hsa-builtins.def.
(OBJS): Remove hsa-common.o, hsa-gen.o, hsa-regalloc.o, hsa-brig.o,
hsa-dump.o, ipa-hsa.c and omp-grid.o.
(GTFILES): Removed hsa-common.c and omp-expand.c.
* builtins.def: Remove processing of hsa-builtins.def.
(DEF_HSA_BUILTIN): Remove.
* common.opt (flag_disable_hsa): Remove.
(-Whsa): Ignore.
* config.in (ENABLE_HSA): Removed.
* configure.ac: Removed handling configuration for hsa offloading.
(ENABLE_HSA): Removed.
* configure: Regenerated.
* doc/install.texi (--enable-offload-targets): Remove hsa from the
example.
(--with-hsa-runtime): Reword to reference any HSA run-time, not
specifically HSA offloading.
* doc/invoke.texi (Option Summary): Remove -Whsa.
(Warning Options): Likewise.
(Optimize Options): Remove hsa-gen-debug-stores.
* doc/passes.texi (Regular IPA passes): Remove section on IPA HSA
pass.
* gimple-low.c (lower_stmt): Remove GIMPLE_OMP_GRID_BODY case.
* gimple-pretty-print.c (dump_gimple_omp_for): Likewise.
(dump_gimple_omp_block): Likewise.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.c (walk_gimple_stmt): Likewise.
* gimple.c (gimple_build_omp_grid_body): Removed function.
(gimple_copy): Remove GIMPLE_OMP_GRID_BODY case.
* gimple.def (GIMPLE_OMP_GRID_BODY): Removed.
* gimple.h (gf_mask): Removed GF_OMP_PARALLEL_GRID_PHONY,
OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY,
GF_OMP_FOR_GRID_INTRA_GROUP, GF_OMP_FOR_GRID_GROUP_ITER and
GF_OMP_TEAMS_GRID_PHONY. Renumbered GF_OMP_FOR_KIND_SIMD and
GF_OMP_TEAMS_HOST.
(gimple_build_omp_grid_body): Removed declaration.
(gimple_has_substatements): Remove GIMPLE_OMP_GRID_BODY case.
(gimple_omp_for_grid_phony): Removed.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_for_grid_intra_group): Likewise.
(gimple_omp_for_grid_intra_group): Likewise.
(gimple_omp_for_grid_group_iter): Likewise.
(gimple_omp_for_set_grid_group_iter): Likewise.
(gimple_omp_parallel_grid_phony): Likewise.
(gimple_omp_parallel_set_grid_phony): Likewise.
(gimple_omp_teams_grid_phony): Likewise.
(gimple_omp_teams_set_grid_phony): Likewise.
(CASE_GIMPLE_OMP): Remove GIMPLE_OMP_GRID_BODY case.
* lto-section-in.c (lto_section_name): Removed hsa.
* lto-streamer.h (lto_section_type): Removed LTO_section_ipa_hsa.
* lto-wrapper.c (compile_images_for_offload_targets): Remove special
handling of hsa.
* omp-expand.c: Do not include hsa-common.h and gt-omp-expand.h.
(parallel_needs_hsa_kernel_p): Removed.
(grid_launch_attributes_trees): Likewise.
(grid_launch_attributes_trees): Likewise.
(grid_create_kernel_launch_attr_types): Likewise.
(grid_insert_store_range_dim): Likewise.
(grid_get_kernel_launch_attributes): Likewise.
(get_target_arguments): Remove code passing HSA grid sizes.
(grid_expand_omp_for_loop): Remove.
(grid_arg_decl_map): Likewise.
(grid_remap_kernel_arg_accesses): Likewise.
(grid_expand_target_grid_body): Likewise.
(expand_omp): Remove call to grid_expand_target_grid_body.
(omp_make_gimple_edges): Remove GIMPLE_OMP_GRID_BODY case.
* omp-general.c: Do not include hsa-common.h.
(omp_maybe_offloaded): Do not check for HSA offloading.
(omp_context_selector_matches): Likewise.
* omp-low.c: Do not include hsa-common.h and omp-grid.h.
(build_outer_var_ref): Remove handling of GIMPLE_OMP_GRID_BODY.
(scan_sharing_clauses): Remove handling of OMP_CLAUSE__GRIDDIM_.
(scan_omp_parallel): Remove handling of the phoney variant.
(check_omp_nesting_restrictions): Remove handling of
GIMPLE_OMP_GRID_BODY and GF_OMP_FOR_KIND_GRID_LOOP.
(scan_omp_1_stmt): Remove handling of GIMPLE_OMP_GRID_BODY.
(lower_omp_for_lastprivate): Remove handling of gridified loops.
(lower_omp_for): Remove phony loop handling.
(lower_omp_taskreg): Remove phony construct handling.
(lower_omp_teams): Likewise.
(lower_omp_grid_body): Removed.
(lower_omp_1): Remove GIMPLE_OMP_GRID_BODY case.
(execute_lower_omp): Do not call omp_grid_gridify_all_targets.
* opts.c (common_handle_option): Do not handle hsa when processing
OPT_foffload_.
* params.opt (hsa-gen-debug-stores): Remove.
* passes.def: Remove pass_ipa_hsa and pass_gen_hsail.
* timevar.def: Remove TV_IPA_HSA.
* toplev.c: Do not include hsa-common.h.
(compile_file): Do not call hsa_output_brig.
* tree-core.h (enum omp_clause_code): Remove OMP_CLAUSE__GRIDDIM_.
(tree_omp_clause): Remove union field dimension.
* tree-nested.c (convert_nonlocal_omp_clauses): Remove the
OMP_CLAUSE__GRIDDIM_ case.
(convert_local_omp_clauses): Likewise.
* tree-pass.h (make_pass_gen_hsail): Remove declaration.
(make_pass_ipa_hsa): Likewise.
* tree-pretty-print.c (dump_omp_clause): Remove GIMPLE_OMP_GRID_BODY
case.
* tree.c (omp_clause_num_ops): Remove the element corresponding to
OMP_CLAUSE__GRIDDIM_.
(omp_clause_code_name): Likewise.
(walk_tree_1): Remove GIMPLE_OMP_GRID_BODY case.
* tree.h (OMP_CLAUSE__GRIDDIM__DIMENSION): Remove.
(OMP_CLAUSE__GRIDDIM__SIZE): Likewise.
(OMP_CLAUSE__GRIDDIM__GROUP): Likewise.
gcc/fortran/ChangeLog:
2020-07-24 Martin Jambor <mjambor@suse.cz>
* f95-lang.c (gfc_init_builtin_functions): Remove processing of
hsa-builtins.def.
gcc/brig/ChangeLog:
2020-07-24 Martin Jambor <mjambor@suse.cz>
* brigfrontend/brig-util.h (hsa_type_packed_p): Declared.
* brigfrontend/brig-util.cc (hsa_type_packed_p): Moved here from
removed gcc/hsa-common.c.
Bu Le [Mon, 3 Aug 2020 15:38:46 +0000 (16:38 +0100)]
aarch64: Add support for unpacked sub [PR96366]
The test case bb-slp-20.c in the gcc testsuit will cause an
ICE in the expand pass due to the lack of a pattern for
subtraction of the VNx2SI mode. This patch solve this problem
by adding support for unpacked sub.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (sub<mode>3): Add support for
unpacked vectors.
MSP430: Don't pass redundant -md option to the assembler
The MSP430 GAS option "-md" is supposed to indicate that the CRT startup
code should copy data from ROM to RAM at startup. However, this option
has no effect; GAS handles the related behaviour automatically.
gcc/ChangeLog:
* config/msp430/msp430.h (ASM_SPEC): Don't pass on "-md" option.
SMS is performed before reload, and each insn in SMS schedule uses
pseudo-register. After reload, regrename pass try to adjust the hard
registers with def/use chain created by build_def_use. For now, regrename
pass isn't aware of VLIW bundles created by SMS, it may updated a register
which may not be really unused, which will causes invalid VLIW bundles.
Before the final schedule, we recheck the validation of VLIW bundles and
reschedule the conflicted insns to avoid the above issue. Rescheduling
the conflicted insns will destroy SMS schedule of the kernel loop, which
would be harmful to performance.
2020-08-03 Yunde Zhong <zhongyunde@huawei.com>
gcc/
PR rtl-optimization/95696
* regrename.c (regrename_analyze): New param include_all_block_p
with default value TRUE. If set to false, avoid disrupting SMS
schedule.
* regrename.h (regrename_analyze): Adjust prototype.
Richard Biener [Mon, 3 Aug 2020 13:05:37 +0000 (15:05 +0200)]
lto/96385 - avoid unused global UNDEFs in debug objects
Unused global UNDEFs can have side-effects in some circumstances so
the following patch avoids them by treating them the same as other
to be discarded DEFs - make them local.
2020-08-03 Richard Biener <rguenther@suse.de>
PR lto/96385
libiberty/
* simple-object-elf.c
(simple_object_elf_copy_lto_debug_sections): Localize global
UNDEFs and reuse the prevailing name.
Qian Jianhua [Mon, 3 Aug 2020 13:01:40 +0000 (14:01 +0100)]
aarch64: Add A64FX machine model
This patch add support for Fujitsu A64FX, as the first step of adding
A64FX machine model.
A64FX is used in FUJITSU Supercomputer PRIMEHPC FX1000,
PRIMEHPC FX700, and supercomputer Fugaku.
The official microarchitecture information of A64FX can be read at
https://github.com/fujitsu/A64FX.
2020-08-03 Qian jianhua <qianjh@cn.fujitsu.com>
gcc/
* config/aarch64/aarch64-cores.def (a64fx): New core.
* config/aarch64/aarch64-tune.md: Regenerated.
* config/aarch64/aarch64.c (a64fx_prefetch_tune, a64fx_tunings): New.
* doc/invoke.texi: Add a64fx to the list.
Roger Sayle [Mon, 3 Aug 2020 12:15:58 +0000 (13:15 +0100)]
PR rtl-optimization 61494: Preserve x-0.0 with HONOR_SNANS.
The following patch avoids simplifying x-0.0 to x when -fsignaling-nans
is specified, which resolves PR rtl-optimization 61494. Indeed, running
the test program attached to that PR now reports no failures.
2020-08-02 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/61494
* simplify-rtx.c (simplify_binary_operation_1) [MINUS]: Don't
simplify x - 0.0 with -fsignaling-nans.
Roger Sayle [Mon, 3 Aug 2020 12:10:45 +0000 (13:10 +0100)]
genmatch: Avoid unused parameter warnings in generated code.
This patch silences a number of unused parameter warnings whilst
compiling both generic-match.c and gimple-match.c. The problem is
that multiple (polymorphic) functions are generated for generic_simplify
and gimple_simplify, each handling tree codes with a specific number
of children. Currently, there are no simplifications for tree codes
with four or five children, leading to functions with "empty" bodies
and unused function arguments. This patch detects those cases, and
generates stub functions (with anonymous arguments) to silence these
warnings.
2020-08-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* genmatch.c (decision_tree::gen): Emit stub functions for
tree code operand counts that have no simplifications.
(main): Correct comment typo.
Jonathan Wakely [Mon, 3 Aug 2020 09:38:44 +0000 (10:38 +0100)]
tree-optimization: Fix typos in comments
The only two changes which aren't obvious are s/dirified/specified/ and
s/edirially/especially/ which appear to be caused by a s/spec/dir/ edit
that went too far.
Tamar Christina [Mon, 3 Aug 2020 11:03:17 +0000 (12:03 +0100)]
AArch64: Fix hwasan failure in readline.
My previous fix added an unchecked call to fgets in the new function readline.
fgets can fail when there's an error reading the file in which case it returns
NULL. It also returns NULL when the next character is EOF.
The EOF case is already covered by the existing code but the error case isn't.
This fixes it by returning the empty string on error.
Also I now use strnlen instead of strlen to make sure we never read outside the
buffer.
This was flagged by Matthew Malcomson during his hwasan work.
gcc/ChangeLog:
* config/aarch64/driver-aarch64.c (readline): Check return value fgets.
Richard Biener [Mon, 3 Aug 2020 08:30:49 +0000 (10:30 +0200)]
mark match.pd ! not implemented on GENERIC
This makes us error when the ! operator modifier is encountered
when not targeting GIMPLE.
2020-08-03 Richard Biener <rguenther@suse.de>
* genmatch.c (parser::gimple): New.
(parser::parser): Initialize gimple flag member.
(parser::parse_expr): Error on ! operator modifier when
not targeting GIMPLE.
(main): Pass down gimple flag to parser ctor.
Moves no frame access error to own function, adding use of it for both
when get_framedecl() cannot find a path to the outer function frame, and
guarding get_decl_tree() from recursively calling itself.
gcc/d/ChangeLog:
PR d/96254
* d-codegen.cc (error_no_frame_access): New.
(get_frame_for_symbol): Use fdparent name in error message.
(get_framedecl): Replace call to assert with error.
* d-tree.h (error_no_frame_access): Declare.
* decl.cc (get_decl_tree): Detect recursion and error.
gcc/testsuite/ChangeLog:
PR d/96254
* gdc.dg/pr96254a.d: New test.
* gdc.dg/pr96254b.d: New test.
Multi-range implementation for value_range (irange).
Implement class irange, a generic multi-range implementation for
value ranges. This class is API compatible with value_range, and is meant
to seamlessly coexist with it.
gcc/ChangeLog:
* Makefile.in (GTFILES): Move value-range.h up.
* gengtype-lex.l: Set yylval to handle GTY markers on templates.
* ipa-cp.c (initialize_node_lattices): Call value_range
constructor.
(ipcp_propagate_stage): Use in-place new so value_range construct
is called.
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Use std
vec instead of GCC's vec<>.
(evaluate_properties_for_edge): Adjust for std vec.
(ipa_fn_summary_t::duplicate): Same.
(estimate_ipcp_clone_size_and_time): Same.
* ipa-prop.c (ipa_get_value_range): Use in-place new for
value_range.
* ipa-prop.h (struct GTY): Remove class keyword for m_vr.
* range-op.cc (empty_range_check): Rename to...
(empty_range_varying): ...this and adjust for varying.
(undefined_shift_range_check): Adjust for irange.
(range_operator::wi_fold): Same.
(range_operator::fold_range): Adjust for irange. Special case
single pairs for performance.
(range_operator::op1_range): Adjust for irange.
(range_operator::op2_range): Same.
(value_range_from_overflowed_bounds): Same.
(value_range_with_overflow): Same.
(create_possibly_reversed_range): Same.
(range_true): Same.
(range_false): Same.
(range_true_and_false): Same.
(get_bool_state): Adjust for irange and tweak for performance.
(operator_equal::fold_range): Adjust for irange.
(operator_equal::op1_range): Same.
(operator_equal::op2_range): Same.
(operator_not_equal::fold_range): Same.
(operator_not_equal::op1_range): Same.
(operator_not_equal::op2_range): Same.
(build_lt): Same.
(build_le): Same.
(build_gt): Same.
(build_ge): Same.
(operator_lt::fold_range): Same.
(operator_lt::op1_range): Same.
(operator_lt::op2_range): Same.
(operator_le::fold_range): Same.
(operator_le::op1_range): Same.
(operator_le::op2_range): Same.
(operator_gt::fold_range): Same.
(operator_gt::op1_range): Same.
(operator_gt::op2_range): Same.
(operator_ge::fold_range): Same.
(operator_ge::op1_range): Same.
(operator_ge::op2_range): Same.
(operator_plus::wi_fold): Same.
(operator_plus::op1_range): Same.
(operator_plus::op2_range): Same.
(operator_minus::wi_fold): Same.
(operator_minus::op1_range): Same.
(operator_minus::op2_range): Same.
(operator_min::wi_fold): Same.
(operator_max::wi_fold): Same.
(cross_product_operator::wi_cross_product): Same.
(operator_mult::op1_range): New.
(operator_mult::op2_range): New.
(operator_mult::wi_fold): Adjust for irange.
(operator_div::wi_fold): Same.
(operator_exact_divide::op1_range): Same.
(operator_lshift::fold_range): Same.
(operator_lshift::wi_fold): Same.
(operator_lshift::op1_range): New.
(operator_rshift::op1_range): New.
(operator_rshift::fold_range): Adjust for irange.
(operator_rshift::wi_fold): Same.
(operator_cast::truncating_cast_p): Abstract out from
operator_cast::fold_range.
(operator_cast::fold_range): Adjust for irange and tweak for
performance.
(operator_cast::inside_domain_p): Abstract out from fold_range.
(operator_cast::fold_pair): Same.
(operator_cast::op1_range): Use abstracted methods above. Adjust
for irange and tweak for performance.
(operator_logical_and::fold_range): Adjust for irange.
(operator_logical_and::op1_range): Same.
(operator_logical_and::op2_range): Same.
(unsigned_singleton_p): New.
(operator_bitwise_and::remove_impossible_ranges): New.
(operator_bitwise_and::fold_range): New.
(wi_optimize_and_or): Adjust for irange.
(operator_bitwise_and::wi_fold): Same.
(set_nonzero_range_from_mask): New.
(operator_bitwise_and::simple_op1_range_solver): New.
(operator_bitwise_and::op1_range): Adjust for irange.
(operator_bitwise_and::op2_range): Same.
(operator_logical_or::fold_range): Same.
(operator_logical_or::op1_range): Same.
(operator_logical_or::op2_range): Same.
(operator_bitwise_or::wi_fold): Same.
(operator_bitwise_or::op1_range): Same.
(operator_bitwise_or::op2_range): Same.
(operator_bitwise_xor::wi_fold): Same.
(operator_bitwise_xor::op1_range): New.
(operator_bitwise_xor::op2_range): New.
(operator_trunc_mod::wi_fold): Adjust for irange.
(operator_logical_not::fold_range): Same.
(operator_logical_not::op1_range): Same.
(operator_bitwise_not::fold_range): Same.
(operator_bitwise_not::op1_range): Same.
(operator_cst::fold_range): Same.
(operator_identity::fold_range): Same.
(operator_identity::op1_range): Same.
(class operator_unknown): New.
(operator_unknown::fold_range): New.
(class operator_abs): Adjust for irange.
(operator_abs::wi_fold): Same.
(operator_abs::op1_range): Same.
(operator_absu::wi_fold): Same.
(class operator_negate): Same.
(operator_negate::fold_range): Same.
(operator_negate::op1_range): Same.
(operator_addr_expr::fold_range): Same.
(operator_addr_expr::op1_range): Same.
(pointer_plus_operator::wi_fold): Same.
(pointer_min_max_operator::wi_fold): Same.
(pointer_and_operator::wi_fold): Same.
(pointer_or_operator::op1_range): New.
(pointer_or_operator::op2_range): New.
(pointer_or_operator::wi_fold): Adjust for irange.
(integral_table::integral_table): Add entries for IMAGPART_EXPR
and POINTER_DIFF_EXPR.
(range_cast): Adjust for irange.
(build_range3): New.
(range3_tests): New.
(widest_irange_tests): New.
(multi_precision_range_tests): New.
(operator_tests): New.
(range_tests): New.
* range-op.h (class range_operator): Adjust for irange.
(range_cast): Same.
* tree-vrp.c (range_fold_binary_symbolics_p): Adjust for irange and
tweak for performance.
(range_fold_binary_expr): Same.
(masked_increment): Change to extern.
* tree-vrp.h (masked_increment): New.
* tree.c (cache_wide_int_in_type_cache): New function abstracted
out from wide_int_to_tree_1.
(wide_int_to_tree_1): Cache 0, 1, and MAX for pointers.
* value-range-equiv.cc (value_range_equiv::deep_copy): Use kind
method.
(value_range_equiv::move): Same.
(value_range_equiv::check): Adjust for irange.
(value_range_equiv::intersect): Same.
(value_range_equiv::union_): Same.
(value_range_equiv::dump): Same.
* value-range.cc (irange::operator=): Same.
(irange::maybe_anti_range): New.
(irange::copy_legacy_range): New.
(irange::set_undefined): Adjust for irange.
(irange::swap_out_of_order_endpoints): Abstract out from set().
(irange::set_varying): Adjust for irange.
(irange::irange_set): New.
(irange::irange_set_anti_range): New.
(irange::set): Adjust for irange.
(value_range::set_nonzero): Move to header file.
(value_range::set_zero): Move to header file.
(value_range::check): Rename to...
(irange::verify_range): ...this.
(value_range::num_pairs): Rename to...
(irange::legacy_num_pairs): ...this, and adjust for irange.
(value_range::lower_bound): Rename to...
(irange::legacy_lower_bound): ...this, and adjust for irange.
(value_range::upper_bound): Rename to...
(irange::legacy_upper_bound): ...this, and adjust for irange.
(value_range::equal_p): Rename to...
(irange::legacy_equal_p): ...this.
(value_range::operator==): Move to header file.
(irange::equal_p): New.
(irange::symbolic_p): Adjust for irange.
(irange::constant_p): Same.
(irange::singleton_p): Same.
(irange::value_inside_range): Same.
(irange::may_contain_p): Same.
(irange::contains_p): Same.
(irange::normalize_addresses): Same.
(irange::normalize_symbolics): Same.
(irange::legacy_intersect): Same.
(irange::legacy_union): Same.
(irange::union_): Same.
(irange::intersect): Same.
(irange::irange_union): New.
(irange::irange_intersect): New.
(subtract_one): New.
(irange::invert): Adjust for irange.
(dump_bound_with_infinite_markers): New.
(irange::dump): Adjust for irange.
(debug): Add irange versions.
(range_has_numeric_bounds_p): Adjust for irange.
(vrp_val_max): Move to header file.
(vrp_val_min): Move to header file.
(DEFINE_INT_RANGE_GC_STUBS): New.
(DEFINE_INT_RANGE_INSTANCE): New.
* value-range.h (class irange): New.
(class int_range): New.
(class value_range): Rename to a instantiation of int_range.
(irange::legacy_mode_p): New.
(value_range::value_range): Remove.
(irange::kind): New.
(irange::num_pairs): Adjust for irange.
(irange::type): Adjust for irange.
(irange::tree_lower_bound): New.
(irange::tree_upper_bound): New.
(irange::type): Adjust for irange.
(irange::min): Same.
(irange::max): Same.
(irange::varying_p): Same.
(irange::undefined_p): Same.
(irange::zero_p): Same.
(irange::nonzero_p): Same.
(irange::supports_type_p): Same.
(range_includes_zero_p): Same.
(gt_ggc_mx): New.
(gt_pch_nx): New.
(irange::irange): New.
(int_range::int_range): New.
(int_range::operator=): New.
(irange::set): Moved from value-range.cc and adjusted for irange.
(irange::set_undefined): Same.
(irange::set_varying): Same.
(irange::operator==): Same.
(irange::lower_bound): Same.
(irange::upper_bound): Same.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
(irange::normalize_min_max): New.
(vrp_val_max): Move from value-range.cc.
(vrp_val_min): Same.
* vr-values.c (vr_values::get_lattice_entry): Call value_range
constructor.
to more complex partial bit population algorithm. Due to presence
of uninitialized bits gcc started injecting extra debug entries
in seemigly arbitrary locations and started failing stage2/stage3
bootstrap comparison.
valgrind detected unilitialized bits as:
Conditional jump or move depends on uninitialised value(s)
at 0xDBED3B: vt_find_locations() (var-tracking.c:7230)
by 0xDBF2FB: variable_tracking_main_1() (var-tracking.c:10519)
...
Uninitialised value was created by a heap allocation
at 0x483779F: malloc (vg_replace_malloc.c:307)
by 0x14EE80B: xmalloc (xmalloc.c:147)
by 0x14911F9: sbitmap_alloc(unsigned int) (sbitmap.c:51)
...
The fix explicitly initializes 'in_pending' bitmap with zeros.
2020-08-02 Sergei Trofimovich <siarheit@google.com>
gcc/
PR bootstrap/96404
* var-tracking.c (vt_find_locations): Fully initialize
all 'in_pending' bits.
Paul Thomas [Sun, 2 Aug 2020 09:57:59 +0000 (10:57 +0100)]
This patch fixes PR96320. See the explanatory comment in the testcase.
2020-08-01 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR target/96320
* interface.c (gfc_check_dummy_characteristics): If a module
procedure arrives with assumed shape in the interface and
deferred shape in the procedure itself, update the latter and
copy the lower bounds.
gcc/testsuite/
PR target/96320
* gfortran.dg/module_procedure_4.f90 : New test.
Paul Thomas [Sun, 2 Aug 2020 09:35:36 +0000 (10:35 +0100)]
This patch fixes PR96325. See the explanatory comment in the testcase.
2020-08-02 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/96325
* primary.c (gfc_match_varspec): In the case that a component
reference is added to an intrinsic type component, emit the
error message in this function.
One of the problems in this PR was that if we had:
vector_type1 array[] = { vector_value1 };
process_init_element would only treat vector_value1 as initialising
a vector_type1 if they had the same TYPE_MAIN_VARIANT. This has
several problems:
(1) It gives confusing error messages if the vector types are
incompatible. (Tested by gcc.dg/pr96377-1.c.)
(2) It means that we reject code that should be valid with
-flax-vector-conversions. (Tested by gcc.dg/pr96377-2.c.)
(3) On arm and aarch64 targets, it means that we reject some
initializers that mix Advanced SIMD and standard GNU vectors.
These vectors have traditionally had different TYPE_MAIN_VARIANTs
because they have different mangling schemes. (Tested by
gcc.dg/pr96377-[3-6].c.)
(4) It means that we reject SVE initializers that should be valid.
(Tested by gcc.target/aarch64/sve/gnu_vectors_[34].c.)
because applying the binary operator to arm_neon_value1 strips
the "Advanced SIMD type" attributes that were added in that patch.
Stripping the attributes is problematic for other reasons though,
so that still needs to be fixed separately.
gcc/c/
PR c/96377
* c-typeck.c (process_init_element): Split test for whether to
recurse into a record, union or array into...
(initialize_elementwise_p): ...this new function. Don't recurse
into a vector type if the initialization value is also a vector.
This test fails for mmix for (almost) the same reason it would fail
for e.g. mipsel-elf: the end-condition of the loop tests against a
register set to a constant, and that register is (one of) the
"unexpected IV" moved out of the loop "without introducing a new
temporary register" and making the dump contain more than one
"Decided", causing a non-matching loop2 dump. The test should
probably have been restricted to just the original target for which a
problem was observed to be fixed.
RISC-V/libgcc: Reduce the size of RV64 millicode by 6 bytes
Rewrite code sequences throughout the 64-bit RISC-V `__riscv_save_*'
routines replacing `li t1, -48', `li t1, -64', and `li t1, -80',
instructions, which do not have a compressed encoding, respectively with
`li t1, 3', `li t1, 4', and `li t1, 4', which do, and then adjusting the
remaining code accordingly observing that `sub sp, sp, t1' takes the
same amount of space as an `slli t1, t1, 4'/`add sp, sp, t1' instruction
pair does, again due to the use of compressed encodings, saving 6 bytes
total.
This change does increase code size by 4 bytes for RISC-V processors
lacking the compressed instruction set, however their users couldn't
care about the code size or they would have chosen an implementation
that does have the compressed instructions, wouldn't they?
libgcc/
* config/riscv/save-restore.S [__riscv_xlen == 64]
(__riscv_save_10, __riscv_save_8, __riscv_save_6, __riscv_save_4)
(__riscv_save_2): Replace negative immediates used for the final
stack pointer adjustment with positive ones, right-shifted by 4.
François Dumont [Tue, 21 Jan 2020 06:18:08 +0000 (07:18 +0100)]
libstdc++: Fix and improve std::vector<bool> implementation.
Do not consider allocator noexcept qualification for vector<bool> move
constructor.
Improve swap performance using TBAA like in main vector implementation. Bypass
_M_initialize_dispatch/_M_assign_dispatch in post-c++11 modes.
libstdc++-v3/ChangeLog:
* include/bits/stl_bvector.h
[_GLIBCXX_INLINE_VERSION](_Bvector_impl_data::_M_start): Define as
_Bit_type*.
(_Bvector_impl_data(const _Bvector_impl_data&)): Default.
(_Bvector_impl_data(_Bvector_impl_data&&)): Delegate to latter.
(_Bvector_impl_data::operator=(const _Bvector_impl_data&)): Default.
(_Bvector_impl_data::_M_move_data(_Bvector_impl_data&&)): Use latter.
(_Bvector_impl_data::_M_reset()): Likewise.
(_Bvector_impl_data::_M_swap_data): New.
(_Bvector_impl::_Bvector_impl(_Bvector_impl&&)): Implement explicitely.
(_Bvector_impl::_Bvector_impl(_Bit_alloc_type&&, _Bvector_impl&&)): New.
(_Bvector_base::_Bvector_base(_Bvector_base&&, const allocator_type&)):
New, use latter.
(vector::vector(vector&&, const allocator_type&, true_type)): New, use
latter.
(vector::vector(vector&&, const allocator_type&, false_type)): New.
(vector::vector(vector&&, const allocator_type&)): Use latters.
(vector::vector(const vector&, const allocator_type&)): Adapt.
[__cplusplus >= 201103](vector::vector(_InputIt, _InputIt,
const allocator_type&)): Use _M_initialize_range.
(vector::operator[](size_type)): Use iterator operator[].
(vector::operator[](size_type) const): Use const_iterator operator[].
(vector::swap(vector&)): Add assertions on allocators. Use _M_swap_data.
[__cplusplus >= 201103](vector::insert(const_iterator, _InputIt,
_InputIt)): Use _M_insert_range.
(vector::_M_initialize(size_type)): Adapt.
[__cplusplus >= 201103](vector::_M_initialize_dispatch): Remove.
[__cplusplus >= 201103](vector::_M_insert_dispatch): Remove.
* python/libstdcxx/v6/printers.py (StdVectorPrinter._iterator): Stop
using start _M_offset.
(StdVectorPrinter.to_string): Likewise.
* testsuite/23_containers/vector/bool/allocator/swap.cc: Adapt.
* testsuite/23_containers/vector/bool/cons/noexcept_move_construct.cc:
Add check.
Jakub Jelinek [Fri, 31 Jul 2020 21:08:00 +0000 (23:08 +0200)]
c++: Use error_at rather than warning_at for missing return in constexpr functions [PR96182]
For C++11 we already emit an error if a constexpr function doesn't contain
a return statement, because in C++11 that is the only thing it needs to
contain, but for C++14 we would normally issue a -Wreturn-type warning.
As mentioned by Jonathan, such constexpr functions are invalid, no
diagnostics required, because there doesn't exist any arguments for
which it would result in valid constant expression.
This raises it to an error in such cases. The !LAMBDA_TYPE_P case
is to avoid error on g++.dg/pr81194.C where the user didn't write
constexpr anywhere and the operator() is compiler generated.
2020-07-31 Jakub Jelinek <jakub@redhat.com>
PR c++/96182
* decl.c (finish_function): In constexpr functions use for C++14 and
later error instead of warning if no return statement is present and
diagnose it regardless of warn_return_type. Move the warn_return_type
diagnostics earlier in the function.
Jonathan Wakely [Fri, 31 Jul 2020 18:58:03 +0000 (19:58 +0100)]
libstdc++: Avoid using __float128 in strict modes
libstdc++-v3/ChangeLog:
* testsuite/26_numerics/numbers/float128.cc: Check
__STRICT_ANSI__ before using __float128.
* testsuite/std/concepts/concepts.lang/concept.arithmetic/floating_point.cc:
Likewise.
Jonathan Wakely [Fri, 31 Jul 2020 18:58:02 +0000 (19:58 +0100)]
libstdc++: Add dg-require-effective-target to std::span assert tests
The current dg directives say that the tests can run for any standard
mode, but should fail for C++20. What we want is that they only run for
C++20, and are always expected to fail.
Jonathan Wakely [Fri, 31 Jul 2020 18:58:02 +0000 (19:58 +0100)]
libstdc++: Use c++NN_only effective target to tests
Some tests really are only intended for a specific -std mode, so add a
target selector to make that explicit.
Also reorder the dg-do directives to come after the dg-options ones, so
that the target selector in the dg-do directive is applied after the
dg-options that sets the -std option.
libstdc++-v3/ChangeLog:
* testsuite/20_util/reference_wrapper/83427.cc: Adjust
effective-target to specific language mode only.
* testsuite/24_iterators/headers/iterator/range_access_c++11.cc:
Likewise.
* testsuite/24_iterators/headers/iterator/range_access_c++14.cc:
Likewise.
* testsuite/24_iterators/headers/iterator/synopsis_c++11.cc:
Likewise.
* testsuite/24_iterators/headers/iterator/synopsis_c++14.cc:
Likewise.
* testsuite/26_numerics/valarray/69116.cc:
Likewise.
* testsuite/30_threads/headers/condition_variable/std_c++0x_neg.cc:
Remove whitespace at end of file.
* testsuite/30_threads/headers/future/std_c++0x_neg.cc:
Likewise.
Roger Sayle [Thu, 30 Jul 2020 09:42:06 +0000 (11:42 +0200)]
nvptx: Define TARGET_TRULY_NOOP_TRUNCATION to false
Many thanks to Richard Biener for approving the midde-end
patch that cleared the way for this one. This nvptx patch
defines the target hook TARGET_TRULY_NOOP_TRUNCATION to
false, indicating that integer truncations require explicit
instructions. nvptx.c already defines TARGET_MODES_TIEABLE_P
and TARGET_CAN_CHANGE_MODE_CLASS to false, and as (previously)
documented that may require TARGET_TRULY_NOOP_TRUNCATION to
be defined likewise.
This patch decreases the number of unexpected failures in
the testsuite by 10, and increases the number of expected
passes by 4, including these previous FAILs/ICEs:
gcc.c-torture/compile/opout.c
gcc.dg/torture/pr79125.c
gcc.dg/tree-ssa/pr92085-1.c
Unfortunately there is one testsuite failure that used to
pass gcc.target/nvptx/v2si-cvt.c, but this isn't an ICE or
incorrect code. This regression has been filed as PR96403,
and the failing scan-assembler directives have been replaced
by a reference to the PR.
This patch has been tested on nvptx-none hosted on
x86_64-pc-linux-gnu with "make" and "make check" with
fewer ICEs and no wrong code regressions.
2020-07-31 Roger Sayle <roger@nextmovesoftware.com>
Tom de Vries <tdevries@suse.de>
Jonathan Wakely [Fri, 31 Jul 2020 16:51:00 +0000 (17:51 +0100)]
libstdc++: Adjust tests that give different results in C++20
libstdc++-v3/ChangeLog:
* testsuite/20_util/is_aggregate/value.cc: Adjust for changes to
definition of aggregates in C++20.
* testsuite/20_util/optional/requirements.cc: Adjust for
defaulted comparisons in C++20.
Jonathan Wakely [Fri, 31 Jul 2020 16:51:00 +0000 (17:51 +0100)]
libstdc++: Add -Wno-deprecated for tests that warn in C++20
libstdc++-v3/ChangeLog:
* testsuite/20_util/tuple/78939.cc: Suppress warnings about
deprecation of volatile-qualified structured bindings in C++20.
* testsuite/20_util/variable_templates_for_traits.cc: Likewise
for deprecation of is_pod in C++20
d: Split up the grouped compilable and runnable tests.
The majority of tests in runnable are really compilable/ICE tests, and
have have dg-do adjusted where necessary. Tests that had a dependency
on Phobos have also been reproduced and reduced with all imports
stripped from the test.
The end result is a collection of tests that only check the compiler bug
that was being fixed, rather than the library, and a reduction in time
spent running all tests.
gcc/testsuite/ChangeLog:
* gdc.dg/compilable.d: Removed.
* gdc.dg/gdc108.d: New test.
* gdc.dg/gdc115.d: New test.
* gdc.dg/gdc121.d: New test.
* gdc.dg/gdc122.d: New test.
* gdc.dg/gdc127.d: New test.
* gdc.dg/gdc131.d: New test.
* gdc.dg/gdc133.d: New test.
* gdc.dg/gdc141.d: New test.
* gdc.dg/gdc142.d: New test.
* gdc.dg/gdc15.d: New test.
* gdc.dg/gdc17.d: New test.
* gdc.dg/gdc170.d: New test.
* gdc.dg/gdc171.d: New test.
* gdc.dg/gdc179.d: New test.
* gdc.dg/gdc183.d: New test.
* gdc.dg/gdc186.d: New test.
* gdc.dg/gdc187.d: New test.
* gdc.dg/gdc19.d: New test.
* gdc.dg/gdc191.d: New test.
* gdc.dg/gdc194.d: New test.
* gdc.dg/gdc196.d: New test.
* gdc.dg/gdc198.d: New test.
* gdc.dg/gdc200.d: New test.
* gdc.dg/gdc204.d: New test.
* gdc.dg/gdc210.d: New test.
* gdc.dg/gdc212.d: New test.
* gdc.dg/gdc213.d: New test.
* gdc.dg/gdc218.d: New test.
* gdc.dg/gdc223.d: New test.
* gdc.dg/gdc231.d: New test.
* gdc.dg/gdc239.d: New test.
* gdc.dg/gdc24.d: New test.
* gdc.dg/gdc240.d: New test.
* gdc.dg/gdc241.d: New test.
* gdc.dg/gdc242a.d: New test.
* gdc.dg/gdc242b.d: New test.
* gdc.dg/gdc248.d: New test.
* gdc.dg/gdc250.d: New test.
* gdc.dg/gdc251.d: New test.
* gdc.dg/gdc253a.d: New test.
* gdc.dg/gdc253b.d: New test.
* gdc.dg/gdc255.d: New test.
* gdc.dg/gdc256.d: New test.
* gdc.dg/gdc261.d: New test.
* gdc.dg/gdc27.d: New test.
* gdc.dg/gdc273.d: New test.
* gdc.dg/gdc280.d: New test.
* gdc.dg/gdc284.d: New test.
* gdc.dg/gdc285.d: New test.
* gdc.dg/gdc286.d: New test.
* gdc.dg/gdc300.d: New test.
* gdc.dg/gdc309.d: New test.
* gdc.dg/gdc31.d: New test.
* gdc.dg/gdc35.d: New test.
* gdc.dg/gdc36.d: New test.
* gdc.dg/gdc37.d: New test.
* gdc.dg/gdc4.d: New test.
* gdc.dg/gdc43.d: New test.
* gdc.dg/gdc47.d: New test.
* gdc.dg/gdc51.d: New test.
* gdc.dg/gdc57.d: New test.
* gdc.dg/gdc66.d: New test.
* gdc.dg/gdc67.d: New test.
* gdc.dg/gdc71.d: New test.
* gdc.dg/gdc77.d: New test.
* gdc.dg/imports/gdc239.d: Remove phobos dependency.
* gdc.dg/imports/gdc241a.d: Updated imports.
* gdc.dg/imports/gdc241b.d: Likewise.
* gdc.dg/imports/gdc251a.d: Likewise.
* gdc.dg/imports/gdc253.d: Rename to...
* gdc.dg/imports/gdc253a.d: ...this.
* gdc.dg/imports/gdc253b.d: New.
* gdc.dg/imports/gdc36.d: New.
* gdc.dg/imports/runnable.d: Removed.
* gdc.dg/link.d: Removed.
* gdc.dg/runnable.d: Removed.
* gdc.dg/runnable2.d: Removed.
* gdc.dg/simd.d: Remove phobos dependency.
For 32-bit btr(), BIT_NOT_EXPR was being generated twice, something that
was not seen with the 64-bit variant. Removed the second call to fix
the generated code.
Richard Biener [Thu, 30 Jul 2020 09:46:43 +0000 (11:46 +0200)]
debug/96383 - emit debug info for used external functions
This makes sure to emit full declaration DIEs including
formal parameters for used external functions. This helps
debugging when debug information of the external entity is
not available and also helps external tools cross-checking
ABI compatibility which was the bug reporters use case.
For cc1 this affects debug information size as follows:
and DWARF compression via DWZ can only shave off minor bits
here.
Previously we emitted no DIEs for external functions at all
unless they were referenced via DW_TAG_GNU_call_site which
for some GCC revs caused a regular DIE to appear and since
GCC 4.9 only a stub without formal parameters. This means
at -O0 we did not emit any DIE for external functions
but with optimization we emitted stubs.
2020-07-30 Richard Biener <rguenther@suse.de>
PR debug/96383
* langhooks-def.h (lhd_finalize_early_debug): Declare.
(LANG_HOOKS_FINALIZE_EARLY_DEBUG): Define.
(LANG_HOOKS_INITIALIZER): Amend.
* langhooks.c: Include cgraph.h and debug.h.
(lhd_finalize_early_debug): Default implementation from
former code in finalize_compilation_unit.
* langhooks.h (lang_hooks::finalize_early_debug): Add.
* cgraphunit.c (symbol_table::finalize_compilation_unit):
Call the finalize_early_debug langhook.
gcc/c-family/
* c-common.h (c_common_finalize_early_debug): Declare.
* c-common.c: Include debug.h.
(c_common_finalize_early_debug): finalize_early_debug langhook
implementation generating debug for extern declarations.
gcc/c/
* c-objc-common.h (LANG_HOOKS_FINALIZE_EARLY_DEBUG):
Define to c_common_finalize_early_debug.
gcc/cp/
* cp-objcp-common.h (LANG_HOOKS_FINALIZE_EARLY_DEBUG):
Define to c_common_finalize_early_debug.
gcc/testsuite/
* gcc.dg/debug/dwarf2/pr96383-1.c: New testcase.
* gcc.dg/debug/dwarf2/pr96383-2.c: Likewise.
libstdc++-v3/
* testsuite/20_util/assume_aligned/3.cc: Use -g0.
to make the simplification only apply in case both plus operations
in the result end up simplified to a simple operand.
2020-07-31 Richard Biener <rguenther@suse.de>
* genmatch.c (expr::force_leaf): Add and initialize.
(expr::gen_transform): Honor force_leaf by passing
NULL as sequence argument to maybe_push_res_to_seq.
(parser::parse_expr): Allow ! marker on result expression
operations.
* doc/match-and-simplify.texi: Amend.
Kewen Lin [Fri, 31 Jul 2020 12:49:39 +0000 (07:49 -0500)]
vect: Don't consider branch costs if no peeled iterations
Currently vectorizer cost modeling counts branch taken costs for
prologue and epilogue if the number of iterations is unknown.
But it isn't sensible if there are no peeled iterations.
This patch is to guard them under peel_iters_prologue > 0 or
peel_iters_epilogue > 0.
Bootstrapped/regtested on powerpc64le-linux-gnu and aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch
taken costs for prologue and epilogue if they don't exist.
(vect_estimate_min_profitable_iters): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model
change.
Richard Biener [Fri, 31 Jul 2020 06:41:56 +0000 (08:41 +0200)]
middle-end/96369 - fix missed short-circuiting during range folding
This makes the special case of constant evaluated LHS for a
short-circuiting or/and explicit rather than doing range
merging and eventually exposing a side-effect that shouldn't be
evaluated.
Richard Biener [Fri, 24 Jul 2020 11:44:09 +0000 (13:44 +0200)]
Improve var-tracking dataflow iteration order
This builds upon the rev_post_order_and_mark_dfs_back_seme improvements
and makes vt_find_locations iterate over the dataflow problems for
each toplevel SCC separately, improving memory locality and avoiding
to process nodes after the SCC before the SCC itself stabilized.
On the asan_interceptors.cc testcase this for example reduces the
number of visited blocks from 3751 to 2867. For stage3-gcc
this reduces the number of visited blocks by ~4%.
2020-07-28 Richard Biener <rguenther@suse.de>
PR debug/78288
* var-tracking.c (vt_find_locations): Use
rev_post_order_and_mark_dfs_back_seme and separately iterate
over toplevel SCCs.
Richard Biener [Mon, 20 Jul 2020 11:38:16 +0000 (13:38 +0200)]
Compute RPO with adjacent SCC members, expose toplevel SCC extents
This produces a more optimal RPO order for iteration processing
by making sure that SCC exits are processed before SCC members
themselves.. This avoids iterating blocks unrelated to the current
iteration for RPO VN and has the chance to improve code-generation
for the non-iterative mode of RPO VN. The patch also exposes toplevel
SCCs and gets rid of the ad-hoc max_rpo computation in RPO VN.
For simplicity it also removes the odd reverse ordering of the RPO
array returned from rev_post_order_and_mark_dfs_back_seme.
Overall reduction in the number of visited blocks isn't spectacular
for bootstrap (~2.5%) but single cases see up to a 10% reduction.
The same function can be used to optimize var-tracking iteration order
as seen in the followup.
2020-07-28 Richard Biener <rguenther@suse.de>
* cfganal.h (rev_post_order_and_mark_dfs_back_seme): Adjust
prototype.
* cfganal.c (rpoamdbs_bb_data): New struct with pre BB data.
(tag_header): New helper.
(cmp_edge_dest_pre): Likewise.
(rev_post_order_and_mark_dfs_back_seme): Compute SCCs,
find SCC exits and perform a DFS walk with extra edges to
compute a RPO with adjacent SCC members when requesting an
iteration optimized order and populate the toplevel SCC array.
* tree-ssa-sccvn.c (do_rpo_vn): Remove ad-hoc computation
of max_rpo and fill it in from SCC extent info instead.
Patrick Palka [Fri, 31 Jul 2020 02:21:41 +0000 (22:21 -0400)]
c++: decl_constant_value and unsharing [PR96197]
In the testcase from the PR we're seeing excessive memory use (> 5GB)
during constexpr evaluation, almost all of which is due to the call to
decl_constant_value in the VAR_DECL/CONST_DECL branch of
cxx_eval_constant_expression. We reach here every time we evaluate an
ARRAY_REF of a constexpr VAR_DECL, and from there decl_constant_value
makes an unshared copy of the VAR_DECL's initializer. But unsharing
here is unnecessary because callers of cxx_eval_constant_expression
already unshare its result when necessary.
To fix this excessive unsharing, this patch adds a new defaulted
parameter unshare_p to decl_really_constant_value and
decl_constant_value so that callers can control whether to unshare.
As a simplification, we can also move the call to unshare_expr in
constant_value_1 outside of the loop, since doing unshare_expr on a
DECL_P is a no-op.
Now that we no longer unshare the result of decl_constant_value and
decl_really_constant_value from cxx_eval_constant_expression, memory use
during constexpr evaluation for the testcase from the PR falls from ~5GB
to 15MB according to -ftime-report.
gcc/cp/ChangeLog:
PR c++/96197
* constexpr.c (cxx_eval_constant_expression) <case CONST_DECL>:
Pass false to decl_constant_value and decl_really_constant_value
so that they don't unshare their result.
* cp-tree.h (decl_constant_value): New declaration with an added
bool parameter.
(decl_really_constant_value): Add bool parameter defaulting to
true to existing declaration.
* init.c (constant_value_1): Add bool parameter which controls
whether to unshare the initializer before returning. Call
unshare_expr at most once.
(scalar_constant_value): Pass true to constant_value_1's new
bool parameter.
(decl_really_constant_value): Add bool parameter and forward it
to constant_value_1.
(decl_constant_value): Likewise, but instead define a new
overload with an added bool parameter.
gcc/testsuite/ChangeLog:
PR c++/96197
* g++.dg/cpp1y/constexpr-array8.C: New test.
d: Fix associative array literals that don't have alignment holes filled
Associative array literal keys with alignment holes are now filled using
memset() prior to usage, with LTR evaluation of side-effects enforced.
gcc/d/ChangeLog:
PR d/96152
* d-codegen.cc (build_array_from_exprs): New function.
* d-tree.h (build_array_from_exprs): Declare.
* expr.cc (ExprVisitor::visit (AssocArrayLiteralExp *)): Use
build_array_from_exprs to generate key and value arrays.
Both intrinsics did not handle the case where the va_list object comes
from a ref parameter.
gcc/d/ChangeLog:
PR d/96140
* intrinsics.cc (expand_intrinsic_vaarg): Handle ref parameters as
arguments to va_arg().
(expand_intrinsic_vastart): Handle ref parameters as arguments to
va_start().
Jonathan Wakely [Thu, 30 Jul 2020 19:58:09 +0000 (20:58 +0100)]
libstdc++: Make COW string use allocator_traits for nested types
When compiled as C++20 the COW std::string fails due to assuming that
the allocator always defines size_type and difference_type. That has
been incorrect since C++11, but we got away with it for specializations
using std::allocator until those members were removed in C++20.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h (size_type, difference_type):
Use allocator_traits to obtain the allocator's size_type and
difference_type.
Jonathan Wakely [Thu, 30 Jul 2020 19:55:56 +0000 (20:55 +0100)]
libstdc++: Check _GLIBCXX_USE_C99_STDLIB for strtof and strtold
On broken systems we only have strtod, not strtof and strtold. Just use
strtod for all types, even though that will produce incorrect results in
some cases.
Similarly, if _GLIBCXX_USE_C99_MATH is not defined then std::isinf won't
be declared. Just refer to it unqualified, which should find the C
library's isinf macro if that hasn't been #undef'd by <cmath>.
libstdc++-v3/ChangeLog:
* src/c++17/floating_from_chars.cc (from_chars_impl): Use
isinf unqualified.
[!_GLIBCXX_USE_C99_STDLIB]: Use strtod for float and long
double.
Jonathan Wakely [Thu, 30 Jul 2020 17:41:47 +0000 (18:41 +0100)]
libstdc++: Fix tests using wrong allocator type
libstdc++-v3/ChangeLog:
* testsuite/23_containers/unordered_multiset/cons/noexcept_default_construct.cc:
Use allocator with the correct value type.
* testsuite/23_containers/unordered_set/cons/noexcept_default_construct.cc:
Likewise.
d: Inline bounds checking for simple array assignments.
This optimizes the code generation of simple array assignments, inlining
the array bounds checking code so there is no reliance on the library
routine _d_arraycopy(), which also deals with postblit and copy
constructors for non-trivial arrays.
d: Refactor use of built-in memcmp/memcpy/memset into helper functions.
Generating calls to memset, memcpy, and memcmp is frequent enough that
it becomes beneficial to put them into their own routine. All parts of
the front-end have been updated to call the new helper functions instead
of doing it themselves.
d: Move private functions out of ExprVisitor into local statics
None of these functions need access to the context pointer of the
visitor class, so have been made free standing.
gcc/d/ChangeLog:
* expr.cc (needs_postblit): Move out of ExprVisitor as a static
function. Update all callers.
(needs_dtor): Likewise.
(lvalue_p): Likewise.
(binary_op): Likewise.
(binop_assignment): Likewise.
H.J. Lu [Wed, 15 Jul 2020 13:16:01 +0000 (06:16 -0700)]
Require CET support only for the final GCC build
With --enable-cet, require CET support only for the final GCC build.
Don't enable CET without CET support for non-bootstrap build, in stage1
nor for build support.
config/
PR bootstrap/96202
* cet.m4 (GCC_CET_HOST_FLAGS): Don't enable CET without CET
support in stage1 nor for build support.
Jonathan Wakely [Thu, 30 Jul 2020 11:23:55 +0000 (12:23 +0100)]
libstdc++: cv bool can't be an integer-like type (LWG 3467)
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h (__detail::__cv_bool): New
helper concept.
(__detail::__integral_nonbool): Likewise.
(__detail::__is_integer_like): Use __integral_nonbool.
* testsuite/std/ranges/access/lwg3467.cc: New test.
Jonathan Wakely [Thu, 30 Jul 2020 11:23:54 +0000 (12:23 +0100)]
libstdc++: Make testsuite usable with -fno-exceptions
Previously it was not possible to add -fno-exceptions to the testsuite
flags, because some files that are compiled by the v3-build_support
procedure failed with exceptions disabled.
This adjusts those files to still compile without exceptions (with
degraded functionality in some cases).
The sole testcase that explicitly checks for -fno-exceptions has also
been adjusted to use the more robust exceptions_enabled effective-target
keyword from gcc/testsuite/lib/target-supports.exp.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/vector/bool/72847.cc: Use the
exceptions_enabled effective-target keyword instead of
checking for an explicit -fno-exceptions option.
* testsuite/util/testsuite_abi.cc (examine_symbol): Remove
redundant try-catch.
* testsuite/util/testsuite_allocator.h [!__cpp_exceptions]:
Do not define check_allocate_max_size and memory_resource.
* testsuite/util/testsuite_containers.h: Replace comment with
#error if wrong standard dialect used.
* testsuite/util/testsuite_shared.cc: Likewise.
d: Refactor matching and lowering of intrinsic functions.
Intrinsics are now matched explicitly, rather than through a common
alias where there are multiple overrides for a common intrinsic.
Where there is a corresponding DECL_FUNCTION_CODE, that is now stored in
the D intrinsic array. All run-time std.math intrinsics have been
removed, as the library implementation already forwards to core.math.
gcc/d/ChangeLog:
* d-tree.h (DEF_D_INTRINSIC): Rename second argument from A to B.
* intrinsics.cc (intrinsic_decl): Add built_in field.
(DEF_D_INTRINSIC): Rename second argument from ALIAS to BUILTIN.
(maybe_set_intrinsic): Handle new intrinsic codes.
(expand_intrinsic_bt): Likewise.
(expand_intrinsic_checkedint): Likewise.
(expand_intrinsic_bswap): Remove.
(expand_intrinsic_sqrt): Remove.
(maybe_expand_intrinsic): Group together intrinsic cases that map
directly to gcc built-ins.
* intrinsics.def (DEF_D_BUILTIN): Rename second argument from A to B.
Update all callers to pass equivalent DECL_FUNCTION_CODE.
(DEF_CTFE_BUILTIN): Likewise.
(STD_COS): Remove intrinsic.
(STD_FABS): Remove intrinsic.
(STD_LDEXP): Remove intrinsic.
(STD_RINT): Remove intrinsic.
(STD_RNDTOL): Remove intrinsic.
(STD_SIN): Remove intrinsic.
(STD_SQRTF): Remove intrinsic.
(STD_SQRT): Remove intrinsic.
(STD_SQRTL): Remove intrinsic.
Richard Biener [Thu, 30 Jul 2020 08:24:42 +0000 (10:24 +0200)]
tree-optimization/96370 - make reassoc expr rewrite more robust
In the face of the more complex tricks in reassoc with respect
to negate processing it can happen that the expression rewrite
is fooled to recurse on a leaf and pick up a bogus expression
code. The following patch makes the expression rewrite more
robust in providing the expression code to it directly since
it is the same for all operations in a chain.
2020-07-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/96370
* tree-ssa-reassoc.c (rewrite_expr_tree): Add operation
code parameter and use it instead of picking it up from
the stmt that is being rewritten.
(reassociate_bb): Pass down the operation code.
Roger Sayle [Thu, 30 Jul 2020 08:23:38 +0000 (10:23 +0200)]
nvptx: Provide vec_set<mode> and vec_extract<vmode><mode> patterns
This patch provides standard vec_extract and vec_set patterns to the
nvptx backend, to extract an element from a PTX vector and set an
element of a PTX vector respectively. PTX vectors (I hesitate to
call them SIMD vectors) may contain up to four elements, so vector
modes up to size four are supported by this patch even though the
nvptx backend currently only allows V2SI and V2DI, i.e. two out
of the ten possible vector modes.
As an example of the improvement, the following C function:
typedef int __v2si __attribute__((__vector_size__(8)));
int foo (__v2si arg) { return arg[0]+arg[1]; }
I've implemented these getters and setters as their own instructions
instead of attempting the much more intrusive patch of changing the
backend's definition of register_operand. Given the limited utility
of PTX vectors, I'm not convinced that attempting to support them as
operands in every instruction would be worth the effort involved.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with "make" and "make check" with no new regressions.
2020-07-15 Roger Sayle <roger@nextmovesoftware.com>
Tom de Vries <tdevries@suse.de>
gcc/ChangeLog:
* config/nvptx/nvptx.md (nvptx_vector_index_operand): New predicate.
(VECELEM): New mode attribute for a vector's uppercase element mode.
(Vecelem): New mode attribute for a vector's lowercase element mode.
(*vec_set<mode>_0, *vec_set<mode>_1, *vec_set<mode>_2)
(*vec_set<mode>_3): New instructions.
(vec_set<mode>): New expander to generate one of the above insns.
(vec_extract<mode><Vecelem>): New instruction.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/v2si-vec-set-extract.c: New test.
Patrick Palka [Thu, 30 Jul 2020 02:06:44 +0000 (22:06 -0400)]
c++: overload sets and placeholder return type [PR64194]
In the testcase below, template argument deduction for the call
g(id<int>) goes wrong because the functions in the overload set id<int>
each have a yet-undeduced auto return type, and this undeduced return
type makes try_one_overload fail to match up any of the overloads with
g's parameter type, leading to g's template argument going undeduced and
to the overload set going unresolved.
This patch fixes this issue by performing return type deduction via
instantiation before doing try_one_overload, in a manner similar to what
resolve_address_of_overloaded_function does.
gcc/cp/ChangeLog:
PR c++/64194
* pt.c (resolve_overloaded_unification): If the function
template specialization has a placeholder return type,
then instantiate it before attempting unification.
gcc/testsuite/ChangeLog:
PR c++/64194
* g++.dg/cpp1y/auto-fn60.C: New test.
Patrick Palka [Thu, 30 Jul 2020 02:06:41 +0000 (22:06 -0400)]
c++: alias_ctad_tweaks and constrained dguide [PR95486]
In the below testcase, we're ICEing from alias_ctad_tweaks ultimately
because the implied deduction guide for X's user-defined constructor
already has constraints associated with it. We then carry over these
constraints to 'fprime', the overlying deduction guide for the alias
template Y, via tsubst_decl from alias_ctad_tweaks. Later in
alias_ctad_tweaks we call get_constraints followed by set_constraints
without doing remove_constraints in between, which triggers the !found
assert in set_constraints.
This patch fixes this issue by adding an intervening call to
remove_constraints.
gcc/cp/ChangeLog:
PR c++/95486
* pt.c (alias_ctad_tweaks): Call remove_constraints before
calling set_constraints.
gcc/testsuite/ChangeLog:
PR c++/95486
* g++.dg/cpp2a/class-deduction-alias3.C: New test.
Patrick Palka [Thu, 30 Jul 2020 02:06:36 +0000 (22:06 -0400)]
c++: abbreviated function template friend matching [PR96106]
In the below testcase, duplicate_decls wasn't merging the tsubsted
friend declaration for 'void add(auto)' with its definition, because
reduce_template_parm_level (during tsubst_friend_function) lost the
DECL_VIRTUAL_P flag on the auto's invented template parameter, which
caused template_heads_equivalent_p to deem the two template heads as not
equivalent in C++20 mode.
This patch makes reduce_template_parm_level carry over the
DECL_VIRTUAL_P flag from the original TEMPLATE_PARM_DECL.
gcc/cp/ChangeLog:
PR c++/96106
* pt.c (reduce_template_parm_level): Propagate DECL_VIRTUAL_P
from the original TEMPLATE_PARM_DECL to the new lowered one.
gcc/testsuite/ChangeLog:
PR c++/96106
* g++.dg/concepts/abbrev7.C: New test.