Richard Biener [Thu, 12 Apr 2018 12:27:14 +0000 (12:27 +0000)]
re PR lto/85371 (Compiling code with -g -flto gives an ICE on darwin after revision r259317)
2018-04-12 Richard Biener <rguenther@suse.de>
PR lto/85371
* dwarf2out.c (init_sections_and_labels): Use debug_line_section[_label]
for the early LTO debug to properly generate references to it
during DIE emission. Do not re-use that for the skeleton for
split-dwarf.
(dwarf2out_early_finish): Likewise.
Jakub Jelinek [Thu, 12 Apr 2018 11:17:23 +0000 (13:17 +0200)]
re PR target/85328 (accessing ymm16 with non-avx512 instruction form)
PR target/85328
* config/i386/sse.md
(<mask_codefor>avx512dq_vextract<shuffletype>64x2_1<mask_name> split,
<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name> split,
vec_extract_lo_<mode><mask_name> split, vec_extract_lo_v32hi,
vec_extract_lo_v64qi): For non-AVX512VL if input is xmm16+ reg
and output is a reg, avoid creating invalid lowpart subreg, but
instead split into a 512-bit move. Don't split if not AVX512VL,
input is xmm16+ reg and output is a mem.
(vec_extract_lo_<mode><mask_name>, vec_extract_lo_v32hi,
vec_extract_lo_v64qi): Don't require split if not AVX512VL, input is
xmm16+ reg and output is a mem.
Andreas Krebbel [Thu, 12 Apr 2018 09:14:57 +0000 (09:14 +0000)]
IBM Z: Spectre: Prevent thunk cfi to be emitted with -fno-dwarf2-cfi-asm
The CFI magic we emit as part of the indirect branch thunks in order to
have somewhat sane unwind information must not be emitted with
-fno-dwarf2-cfi-asm.
gcc/ChangeLog:
2018-04-12 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
* config/s390/s390.c (s390_output_indirect_thunk_function): Check
also for flag_dwarf2_cfi_asm.
gcc/testsuite/ChangeLog:
2018-04-12 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Jakub Jelinek [Thu, 12 Apr 2018 08:39:50 +0000 (10:39 +0200)]
re PR rtl-optimization/85342 (ICE: SIGSEGV in copyprop_hardreg_forward_1 (regcprop.c:995) with -O2 -mavx512vl)
PR rtl-optimization/85342
* regcprop.c (copyprop_hardreg_forward_1): Remove replaced array, use
a bool scalar var inside of the loop instead. Don't try to update
recog_data.operand after failed apply_change_group.
Tom de Vries [Thu, 12 Apr 2018 07:17:29 +0000 (07:17 +0000)]
[nvptx] Fix handling of extern var with flexible array member
2018-04-12 Tom de Vries <tom@codesourcery.com>
PR target/85296
* config/nvptx/nvptx.c (flexible_array_member_type_p): New function.
(nvptx_assemble_decl_begin): Add undefined param. Declare undefined
array with flexible array member as array without given dimension.
(nvptx_assemble_undefined_decl): Set nvptx_assemble_decl_begin call
argument for undefined param to true.
re PR target/85321 (Missing documentation and option misc for ppc64le)
2018-04-11 Aaron Sawdey <acsawdey@linux.ibm.com>
PR target/85321
* doc/invoke.texi (RS/6000 and PowerPC Options): Document options
-mcall- and -mtraceback=. Remove options -mabi=spe and -mabi=no-spe
from PowerPC section.
* config/rs6000/sysv4.opt (mcall-): Improve help text.
* config/rs6000/rs6000.opt (mblock-compare-inline-limit=): Trim
help text that is too long.
* config/rs6000/rs6000.opt (mblock-compare-inline-loop-limit=): Trim
help text that is too long.
* config/rs6000/rs6000.opt (mstring-compare-inline-limit=): Trim
help text that is too long.
Martin Jambor [Wed, 11 Apr 2018 13:30:53 +0000 (15:30 +0200)]
Improve IPA-CP handling of self-recursive calls
2018-04-11 Martin Jambor <mjambor@suse.cz>
PR ipa/84149
* ipa-cp.c (propagate_vals_across_pass_through): Expand comment.
(cgraph_edge_brings_value_p): New parameter dest_val, check if it is
not the same as the source val.
(cgraph_edge_brings_value_p): New parameter.
(gather_edges_for_value): Pass destination value to
cgraph_edge_brings_value_p.
(perhaps_add_new_callers): Likewise.
(get_info_about_necessary_edges): Likewise and exclude values brought
only by self-recursive edges.
(create_specialized_node): Redirect only clones of self-calling edges.
(+self_recursive_pass_through_p): New function.
(find_more_scalar_values_for_callers_subset): Use it.
(find_aggregate_values_for_callers_subset): Likewise.
(known_aggs_to_agg_replacement_list): Removed.
(decide_whether_version_node): Re-calculate known constants for all
remaining context clones.
Richard Biener [Wed, 11 Apr 2018 13:05:35 +0000 (13:05 +0000)]
re PR lto/85339 (With early LTO debug the early DWARF misses line-info)
2018-04-11 Richard Biener <rguenther@suse.de>
PR lto/85339
* dwarf2out.c (dwarf2out_finish): Remove DW_AT_stmt_list attribute
from early DWARF output.
(dwarf2out_early_finish): Output line info unconditionally into
early DWARF and add reference to it.
Jakub Jelinek [Wed, 11 Apr 2018 11:37:01 +0000 (13:37 +0200)]
re PR target/85281 (Assembler messages: Error: operand size mismatch for `vpbroadcastb' with -mavx512bw -masm=intel)
PR target/85281
* config/i386/sse.md (iptr): Add V16SFmode and V8DFmode cases.
(<avx512>_vec_dup<mode><mask_name>): Use a single pattern for modes
other than V2DFmode using iptr mode attribute.
(<avx512>_vec_dup<mode><mask_name>): Use iptr mode attribute.
Jakub Jelinek [Wed, 11 Apr 2018 10:22:36 +0000 (12:22 +0200)]
re PR rtl-optimization/85302 (ICE in size_of_loc_descr, at dwarf2out.c:1771 on i686-linux-gnu)
PR debug/85302
* dwarf2out.c (skip_loc_list_entry): Don't call size_of_locs if
SIZEP is NULL.
(output_loc_list): Pass address of a dummy size variable even in the
locview handling loop.
(index_location_lists): Add comment on why skip_loc_list_entry can't
call size_of_locs.
Instruction pattern for setting the FPSCR expects the input value to be
in a register. However, __builtin_arm_set_fpscr expander does not ensure
that this is the case and as a result GCC ICEs when the builtin is
called with a constant literal.
This commit fixes the builtin to force the input value into a register.
It also remove the unneeded volatile in the existing fpscr test and
fixes the function prototype.
2018-04-11 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
PR target/85261
* config/arm/arm-builtins.c (arm_expand_builtin): Force input operand
into register.
gcc/testsuite/
PR target/85261
* config/arm/arm-builtins.c (arm_expand_builtin): Force input operand
into register.
rs6000: Fix stack clash for big residuals (PR85287)
The stack clash protection code had a logic error in how it decided
whether to put the final update size in a register, or to emit it
directly in an insn. This fixes it. It also tidies some surrounding
code.
PR target/85287
* gcc/config/rs6000/rs6000.md (allocate_stack): Put the residual size
for stack clash protection in a register whenever we need it to be in
a register.
rs6000: Enable -fasynchronous-unwind-tables by default
To find out where on-entry register values live at any point in a
program, GDB currently tries to parse to parse the executable code.
This does not work very well, for example it gets confused if some
accesses to the stack use the frame pointer (r31) and some use the
stack pointer (r1). A symptom is that backtraces can be cut short.
This patch enables -fasynchronous-unwind-tables by default for rs6000,
which causes us to emit DWARF unwind tables for all functions, solving
these problems.
This not do anything for sub-targets without DWARF, and only for ELF
sub-targets for now.
It increases executable size, but only modestly, and does not change
memory use, only the disk image.
* common/config/rs6000/rs6000-common.c (rs6000_option_init_struct):
Enable -fasynchronous-unwind-tables by default if OBJECT_FORMAT_ELF.
This updates the help text for some options to mention the allowed
values for -mXX=XX.
PR target/85321
* config/rs6000/rs6000.opt (mtraceback=): Show the allowed values in
the help text.
(mlong-double-): Ditto.
* config/rs6000/sysv4.opt (msdata=): Ditto.
(mtls-size=): Ditto.
rs6000-c.c (altivec_overloaded_builtins): Remove erroneous entries for "vector int vec_ldl (int...
gcc/ChangeLog:
2018-04-10 Kelvin Nilsen <kelvin@gcc.gnu.org>
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
erroneous entries for
"vector int vec_ldl (int, long int *)", and
"vector unsigned int vec_ldl (int, unsigned long int *)".
Add comments and entries for
"vector bool char vec_ldl (int, bool char *)",
"vector bool short vec_ldl (int, bool short *)",
"vector bool int vec_ldl (int, bool int *)",
"vector bool long long vec_ldl (int, bool long long *)",
"vector pixel vec_ldl (int, pixel *)",
"vector long long vec_ldl (int, long long *)",
"vector unsigned long long vec_ldl (int, unsigned long long *)".
* config/rs6000/rs6000.c (rs6000_init_builtins): Initialize new
type tree bool_long_long_type_node and correct definition of
bool_V2DI_type_node to make reference to this new type tree.
(rs6000_mangle_type): Replace erroneous reference to
bool_long_type_node with bool_long_long_type_node.
* config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add
comments to emphasize sign distinctions for char and int types and
replace RS6000_BTI_bool_long constant with
RS6000_BTI_bool_long_long constant. Also add comment to restrict
use of RS6000_BTI_pixel.
(bool_long_type_node): Remove this macro definition.
(bool_long_long_type_node): New macro definition
gcc/testsuite/ChangeLog:
2018-04-10 Kelvin Nilsen <kelvin@gcc.gnu.org>
* gcc.target/powerpc/vec-ldl-1.c: New test.
* gcc.dg/vmx/ops-long-1.c: Correct test programs to reflect
corrections to ABI implementation.
Jakub Jelinek [Tue, 10 Apr 2018 15:31:57 +0000 (17:31 +0200)]
re PR rtl-optimization/85300 (ICE in exact_int_to_float_conversion_p, at simplify-rtx.c:895)
PR rtl-optimization/85300
* combine.c (subst): Handle subst of CONST_SCALAR_INT_P new_rtx also
into FLOAT and UNSIGNED_FLOAT like ZERO_EXTEND, return a CLOBBER if
simplify_unary_operation fails.
David Malcolm [Tue, 10 Apr 2018 14:37:09 +0000 (14:37 +0000)]
Show pertinent parameter (PR c++/85110)
gcc/cp/ChangeLog:
PR c++/85110
* call.c (get_fndecl_argument_location): Make non-static.
* cp-tree.h (get_fndecl_argument_location): New decl.
* typeck.c (convert_for_assignment): When complaining due to
conversions for an argument, show the location of the parameter
within the decl.
gcc/testsuite/ChangeLog:
PR c++/85110
* g++.dg/cpp1z/direct-enum-init1.C: Update for the cases
where we now show the pertinent parameter.
* g++.dg/diagnostic/aka2.C: Likewise.
* g++.dg/diagnostic/param-type-mismatch-2.C: Likewise.
Jonathan Wakely [Tue, 10 Apr 2018 14:36:09 +0000 (15:36 +0100)]
PR libstdc++/85222 allow catching iostream errors as gcc4-compatible ios::failure
Define a new exception type derived from std::ios::failure[abi:cxx11]
which also aggregates an object of the gcc4-compatible ios::failure
type. Make __throw_ios_failure throw this new type for iostream errors
that raise exceptions. Provide custom type info for the new type so that
it can be caught by handlers for the gcc4-compatible ios::failure type
as well as handlers for ios::failure[abi:cxx11] and its bases.
PR libstdc++/85222
* src/c++11/Makefile.am [ENABLE_DUAL_ABI]: Add special rules for
cxx11-ios_failure.cc to rewrite type info for __ios_failure.
* src/c++11/Makefile.in: Regenerate.
* src/c++11/cxx11-ios_failure.cc (__ios_failure, __iosfail_type_info):
New types.
[_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here.
* src/c++11/ios.cc (__throw_ios_failure): Remove definition.
* src/c++98/ios_failure.cc (__construct_ios_failure)
(__destroy_ios_failure, is_ios_failure_handler): New functions.
[!_GLIBCXX_USE_DUAL_ABI] (__throw_ios_failure): Define here.
* testsuite/27_io/ios_base/failure/dual_abi.cc: New.
* testsuite/27_io/basic_ios/copyfmt/char/1.cc: Revert changes to
handler types, to always catch std::ios_base::failure.
* testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/ios_base/storage/2.cc: Likewise.
Jakub Jelinek [Tue, 10 Apr 2018 12:37:36 +0000 (14:37 +0200)]
re PR target/85177 (wrong code with -O -fno-tree-ccp -fno-tree-sra -mavx512f)
PR target/85177
PR target/85255
* config/i386/sse.md
(<extract_type>_vinsert<shuffletype><extract_suf>_mask): Fix
computation of the VEC_MERGE selector from mask.
(<extract_type>_vinsert<shuffletype><extract_suf>_1<mask_name>):
Fix decoding of the VEC_MERGE selector into mask.
* gcc.target/i386/avx512f-pr85177.c: New test.
* gcc.target/i386/avx512f-pr85255.c: New test.
Add missing cases to vect_get_smallest_scalar_type (PR 85286)
In this PR we used WIDEN_SUM_EXPR to vectorise:
short i, y;
int sum;
[...]
for (i = x; i > 0; i--)
sum += y;
with 4 ints and 8 shorts per vector. The problem was that we set
the VF based only on the ints, then calculated the number of vector
copies based on the shorts, giving 4/8. Previously that led to
ncopies==0, but after r249897 we pick it up as an ICE.
In this particular case we could vectorise the reduction by setting
ncopies based on the output type rather than the input type, but it
doesn't seem worth adding a special "optimisation" for such a
pathological case. I think it's really an instance of the more general
problem that we can't vectorise using combinations of (say) 64-bit and
128-bit vectors on targets that support both.
2018-04-10 Richard Sandiford <richard.sandiford@linaro.org>
final_1 already sets insn_current_address for each instruction, making
it possible to use some of the address functions in final.c during
assembly generation. This patch also sets insn_last_address, since
as the comment says, we can treat final as a shorten_branches pass that
does nothing. It's then possible to use insn_current_reference_address
during final as well.
This is needed for the aarch64.md definitions of far_branch to work:
This value (tested only during final) uses the difference between
the INSN_ADDRESSES of operand 2 and insn_current_reference_address
to calculate a conservatively-correct estimate of the branch distance.
It takes into account the worst-case gap due to alignment, whereas
a direct comparison of INSN_ADDRESSES would give an unreliable,
optimistic result.
2018-04-10 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* final.c (final_1): Set insn_last_address as well as
insn_current_address.
[explow] PR target/85173: validize memory before passing it on to target probe_stack
In this PR the expansion code emits an invalid memory address for the stack probe, which the backend fails to recognise.
The address is created explicitly in anti_adjust_stack_and_probe_stack_clash in explow.c and passed down to gen_probe_stack
without any validation in emit_stack_probe.
This patch fixes the ICE by calling validize_mem on the memory location before passing it down to the target.
Jakub pointed out that we also want to create valid addresses for the probe_stack_address case, so this patch
creates an expand operand and legitimizes it before passing it down to the probe_stack_address expander.
This patch passes bootstrap and testing on arm-none-linux-gnueabihf and aarch64-none-linux-gnu
and ppc64le-redhat-linux on gcc112 in the compile farm.
PR target/85173
* explow.c (emit_stack_probe): Call validize_mem on memory location
before passing it to gen_probe_stack. Create address operand and
legitimize it for the probe_stack_address case.
Jan Hubicka [Tue, 10 Apr 2018 06:33:38 +0000 (08:33 +0200)]
re PR lto/85078 (LTO ICE: tree check: expected tree that contains 'decl minimal' structure, have 'identifier_node' in decl_mangling_context, at cp/mangle.c:878)
PR lto/85078
* ipa-devirt.c (rebuild_type_inheritance-hash): New.
* ipa-utils.h (rebuild_type_inheritance-hash): Declare.
* tree.c (free_lang_data_in_type): Fix handling of binfos;
walk basetypes.
(free_lang_data): Rebuild type inheritance graph.
* g++.dg/torture/pr85078.C: New.
Jakub Jelinek [Mon, 9 Apr 2018 19:48:48 +0000 (21:48 +0200)]
re PR c++/85194 (ICE with structured binding in broken for-loop)
PR c++/85194
* parser.c (cp_parser_simple_declaration): For structured bindings,
if *maybe_range_for_decl is NULL after parsing it, set it to
error_mark_node.
Jan Hubicka [Mon, 9 Apr 2018 16:33:51 +0000 (18:33 +0200)]
re PR rtl-optimization/84058 (RTl partitioning fixup should drag very small blocks back to hot partition)
PR rtl/84058
* cfgcleanup.c (try_forward_edges): Do not give up on crossing
jumps; choose last target that matches the criteria (i.e.
no partition changes for non-crossing jumps).
* cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic
support for redirecting crossing jumps to non-crossing.
Richard Biener [Mon, 9 Apr 2018 13:27:33 +0000 (13:27 +0000)]
re PR tree-optimization/85284 (Loop miscompilation starting with r238367)
2018-04-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/85284
* tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions):
Only use the niter constraining form of simple_iv when the exit
is always executed.
* sel-sched-ir.c (has_dependence_note_mem_dep): Take into account the
correct producer for the insn.
(tidy_control_flow): Fixup seqnos in case of debug insns.
* gcc.dg/pr80463.c: New test.
* g++.dg/pr80463.C: Likewise.
* gcc.dg/pr83972.c: Likewise.
re PR rtl-optimization/83530 (ICE in reset_sched_cycles_in_current_ebb, at sel-sched.c:7150)
PR rtl-optimization/83530
* sel-sched.c (force_next_insn): New global variable.
(remove_insn_for_debug): When force_next_insn is true, also leave only
next insn in the ready list.
(sel_sched_region): When the region wasn't scheduled, make another pass
over it with force_next_insn set to 1.
Kito Cheng [Sun, 8 Apr 2018 08:31:52 +0000 (08:31 +0000)]
[NDS32] Implement n8 pipeline.
gcc/
* config.gcc (nds32*-*-*): Check that n6/n8/s8 are valid to --with-cpu.
* config/nds32/nds32-n8.md: New file.
* config/nds32/nds32-opts.h (nds32_cpu_type): Add CPU_N6 and CPU_N8.
* config/nds32/nds32-pipelines-auxiliary.c: Implementation for n8
pipeline.
* config/nds32/nds32-protos.h: More declarations for n8 pipeline.
* config/nds32/nds32-utils.c: More implementations for n8 pipeline.
* config/nds32/nds32.md (pipeline_model): Add n8.
* config/nds32/nds32.opt (mcpu): Support n8 pipeline cpus.
* config/nds32/pipelines.md: Include n8 settings.
Kito Cheng [Sun, 8 Apr 2018 08:12:19 +0000 (08:12 +0000)]
[NDS32] Implment n9 pipeline.
gcc/
* config.gcc (nds32*): Add nds32-utils.o into extra_objs.
* config/nds32/nds32-n9-2r1w.md: New file.
* config/nds32/nds32-n9-3r2w.md: New file.
* config/nds32/nds32-opts.h (nds32_cpu_type, nds32_mul_type,
nds32_register_ports): New or modify for cpu n9.
* config/nds32/nds32-pipelines-auxiliary.c: Implementation for n9
pipeline.
* config/nds32/nds32-protos.h: More declarations for n9 pipeline.
* config/nds32/nds32-utils.c: New file.
* config/nds32/nds32.h (TARGET_PIPELINE_N9, TARGET_PIPELINE_SIMPLE,
TARGET_MUL_SLOW): Define.
* config/nds32/nds32.md (pipeline_model): New attribute.
* config/nds32/nds32.opt (mcpu, mconfig-mul, mconfig-register-ports):
New options that support cpu n9.
* config/nds32/pipelines.md: Include n9 settings.
* config/nds32/t-nds32 (nds32-utils.o): Add dependency.
Thomas Koenig [Sat, 7 Apr 2018 23:52:03 +0000 (23:52 +0000)]
re PR middle-end/82976 (Error: non-trivial conversion at assignment since r254526)
2018-04-07 Thomas Koenig <tkoenig@gcc.gnu.org>
Andrew Pinski <pinsika@gcc.gnu.org>
PR middle-end/82976
* match.pd: Use constant_boolean_node of correct type instead of
boolean_true_node or boolean_false_node for simplifying
pointer comparisons to zero.
2018-04-07 Thomas Koenig <tkoenig@gcc.gnu.org>
PR middle-end/82976
* gfortran.dg/realloc_on_assign_16a.f90: New test.
Co-Authored-By: Andrew Pinski <pinskia@gcc.gnu.org>
From-SVN: r259212
Jakub Jelinek [Sat, 7 Apr 2018 07:20:42 +0000 (09:20 +0200)]
re PR tree-optimization/85257 (wrong code with -O -fno-tree-ccp and reading zeroed vector member)
PR tree-optimization/85257
* fold-const.c (native_encode_vector): If not all elts could fit
and off is -1, return 0 rather than offset.
* tree-ssa-sccvn.c (vn_reference_lookup_3): Pass
(offseti - offset2) / BITS_PER_UNIT as 4th argument to
native_encode_expr. Verify len * BITS_PER_UNIT >= maxsizei. Don't
adjust buffer in native_interpret_expr call.