Andrew pointed out that I did not document the new architecture extension flag I added for the RcPc extension.
Andrew pointed out that I did not document the new architecture extension
flag I added for the RcPc extension. This was intentional, as enabling the
rcpc extension does not change GCC code generation, and is just
an assembler flag. But for completeness, here is documentation for the
new option.
gcc/
2017-07-03 James Greenhalgh <james.greenhalgh@arm.com>
* doc/invoke.texi (rcpc architecture extension): Document it.
Richard Biener [Mon, 3 Jul 2017 13:44:13 +0000 (13:44 +0000)]
re PR tree-optimization/60510 (SLP blocks loop vectorization (with reduction))
2017-07-03 Richard Biener <rguenther@suse.de>
PR tree-optimization/60510
* tree-vect-loop.c (vect_create_epilog_for_reduction): Pass in
the scalar reduction PHI and use it.
(vectorizable_reduction): Properly guard the single_defuse_cycle
path for non-SLP reduction chains where we cannot use it.
Rework reduc_def/index and vector type deduction. Rework
vector operand gathering during reduction op code-gen.
* tree-vect-slp.c (vect_analyze_slp): For failed SLP reduction
chains dissolve the chain and leave it to non-SLP reduction
handling.
Add a helper for getting the overall alignment of a DR
This combines the information from previous patches to give a guaranteed
alignment for the DR as a whole. This should be a bit safer than using
base_element_aligned, since that only really took the base into account
(not the init or offset).
2017-07-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-data-ref.h (dr_alignment): Declare.
* tree-data-ref.c (dr_alignment): New function.
* tree-vectorizer.h (dataref_aux): Remove base_element_aligned.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't
set it.
* tree-vect-stmts.c (vectorizable_store): Use dr_alignment.
This patch records the base alignment and misalignment in
innermost_loop_behavior, to avoid the second-guessing that was
previously done in vect_compute_data_ref_alignment. It also makes
vect_analyze_data_refs use dr_analyze_innermost, instead of having an
almost-copy of the same code.
I wasn't sure whether the alignments should be measured in bits
(for consistency with most other interfaces) or in bytes (for consistency
with DR_ALIGNED_TO, now DR_OFFSET_ALIGNMENT, and with *_ptr_info_alignment).
I went for bytes because:
- I think in practice most consumers are going to want bytes.
E.g. using bytes avoids having to mix TYPE_ALIGN and TYPE_ALIGN_UNIT
in vect_compute_data_ref_alignment.
- It means that any bit-level paranoia is dealt with when building
the innermost_loop_behavior and doesn't get pushed down to consumers.
2017-07-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-data-ref.h (innermost_loop_behavior): Add base_alignment
and base_misalignment fields.
(DR_BASE_ALIGNMENT, DR_BASE_MISALIGNMENT): New macros.
* tree-data-ref.c: Include builtins.h.
(dr_analyze_innermost): Set up the new innmost_loop_behavior fields.
* tree-vectorizer.h (STMT_VINFO_DR_BASE_ALIGNMENT): New macro.
(STMT_VINFO_DR_BASE_MISALIGNMENT): Likewise.
* tree-vect-data-refs.c: Include tree-cfg.h.
(vect_compute_data_ref_alignment): Use the new innermost_loop_behavior
fields instead of calculating an alignment here.
(vect_analyze_data_refs): Use dr_analyze_innermost. Dump the new
innermost_loop_behavior fields.
A later patch adds base alignment information to innermost_loop_behavior.
After that, the only remaining piece of alignment information that wasn't
immediately obvious was the step alignment. Adding that allows a minor
simplification to vect_compute_data_ref_alignment, and also potentially
improves the handling of variable strides for outer loop vectorisation.
A later patch will also use it to give the alignment of the DR as a whole.
2017-07-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-data-ref.h (innermost_loop_behavior): Add a step_alignment
field.
(DR_STEP_ALIGNMENT): New macro.
* tree-vectorizer.h (STMT_VINFO_DR_STEP_ALIGNMENT): Likewise.
* tree-data-ref.c (dr_analyze_innermost): Initalize step_alignment.
(create_data_ref): Print it.
* tree-vect-stmts.c (vectorizable_load): Use the step alignment
to tell whether the step preserves vector (mis)alignment.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Likewise.
Move the check for an integer step and generalise to all INTEGER_CST.
(vect_analyze_data_refs): Set DR_STEP_ALIGNMENT when setting DR_STEP.
Print the outer step alignment.
This patch renames DR_ALIGNED_TO to DR_OFFSET_ALIGNMENT, to avoid
confusion with the upcoming DR_BASE_ALIGNMENT. Nothing needed the
value as a tree, and the value is clipped to BIGGEST_ALIGNMENT
(maybe it should be MAX_OFILE_ALIGNMENT?) so we might as well use
an unsigned int instead.
2017-07-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-data-ref.h (innermost_loop_behavior): Replace aligned_to
with offset_alignment.
(DR_ALIGNED_TO): Delete.
(DR_OFFSET_ALIGNMENT): New macro.
* tree-vectorizer.h (STMT_VINFO_DR_ALIGNED_TO): Delete.
(STMT_VINFO_DR_OFFSET_ALIGNMENT): New macro.
* tree-data-ref.c (dr_analyze_innermost): Update after above changes.
(create_data_ref): Likewise.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Likewise.
(vect_analyze_data_refs): Likewise.
* tree-if-conv.c (if_convertible_loop_p_1): Use memset before
creating dummy innermost behavior.
Use innermost_loop_behavior for outer loop vectorisation
This patch replaces the individual stmt_vinfo dr_* fields with
an innermost_loop_behavior, so that the changes in later patches
get picked up automatically. It also adds a helper function for
getting the behavior of a data reference wrt the vectorised loop.
2017-07-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vectorizer.h (_stmt_vec_info): Replace individual dr_*
fields with dr_wrt_vec_loop.
(STMT_VINFO_DR_BASE_ADDRESS, STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET)
(STMT_VINFO_DR_STEP, STMT_VINFO_DR_ALIGNED_TO): Update accordingly.
(STMT_VINFO_DR_WRT_VEC_LOOP): New macro.
(vect_dr_behavior): New function.
(vect_create_addr_base_for_vector_ref): Remove loop parameter.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
vect_dr_behavior. Use a step_preserves_misalignment_p boolean to
track whether the step preserves the misalignment.
(vect_create_addr_base_for_vector_ref): Remove loop parameter.
Use vect_dr_behavior.
(vect_setup_realignment): Update call accordingly.
(vect_create_data_ref_ptr): Likewise. Use vect_dr_behavior.
* tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Update
call to vect_create_addr_base_for_vector_ref.
(vect_create_cond_for_align_checks): Likewise.
* tree-vect-patterns.c (vect_recog_bool_pattern): Copy
STMT_VINFO_DR_WRT_VEC_LOOP as a block.
(vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-stmts.c (compare_step_with_zero): Use vect_dr_behavior.
(new_stmt_vec_info): Remove redundant zeroing.
The existing code in arm/bpabi.h was quite fragile and relied on matching
specific CPU and/or architecture names. The introduction of the option
format for -mcpu and -march broke that in a way that would be non-trivial
to fix by updating the list. The hook in that file was always a pain
as it required every new CPU being added to be add an update here as well
(easy to miss).
I've fixed that problem once and for all by adding a new callback into
the driver to select the correct BE8 behaviour. This uses features in
the ISA capabilities list to select whether or not to use BE8 format
during linking.
I also noticed that if the user happened to pass both -mbig-endian and
-mlittle-endian on the command line then the linker spec rules would
get somewhat confused and potentially do the wrong thing. I've fixed that
by marking these options as opposites in the option descriptions. The
driver will now automatically suppress overridden options leading to the
correct desired behavior.
Whilst fixing this I noticed a couple of anomolus cases in the
existing BE8 support: we were not generating BE8 format for ARMv6 or
ARMv7-R targets. While the ARMv6 status was probably deliberate at
the time, this is probably not a good idea in the long term as the
alternative, BE32, has been deprecated by ARM. After discussion with
a couple of colleagues I've decided to change this, but to then add an
option to restore the existing behaviour at the user's option. So
this patch introduces two new options (opposites) -mbe8 and -mbe32.
This is a quiet behavior change, so I'll add a comment to the release
notes shortly.
* common/config/arm/arm-common.c (arm_be8_option): New function.
* config/arm/arm-isa.h (isa_feature): Add new feature bit isa_bit_be8.
(ISA_ARMv6): Add isa_bit_be8.
* config/arm/arm.h (arm_be8_option): Add prototype.
(BE8_SPEC_FUNCTION): New define.
(EXTRA_SPEC_FUNCTIONS): Add BE8_SPEC_FUNCTION.
* config/arm/arm.opt (mbig-endian): Mark as Negative of mlittle-endian.
(mlittle-endian): Similarly.
(mbe8, mbe32): New options.
* config/arm/bpabi.h (BE8_LINK_SPEC): Call arm_be8_option.
* doc/invoke.texi (ARM Options): Document -mbe8 and -mbe32.
Jan Hubicka [Mon, 3 Jul 2017 12:42:07 +0000 (14:42 +0200)]
tree-cfgcleanup.c (want_merge_blocks_p): New function.
* tree-cfgcleanup.c (want_merge_blocks_p): New function.
(cleanup_tree_cfg_bb): Use it.
* profile-count.h (profile_count::of_for_merging, profile_count::merge):
New functions.
* tree-cfg.c (gimple_merge_blocks): Use profile_count::merge.
PR sanitize/81040
* g++.dg/asan/function-argument-1.C: New test.
* g++.dg/asan/function-argument-2.C: New test.
* g++.dg/asan/function-argument-3.C: New test.
2017-07-03 Martin Liska <mliska@suse.cz>
Martin Liska [Mon, 3 Jul 2017 09:26:31 +0000 (11:26 +0200)]
Make stack epilogue more efficient
2017-07-03 Martin Liska <mliska@suse.cz>
* asan.c (asan_emit_stack_protection): Unpoison just red zones
and shadow memory of auto variables which are subject of
use-after-scope sanitization.
(asan_expand_mark_ifn): Add do set only when is_poison.
dr_analyze_innermost had a "struct loop *nest" parameter that acted
like a boolean. This was added in r179161, with the idea that a
null nest selected BB-level analysis rather than loop analysis.
The handling seemed strange though. If the DR was part of a loop,
we still tried to express the base and offset values as IVs, potentially
giving a nonzero step. If that failed for any reason, we'd revert to
using the original base and offset, just as we would if we hadn't asked
for an IV in the first place.
It seems more natural to use the !in_loop handling whenever nest is null
and always set the step to zero. This actually enables one more SLP
opportunity in bb-slp-pr65935.c.
I checked out r179161 and tried the patch there. The test case added
in that revision still passes, so I don't think there was any particular
need to check simple_iv.
2017-06-28 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-data-ref.c (dr_analyze_innermost): Replace the "nest"
parameter with a "loop" parameter and use it instead of the
loop containing DR_STMT. Don't check simple_iv when doing
BB analysis. Describe the two analysis modes in the comment.
gcc/testsuite/
* gcc.dg/vect/bb-slp-pr65935.c: Expect SLP to be used in main
as well.
Tom de Vries [Mon, 3 Jul 2017 07:21:34 +0000 (07:21 +0000)]
Don't tail-merge blocks from different loops
2017-07-03 Tom de Vries <tom@codesourcery.com>
PR tree-optimization/81192
* tree-ssa-tail-merge.c (same_succ_hash): Use bb->loop_father->num in
hash.
(same_succ::equal): Don't find bbs to be equal if bb->loop_father
differs.
(find_same_succ_bb): Remove obsolete test on bb->loop_father->latch.
This patch splits the auto-generated inline functions out of
insn-modes.h and puts them in a new header file, insn-modes-inline.h.
It also makes coretypes.h include these files directly, rather than
indirectly via machmode.h. This in turn allows insn-modes-inline.h
and machmode.h to come later in the include list, after wide-int.h.
This is useful for later patches.
insn-modes.h itself still needs to come first, since it provides
configuration information like MAX_BITSIZE_MODE_ANY_INT, which is
used to control the size of a wide_int.
The patch also makes the generator files include machmode.h
via coretypes.h. Previously they did it by more indirect means.
Finally, the patch makes wide-int-print.h available via coretypes.h
too. There didn't seem to be any reason to force only the print
routines to be included directly, and it would be painful to extend
that approach to the SVE patches.
[Based on the code ARM contributed in branches/ARM/sve-branch@242100]
2017-07-02 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* Makefile.in (MACHMODE_H): Remove insn-modes.h
(CORETYPES_H): New define.
(MOSTLYCLEANFILES): Add insn-modes-inline.h.
(insn-modes-inline.h, s-modes-inline-h): New rules.
(generated_files): Add insn-modes-inline.h.
(RTL_BASE_H, TREE_CORE_H): Use CORETYPES_H instead of coretypes.h.
(build/gensupport.o, build/ggc-none.o, build/print-rtl.o): Likewise.
(build/read-md.o, build/read-rtl.o, build/rtl.o): Likewise.
(build/vec.o, build/hash-table.o, build/inchash.o): Likewise.
(build/gencondmd.o, build/genattr.o, build/genattr-common.o): Likewise.
(build/genattrtab.o, build/genautomata.o, build/gencheck.o): Likewise.
(build/gencodes.o, build/genconditions.o): Likewise.
(build/genconfig.o, build/genconstants.o, build/genemit.o): Likewise.
(build/genenums.o, build/genextract.o, build/genflags.o): Likewise.
(build/gentarget-def.o, build/genmddeps.o, build/genopinit.o)
(build/genoutput.o, build/genpeep.o, build/genpreds.o): Likewise.
(build/genrecog.o, build/genmddump.o, build/genmatch.o): Likewise.
(build/gencfn-macros.o, build/gcov-iov.o): Likewise.
* coretypes.h: Include everything up to real.h for generators.
Include insn-modes.h first. Include wide-int-print.h after
wide-int.h. Include insn-modes-inline.h and then machmode.h.
* machmode.h: Don't include insn-modes.h here.
* function-tests.c: Remove includes of signop.h, machmode.h,
double-int.h and wide-int.h.
* rtl.h: Likewise.
* gcc-rich-location.c: Remove includes of machmode.h, double-int.h
and wide-int.h.
* optc-save-gen.awk: Likewise.
* gencheck.c (BITS_PER_UNIT): Delete dummy definition.
* godump.c: Remove include of wide-int-print.h.
* pretty-print.h: Likewise.
* wide-int-print.cc: Likewise.
* wide-int.cc: Likewise.
* hash-map-tests.c: Remove include of signop.h.
* hash-set-tests.c: Likewise.
* rtl-tests.c: Likewise.
* mkconfig.sh: Remove include of machmode.h.
* genmodes.c (emit_insn_modes_h): Split emission of inline functions
into...
(emit_insn_modes_inline_h): ...this new function. Emit the code
into an insn-modes-inline.h header file, adding appropriate
include guards and end comments.
(emit_insn_modes_c_header): Remove include of machmode.h.
(emit_min_insn_modes_c_header): Include coretypes.h rather than
machmode.h.
(main): Handle -i flag and call emit_insn_modes_inline_h when
it is passed.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r249881
Make tree-ssa-strlen.c handle partial unterminated strings
tree-ssa-strlen.c looks for cases in which a string is built up using
operations like:
memcpy (a, "foo", 4);
memcpy (a + 3, "bar", 4);
int x = strlen (a);
As a side-effect, it optimises the non-final memcpys so that they don't
include the nul terminator.
However, after removing some "& ~0x1"s from tree-ssa-dse.c, the DSE pass
does this optimisation itself (because it can tell that later memcpys
overwrite the terminators). The strlen pass wasn't able to handle these
pre-optimised calls in the same way as the unoptimised ones.
This patch adds support for tracking unterminated strings.
[Based on the code ARM contributed in branches/ARM/sve-branch@246236]
2017-07-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-ssa-strlen.c (strinfo): Rename the length field to
nonzero_chars. Add a full_string_p field.
(compare_nonzero_chars, zero_length_string_p): New functions.
(get_addr_stridx): Add an offset_out parameter.
Use compare_nonzero_chars.
(get_stridx): Update accordingly. Use compare_nonzero_chars.
(new_strinfo): Update after above changes to strinfo.
(set_endptr_and_length): Set full_string_p.
(get_string_length): Update after above changes to strinfo.
(unshare_strinfo): Update call to new_strinfo.
(maybe_invalidate): Likewise.
(get_stridx_plus_constant): Change off to unsigned HOST_WIDE_INT.
Use compare_nonzero_chars and zero_string_p. Treat nonzero_chars
as a uhwi instead of an shwi. Update after above changes to
strinfo and new_strinfo.
(zero_length_string): Assert that chainsi contains full strings.
Use zero_length_string_p. Update call to new_strinfo.
(adjust_related_strinfos): Update after above changes to strinfo.
Copy full_string_p from origsi.
(adjust_last_stmt): Use zero_length_string_p.
(handle_builtin_strlen): Update after above changes to strinfo and
new_strinfo. Install the lhs as the string length if the previous
entry didn't describe a full string.
(handle_builtin_strchr): Update after above changes to strinfo
and new_strinfo.
(handle_builtin_strcpy): Likewise.
(handle_builtin_strcat): Likewise.
(handle_builtin_malloc): Likewise.
(handle_pointer_plus): Likewise.
(handle_builtin_memcpy): Likewise. Track nonzero characters
that aren't necessarily followed by a nul terminator.
(handle_char_store): Likewise.
that the length of p1 and p2 can be calculated by converting the
second strcpy to:
tmp = stpcpy (p2, q)
and then doing tmp - p1 for p1 and tmp - p2 for p2. This is delayed
until we know whether we actually need it. Then:
char *p3 = strchr (p2, '\0');
forces us to calculate the length of p2 in this way. At this point
we had three related strinfos:
p1: delayed length, calculated from tmp = stpcpy (p2, q)
p2: known length, tmp - p2
p3: known length, 0
After:
memcpy (p3, "x", 2);
we use adjust_related_strinfos to add 1 to each length. However,
that didn't do anything for delayed lengths because:
else if (si->stmt != NULL)
/* Delayed length computation is unaffected. */
;
So after the memcpy we had:
p1: delayed length, calculated from tmp = stpcpy (p2, q)
p2: known length, tmp - p2 + 1
p3: known length, 1
where the length of p1 was no longer correct.
2017-05-16 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/80769
* tree-ssa-strlen.c (strinfo): Document that "stmt" is also used
for malloc and calloc. Document the new invariant that all related
strinfos have delayed lengths or none do.
(verify_related_strinfos): Move earlier in file.
(set_endptr_and_length): New function, split out from...
(get_string_length): ...here. Also set the lengths of related
strinfos.
(zero_length_string): Assert that chainsi has known (rather than
delayed) lengths.
(adjust_related_strinfos): Likewise.
gcc/testsuite/
PR tree-optimization/80769
* gcc.dg/strlenopt-31.c: New test.
* gcc.dg/strlenopt-31g.c: Likewise.
- one memory reference guaranteed a high base alignment, when considering
that reference in isolation. This meant that we could calculate the
vector misalignment for its DR at compile time.
- the other memory reference only guaranteed a low base alignment,
when considering that reference in isolation. We therefore couldn't
calculate the vector misalignment for its DR at compile time.
- when looking at the values of the two addresses as a pair (rather
than the memory references), it was obvious that they had the same
misalignment, whatever that misalignment happened to be.
This is working as designed, so the patch restricts the assert to cases
in which both addresses have a compile-time misalignment.
In the test case this looks like a missed opportunity. Both references
are unconditional, so it should be possible to use the highest of the
available base alignment guarantees when analyzing each reference.
A later patch does this, but the problem would still remain for
conditional references.
2017-07-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/81136
* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Only
assert that two references with the same misalignment have the same
compile-time misalignment if those compile-time misalignments
are known.
gcc/testsuite/
PR tree-optimization/81136
* gcc.dg/vect/pr81136.c: New test.
A tree type dump currently doesn't print the attributes. Since we have
so many now and they do many interesting things dumping them can be
useful. So dump them by default for tree type dumps.
gcc/:
2017-07-01 Andi Kleen <ak@linux.intel.com>
* print-tree.c (print_node): Print all attributes.
Jakub Jelinek [Sat, 1 Jul 2017 10:11:16 +0000 (12:11 +0200)]
re PR sanitizer/81262 (verify_flow_info failed for asmgoto test-case with -fsanitize=undefined)
PR sanitizer/81262
* bb-reorder.c (fix_up_fall_thru_edges): Move variable declarations to
the right scopes, make sure cond_jump isn't preserved between multiple
iterations. Search for fallthru edge whenever there are 3+ edges and
use find_fallthru_edge for it.
Jakub Jelinek [Sat, 1 Jul 2017 08:16:27 +0000 (10:16 +0200)]
re PR sanitizer/81262 (verify_flow_info failed for asmgoto test-case with -fsanitize=undefined)
PR sanitizer/81262
* bb-reorder.c (fix_up_fall_thru_edges): Move variable declarations to
the right scopes, make sure cond_jump isn't preserved between multiple
iterations. Search for fallthru edge whenever there are 3+ edges and
use find_fallthru_edge for it.
* gcc.c-torture/compile/pr81262.c: New test.
* g++.dg/ubsan/pr81262.C: New test.
Richard Earnshaw [Fri, 30 Jun 2017 16:36:57 +0000 (16:36 +0000)]
[rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
Many parallel set insns are of the form of a single set that also sets
the condition code flags. In this case the cost of such an insn is
normally the cost of the part that doesn't set the flags, since
updating the condition flags is simply a side effect.
At present all such insns are treated as having unknown cost (ie 0)
and combine assumes that such insns are infinitely more expensive than
any other insn sequence with a non-zero cost.
This patch addresses this problem by allowing insn_rtx_cost to ignore
the condition setting part of a PARALLEL iff there is exactly one
comparison set and one non-comparison set. If the only set operation
is a comparison we still use that as the basis of the insn cost.
* rtlanal.c (insn_rtx_cost): If a parallel contains exactly one
comparison set and one other set, use the cost of the non-comparison
set.
Peter Bergner [Fri, 30 Jun 2017 16:04:08 +0000 (11:04 -0500)]
tree-cfg.c (group_case_labels_stmt): Merge scanning and compressing loops.
* tree-cfg.c (group_case_labels_stmt): Merge scanning and compressing
loops. Remove now unneeded calls to gimple_switch_set_label() that
just set removed labels to NULL_TREE.
Aldy Hernandez [Fri, 30 Jun 2017 15:36:41 +0000 (15:36 +0000)]
tree-ssanames.c (set_range_info_raw): Abstract from ...
* tree-ssanames.c (set_range_info_raw): Abstract from ...
(set_range_info): ...here. Only call set_range_info_raw if domain
is useful.
(set_nonzero_bits): Call set_range_info_raw.
* tree-ssanames.h (set_range_info_raw): New.
testsuite/
* gcc.dg/Walloca-14.c: Adapt test to recognize new complaint of
unbounded use.
Jakub Jelinek [Fri, 30 Jun 2017 14:52:24 +0000 (16:52 +0200)]
re PR target/81225 (ICE with -mavx512ifma -O3 -ffloat-store)
PR target/81225
* config/i386/sse.md (vec_extract_lo_<mode><mask_name>): For
V8FI, V16FI and VI8F_256 iterators, use <store_mask_predicate> instead
of nonimmediate_operand and <store_mask_constraint> instead of m for
the input operand. For V8FI iterator, always split if input is a MEM.
For V16FI and V8SF_256 iterators, don't test if both operands are MEM
if <mask_applied>. For VI4F_256 iterator, use <store_mask_predicate>
instead of register_operand and <store_mask_constraint> instead of v for
the input operand. Make sure both operands aren't MEMs for if not
<mask_applied>.
Richard Biener [Fri, 30 Jun 2017 13:19:29 +0000 (13:19 +0000)]
tree-vect-slp.c (vect_slp_analyze_node_operations): Only analyze the first scalar stmt.
2017-06-30 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_slp_analyze_node_operations): Only
analyze the first scalar stmt. Move vector type computation
for the BB case here from ...
* tree-vect-stmts.c (vect_analyze_stmt): ... here. Guard
live operation processing in the SLP case properly.
Nathan Sidwell [Fri, 30 Jun 2017 13:11:01 +0000 (13:11 +0000)]
call.c (build_new_method_call_1): Use constructo_name to get ctor name.
* call.c (build_new_method_call_1): Use constructo_name to get
ctor name. Move argument processing earlier to merge cdtor
handling blocks.
* decl.c (grokfndecl): Cdtors have special names.
* method.c (implicitly_declare_fn): Likewise. Simplify flag setting.
* pt.c (check_explicit_specialization): Cdtor name is already
special.
* search.c (class_method_index_for_fn): Likewise.
* g++.dg/plugin/decl-plugin-test.C: Expect special ctor name.
Andreas Krebbel [Fri, 30 Jun 2017 06:45:51 +0000 (06:45 +0000)]
S/390: Adjust to the recent branch probability changes.
This fixes the bootstrap failure triggered by the recent changes wrt
branch probabilities aka emit_cmp_and_jump_insns does not accept
integers as branch probability anymore.
gcc/ChangeLog:
2017-06-30 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
* config/s390/s390.c (s390_expand_setmem): Adjust to the new data
type for branch probabilities.
Julian Brown [Fri, 30 Jun 2017 03:58:48 +0000 (03:58 +0000)]
aarch64-fusion-pairs.def: Add ALU_BRANCH entry.
2017-06-29 Julian Brown <julian@codesourcery.com>
Naveen H.S <Naveen.Hurugalawadi@cavium.com>
* config/aarch64/aarch64-fusion-pairs.def: Add ALU_BRANCH entry.
* config/aarch64/aarch64.c (AARCH64_FUSE_ALU_BRANCH): New fusion type.
(thunderx2t99_tunings): Set AARCH64_FUSE_ALU_BRANCH flag.
(aarch_macro_fusion_pair_p): Add support for AARCH64_FUSE_ALU_BRANCH.
* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Push the
check for CC usage into AARCH64_FUSE_CMP_BRANCH.
* config/i386/i386.c (ix86_macro_fusion_pair_p): Push the check for
CC usage from generic code to here.
* sched-deps.c (sched_macro_fuse_insns): Move the condition for
CC usage into the target macros.
* config/rs6000/rs6000.c (toc_relative_expr_p): Make tocrel_base
and tocrel_offset be pointer args rather than implicitly using
static versions.
(legitimate_constant_pool_address_p, rs6000_emit_move,
const_load_sequence_p, adjust_vperm): Add local tocrel_base and
tocrel_offset and use in toc_relative_expr_p call.
(print_operand, print_operand_address): Use static tocrel_base_oac
and tocrel_offset_oac.
(rs6000_output_addr_const_extra): Use static tocrel_base_oac and
tocrel_offset_oac.
Eric Botcazou [Thu, 29 Jun 2017 19:21:25 +0000 (19:21 +0000)]
expr.c (expand_expr): When testing for unaligned objects...
* expr.c (expand_expr) <normal_inner_ref>: When testing for unaligned
objects, take into account only the alignment of 'op0' and 'mode1' if
'op0' is a MEM.
Steve Ellcey [Thu, 29 Jun 2017 18:20:14 +0000 (18:20 +0000)]
ccmp.c (ccmp_tree_comparison_p): New function.
2017-06-29 Steve Ellcey <sellcey@cavium.com>
* ccmp.c (ccmp_tree_comparison_p): New function.
(ccmp_candidate_p): Update to use above function.
(get_compare_parts): New function.
(expand_ccmp_next): Update to use new functions.
(expand_ccmp_expr_1): Take tree arg instead of gimple, update to use
new functions.
(expand_ccmp_expr): Pass tree instead of gimple to expand_ccmp_expr_1,
take mode as argument.
* ccmp.h (expand_ccmp_expr): Add mode as argument.
* expr.c (expand_expr_real_1): Pass mode as argument.
Nathan Sidwell [Thu, 29 Jun 2017 18:20:13 +0000 (18:20 +0000)]
re PR c++/81247 (ICE on invalid C++ code with malformed namespace declaration: in do_push_nested_namespace, at cp/name-lookup.c:6002)
PR c++/81247
* parser.c (cp_parser_namespace_definition): Immediately close the
namespace if there's no open-brace.
* name-lookup.c (do_pushdecl): Reset OLD when pushing into new
namespace.
In the combine dump file, at the start there is a list of the RTL cost
of every insn. The only thing listed about the insns is the UID though.
To make it more useful, this patch prints the insn itself as well (in
slim format).
* combine.c (combine_instructions): Print insns to dump_file, together
with their costs.
Nathan Sidwell [Thu, 29 Jun 2017 14:49:46 +0000 (14:49 +0000)]
cp-tree.h (THIS_NAME, [...]): Delete.
* cp-tree.h (THIS_NAME, IN_CHARGE_NAME, VTBL_PTR_TYPE,
VTABLE_DELTA_NAME, VTABLE_PFN_NAME): Delete.
* decl.c (initialize_predefined_identifiers): Name cdtor special
names consistently. Use literals for above deleted defines.
(cxx_init_decl_processing): Use literal for vtbl_ptr_type name,