vtrn_half.c:76:17: error: redeclaration of 'vector_float64x2' with no linkage
vtrn_half.c:77:17: error: redeclaration of 'vector2_float64x2' with no linkage
vtrn_half.c:80:17: error: redeclaration of 'vector_res_float64x2' with no linkage
This is because r11-3402 now always declares float64x2 variables for
aarch64, leading to a duplicate declaration in these testcases.
The fix is simply to remove these now useless declarations.
These tests are skipped on arm*, so there is no impact on that target.
This patch implements the missing reinterprets to and from poly128_t and
float64x2_t.
I've plugged in the appropriate testing in the advsimd-intrinsics.exp
too.
Bootstrapped and tested on aarch64-none-linux-gnu.
Tested advsimd-intrinsics.exp on arm-none-eabi too to make sure arm
testing isn't affected.
This patch implements the missing vrndns_f32 intrinsic. This operates on a scalar float32_t value.
It can be mapped down to a __builtin_aarch64_frintnsf builtin.
This patch does that.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/71233
* config/aarch64/aarch64-simd-builtins.def (frintn): Use BUILTIN_VHSDF_HSDF
for modes. Remove explicit hf instantiation.
* config/aarch64/arm_neon.h (vrndns_f32): Define.
gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/vrndns_f32_1.c: New test.
AArch64: Implement missing _p64 intrinsics for vector permutes
This patch implements some missing vector permute intrinsics operating on poly64x2_t types.
They are implemented identically to their uint64x2_t brethren.
Bootstrapped and tested on aarch64-none-linux-gnu.
This patch implements some missing vceq* intrinsics on poly types.
The behaviour is to produce the appropriate CMEQ instruction as for the unsigned types.
Bootstrapped and tested on aarch64-none-linux-gnu.
Jakub Jelinek [Sun, 27 Sep 2020 21:18:26 +0000 (23:18 +0200)]
optabs: Don't reuse target for multi-word expansions if it overlaps operand(s) [PR97073]
The following testcase is miscompiled on i686-linux, because
we try to expand a double-word bitwise logic operation with op0
being a (mem:DI u) and target (mem:DI u+4), i.e. partial overlap, and
thus end up with:
movl 4(%esp), %eax
andl u, %eax
movl %eax, u+4
! movl u+4, %eax optimized out
andl 8(%esp), %eax
movl %eax, u+8
rather than with the desired:
movl 4(%esp), %edx
movl 8(%esp), %eax
andl u, %edx
andl u+4, %eax
movl %eax, u+8
movl %edx, u+4
because the store of the first word to target overwrites the second word of
the operand.
expand_binop for this (and several similar places) already check for target
== op0 or target == op1, this patch just adds reg_overlap_mentioned_p calls
next to it.
Pedantically, at least for some of these it might be sufficient to force
a different target if there is overlap but target is not rtx_equal_p to
the operand (e.g. in this bitwise logical case, but e.g. not in the shift
cases where there is reordering), though that would go against the
preexisting target == op? checks and the rationale that REG_EQUAL notes in
that case isn't correct.
2020-09-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/97073
* optabs.c (expand_binop, expand_absneg_bit, expand_unop,
expand_copysign_bit): Check reg_overlap_mentioned_p between target
and operand(s) and if it returns true, force a pseudo as target.
Mark Eggleston [Thu, 11 Jun 2020 13:33:51 +0000 (14:33 +0100)]
Fortran : ICE in build_field PR95614
Local identifiers can not be the same as a module name. Original
patch by Steve Kargl resulted in name clashes between common block
names and local identifiers. A local identifier can be the same as
a global identier if that identifier represents a common. The patch
was modified to allow global identifiers that represent a common
block.
2020-09-27 Steven G. Kargl <kargl@gcc.gnu.org>
Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/fortran/
PR fortran/95614
* decl.c (gfc_get_common): Use gfc_match_common_name instead
of match_common_name.
* decl.c (gfc_bind_idents): Use gfc_match_common_name instead
of match_common_name.
* match.c : Rename match_common_name to gfc_match_common_name.
* match.c (gfc_match_common): Use gfc_match_common_name instead
of match_common_name.
* match.h : Rename match_common_name to gfc_match_common_name.
* resolve.c (resolve_common_vars): Check each symbol in a
common block has a global symbol. If there is a global symbol
issue an error if the symbol type is known as is not a common
block name.
2020-09-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/testsuite/
PR fortran/95614
* gfortran.dg/pr95614_1.f90: New test.
* gfortran.dg/pr95614_2.f90: New test.
I'd like to backport some patches from Tamar in GCC 9 to GCC 8 that implement the complex arithmetic intrinsics for Advanced SIMD.
These should have been present in GCC 8 that gained support for Armv8.3-a.
There were 4 follow-up fixes that I've rolled into the one commit.
Bootstrapped and tested on aarch64-none-linux-gnu and arm-none-linux-gnueabihf on the GCC 8 branch.
Kyrylo Tkachov [Mon, 21 Oct 2019 10:52:05 +0000 (10:52 +0000)]
AArch64: Implement __rndr, __rndrrs intrinsics
This patch implements the recently published[1] __rndr and __rndrrs
intrinsics used to access the RNG in Armv8.5-A.
The __rndrrs intrinsics can be used to reseed the generator too.
They are guarded by the __ARM_FEATURE_RNG feature macro.
A quirk with these intrinsics is that they store the random number in
their pointer argument and return a status
code if the generation succeeded.
The instructions themselves write the CC flags indicating the success of
the operation that we can then read with a CSET.
Therefore this implementation makes use of the IGNORE indicator to the
builtin expand machinery to avoid generating
the CSET if its result is unused (the CC reg clobbering effect is still
reflected in the pattern).
I've checked that using unspec_volatile prevents undesirable CSEing of
the instructions.
H.J. Lu [Mon, 14 Sep 2020 15:52:27 +0000 (08:52 -0700)]
rtl_data: Add sp_is_clobbered_by_asm
Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.
gcc/
PR target/97032
* cfgexpand.c (expand_asm_stmt): Set sp_is_clobbered_by_asm to
true if the stack pointer is clobbered by asm statement.
* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
if the stack pointer is clobbered by asm statement.
gcc/testsuite/
PR target/97032
* gcc.target/i386/pr97032.c: New test.
This patch implements the __jcvt ACLE intrinsic [1] that maps down to the FJCVTZS [2] instruction from Armv8.3-a.
No fancy mode iterators or nothing. Just a single builtin, UNSPEC and define_insn and the associate plumbing.
This patch also defines __ARM_FEATURE_JCVT to indicate when the intrinsic is available.
gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/jcvt_1.c: New test.
* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.
Tamar Christina [Mon, 21 May 2018 10:33:30 +0000 (10:33 +0000)]
Add missing AArch64 NEON instrinctics for Armv8.2-a to Armv8.4-a
This patch adds the missing neon intrinsics for all 128 bit vector Integer modes for the
three-way XOR and negate and xor instructions for Arm8.2-a to Armv8.4-a.
Jonathan Wakely [Tue, 22 Sep 2020 08:39:33 +0000 (09:39 +0100)]
libstdc++: Use correct argument type for __use_alloc [PR 96803]
The _Tuple_impl constructor for allocator-extended construction from a
different tuple type uses the _Tuple_impl's own _Head type in the
__use_alloc test. That is incorrect, because the argument tuple could
have a different type. Using the wrong type might select the
leading-allocator convention when it should use the trailing-allocator
convention, or vice versa.
This backport includes the value category fix from r11-3348.
libstdc++-v3/ChangeLog:
PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl<U...>&)):
Replace parameter pack with a type parameter and a pack and pass
the first type to __use_alloc.
* testsuite/20_util/tuple/cons/96803.cc: New test.
Jakub Jelinek [Wed, 16 Sep 2020 07:42:33 +0000 (09:42 +0200)]
store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053]
As the testcases show, if we have something like:
MEM <char[12]> [&b + 8B] = {};
MEM[(short *) &b] = 5;
_5 = *x_4(D);
MEM <long long unsigned int> [&b + 2B] = _5;
MEM[(char *)&b + 16B] = 88;
MEM[(int *)&b + 20B] = 1;
then in sort_by_bitpos the stores are almost like in the given order,
except the first store is after the = _5; store.
We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
while the former INTEGER_CST, and we can't coalesce the = _5 store with
the = {} store because the former is MEM_REF, the latter INTEGER_CST.
But we happily coalesce the remaining 3 stores, which is wrong, because the
= _5; store overlaps those and is in between them in the program order.
We already have code to deal with similar cases in check_no_overlap, but we
deal only with the following stores in sort_by_bitpos order, not the earlier
ones.
The following patch checks also the earlier ones. In coalesce_immediate_stores
it computes the first one that needs to be checked (all the ones whose
bitpos + bitsize is smaller or equal to merged_store->start don't need to be
checked and don't need to be checked even for any following attempts because
of the sort_by_bitpos sorting) and the end of that (that is the first store
in the merged_store).
2020-09-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97053
* gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
START, FIRST_EARLIER and LAST_EARLIER arguments. Return false if
any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
be merged store.
(imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
Adjust check_no_overlap caller.
(imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
and last_earlier variables, adjust them during iterations. Adjust
check_no_overlap callers, call check_no_overlap even when extending
overlapping stores by extra INTEGER_CST stores.
* gcc.dg/store_merging_31.c: New test.
* gcc.dg/store_merging_32.c: New test.
2020-04-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/arm/arm-builtins.c (arm_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for the first assignments to
fenv_var and new_fenv_var.
rs6000: Properly handle LE index munging in vec_shr (PR94710)
The PR shows the compiler crashing with -mvsx -mlittle -O0. This turns
out to be caused by a failure to make of the higher bits in an index
endian conversion.
Jakub Jelinek [Wed, 26 Aug 2020 08:30:15 +0000 (10:30 +0200)]
dwarf2out: Fix up dwarf2out_next_real_insn caching [PR96729]
The addition of NOTE_INSN_BEGIN_STMT and NOTE_INSN_INLINE_ENTRY notes
reintroduced quadratic behavior into dwarf2out_var_location.
This function needs to know the next real instruction to which the var
location note applies, but the way final_scan_insn is called outside of
final.c main loop doesn't make it easy to look up the next real insn in
there (and for non-dwarf it is even useless). Usually next real insn is
only a few notes away, but we can have hundreds of thousands of consecutive
notes only followed by a real insn. dwarf2out_var_location to avoid the
quadratic behavior contains a cache, it remembers the next note and when it
is called again on that loc_note, it can use the previously computed
dwarf2out_next_real_insn result, rather than walking the insn chain once
again. But, for NOTE_INSN_{BEGIN_STMT,INLINE_ENTRY} dwarf2out_var_location
is not called while the code puts into the cache those notes, which means if
we have e.g. in the worst case NOTE_INSN_VAR_LOCATION and
NOTE_INSN_BEGIN_STMT notes alternating, the cache is not really used.
The following patch fixes it by looking up the next NOTE_INSN_VAR_LOCATION
if any. While the lookup could be perhaps done together with looking for
the next real insn once (e.g. in dwarf2out_next_real_insn or its copy),
there are other dwarf2out_next_real_insn callers which don't need/want that
behavior and if there are more than two NOTE_INSN_VAR_LOCATION notes
followed by the same real insn, we need to do that "find next
NOTE_INSN_VAR_LOCATION" walk anyway.
On the testcase from the PR this patch speeds it 2.8times, from 0m0.674s
to 0m0.236s (why it takes for the reporter more than 60s is unknown).
2020-08-26 Jakub Jelinek <jakub@redhat.com>
PR debug/96729
* dwarf2out.c (dwarf2out_next_real_insn): Adjust function comment.
(dwarf2out_var_location): Look for next_note only if next_real is
non-NULL, in that case look for the first non-deleted
NOTE_INSN_VAR_LOCATION between loc_note and next_real, if any.
Jakub Jelinek [Tue, 25 Aug 2020 11:49:40 +0000 (13:49 +0200)]
gimple: Ignore *0 = {CLOBBER} in path isolation [PR96722]
Clobbers of MEM_REF with NULL address are just fancy nops, something we just
ignore and don't emit any code for it (ditto for other clobbers), they just
mark end of life on something, so we shouldn't infer from those that there
is some UB.
Jakub Jelinek [Tue, 18 Aug 2020 05:51:58 +0000 (07:51 +0200)]
c: Fix -Wunused-but-set-* warning with _Generic [PR96571]
The following testcase shows various problems with -Wunused-but-set*
warnings and _Generic construct. I think it is best to treat the selector
and the ignored expressions as (potentially) read, because when they are
parsed, the vars in there are already marked as TREE_USED.
2020-08-18 Jakub Jelinek <jakub@redhat.com>
PR c/96571
* c-parser.c (c_parser_generic_selection): Change match_found from bool
to int, holding index of the match. Call mark_exp_read on the selector
expression and on expressions other than the selected one.
Jakub Jelinek [Tue, 11 Aug 2020 14:46:49 +0000 (16:46 +0200)]
c-family: Fix ICE in get_atomic_generic_size [PR96545]
As the testcase shows, we would ICE if the type of the first argument of
various atomic builtins was pointer to (non-void) incomplete type, we would
assume that TYPE_SIZE_UNIT must be non-NULL. This patch diagnoses it
instead. And also changes the TREE_CODE != INTEGER_CST check to
!tree_fits_uhwi_p, as we use tree_to_uhwi after this and at least in theory
the int could be too large and not fit.
2020-08-11 Jakub Jelinek <jakub@redhat.com>
PR c/96545
* c-common.c (get_atomic_generic_size): Require that first argument's
type points to a complete type and use tree_fits_uhwi_p instead of
just INTEGER_CST TREE_CODE check for the TYPE_SIZE_UNIT.
Jakub Jelinek [Sat, 8 Aug 2020 09:10:30 +0000 (11:10 +0200)]
openmp: Handle clauses with gimple sequences in convert_nonlocal_omp_clauses properly
If the walk_body on the various sequences of reduction, lastprivate and/or linear
clauses needs to create a temporary variable, we should declare that variable
in that sequence rather than outside, where it would need to be privatized inside of
the construct.
2020-08-08 Jakub Jelinek <jakub@redhat.com>
PR fortran/93553
* tree-nested.c (convert_nonlocal_omp_clauses): For
OMP_CLAUSE_REDUCTION, OMP_CLAUSE_LASTPRIVATE and OMP_CLAUSE_LINEAR
save info->new_local_var_chain around walks of the clause gimple
sequences and declare_vars if needed into the sequence.
Jakub Jelinek [Wed, 15 Jul 2020 09:34:44 +0000 (11:34 +0200)]
fix _mm512_{,mask_}cmp*_p[ds]_mask at -O0 [PR96174]
The _mm512_{,mask_}cmp_p[ds]_mask and also _mm_{,mask_}cmp_s[ds]_mask
intrinsics have an argument which must have a constant passed to it
and so use an inline version only for ifdef __OPTIMIZE__ and have
a #define for -O0. But the _mm512_{,mask_}cmp*_p[ds]_mask intrinsics
don't need a constant argument, they are essentially the first
set with the constant added to them implicitly based on the comparison
name, and so there is no #define version for them (correctly).
But their inline versions are defined in between the first and s[ds]
set and so inside of ifdef __OPTIMIZE__, which means that with -O0
they aren't defined at all.
This patch fixes that by moving those after the #ifdef __OPTIMIZE #else
use #define #endif block.
Jakub Jelinek [Thu, 2 Jul 2020 09:38:20 +0000 (11:38 +0200)]
tree-cfg: Fix ICE with switch stmt to unreachable opt and forced labels [PR95857]
The following testcase ICEs, because during the cfg cleanup, we see:
switch (i$e_11) <default: <L12> [33.33%], case -3: <lab2> [33.33%], case 0: <L10> [33.33%], case 2: <lab2> [33.33%]>
...
lab2:
__builtin_unreachable ();
where lab2 is FORCED_LABEL. The way it works, we go through the case labels
and when we reach the first one that points to gimple_seq_unreachable*
basic block, we remove the edge (if any) from the switch bb to the bb
containing the label and bbs reachable only through that edge we've just
removed. Once we do that, we must throw away all other cases that use
the same label (or some other labels from the same bb we've removed the edge
to and the bb). To avoid quadratic behavior, this is not done by walking
all remaining cases immediately before removing, but only when processing
them later.
For normal labels this works, fine, if the label is in a deleted bb, it will
have NULL label_to_block and we handle that case, or, if the unreachable bb
has some other edge to it, only the edge will be removed and not the bb,
and again, find_edge will not find the edge and we only remove the case.
And if a label would be to some other block, that other block wouldn't have
been removed earlier because there would be still an edge from the switch
block.
Now, FORCED_LABEL (and I think DECL_NONLOCAL too) break this, because
those labels aren't removed, but instead moved to some surrounding basic
block. So, when we later process those, when their gimple_seq_unreachable*
basic block is removed, label_to_block will return some unrelated block
(in the testcase the switch bb), so we decide to keep the case which doesn't
seem to be unreachable, but we don't really have an edge from the switch
block to the block the label got moved to.
I thought first about punting in gimple_seq_unreachable* on
FORCED_LABEL/DECL_NONLOCAL labels, but that might penalize even code that
doesn't care, so this instead just makes sure that for
FORCED_LABEL/DECL_NONLOCAL labels that are being removed (and thus moved
randomly) we remember in a hash_set the fact that those labels should be
treated as removed for the purpose of the optimization, and later on
handle those labels that way.
2020-07-02 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/95857
* tree-cfg.c (group_case_labels_stmt): When removing an unreachable
base_bb, remember all forced and non-local labels on it and later
treat those as if they have NULL label_to_block. Formatting fix.
Fix a comment typo.
Jakub Jelinek [Sat, 27 Jun 2020 10:38:23 +0000 (12:38 +0200)]
c-family: Use TYPE_OVERFLOW_UNDEFINED instead of !TYPE_UNSIGNED in pointer_sum [PR95903]
For lp64 targets and int off ... ptr[off + 1]
is lowered in pointer_sum to *(ptr + ((sizetype) off + (sizetype) 1)).
That is fine when signed integer wrapping is undefined (and is not done
already if off has unsigned type), but changes behavior for -fwrapv, where
overflow is well defined. Runtime test could be:
int
main ()
{
char *p = __builtin_malloc (0x100000000UL);
if (!p) return 0;
char *q = p + 0x80000000UL;
int o = __INT_MAX__;
q[o + 1] = 1;
if (q[-__INT_MAX__ - 1] != 1) __builtin_abort ();
return 0;
}
with -fwrapv or so, not included in the testsuite because it requires 4GB
allocation (with some other test it would be enough to have something
slightly above 2GB, but still...).
2020-06-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/95903
gcc/c-family/
* c-common.c (pointer_int_sum): Use TYPE_OVERFLOW_UNDEFINED instead of
!TYPE_UNSIGNED check to see if we can apply distributive law and handle
smaller precision intop operands separately.
gcc/testsuite/
* c-c++-common/pr95903.c: New test.
Jakub Jelinek [Thu, 28 May 2020 21:40:54 +0000 (23:40 +0200)]
c++: Try to complete decomp types [PR95328]
Two years ago Paolo has added the
else if (processing_template_decl && !COMPLETE_TYPE_P (type))
pedwarn (...);
lines into cp_finish_decomp. For type dependent decl we punt much earlier,
but even for types which aren't type dependent COMPLETE_TYPE_P might be
false as this testcase shows, so this patch tries to complete_type first
(the reason for writing it that way is that it is then followed by another
else if and if complete_type returns error_mark_node, we shouldn't report
anything, as a bug should have been reported already.
2020-05-28 Jakub Jelinek <jakub@redhat.com>
PR c++/95328
* decl.c (cp_finish_decomp): Call complete_type before checking
COMPLETE_TYPE_P.
Jakub Jelinek [Thu, 14 May 2020 07:51:05 +0000 (09:51 +0200)]
openmp: Fix placement of 2nd+ preparation statement for PHIs in simd clone lowering [PR95108]
For normal stmts, preparation statements are inserted before the stmt, so if we need multiple,
they are in the correct order, but for PHIs we emit them after labels in the entry successor
bb, and we used to emit them in the reverse order that way.
2020-05-14 Jakub Jelinek <jakub@redhat.com>
PR middle-end/95108
* omp-simd-clone.c (struct modify_stmt_info): Add after_stmt member.
(ipa_simd_modify_stmt_ops): For PHIs, only add before first stmt in
entry block if info->after_stmt is NULL, otherwise add after that stmt
and update it after adding each stmt.
(ipa_simd_modify_function_body): Initialize info.after_stmt.
Jakub Jelinek [Wed, 13 May 2020 09:22:37 +0000 (11:22 +0200)]
Fix -fcompare-debug issue in purge_dead_edges [PR95080]
The following testcase fails with -fcompare-debug, the bug used to be latent
since introduction of -fcompare-debug.
The loop at the start of purge_dead_edges behaves differently between -g0
and -g - if the last insn is a DEBUG_INSN, then it skips not just
DEBUG_INSNs but also NOTEs until it finds some other real insn (or bb head),
while with -g0 it will not skip any NOTEs, so if we have
real_insn
note
debug_insn // not present with -g0
then with -g it might remove useless REG_EH_REGION from real_insn, while
with -g0 it will not.
Yet another option would be not skipping NOTE_P in the loop; I couldn't find
in history rationale for why it is done.
2020-05-13 Jakub Jelinek <jakub@redhat.com>
PR debug/95080
* cfgrtl.c (purge_dead_edges): Skip over debug and note insns even
if the last insn is a note.
Jakub Jelinek [Wed, 6 May 2020 21:38:13 +0000 (23:38 +0200)]
c++: Avoid strict_aliasing_warning on dependent types or expressions [PR94951]
The following testcase gets a bogus warning during build_base_path,
when cp_build_indirect_ref* calls strict_aliasing_warning with a dependent
expression. IMHO calling get_alias_set etc. on dependent types feels wrong
to me, we should just defer the warnings in those cases until instantiation
and only handle the cases where neither type nor expr are dependent.
2020-05-06 Jakub Jelinek <jakub@redhat.com>
PR c++/94951
* typeck.c (cp_strict_aliasing_warning): New function.
(cp_build_indirect_ref_1, build_reinterpret_cast_1): Use
it instead of strict_aliasing_warning.
* g++.dg/warn/Wstrict-aliasing-bogus-tmpl.C: New test.
Jakub Jelinek [Wed, 6 May 2020 07:40:33 +0000 (09:40 +0200)]
riscv: Fix up riscv_atomic_assign_expand_fenv [PR94950]
Similarly to the fixes on many other targets, riscv needs to use TARGET_EXPR
to avoid having the create_tmp_var_raw temporaries without proper DECL_CONTEXT
and not mentioned in local decls.
2020-05-06 Jakub Jelinek <jakub@redhat.com>
PR target/94950
* config/riscv/riscv-builtins.c (riscv_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for first assignment to old_flags.
Jakub Jelinek [Wed, 6 May 2020 07:31:19 +0000 (09:31 +0200)]
combine: Don't replace SET_SRC with REG_EQUAL note content if SET_SRC has side-effects [PR94873]
There were some discussions about whether REG_EQUAL notes are valid on insns with a single
set which contains auto-inc-dec side-effects in the SET_SRC and the majority thinks that
it should be valid. So, this patch fixes the combiner to punt in that case, because otherwise
the auto-inc-dec side-effects from the SET_SRC are lost.
2020-05-06 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/94873
* combine.c (combine_instructions): Don't optimize using REG_EQUAL
note if SET_SRC (set) has side-effects.
Jakub Jelinek [Thu, 30 Apr 2020 19:48:30 +0000 (21:48 +0200)]
c: Fix ICE with _Atomic side-effect in nested fn param decls [PR94842]
If there are _Atomic side-effects in the parameter declarations
of non-nested function, when they are parsed, current_function_decl is
NULL, the create_artificial_label created labels during build_atomic* are
then adjusted by store_parm_decls through set_labels_context_r callback.
Unfortunately, if such thing happens in nested function parameter
declarations, while those decls are parsed current_function_decl is the
parent function (and am not sure it is a good idea to temporarily clear it,
some code perhaps should be aware it is in a nested function, or it can
refer to variables from the parent function etc.) and that means
store_param_decls through set_labels_context_r doesn't adjust anything.
As those labels are emitted in the nested function body rather than in the
parent, I think it is ok to override the context in those cases.
2020-04-30 Jakub Jelinek <jakub@redhat.com>
PR c/94842
* c-decl.c (set_labels_context_r): In addition to context-less
LABEL_DECLs adjust also LABEL_DECLs with context equal to
parent function if any.
(store_parm_decls): Adjust comment.
Jakub Jelinek [Sat, 2 May 2020 10:09:04 +0000 (12:09 +0200)]
tilegx: Unbreak build
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '1' (via 'I124MODE:n') or '4' (via 'I48MODE:n')
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '1' (via 'I124MODE:n') or '' (via 'I48MODE:n')
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '2' (via 'I124MODE:n') or '4' (via 'I48MODE:n')
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '2' (via 'I124MODE:n') or '' (via 'I48MODE:n')
../../gcc/config/tilegx/tilegx.md:4109:1: ambiguous attribute 'n'; could be '4' (via 'I124MODE:n') or '' (via 'I48MODE:n')
The insn name already uses <I124MODE:n> explicitly, just the preparation
stmts don't, and as it creates a I124MODE lowpart subreg of a word mode
register, <I124MODE:n> seems obviously correct.
2020-05-02 Jakub Jelinek <jakub@redhat.com>
* config/tilegx/tilegx.md
(insn_stnt<I124MODE:n>_add<I48MODE:bitsuffix>): Use <I124MODE:n>
rather than just <n>.
Jakub Jelinek [Wed, 29 Apr 2020 15:31:26 +0000 (17:31 +0200)]
x86: Fix -O0 remaining intrinsic macros [PR94832]
A few other macros seem to suffer from the same issue. What I've done was:
cat gcc/config/i386/*intrin.h | sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' \
| grep '^[[:blank:]]*#[[:blank:]]*define[[:blank:]].*(' | sed 's/[ ]\+/ /g' \
> /tmp/macros
and then looking for regexps:
)[a-zA-Z]
) [a-zA-Z]
[a-zA-Z][-+*/%]
[a-zA-Z] [-+*/%]
[-+*/%][a-zA-Z]
[-+*/%] [a-zA-Z]
in the resulting file.
As reported in the PR, while most intrinsic -O0 macro argument uses
are properly wrapped in ()s or used in context where having a complex
expression passed as the argument doesn't pose a problem (e.g. when
macro argument use is in between commas, or between ( and comma, or
between comma and ) etc.), especially the gather/scatter macros don't do
this and if one passes to some macro e.g. x + y as argument, the
corresponding inline function would do cast on the argument, but
the macro does (int) ARG, then it is (int) x + y rather than (int) (x + y).
The following patch fixes those issues in *gather/*scatter*; additionally,
the AVX2 macros were passing incorrect mask of e.g.
(__v2df)_mm_set1_pd((double)(long long int) -1)
which is IMHO equivalent to
(__v2df){-1.0, -1.0}
when it really wants to pass __v2df vector with all bits set.
I've used what the inline functions use for those cases.
This is the rs6000 version of the earlier committed x86, aarch64 and arm
fixes, as create_tmp_var_raw is used because the C FE can call this outside
of function context, we need to make sure the first references to those
VAR_DECLs are through a TARGET_EXPR, so that it gets gimple_add_tmp_var
marked in whatever function it gets expanded in. Without that DECL_CONTEXT
is NULL and the vars aren't added as local decls of the containing function.
2020-04-29 Jakub Jelinek <jakub@redhat.com>
PR target/94826
* config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for first assignment to
fenv_var, fenv_clear and old_fenv variables. For fenv_addr
take address of TARGET_EXPR of fenv_var with void_node initializer.
Formatting fixes.
This is a simple fix for pr94820.
The PR was only fixed on i386, the same error was also reported on aarch64.
This function, because it is sometimes called even outside of function bodies, uses create_tmp_var_raw rather than create_tmp_var.
But in order for that to work, when first referenced, the VAR_DECLs need to appear in a TARGET_EXPR so that during gimplification
the var gets the right DECL_CONTEXT and is added to local decls. Without that, e.g. tree-nested.c ICEs on those.
PR target/94820
* config/aarch64/aarch64-builtins.c
(aarch64_atomic_assign_expand_fenv): Use TARGET_EXPR instead of
MODIFY_EXPR for first assignment to fenv_cr, fenv_sr and
new_fenv_var.
Jakub Jelinek [Tue, 28 Apr 2020 09:26:56 +0000 (11:26 +0200)]
tree: Fix up TREE_SIDE_EFFECTS on internal calls [PR94809]
On the following testcase, match.pd during GENERIC folding
changes the -1U / x < y into __imag__ .MUL_OVERFLOW (x, y),
but unfortunately unlike for normal calls nothing sets TREE_SIDE_EFFECTS on
the call. There is the process_call_operands function that non-internal
call creation calls and it is usable for internal calls too,
e.g. TREE_SIDE_EFFECTS is derived from checking whether the
call has side-effects (non-ECF_{CONST,PURE}; we have those for internal
calls) and from whether any of the arguments has TREE_SIDE_EFFECTS.
Jakub Jelinek [Mon, 27 Apr 2020 19:14:52 +0000 (21:14 +0200)]
x86: Fix up ix86_atomic_assign_expand_fenv [PR94780]
This function, because it is sometimes called even outside of function
bodies, uses create_tmp_var_raw rather than create_tmp_var. But in order
for that to work, when first referenced, the VAR_DECLs need to appear in a
TARGET_EXPR so that during gimplification the var gets the right
DECL_CONTEXT and is added to local decls. Without that, e.g. tree-nested.c
ICEs on those.
2020-04-27 Jakub Jelinek <jakub@redhat.com>
PR target/94780
* config/i386/i386.c (ix86_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for first assignment to
sw_var, exceptions_var, mxcsr_orig_var and mxcsr_mod_var.
Jakub Jelinek [Fri, 24 Apr 2020 22:11:35 +0000 (00:11 +0200)]
c++: Avoid -Wreturn-type warning if a template fn calls noreturn template fn [PR94742]
finish_call_expr already has code to set current_function_returns_abnormally
if a template calls a noreturn function, but on the following testcase it
doesn't call a FUNCTION_DECL, but TEMPLATE_DECL instead, in which case
we didn't check noreturn at all and just assumed it could return.
2020-04-25 Jakub Jelinek <jakub@redhat.com>
PR c++/94742
* semantics.c (finish_call_expr): When looking if all overloads
are noreturn, use STRIP_TEMPLATE to look through TEMPLATE_DECLs.
Jakub Jelinek [Thu, 23 Apr 2020 19:57:50 +0000 (21:57 +0200)]
Shortcut identity VEC_PERM expansion [PR94710]
This PR is about the rs6000 backend emitting wrong assembly
for whole vector shift by 0, and while I think it is desirable
to fix the backend, I don't see a point why the expander should
try to emit that, whole vector shift by 0 is identity, we can just
return the operand.
2020-04-23 Jakub Jelinek <jakub@redhat.com>
PR target/94710
* optabs.c (expand_vec_perm_const): For shift_amt const0_rtx
just return v2.
Jakub Jelinek [Fri, 17 Apr 2020 08:33:27 +0000 (10:33 +0200)]
Fix -fcompare-debug issue in delete_insn_and_edges [PR94618]
delete_insn_and_edges calls purge_dead_edges whenever deleting the last insn
in a bb, whatever it is. If it called it only for mandatory last insns
in the basic block (that may not be followed by DEBUG_INSNs, dunno if that
is control_flow_insn_p or something more complex), that wouldn't be a
problem, but as it calls it on any last insn and can actually do something
in the bb, if such an insn is followed by one more more DEBUG_INSNs and
nothing else in the same bb, we don't call purge_dead_edges with -g and do
call it with -g0.
On the testcase, there are two reg-to-reg moves with REG_EH_REGION notes
(previously memory accesses but simplified and yet not optimized), and the
second is followed by DEBUG_INSNs; the second move is delete_insn_and_edges
and after removing it, for -g0 purge_dead_edges removes the REG_EH_REGION
from the now last insn in the bb (the first reg-to-reg move), while
for -g it isn't called and things diverge from that quickly on.
Fixed by calling purdge_dead_edges even if we remove the last real insn
followed only by DEBUG_INSNs in the same bb.
2020-04-17 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/94618
* cfgrtl.c (delete_insn_and_edges): Set purge not just when
insn is the BB_END of its block, but also when it is only followed
by DEBUG_INSNs in its block.
Jakub Jelinek [Thu, 16 Apr 2020 05:19:57 +0000 (07:19 +0200)]
c++: Fix pasto in structured binding diagnostics [PR94571]
This snippet has been copied from the non-structured binding declaration
parsing later in the function, and while for non-structured bindings
it can be followed by comma or semicolon, structured bindings may be
only followed by semicolon.
Or, do we want to have a different message for the case when there is
a comma (and keep this corrected one only if there is something else)
that would explain better what is the bug (or add a fix-it hint)?
Marek said in the PR that clang++ reports
error: decomposition declaration must be the only declaration in its group
There is another thing Marek noted (though, something for different spot),
that diagnostic for auto x(1), [e,f] = test2; could also use a clearer
wording like the above (or a fix-it hint), but the question is if we should
assume [ after , as a structured binding or if we should do some tentative
parsing first to figure out if it looks like a structured binding.
2020-04-16 Jakub Jelinek <jakub@redhat.com>
PR c++/94571
* parser.c (cp_parser_simple_declaration): Fix up a pasto in
diagnostics.
Jakub Jelinek [Wed, 8 Apr 2020 19:22:05 +0000 (21:22 +0200)]
vect: Fix up lowering of TRUNC_MOD_EXPR by negative constant [PR94524]
The first testcase below is miscompiled, because for the division part
of the lowering we canonicalize negative divisors to their absolute value
(similarly how expmed.c canonicalizes it), but when multiplying the division
result back by the VECTOR_CST, we use the original constant, which can
contain negative divisors.
Fixed by computing ABS_EXPR of the VECTOR_CST. Unfortunately, fold-const.c
doesn't support const_unop (ABS_EXPR, VECTOR_CST) and I think it is too late
in GCC 10 cycle to add it now.
Furthermore, while modulo by most negative constant happens to return the
right value, it does that only by invoking UB in the IL, because
we then expand division by that 1U+INT_MAX and say for INT_MIN % INT_MIN
compute the division as -1, and then multiply by INT_MIN, which is signed
integer overflow. We in theory could do the computation in unsigned vector
types instead, but is it worth bothering. People that are doing % INT_MIN
are either testing for standard conformance, or doing something wrong.
So, I've also added punting on % INT_MIN, both in vect lowering and vect
pattern recognition (we punt already for / INT_MIN).
2020-04-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94524
* tree-vect-generic.c (expand_vector_divmod): If any elt of op1 is
negative for signed TRUNC_MOD_EXPR, multiply with absolute value of
op1 rather than op1 itself at the end. Punt for signed modulo by
most negative constant.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Punt for signed
modulo by most negative constant.
* gcc.c-torture/execute/pr94524-1.c: New test.
* gcc.c-torture/execute/pr94524-2.c: New test.
Jakub Jelinek [Wed, 8 Apr 2020 16:24:12 +0000 (18:24 +0200)]
i386: Don't use AVX512F integral masks for V*TImode [PR94438]
The ix86_get_mask_mode hook uses int mask for 512-bit vectors or 128/256-bit
vectors with AVX512VL (that is correct), and only for V*[SD][IF]mode if not
AVX512BW (also correct), but with AVX512BW it would stop checking the
elem_size altogether and pretend the hw has masking support for V*TImode
etc., which it doesn't. That can lead to various ICEs later on.
2020-04-08 Jakub Jelinek <jakub@redhat.com>
PR target/94438
* config/i386/i386.c (ix86_get_mask_mode): Only use int mask for elem_size
1, 2, 4 and 8.
* gcc.target/i386/avx512bw-pr94438.c: New test.
* gcc.target/i386/avx512vlbw-pr94438.c: New test.
Jakub Jelinek [Wed, 8 Apr 2020 13:30:16 +0000 (15:30 +0200)]
c++: Further fix for -fsanitize=vptr [PR94325]
For -fsanitize=vptr, we insert a NULL store into the vptr instead of just
adding a CLOBBER of this. build_clobber_this makes the CLOBBER conditional
on in_charge (implicit) parameter whenever CLASSTYPE_VBASECLASSES, but when
adding this conditionalization to the -fsanitize=vptr code in PR87095,
I wanted it to catch some more cases when the class has CLASSTYPE_VBASECLASSES,
but the vptr is still not shared with something else, otherwise the
sanitization would be less effective.
The following testcase shows that the chosen test that CLASSTYPE_PRIMARY_BINFO
is non-NULL and has BINFO_VIRTUAL_P set wasn't sufficient,
the D class has still sizeof(D) == sizeof(void*) and thus contains just
a single vptr, but while in B::~B() this results in the vptr not being
cleared, in C::~C() this condition isn't true, as CLASSTYPE_PRIMARY_BINFO
in that case is B and is not BINFO_VIRTUAL_P, so it clears the vptr, but the
D::~D() dtor after invoking C::~C() invokes A::~A() with an already cleared
vptr, which is then reported.
The following patch is just a shot in the dark, keep looking through
CLASSTYPE_PRIMARY_BINFO until we find BINFO_VIRTUAL_P, but it works on the
existing testcase as well as this new one.
2020-04-08 Jakub Jelinek <jakub@redhat.com>
PR c++/94325
* decl.c (begin_destructor_body): For CLASSTYPE_VBASECLASSES class
dtors, if CLASSTYPE_PRIMARY_BINFO is non-NULL, but not BINFO_VIRTUAL_P,
look at CLASSTYPE_PRIMARY_BINFO of its BINFO_TYPE if it is not
BINFO_VIRTUAL_P, and so on.
The following testcases are miscompiled, because expand_vec_perm_pshufb
incorrectly thinks it can use vpshufb instruction for the permutations
when it can't.
The
if (vmode == V32QImode)
{
/* vpshufb only works intra lanes, it is not
possible to shuffle bytes in between the lanes. */
for (i = 0; i < nelt; ++i)
if ((d->perm[i] ^ i) & (nelt / 2))
return false;
}
intra-lane check which is correct has been copied and adjusted for 64-byte
modes into:
if (vmode == V64QImode)
{
/* vpshufb only works intra lanes, it is not
possible to shuffle bytes in between the lanes. */
for (i = 0; i < nelt; ++i)
if ((d->perm[i] ^ i) & (nelt / 4))
return false;
}
which is not correct, because 64-byte modes have 4 lanes rather than just
two and the above is only testing that the permutation grabs even lane elts
from even lanes and odd lane elts from odd lanes, but not that they are
from the same 256-bit half.
The following patch fixes it by using 3 * nelt / 4 instead of nelt / 4,
so we actually check the most significant 2 bits rather than just one.
2020-04-07 Jakub Jelinek <jakub@redhat.com>
PR target/94509
* config/i386/i386.c (expand_vec_perm_pshufb): Fix the check
for inter-lane permutation for 64-byte modes.
* gcc.target/i386/avx512bw-pr94509-1.c: New test.
* gcc.target/i386/avx512bw-pr94509-2.c: New test.
The following testcase ICEs on aarch64 apparently since the introduction of
the aarch64 port. The reason is that the {ashl,ashr,lshr}<mode>3 expanders
completely unnecessarily FAIL; if operands[2] is something other than
a CONST_INT or REG or MEM and the middle-end code can't cope with the
pattern giving up in these cases. All the expanders use general_operand
predicate for the shift amount operand, but then have just a special case
for CONST_INT (if in-bound, emit an immediate shift, otherwise force into
REG), or MEM (force into REG), or REG (that is the case it handles).
In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid
general_operand.
I don't see any reason what is magic about MEMs that it should be forced
into REG and others like SUBREGs that it shouldn't, there isn't even a
reason to check for !REG_P because force_reg will do nothing if the operand
is already a REG, and otherwise can handle general_operand just fine.
2020-04-07 Jakub Jelinek <jakub@redhat.com>
PR target/94488
* config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3,
ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT.
Assume it is a REG after that instead of testing it and doing FAIL
otherwise. Formatting fix.
Jakub Jelinek [Tue, 7 Apr 2020 19:01:40 +0000 (21:01 +0200)]
debug: Improve debug info of c++14 deduced return type [PR94459]
On the following testcase, in gdb ptype S<long>::m1 prints long as return
type, but all the other methods show void instead.
PR53756 added code to add_type_attribute if the return type is
auto/decltype(auto), but we actually should look through references,
pointers and qualifiers.
Haven't included there DW_TAG_atomic_type, because I think at least ATM
one can't use that in C++. Not sure about DW_TAG_array_type or what else
could be deduced.
> http://eel.is/c++draft/dcl.spec.auto#3 says it has to appear as a
> decl-specifier.
>
> http://eel.is/c++draft/temp.deduct.type#8 lists the forms where a template
> argument can be deduced.
>
> Looks like you are missing arrays, pointers to members, and function return
> types.
2020-04-04 Hannes Domani <ssbssa@yahoo.de>
Jakub Jelinek <jakub@redhat.com>
PR debug/94459
* dwarf2out.c (gen_subprogram_die): Look through references, pointers,
arrays, pointer-to-members, function types and qualifiers when
checking if in-class DIE had an 'auto' or 'decltype(auto)' return type
to emit type again on definition.
The following testcase is miscompiled, because the AVX2 patterns don't
describe correctly what the insn does. E.g. vphaddd with %ymm* operands
(the second pattern) instruction as per:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_hadd_epi32&expand=2941
does { a0+a1, a2+a3, b0+b1, b2+b3, a4+a5, a6+a7, b4+b5, b6+b7 }
but our RTL pattern did
{ a0+a1, a2+a3, a4+a5, a6+a7, b0+b1, b2+b3, b4+b5, b6+b7 }
where the first and last 64 bits are the same and two middle 64 bits
swapped.
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_hadd_epi16&expand=2939
similarly, insn does:
{ a0+a1, a2+a3, a4+a5, a6+a7, b0+b1, b2+b3, b4+b5, b6+b7,
a8+a9, a10+a11, a12+a13, a14+a15, b8+b9, b10+b11, b12+b13, b14+b15 }
but RTL pattern did
{ a0+a1, a2+a3, a4+a5, a6+a7, a8+a9, a10+a11, a12+a13, a14+a15,
b0+b1, b2+b3, b4+b5, b6+b7, b8+b9, b10+b11, b12+b13, b14+b15 }
again, first and last 64 bits are the same and the two middle 64 bits
swapped.
2020-04-03 Jakub Jelinek <jakub@redhat.com>
PR target/94460
* config/i386/sse.md (avx2_ph<plusminus_mnemonic>wv16hi3,
avx2_ph<plusminus_mnemonic>dv8si3): Fix up RTL pattern to do
second half of first lane from first lane of second operand and
first half of second lane from second lane of first operand.
Jakub Jelinek [Tue, 7 Apr 2020 19:01:06 +0000 (21:01 +0200)]
objsz: Don't call replace_uses_by on SSA_NAME_OCCURS_IN_ABNORMAL_PHI [PR94423]
The following testcase ICEs because the objsz pass calls replace_uses_by
on SSA_NAME_OCCURS_IN_ABNORMAL_PHI SSA_NAME. The following patch instead
of that calls replace_call_with_value, which will turn it into
xyz_123(ab) = 234;
Jakub Jelinek [Tue, 7 Apr 2020 19:00:28 +0000 (21:00 +0200)]
Fix vextract* masked patterns [PR93069]
The AVX512F documentation clearly states that in instructions where the
destination is a memory only merging-masking is possible, not zero-masking,
and the assembler enforces that.
The testcase in this patch fails to assemble because of
Error: unsupported masking for `vextracti32x8'
on
vextracti32x8 $0x0, %zmm1, -64(%rsp){%k1}{z}
For the vector extraction patterns, we apparently have 7 *_maskm patterns
that only accept memory destinations and rtx_equal_p merge-masking source
for it, 7 *<mask_name> corresponding patterns that allow memory destination
only for the non-masked cases (through <store_mask_constraint>), then 2
*<mask_name> patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which
do allow memory destination even for masked cases and are the cause of the
testsuite failure, because we must not allow C constraint if the destination
is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf
VI4F_256), which has another issue (for which I don't have a testcase
though), where if it would match zero-masking with register destination,
it wouldn't emit the needed {z} into assembly.
The attached patch fixes those 3 issues only, perhaps more suitable for
backporting.
2020-03-30 Jakub Jelinek <jakub@redhat.com>
PR target/93069
* config/i386/sse.md (vec_extract_lo_<mode><mask_name>): Use
<store_mask_constraint> instead of m in output operand constraint.
(vec_extract_hi_<mode><mask_name>): Use <mask_operand2> instead of
%{%3%}.
* gcc.target/i386/avx512vl-pr93069.c: New test.
* gcc.dg/vect/pr93069.c: New test.
Jakub Jelinek [Sat, 28 Mar 2020 09:21:52 +0000 (10:21 +0100)]
reassoc: Fix -fcompare-debug bug in reassociate_bb [PR94329]
The following testcase FAILs with -fcompare-debug, because reassociate_bb
mishandles the case when the last stmt in a bb has zero uses. In that case
reassoc_remove_stmt (like gsi_remove) moves the iterator to the next stmt,
i.e. gsi_end_p is true, which means the code sets the iterator back to
gsi_last_bb. The problem is that the for loop does gsi_prev on that before
handling the next statement, which means the former penultimate stmt, now
last one, is not processed by reassociate_bb.
Now, with -g, if there is at least one debug stmt at the end of the bb,
reassoc_remove_stmt moves the iterator to that following debug stmt and we
just do gsi_prev and continue with the former penultimate non-debug stmt,
now last non-debug stmt.
The following patch fixes that by not doing the gsi_prev in this case; there
are too many continue; cases, so I didn't want to copy over the gsi_prev to
all of them, so this patch uses a bool for that instead. The second
gsi_end_p check isn't needed anymore, because when we don't do the
undesirable gsi_prev after gsi = gsi_last_bb, the loop !gsi_end_p (gsi)
condition will catch the removal of the very last stmt from a bb.
2020-03-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94329
* tree-ssa-reassoc.c (reassociate_bb): When calling reassoc_remove_stmt
on the last stmt in a bb, make sure gsi_prev isn't done immediately
after gsi_last_bb.
Jakub Jelinek [Tue, 7 Apr 2020 18:59:37 +0000 (20:59 +0200)]
varasm: Fix output_constructor where a RANGE_EXPR index needs to skip some elts [PR94303]
The following testcase is miscompiled, because output_constructor doesn't
output the initializer correctly. The FE creates {[1...2] = 9} in this
case, and we emit .long 9; long 9; .zero 8 instead of the expected
.zero 8; .long 9; .long 9. If the CONSTRUCTOR is {[1] = 9, [2] = 9},
output_constructor_regular_field has code to notice that the current
location (local->total_bytes) is smaller than the location we want to write
to (1*sizeof(elt)) and will call assemble_zeros to skip those. But
RANGE_EXPRs are handled by a different function which didn't do this,
so for RANGE_EXPRs we emitted them properly only if local->total_bytes
was always equal to the location where the RANGE_EXPR needs to start.
2020-03-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/94303
* varasm.c (output_constructor_array_range): If local->index
RANGE_EXPR doesn't start at the current location in the constructor,
skip needed number of bytes using assemble_zeros or assert we don't
go backwards.
PR middle-end/94303
* g++.dg/torture/pr94303.C: New test.
Jakub Jelinek [Tue, 7 Apr 2020 18:57:37 +0000 (20:57 +0200)]
if-conv: Fix -fcompare-debug bugs in ifcvt_local_dce [PR94283]
The following testcase shows -fcompare-debug bugs in ifcvt_local_dce,
where the decisions what statements are needed is based also on debug stmt
operands, which is wrong.
So, this patch makes sure to never add debug stmt to the worklist, or never
add an assign to worklist just because it is used in a debug stmt in another
bb.
2020-03-24 Jakub Jelinek <jakub@redhat.com>
PR debug/94283
* tree-if-conv.c (ifcvt_local_dce): For gimple debug stmts, just set
GF_PLF_2, but don't add them to worklist. Don't add an assigment to
worklist or set GF_PLF_2 just because it is used in a debug stmt in
another bb. Formatting improvements.
Jakub Jelinek [Thu, 19 Mar 2020 11:22:47 +0000 (12:22 +0100)]
c++: Fix up handling of captured vars in lambdas in OpenMP clauses [PR93931]
Without the parser.c change we were ICEing on the testcase, because while the
uses of the captured vars inside of the constructs were replaced with capture
proxy decls, we didn't do that for decls in OpenMP clauses.
With that fixed, we don't ICE anymore, but the testcase is miscompiled and FAILs
at runtime. This is because the capture proxy decls have DECL_VALUE_EXPR and
during gimplification we were gimplifying those to their DECL_VALUE_EXPRs.
That is fine for shared vars, but for privatized ones we must not do that.
So that is what the cp-gimplify.c changes do. Had to add a DECL_CONTEXT check
before calling is_capture_proxy because some VAR_DECLs don't have DECL_CONTEXT
set (yet) and is_capture_proxy relies on that being non-NULL always.
2020-03-19 Jakub Jelinek <jakub@redhat.com>
PR c++/93931
* parser.c (cp_parser_omp_var_list_no_open): Call process_outer_var_ref
on outer_automatic_var_p decls.
* cp-gimplify.c (cxx_omp_disregard_value_expr): Return true also for
capture proxy decls.
Jakub Jelinek [Thu, 19 Mar 2020 09:24:16 +0000 (10:24 +0100)]
phiopt: Avoid -fcompare-debug bug in phiopt [PR94211]
Two years ago, I've added support for up to 2 simple preparation statements
in value_replacement, but the
- && estimate_num_insns (assign, &eni_time_weights)
+ && estimate_num_insns (bb_seq (middle_bb), &eni_time_weights)
change, meant that we compute the cost of all those statements rather than
just the single assign that has been the single supported non-debug
statement in the bb before, doesn't do what I thought would do, gimple_seq
is just gimple * and thus it can't be really overloaded depending on whether
we pass a single gimple * or a whole sequence. Which means in the last
two years it doesn't count all the statements, but only the first one.
With -g that happens to be a DEBUG_STMT, or it could be e.g. the first
preparation statement which could be much cheaper than the actual assign.
2020-03-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94211
* tree-ssa-phiopt.c (value_replacement): Use estimate_num_insns_seq
instead of estimate_num_insns for bb_seq (middle_bb). Rename
emtpy_or_with_defined_p variable to empty_or_with_defined_p, adjust
all uses.
Jakub Jelinek [Tue, 17 Mar 2020 21:32:34 +0000 (22:32 +0100)]
c: Handle C_TYPE_INCOMPLETE_VARS even for ENUMERAL_TYPEs [PR94172]
The following testcases ICE, because they contain extern variable
declarations with incomplete enum types that is later completed and after
that those variables are accessed. The ICEs are because the vars then may have
incorrect DECL_MODE etc., e.g. in the first case the var has SImode
DECL_MODE (the guessed mode for the enum), but the enum then actually has
DImode because its enumerators don't fit into unsigned int.
The following patch fixes it by using C_TYPE_INCOMPLETE_VARS not just on
incomplete struct/union types, but also incomplete enum types.
TYPE_VFIELD can't be used as it is TYPE_MIN_VALUE on ENUMERAL_TYPE,
thankfully TYPE_LANG_SLOT_1 has been used in the C FE only on
FUNCTION_TYPEs.
2020-03-17 Jakub Jelinek <jakub@redhat.com>
PR c/94172
* c-tree.h (C_TYPE_INCOMPLETE_VARS): Define to TYPE_LANG_SLOT_1
instead of TYPE_VFIELD, and support it on {RECORD,UNION,ENUMERAL}_TYPE.
(TYPE_ACTUAL_ARG_TYPES): Check that it is only used on FUNCTION_TYPEs.
* c-decl.c (pushdecl): Push C_TYPE_INCOMPLETE_VARS also to
ENUMERAL_TYPEs.
(finish_incomplete_vars): New function, moved from finish_struct. Use
relayout_decl instead of layout_decl.
(finish_struct): Remove obsolete comment about C_TYPE_INCOMPLETE_VARS
being TYPE_VFIELD. Use finish_incomplete_vars.
(finish_enum): Clear C_TYPE_INCOMPLETE_VARS. Call
finish_incomplete_vars.
* c-typeck.c (c_build_qualified_type): Clear C_TYPE_INCOMPLETE_VARS
also on ENUMERAL_TYPEs.
* gcc.dg/pr94172-1.c: New test.
* gcc.dg/pr94172-2.c: New test.
Jakub Jelinek [Tue, 17 Mar 2020 20:21:16 +0000 (21:21 +0100)]
c++: Fix parsing of invalid enum specifiers [PR90995]
The testcase shows some accepts-invalid (the ones without alignas) and
ice-on-invalid-code (the ones with alignas) cases.
If the enum doesn't have an underlying type and is not a definition,
the caller retries to parse it as elaborated type specifier.
E.g. for enum struct S s it will then pedwarn that elaborated type specifier
shouldn't have the struct/class keywords.
The problem is if the enum specifier is not followed by { when it has
underlying type. In that case we have already called
cp_parser_parse_definitely to end the tentative parsing started at the
beginning of cp_parser_enum_specifier. But the
cp_parser_error (parser, "expected %<;%> or %<{%>");
doesn't emit any error because the whole function is called from yet another
tentative parse and the caller starts parsing the elaborated type
specifier where the cp_parser_enum_specifier stopped (i.e. after the
underlying type token(s)). The ultimate caller than commits the tentative
parsing (and even if it wouldn't, it wouldn't know what kind of error
to report). I think after seeing enum {,struct,class} : type not being
followed by { or ;, there is no reason not to report it right away, as it
can't be valid C++, which is what the patch does. Not sure if we shouldn't
also return error_mark_node instead of NULL_TREE, so that the caller doesn't
try to parse it as elaborated type specifier (the patch doesn't do that
right now).
Furthermore, while reading the code, I've noticed that
parser->colon_corrects_to_scope_p is saved and set to false at the start
of the function, but not restored back in some cases. Don't have a testcase
where this would be a problem, but it just seems wrong. Either we can in
the two spots replace return NULL_TREE; with { type = NULL_TREE; goto out; }
or we could perhaps abuse warning_sentinel or create a special class with
dtor to clean the flag up.
And lastly, I've fixed some formatting issues in the function while reading
it.
2020-03-17 Jakub Jelinek <jakub@redhat.com>
PR c++/90995
* parser.c (cp_parser_enum_specifier): Use temp_override for
parser->colon_corrects_to_scope_p, replace goto out with return.
If scoped enum or enum with underlying type is not followed by
{ or ;, call cp_parser_commit_to_tentative_parse before calling
cp_parser_error and make sure to return error_mark_node instead of
NULL_TREE. Formatting fixes.
Jakub Jelinek [Mon, 16 Mar 2020 08:03:59 +0000 (09:03 +0100)]
tree-inline: Fix a -fcompare-debug issue in the inliner [PR94167]
The following testcase fails with -fcompare-debug. The problem is that
bar is marked as address_taken only with -g and not without.
I've tracked it down to insert_init_stmt calling gimple_regimplify_operands
even on DEBUG_STMTs. That function will just insert normal stmts before
the DEBUG_STMT if the DEBUG_STMT operand isn't gimple val or invariant.
While DCE will turn those statements into debug temporaries, it can cause
differences in SSA_NAMEs and more importantly, the ipa references are
generated from those before the DCE happens.
On the testcase, the DEBUG_STMT value is (int)bar.
We could generate DEBUG_STMTs with debug temporaries instead, but I fail to
see the reason to do that, DEBUG_STMTs allow other expressions and all we
want to ensure is that the expressions aren't too large (arbitrarily
complex), but during inlining/function versioning I don't see why something
would queue a DEBUG_STMT with arbitrarily complex expressions in there.
Jakub Jelinek [Fri, 13 Mar 2020 10:33:16 +0000 (11:33 +0100)]
aarch64: Fix another bug in aarch64_add_offset_1 [PR94121]
> I'm getting this ICE with -mabi=ilp32:
>
> during RTL pass: fwprop1
> /opt/gcc/gcc-20200312/gcc/testsuite/gcc.dg/pr94121.c: In function 'bar':
> /opt/gcc/gcc-20200312/gcc/testsuite/gcc.dg/pr94121.c:16:1: internal compiler error: in decompose, at rtl.h:2279
That is a preexisting issue, caused by another bug in the same function.
When mode is SImode and moffset is 0x80000000 (or anything else with the
bit 31 set), we need to sign-extend it.
2020-03-13 Jakub Jelinek <jakub@redhat.com>
PR target/94121
* config/aarch64/aarch64.c (aarch64_add_offset_1): Use gen_int_mode
instead of GEN_INT.
Jakub Jelinek [Thu, 12 Mar 2020 08:35:30 +0000 (09:35 +0100)]
doc: Fix up ASM_OUTPUT_ALIGNED_DECL_LOCAL description
When looking into PR94134, I've noticed bugs in the
ASM_OUTPUT_ALIGNED_DECL_LOCAL documentation. varasm.c has:
#if defined ASM_OUTPUT_ALIGNED_DECL_LOCAL
unsigned int align = symtab_node::get (decl)->definition_alignment ();
ASM_OUTPUT_ALIGNED_DECL_LOCAL (asm_out_file, decl, name,
size, align);
return true;
#elif defined ASM_OUTPUT_ALIGNED_LOCAL
unsigned int align = symtab_node::get (decl)->definition_alignment ();
ASM_OUTPUT_ALIGNED_LOCAL (asm_out_file, name, size, align);
return true;
#else
ASM_OUTPUT_LOCAL (asm_out_file, name, size, rounded);
return false;
#endif
and the ASM_OUTPUT_ALIGNED_LOCAL documentation properly mentions:
Like @code{ASM_OUTPUT_LOCAL} and mentions the same macro in another place.
The ASM_OUTPUT_ALIGNED_DECL_LOCAL description mentions non-existing macros
ASM_OUTPUT_ALIGNED_DECL and ASM_OUTPUT_DECL instead of the right ones
ASM_OUTPUT_ALIGNED_LOCAL and ASM_OUTPUT_LOCAL.
2020-03-12 Jakub Jelinek <jakub@redhat.com>
* doc/tm.texi.in (ASM_OUTPUT_ALIGNED_DECL_LOCAL): Change
ASM_OUTPUT_ALIGNED_DECL in description to ASM_OUTPUT_ALIGNED_LOCAL
and ASM_OUTPUT_DECL to ASM_OUTPUT_LOCAL.
* doc/tm.texi: Regenerated.
Jakub Jelinek [Thu, 12 Mar 2020 08:34:00 +0000 (09:34 +0100)]
tree-dse: Fix mem* head trimming if call has lhs [PR94130]
As the testcase shows, if DSE decides to head trim {mem{set,cpy,move},strncpy}
and the call has lhs, it is incorrect to leave the lhs as is, because it
will then point to the adjusted address (base + head_trim) instead of the
original base.
The following patch fixes that by dropping the lhs of the call and assigning
lhs the original base in a following statement.
2020-03-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94130
* tree-ssa-dse.c: Include gimplify.h.
(increment_start_addr): If stmt has lhs, drop the lhs from call and
set it after the call to the original value of the first argument.
Formatting fixes.
(decrement_count): Formatting fix.
Jakub Jelinek [Wed, 11 Mar 2020 17:35:13 +0000 (18:35 +0100)]
pdp11: Fix handling of common (local and global) vars [PR94134]
As mentioned in the PR, the generic code decides to put the a variable into
lcomm_section, which is a NOSWITCH section and thus the generic code doesn't
switch into a particular section before using
ASM_OUTPUT{_ALIGNED{,_DECL}_}_LOCAL, on many targets that results just in
.lcomm (or for non-local .comm) directives which don't need a switch to some
section, other targets put switch_to_section (bss_section) at the start of
that macro.
pdp11 doesn't do that (and doesn't have bss_section), and so emits the
lcomm/comm variables in whatever section is current (it has only .text/.data
and for DEC assembler rodata).
The following patch fixes that by putting it always into data section, and
additionally avoids emitting an empty line in the assembly for the lcomm
vars.
2020-03-11 Jakub Jelinek <jakub@redhat.com>
PR target/94134
* config/pdp11/pdp11.c (pdp11_asm_output_var): Call switch_to_section
at the start to switch to data section. Don't print extra newline if
.globl directive has not been emitted.
Jakub Jelinek [Wed, 11 Mar 2020 09:54:22 +0000 (10:54 +0100)]
aarch64: Fix ICE in aarch64_add_offset_1 [PR94121]
abs_hwi asserts that the argument is not HOST_WIDE_INT_MIN and as the
(invalid) testcase shows, the function can be called with such an offset.
The following patch is IMHO minimal fix, absu_hwi unlike abs_hwi allows even
that value and will return (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MIN
in that case. The function then uses moffset in two spots which wouldn't
care if the value is (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MIN or
HOST_WIDE_INT_MIN and wouldn't accept it (!moffset and
aarch64_uimm12_shift (moffset)), then in one spot where the signedness of
moffset does matter and using unsigned is the right thing -
moffset < 0x1000000 - and finally has code which will handle even this
value right; the assembler doesn't really care for DImode immediates if
mov x1, -9223372036854775808
or
mov x1, 9223372036854775808
is used and similarly it doesn't matter if we add or sub it in DImode.
2020-03-11 Jakub Jelinek <jakub@redhat.com>
PR target/94121
* config/aarch64/aarch64.c (aarch64_add_offset_1): Use absu_hwi
instead of abs_hwi, change moffset type to unsigned HOST_WIDE_INT.
Jakub Jelinek [Wed, 11 Mar 2020 08:33:52 +0000 (09:33 +0100)]
dfp: Fix decimal_to_binary [PR94111]
As e.g. decimal_from_decnumber shows, the REAL_VALUE_TYPE representation
contains a decimal128 embedded in ->sig only if it is rvc_normal, for
other kinds like rvc_inf or rvc_nan, ->sig is ignored and everything is
contained in the REAL_VALUE_TYPE flags (cl, sign, signalling and decimal).
decimal_to_binary which is used when folding a decimal{32,64,128} constant
to a binary floating point type ignores this and thus folds infinities and
NaNs into +0.0.
The following patch fixes that by only doing that for rvc_normal.
Similarly to the binary to decimal folding, it goes through a string, in
order to e.g. deal with canonical NaN mantissas, or binary float formats
that don't support infinities and/or NaNs.
2020-03-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/94111
* dfp.c (decimal_to_binary): Only use decimal128ToString if from->cl
is rvc_normal, otherwise use real_to_decimal to print the number to
string.
Jakub Jelinek [Wed, 11 Mar 2020 08:32:22 +0000 (09:32 +0100)]
ldist: Further fixes for -ftrapv [PR94114]
As the testcase shows, arithmetics that for -ftrapv would need multiple
basic blocks can show up not just in nb_bytes expressions where we
are calling rewrite_to_non_trapping_overflow for a while already,
but also in the pointer expression to the start of the region.
While the testcase covers just the first hunk and I've failed to create
a testcase for the latter, it is at least in theory possible too, so I've
adjusted that hunk too.
2020-03-11 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94114
* tree-loop-distribution.c (generate_memset_builtin): Call
rewrite_to_non_trapping_overflow even on mem.
(generate_memcpy_builtin): Call rewrite_to_non_trapping_overflow even
on dest and src.
Jakub Jelinek [Thu, 5 Mar 2020 08:12:44 +0000 (09:12 +0100)]
print-rtl: Fix printing of CONST_STRING in DEBUG_INSNs [PR93399]
The following testcase fails to assemble, as CONST_STRING in the DEBUG_INSNs
is printed as is, so if it contains \n and/or \r, we are in trouble:
.loc 1 14 3
# DEBUG haystack => [si]
# DEBUG needle => "
"
In the gimple dumps we print those (STRING_CSTs) as
# DEBUG haystack => D#1
# DEBUG needle => "\n"
so this patch uses what we use in tree printing for the CONST_STRINGs too.
2020-03-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/93399
* tree-pretty-print.h (pretty_print_string): Declare.
* tree-pretty-print.c (pretty_print_string): Remove forward
declaration, no longer static. Change nbytes parameter type
from unsigned to size_t.
* print-rtl.c (print_value) <case CONST_STRING>: Use
pretty_print_string and for shrink way too long strings.
Jakub Jelinek [Wed, 4 Mar 2020 11:59:04 +0000 (12:59 +0100)]
inliner: Copy DECL_BY_REFERENCE in copy_decl_to_var [PR93888]
In the following testcase we emit wrong debug info for the karg
parameter in the DW_TAG_inlined_subroutine into main.
The problem is that the karg PARM_DECL is DECL_BY_REFERENCE and thus
in the IL has const K & type, but in the source just const K.
When the function is inlined, we create a VAR_DECL for it, but don't
set DECL_BY_REFERENCE, so when emitting DW_AT_location, we treat it like
a const K & typed variable, but it has DW_AT_abstract_origin which has
just the const K type and thus the debugger thinks the variable has
const K type.
Fixed by copying the DECL_BY_REFERENCE flag. Not doing it in
copy_decl_for_dup_finish, because copy_decl_no_change already copies
that flag through copy_node and in copy_result_decl_to_var it is
undesirable, as we handle DECL_BY_REFERENCE in that case instead
by changing the type.
Jakub Jelinek [Thu, 5 Mar 2020 18:44:42 +0000 (19:44 +0100)]
i386: Fix some -O0 avx2intrin.h and xopintrin.h intrinsic macros [PR94046]
As the testcases show, the macros we have for -O0 for intrinsics that require
constant argument(s) should first cast the argument to the type the -O1+
inline uses and afterwards to whatever type e.g. a builtin needs.
The PR reported one which violated this, and I've grepped for all double-casts
and grepped out from that meaningful casts where the __m{128,256,512}{,d,i}
first cast is cast to same sized __v* type and has the same kind of element
type (float, double, integral). These 7 macros were using different casts,
and I've double checked them against the inline function types.
2020-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/94046
* config/i386/avx2intrin.h (_mm_mask_i32gather_ps): Fix first cast of
SRC and MASK arguments to __m128 from __m128d.
(_mm256_mask_i32gather_ps): Fix first cast of MASK argument to __m256
from __m256d.
(_mm_mask_i64gather_ps): Fix first cast of MASK argument to __m128
from __m128d.
* config/i386/xopintrin.h (_mm_permute2_pd): Fix first cast of C
argument to __m128i from __m128d.
(_mm256_permute2_pd): Fix first cast of C argument to __m256i from
__m256d.
(_mm_permute2_ps): Fix first cast of C argument to __m128i from __m128.
(_mm256_permute2_ps): Fix first cast of C argument to __m256i from
__m256.
* g++.dg/ext/pr94046-1.C: New test.
* g++.dg/ext/pr94046-2.C: New test.
Jakub Jelinek [Tue, 3 Mar 2020 09:42:34 +0000 (10:42 +0100)]
explow: Fix ICE caused by plus_constant [PR94002]
The following testcase ICEs in cross to riscv64-linux. The problem is
that we have a DImode integral constant (that doesn't fit into SImode),
which is pushed into a constant pool and later access just the first half of
it using a MEM. When plus_constant is called on such a MEM, if the constant
has mode, we verify the mode, but if it doesn't, we don't and ICE later on
when we think the CONST_INT is a valid SImode constant.
2020-03-03 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/94002
* explow.c (plus_constant): Punt if cst has VOIDmode and
get_pool_mode is different from mode.
Will Schmidt [Wed, 16 Sep 2020 16:21:04 +0000 (11:21 -0500)]
[PATCH, rs6000] Fix vector long long subtype (PR96139)
Hi,
This corrects an issue with the powerpc vector long long subtypes.
As reported by SjMunroe, when building some code with -Wall, and
attempting to print an element of a "long long vector" with a
long long printf format string, we will report an error because
the vector sub-type was improperly defined as int.
When defining a V2DI_type_node we use a TARGET_POWERPC64 ternary to
define the V2DI_type_node with "vector long" or "vector long long".
We also need to specify the proper sub-type when we define the type.
Due to some file renames, This is a backport and rework of both
[PATCH, rs6000] Fix vector long long subtype (PR96139)
and
[PATCH, rs6000] Testsuite fixup pr96139 tests
PR target/96139
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_init_builtin): Update V2DI_type_node
and unsigned_V2DI_type_node definitions.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96139-a.c: New test.
* gcc.target/powerpc/pr96139-b.c: New test.
* gcc.target/powerpc/pr96139-c.c: New test.