liuhongt [Wed, 4 Aug 2021 08:03:58 +0000 (16:03 +0800)]
Support cond_{smax,smin,umax,umin} for vector integer modes under AVX512.
gcc/ChangeLog:
* config/i386/sse.md (cond_<code><mode>): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/cond_op_maxmin_b-1.c: New test.
* gcc.target/i386/cond_op_maxmin_b-2.c: New test.
* gcc.target/i386/cond_op_maxmin_d-1.c: New test.
* gcc.target/i386/cond_op_maxmin_d-2.c: New test.
* gcc.target/i386/cond_op_maxmin_q-1.c: New test.
* gcc.target/i386/cond_op_maxmin_q-2.c: New test.
* gcc.target/i386/cond_op_maxmin_ub-1.c: New test.
* gcc.target/i386/cond_op_maxmin_ub-2.c: New test.
* gcc.target/i386/cond_op_maxmin_ud-1.c: New test.
* gcc.target/i386/cond_op_maxmin_ud-2.c: New test.
* gcc.target/i386/cond_op_maxmin_uq-1.c: New test.
* gcc.target/i386/cond_op_maxmin_uq-2.c: New test.
* gcc.target/i386/cond_op_maxmin_uw-1.c: New test.
* gcc.target/i386/cond_op_maxmin_uw-2.c: New test.
* gcc.target/i386/cond_op_maxmin_w-1.c: New test.
* gcc.target/i386/cond_op_maxmin_w-2.c: New test.
gcc/testsuite/ChangeLog:
PR analyzer/101570
* gcc.dg/analyzer/asm-x86-1.c: New test.
* gcc.dg/analyzer/asm-x86-lp64-1.c: New test.
* gcc.dg/analyzer/asm-x86-lp64-2.c: New test.
* gcc.dg/analyzer/pr101570.c: New test.
* gcc.dg/analyzer/torture/asm-x86-linux-array_index_mask_nospec.c:
New test.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c: New
test.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c: New
test.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid.c: New test.
* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c: New
test.
* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr.c: New test.
* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c:
New test.
* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-reduced.c:
New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
H.J. Lu [Tue, 3 Aug 2021 13:17:22 +0000 (06:17 -0700)]
x86: Update STORE_MAX_PIECES
Update STORE_MAX_PIECES to allow 16/32/64 bytes only if inter-unit move
is enabled since vec_duplicate enabled by inter-unit move is used to
implement store_by_pieces of 16/32/64 bytes.
gcc/
PR target/101742
* config/i386/i386.h (STORE_MAX_PIECES): Allow 16/32/64 bytes
only if TARGET_INTER_UNIT_MOVES_TO_VEC is true.
gcc/testsuite/
PR target/101742
* gcc.target/i386/pr101742a.c: New test.
* gcc.target/i386/pr101742b.c: Likewise.
H.J. Lu [Wed, 4 Aug 2021 13:15:04 +0000 (06:15 -0700)]
x86: Avoid stack realignment when copying data with SSE register
To avoid stack realignment, call ix86_gen_scratch_sse_rtx to get a
scratch SSE register to copy data with with SSE register from one
memory location to another.
gcc/
PR target/101772
* config/i386/i386-expand.c (ix86_expand_vector_move): Call
ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
data with SSE register from one memory location to another.
gcc/testsuite/
PR target/101772
* gcc.target/i386/eh_return-2.c: New test.
Andreas Krebbel [Wed, 4 Aug 2021 16:40:10 +0000 (18:40 +0200)]
IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vector merge
This patch implements the TARGET_VECTORIZE_VEC_PERM_CONST in the IBM Z
backend. The initial implementation only exploits the vector merge
instruction but there is more to come.
gcc/ChangeLog:
* config/s390/s390.c (MAX_VECT_LEN): Define macro.
(struct expand_vec_perm_d): Define struct.
(expand_perm_with_merge): New function.
(vectorize_vec_perm_const_1): New function.
(s390_vectorize_vec_perm_const): New function.
(TARGET_VECTORIZE_VEC_PERM_CONST): Define target macro.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/perm-merge.c: New test.
* gcc.target/s390/vector/vec-types.h: New test.
Andreas Krebbel [Wed, 4 Aug 2021 16:40:10 +0000 (18:40 +0200)]
IBM Z: Remove redundant V_HW_64 mode iterator.
gcc/ChangeLog:
* config/s390/vector.md (V_HW_64): Remove mode iterator.
(*vec_load_pair<mode>): Use V_HW_2 instead of V_HW_64.
* config/s390/vx-builtins.md
(vec_scatter_element<V_HW_2:mode>_SI): Use V_HW_2 instead of
V_HW_64.
Andreas Krebbel [Wed, 4 Aug 2021 16:40:09 +0000 (18:40 +0200)]
IBM Z: Get rid of vec merge unspec
This patch gets rid of the unspecs we were using for the vector merge
instruction and replaces it with generic rtx.
gcc/ChangeLog:
* config/s390/s390-modes.def: Add more vector modes to support
concatenation of two vectors.
* config/s390/s390-protos.h (s390_expand_merge_perm_const): Add
prototype.
(s390_expand_merge): Likewise.
* config/s390/s390.c (s390_expand_merge_perm_const): New function.
(s390_expand_merge): New function.
* config/s390/s390.md (UNSPEC_VEC_MERGEH, UNSPEC_VEC_MERGEL):
Remove constant definitions.
* config/s390/vector.md (V_HW_2): Add mode iterators.
(VI_HW_4, V_HW_4): Rename VI_HW_4 to V_HW_4.
(vec_2x_nelts, vec_2x_wide): New mode attributes.
(*vmrhb, *vmrlb, *vmrhh, *vmrlh, *vmrhf, *vmrlf, *vmrhg, *vmrlg):
New pattern definitions.
(vec_widen_umult_lo_<mode>, vec_widen_umult_hi_<mode>)
(vec_widen_smult_lo_<mode>, vec_widen_smult_hi_<mode>)
(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf, vec_unpacks_lo_v2df)
(vec_unpacks_hi_v2df): Adjust expanders to emit non-unspec RTX for
vec merge.
* config/s390/vx-builtins.md (V_HW_4): Remove mode iterator. Now
in vector.md.
(vec_mergeh<mode>, vec_mergel<mode>): Use s390_expand_merge to
emit vec merge pattern.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c:
Instead of vpdi with 0 and 5 vmrlg and vmrhg are used now.
* gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: Likewise.
* gcc.target/s390/zvector/vec-types.h: New test.
* gcc.target/s390/zvector/vec_merge.c: New test.
Jonathan Wright [Mon, 19 Jul 2021 09:19:30 +0000 (10:19 +0100)]
aarch64: Don't include vec_select high-half in SIMD multiply cost
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can select the top or bottom half of the operand registers. This
selection does not change the cost of the underlying instruction and
this should be reflected by the RTL cost function.
This patch adds RTL tree traversal in the Neon multiply cost function
to match vec_select high-half of its operands. This traversal
prevents the cost of the vec_select from being added into the cost of
the multiply - meaning that these instructions can now be emitted in
the combine pass as they are no longer deemed prohibitively
expensive.
gcc/ChangeLog:
2021-07-19 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64.c (aarch64_strip_extend_vec_half):
Define.
(aarch64_rtx_mult_cost): Traverse RTL tree to prevent cost of
vec_select high-half from being added into Neon multiply
cost.
* rtlanal.c (vec_series_highpart_p): Define.
* rtlanal.h (vec_series_highpart_p): Declare.
Jonathan Wright [Mon, 19 Jul 2021 13:01:52 +0000 (14:01 +0100)]
aarch64: Don't include vec_select element in SIMD multiply cost
The Neon multiply/multiply-accumulate/multiply-subtract instructions
can take various forms - multiplying full vector registers of values
or multiplying one vector by a single element of another. Regardless
of the form used, these instructions have the same cost, and this
should be reflected by the RTL cost function.
This patch adds RTL tree traversal in the Neon multiply cost function
to match the vec_select used by the lane-referencing forms of the
instructions already mentioned. This traversal prevents the cost of
the vec_select from being added into the cost of the multiply -
meaning that these instructions can now be emitted in the combine
pass as they are no longer deemed prohibitively expensive.
gcc/ChangeLog:
2021-07-19 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64.c (aarch64_strip_duplicate_vec_elt):
Define.
(aarch64_rtx_mult_cost): Traverse RTL tree to prevent
vec_select cost from being added into Neon multiply cost.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vmul_element_cost.c: New test.
vect: Tweak comparisons with existing epilogue loops
This patch uses a more accurate scalar iteration estimate when
comparing the epilogue of a constant-iteration loop with a candidate
replacement epilogue.
In the testcase, the patch prevents a 1-to-3-element SVE epilogue
from seeming better than a 64-bit Advanced SIMD epilogue.
gcc/
* tree-vect-loop.c (vect_better_loop_vinfo_p): Detect cases in
which old_loop_vinfo is an epilogue loop that handles a constant
number of iterations.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_12.c: New test.
After vect_analyze_loop has successfully analysed a loop for
one base vector mode B1, it considers using following base vector
modes to vectorise an epilogue. However, for VECT_COMPARE_COSTS,
a later mode B2 might turn out to be better than B1 was. Initially
this comparison will be between an epilogue loop (for B2) and a main
loop (for B1). However, in r11-6458 I'd added code to reanalyse the
B2 epilogue loop as a main loop, partly for correctness and partly
for better costing.
This can lead to a situation in which we think that the B2 epilogue
loop was better than the B1 main loop, but that the B2 main loop is
not better than the B1 main loop. There was no dump message to say
that this had happened, which made it look like B2 had still won.
gcc/
* tree-vect-loop.c (vect_analyze_loop): Print a dump message
when a reanalyzed loop fails to be cheaper than the current
main loop.
Fix debug info for ignored decls at start of assembly
Ignored functions decls that are compiled at the start of
the assembly have bogus line numbers until the first .file
directive, as reported in PR101575.
The corresponding binutils bug report is
https://sourceware.org/bugzilla/show_bug.cgi?id=28149
The work around for this issue is to emit a dummy .file
directive before the first function is compiled, unless
another .file directive was already emitted previously.
Richard Biener [Thu, 29 Jul 2021 12:14:48 +0000 (14:14 +0200)]
Add emulated gather capability to the vectorizer
This adds a gather vectorization capability to the vectorizer
without target support by decomposing the offset vector, doing
sclar loads and then building a vector from the result. This
is aimed mainly at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the gather.
Note it's difficult to avoid vectorizing the offset load, but in
some cases later passes can turn the vector load + extract into
scalar loads, see the followup patch.
On SPEC CPU 2017 510.parest_r this improves runtime from 250s
to 219s on a Zen2 CPU which has its native gather instructions
disabled (using those the runtime instead increases to 254s)
using -Ofast -march=znver2 [-flto]. It turns out the critical
loops in this benchmark all perform gather operations.
2021-07-30 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_check_gather_scatter):
Include widening conversions only when the result is
still handed by native gather or the current offset
size not already matches the data size.
Also succeed analysis in case there's no native support,
noted by a IFN_LAST ifn and a NULL decl.
(vect_analyze_data_refs): Always consider gathers.
* tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
Test for no IFN gather rather than decl gather.
* tree-vect-stmts.c (vect_model_load_cost): Pass in the
gather-scatter info and cost emulated gathers accordingly.
(vect_truncate_gather_scatter_offset): Properly test for
no IFN gather.
(vect_use_strided_gather_scatters_p): Likewise.
(get_load_store_type): Handle emulated gathers and its
restrictions.
(vectorizable_load): Likewise. Emulate them by extracting
scalar offsets, doing scalar loads and a vector construct.
* gcc.target/i386/vect-gather-1.c: New testcase.
* gfortran.dg/vect/vect-8.f90: Adjust.
H.J. Lu [Tue, 3 Aug 2021 13:17:22 +0000 (06:17 -0700)]
by_pieces: Pass MAX_PIECES to op_by_pieces_d
Pass MAX_PIECES to op_by_pieces_d::op_by_pieces_d for move, store and
compare.
PR target/101742
* expr.c (op_by_pieces_d::op_by_pieces_d): Add a max_pieces
argument to set m_max_size.
(move_by_pieces_d): Pass MOVE_MAX_PIECES to op_by_pieces_d.
(store_by_pieces_d): Pass STORE_MAX_PIECES to op_by_pieces_d.
(compare_by_pieces_d): Pass COMPARE_MAX_PIECES to op_by_pieces_d.
Roger Sayle [Wed, 4 Aug 2021 13:19:14 +0000 (14:19 +0100)]
Fold (X<<C1)^(X<<C2) to a multiplication when possible.
The easiest way to motivate these additions to match.pd is with the
following example:
unsigned int foo(unsigned char i) {
return i | (i<<8) | (i<<16) | (i<<24);
}
which mainline with -O2 on x86_64 currently generates:
foo: movzbl %dil, %edi
movl %edi, %eax
movl %edi, %edx
sall $8, %eax
sall $16, %edx
orl %edx, %eax
orl %edi, %eax
sall $24, %edi
orl %edi, %eax
ret
but with this patch now becomes:
foo: movzbl %dil, %eax
imull $16843009, %eax, %eax
ret
Interestingly, this transformation is already applied when using
addition, allowing synth_mult to select an optimal sequence, but
not when using the equivalent bit-wise ior or xor operators.
The solution is to use tree_nonzero_bits to check that the
potentially non-zero bits of each operand don't overlap, which
ensures that BIT_IOR_EXPR and BIT_XOR_EXPR produce the same
results as PLUS_EXPR, which effectively generalizes the old
fold_plusminus_mult_expr. Technically, the transformation
is to canonicalize (X*C1)|(X*C2) and (X*C1)^(X*C2) to
X*(C1+C2) where X and X<<C are considered special cases.
2021-08-04 Roger Sayle <roger@nextmovesoftware.com>
Marc Glisse <marc.glisse@inria.fr>
gcc/ChangeLog
* match.pd (bit_ior, bit_xor): Canonicalize (X*C1)|(X*C2) and
(X*C1)^(X*C2) as X*(C1+C2), and related variants, using
tree_nonzero_bits to ensure that operands are bit-wise disjoint.
gcc/testsuite/ChangeLog
* gcc.dg/fold-ior-4.c: New test.
Jonathan Wakely [Tue, 3 Aug 2021 19:50:52 +0000 (20:50 +0100)]
libstdc++: Add [[nodiscard]] to sequence containers
... and container adaptors.
This adds the [[nodiscard]] attribute to functions with no side-effects
for the sequence containers and their iterators, and the debug versions
of those containers, and the container adaptors,
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/forward_list.h: Add [[nodiscard]] to functions
with no side-effects.
* include/bits/stl_bvector.h: Likewise.
* include/bits/stl_deque.h: Likewise.
* include/bits/stl_list.h: Likewise.
* include/bits/stl_queue.h: Likewise.
* include/bits/stl_stack.h: Likewise.
* include/bits/stl_vector.h: Likewise.
* include/debug/deque: Likewise.
* include/debug/forward_list: Likewise.
* include/debug/list: Likewise.
* include/debug/safe_iterator.h: Likewise.
* include/debug/vector: Likewise.
* include/std/array: Likewise.
* testsuite/23_containers/array/creation/3_neg.cc: Use
-Wno-unused-result.
* testsuite/23_containers/array/debug/back1_neg.cc: Cast result
to void.
* testsuite/23_containers/array/debug/back2_neg.cc: Likewise.
* testsuite/23_containers/array/debug/front1_neg.cc: Likewise.
* testsuite/23_containers/array/debug/front2_neg.cc: Likewise.
* testsuite/23_containers/array/debug/square_brackets_operator1_neg.cc:
Likewise.
* testsuite/23_containers/array/debug/square_brackets_operator2_neg.cc:
Likewise.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.
* testsuite/23_containers/deque/cons/clear_allocator.cc: Cast
result to void.
* testsuite/23_containers/deque/debug/invalidation/4.cc:
Likewise.
* testsuite/23_containers/deque/types/1.cc: Use
-Wno-unused-result.
* testsuite/23_containers/list/types/1.cc: Cast result to void.
* testsuite/23_containers/priority_queue/members/7161.cc:
Likewise.
* testsuite/23_containers/queue/members/7157.cc: Likewise.
* testsuite/23_containers/vector/59829.cc: Likewise.
* testsuite/23_containers/vector/ext_pointer/types/1.cc:
Likewise.
* testsuite/23_containers/vector/ext_pointer/types/2.cc:
Likewise.
* testsuite/23_containers/vector/types/1.cc: Use
-Wno-unused-result.
Richard Biener [Fri, 30 Jul 2021 09:06:50 +0000 (11:06 +0200)]
Rewrite more vector loads to scalar loads
This teaches forwprop to rewrite more vector loads that are only
used in BIT_FIELD_REFs as scalar loads. This provides the
remaining uplift to SPEC CPU 2017 510.parest_r on Zen 2 which
has CPU gathers disabled.
In particular vector load + vec_unpack + bit-field-ref is turned
into (extending) scalar loads which avoids costly XMM/GPR
transitions. To not conflict with vector load + bit-field-ref
+ vector constructor matching to vector load + shuffle the
extended transform is only done after vector lowering.
2021-07-30 Richard Biener <rguenther@suse.de>
* tree-ssa-forwprop.c (pass_forwprop::execute): Split
out code to decompose vector loads ...
(optimize_vector_load): ... here. Generalize it to
handle intermediate widening and TARGET_MEM_REF loads
and apply it to loads with a supported vector mode as well.
Richard Biener [Wed, 4 Aug 2021 09:42:41 +0000 (11:42 +0200)]
tree-optimization/101756 - avoid vectorizing boolean MAX reductions
The following avoids vectorizing MIN/MAX reductions on bools which,
when ending up as vector(2) <signed-boolean:64> would need to be
adjusted because of the sign change. The fix instead avoids any
reduction vectorization where the result isn't compatible
to the original scalar type since we don't compensate for that
either.
2021-08-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/101756
* tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Make sure
the result of the reduction epilogue is compatible to the original
scalar result.
Jakub Jelinek [Wed, 4 Aug 2021 09:53:48 +0000 (11:53 +0200)]
c++: Fix up #pragma omp declare {simd,variant} and acc routine parsing
When parsing default arguments, we need to temporarily clear parser->omp_declare_simd
and parser->oacc_routine, otherwise it can clash with further declarations
inside of e.g. lambdas inside of those default arguments.
2021-08-04 Jakub Jelinek <jakub@redhat.com>
PR c++/101759
* parser.c (cp_parser_default_argument): Temporarily override
parser->omp_declare_simd and parser->oacc_routine to NULL.
* g++.dg/gomp/pr101759.C: New test.
* g++.dg/goacc/pr101759.C: New test.
liuhongt [Wed, 4 Aug 2021 02:50:28 +0000 (10:50 +0800)]
Refine predicate of peephole2 to general_reg_operand. [PR target/101743]
The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc
should only work on general registers, considering that x86 also
supports mov instructions between gpr, sse reg, mask reg, limiting the
peephole2 predicate to general_reg_operand.
gcc/ChangeLog:
PR target/101743
* config/i386/i386.md (peephole2): Refine predicate from
register_operand to general_reg_operand.
This makes tail recursion optimization produce a loop structure
manually rather than relying on loop fixup. That also allows the
loop to be marked as finite (it would eventually blow the stack
if it were not).
2021-08-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/101769
* tree-tailcall.c (eliminate_tail_call): Add the created loop
for the first recursion and return it via the new output parameter.
(optimize_tail_call): Pass through new output param.
(tree_optimize_tail_calls_1): After creating all latches,
add the created loop to the loop tree. Do not mark loops for fixup.
* gcc.target/i386/cond_op_fma_double-1.c: New test.
* gcc.target/i386/cond_op_fma_double-2.c: New test.
* gcc.target/i386/cond_op_fma_float-1.c: New test.
* gcc.target/i386/cond_op_fma_float-2.c: New test.
Cherry Mui [Tue, 3 Aug 2021 23:35:55 +0000 (19:35 -0400)]
compiler: support new language constructs in escape analysis
Previous CLs add new language constructs in Go 1.17, specifically,
unsafe.Add, unsafe.Slice, and conversion from a slice to a pointer
to an array. This CL handles them in the escape analysis.
At the point of the escape analysis, unsafe.Add and unsafe.Slice
are still builtin calls, so just handle them in data flow.
Conversion from a slice to a pointer to an array has already been
lowered to a combination of compound expression, conditional
expression and slice info expressions, so handle them in the
escape analysis.
compile, runtime: make selectnbrecv return two values
The only different between selectnbrecv and selectnbrecv2 is the later
set the input pointer value by second return value from chanrecv.
So by making selectnbrecv return two values from chanrecv, we can get
rid of selectnbrecv2, the compiler can now call only selectnbrecv and
generate simpler code.
This is the gofrontend version of https://golang.org/cl/292890.
* create_gcov tool doesn't currently support dwarf 5 so I made a change in profopt.exp
to pass -gdwarf-4 when compiling the binary to profile.
* I updated the invocation of create_gcov in profopt.exp to pass -gcov_version=2.
I recently made a change to create_gcov to support version 2:
https://github.com/google/autofdo/pull/117 .
* I removed useless -o perf.data from the invocation of gcc-auto-profile in
target-supports.exp.
These changes contribute to fixing PR gcov-profile/71672.
gcc/testsuite/ChangeLog:
* lib/profopt.exp: Pass gdwarf-4 when compiling test to profile; pass -gcov_version=2.
* lib/target-supports.exp: Remove unnecessary -o perf.data passed to gcc-auto-profile.
indir-call-prof-2.c has -fno-early-inlining but AutoFDO can't work without
early inlining (it needs to match the inlining of the profiled binary).
I changed profopt.exp to always pass -fearly-inlining for AutoFDO.
With that change the indirect call inlining in indir-call-prof-2.c happens in the early inliner
so I changed the dg-final-use-autofdo.
Contributes to fixing PR gcov-profile/71672
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/indir-call-prof-2.c: Fix dg-final-use-autofdo.
* lib/profopt.exp: Pass -fearly-inlining when compiling with AutoFDO.
* Changed several tests to use -fdump-ipa-afdo-optimized instead of -fdump-ipa-afdo
in dg-options so that the expected output can be found
* Increased the number of iterations in several tests so that perf can have
enough sampling events
Contributes to fixing PR gcov-profile/71672.
gcc/testsuite/ChangeLog:
* g++.dg/tree-prof/indir-call-prof.C: Fix options, increase the number of iterations.
* g++.dg/tree-prof/morefunc.C: Fix options, increase the number of iterations.
* g++.dg/tree-prof/reorder.C: Fix options, increase the number of iterations.
* gcc.dg/tree-prof/indir-call-prof-2.c: Fix options, increase the number of iterations.
* gcc.dg/tree-prof/indir-call-prof.c: Fix options.
Paul A. Clarke [Tue, 23 Feb 2021 01:20:48 +0000 (19:20 -0600)]
rs6000: Add test for _mm_minpos_epu16
Copy the test for _mm_minpos_epu16 from
gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
a few adjustments:
- Adjust the dejagnu directives for powerpc platform.
- Make the data not be monotonically increasing,
such that some of the returned values are not
always the first value (index 0).
- Create a list of input data testing various scenarios
including more than one minimum value and different
orders and indices of the minimum value.
- Fix a masking issue where the index was being truncated
to 2 bits instead of 3 bits, which wasn't found because
all of the returned indices were 0 with the original
generated data.
- Support big-endian.
2021-08-03 Paul A. Clarke <pc@us.ibm.com>
gcc/testsuite
* gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
gcc/testsuite/gcc.target/i386, adjust dg directives to suit,
make more robust.
Jonathan Wakely [Tue, 3 Aug 2021 14:03:44 +0000 (15:03 +0100)]
libstdc++: Suppress redundant definitions of inline variables
In C++17 the out-of-class definitions for static constexpr variables are
redundant, because they are implicitly inline. This change avoids
"redundant redeclaration" warnings from -Wsystem-headers -Wdeprecated.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/random.tcc (linear_congruential_engine): Do not
define static constexpr members when they are implicitly inline.
* include/std/ratio (ratio, __ratio_multiply, __ratio_divide)
(__ratio_add, __ratio_subtract): Likewise.
* include/std/type_traits (integral_constant): Likewise.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.
This adds a partial specialization of allocator_traits, similar to what
was already done for std::allocator. This means that most uses of
polymorphic_allocator via the traits can avoid the metaprogramming
overhead needed to deduce the properties from polymorphic_allocator.
In addition, I'm changing polymorphic_allocator::delete_object to invoke
the destructor (or pseudo-destructor) directly, rather than calling
allocator_traits::destroy, which calls polymorphic_allocator::destroy
(which is deprecated). This is observable if a user has specialized
allocator_traits<polymorphic_allocator<Foo>> and expects to see its
destroy member function called. I consider explicit specializations of
allocator_traits to be wrong-headed, and this use case seems unnecessary
to support. So delete_object just invokes the destructor directly.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/std/memory_resource (polymorphic_allocator::delete_object):
Call destructor directly instead of using destroy.
(allocator_traits<polymorphic_allocator<T>>): Define partial
specialization.
Jonathan Wakely [Mon, 2 Aug 2021 17:35:42 +0000 (18:35 +0100)]
libstdc++: Deprecate std::random_shuffle for C++14
The std::random_shuffle algorithm was removed in C++14 (without
deprecation). This adds the deprecated attribute for C++14 and later, so
that users are warned they should not be using it in those dialects.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* doc/xml/manual/evolution.xml: Document deprecation.
* doc/html/*: Regenerate.
* include/bits/c++config (_GLIBCXX14_DEPRECATED): Define.
(_GLIBCXX14_DEPRECATED_SUGGEST): Define.
* include/bits/stl_algo.h (random_shuffle): Deprecate for C++14
and later.
* testsuite/25_algorithms/headers/algorithm/synopsis.cc: Adjust
for C++11 and C++14 changes to std::random_shuffle and
std::shuffle.
* testsuite/25_algorithms/random_shuffle/1.cc: Add options to
use deprecated algorithms.
* testsuite/25_algorithms/random_shuffle/59603.cc: Likewise.
* testsuite/25_algorithms/random_shuffle/moveable.cc: Likewise.
* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/2.cc:
Likewise.
* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/pod.cc:
Likewise.
Jonathan Wakely [Mon, 2 Aug 2021 22:55:18 +0000 (23:55 +0100)]
libstdc++: Add testsuite proc for testing deprecated features
This change adds options to tests that explicitly use deprecated
features, so that -D_GLIBCXX_USE_DEPRECATED=0 can be used to run the
rest of the testsuite. The tests that explicitly/intentionally use
deprecated features will still be able to use them, but they can be
disabled for the majority of tests.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
Jonathan Wakely [Mon, 2 Aug 2021 17:34:19 +0000 (18:34 +0100)]
libstdc++: Reduce header dependencies in <regex>
This reduces the size of <regex> a little. This is one of the largest
and slowest headers in the library.
By using <bits/stl_algobase.h> and <bits/stl_algo.h> instead of
<algorithm> we don't need to parse all the parallel algorithms and
std::ranges:: algorithms that are not needed by <regex>. Similarly, by
using <bits/stl_tree.h> and <bits/stl_map.h> instead of <map> we don't
need to parse the definition of std::multimap.
The _State_info type is not movable or copyable, so doesn't need to use
std::unique_ptr<bool[]> to manage a bitset, we can just delete it in the
destructor. It would use a lot less space if we used a bitset instead,
but that would be an ABI break. We could do it for the versioned
namespace, but this patch doesn't do so. For future reference, using
vector<bool> would work, but would increase sizeof(_State_info) by two
pointers, because it's three times as large as unique_ptr<bool[]>. We
can't use std::bitset because the length isn't constant. We want a
bitset with a non-constant but fixed length.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex_executor.h (_State_info): Replace
unique_ptr<bool[]> with array of bool.
* include/bits/regex_executor.tcc: Likewise.
* include/bits/regex_scanner.tcc: Replace std::strchr with
__builtin_strchr.
* include/std/regex: Replace standard headers with smaller
internal ones.
* testsuite/28_regex/traits/char/lookup_classname.cc: Include
<string.h> for strlen.
* testsuite/28_regex/traits/char/lookup_collatename.cc:
Likewise.
Jonathan Wakely [Mon, 2 Aug 2021 16:12:52 +0000 (17:12 +0100)]
libstdc++: Avoid using std::unique_ptr in <locale>
std::wstring_convert and std::wbuffer_convert types are not copyable or
movable, and store a plain pointer without a deleter. That means a much
simpler type that just uses delete in its destructor can be used instead
of std::unique_ptr.
That avoids including and parsing all of <bits/unique_ptr.h> in every
header that includes <locale>. It also avoids instantiating
unique_ptr<C> and std::tuple<C*, default_delete<C>> when the conversion
utilities are used.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/locale_conv.h (__detail::_Scoped_ptr): Define new
RAII class template.
(wstring_convert, wbuffer_convert): Use __detail::_Scoped_ptr
instead of unique_ptr.
This patch adds an option to tune for Neoverse cores that have
a total vector bandwidth of 512 bits (4x128 for Advanced SIMD
and a vector-length-dependent equivalent for SVE). This is intended
to be a compromise between tuning aggressively for a single core like
Neoverse V1 (which can be too narrow) and tuning for AArch64 cores
in general (which can be too wide).
-mcpu=neoverse-512tvb is equivalent to -mcpu=neoverse-v1
-mtune=neoverse-512tvb.
gcc/
* doc/invoke.texi: Document -mtune=neoverse-512tvb and
-mcpu=neoverse-512tvb.
* config/aarch64/aarch64-cores.def (neoverse-512tvb): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.c (neoverse512tvb_sve_vector_cost)
(neoverse512tvb_sve_issue_info, neoverse512tvb_vec_issue_info)
(neoverse512tvb_vector_cost, neoverse512tvb_tunings): New structures.
(aarch64_adjust_body_cost_sve): Handle -mtune=neoverse-512tvb.
(aarch64_adjust_body_cost): Likewise.
aarch64: Restrict issue heuristics to inner vector loop
The AArch64 vector costs try to take issue rates into account.
However, when vectorising an outer loop, we lumped the inner
and outer operations together, which is somewhat meaningless.
This patch restricts the heuristic to the inner loop.
gcc/
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Only
record issue information for operations that occur in the
innermost loop.
The issue-based vector costs currently assume that a multiply-add
sequence can be implemented using a single instruction. This is
generally true for scalars (which have a 4-operand instruction)
and SVE (which allows the output to be tied to any input).
However, for Advanced SIMD, multiplying two values and adding
an invariant will end up being a move and an MLA.
The only target to use the issue-based vector costs is Neoverse V1,
which would generally prefer SVE in this case anyway. I therefore
don't have a self-contained testcase. However, the distinction
becomes more important with a later patch.
gcc/
* config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags
parameter. Detect cases in which an Advanced SIMD MLA would almost
certainly require a MOV.
(aarch64_count_ops): Update accordingly.
When the vectoriser scalarises a strided store, it counts one
scalar_store for each element plus one vec_to_scalar extraction
for each element. However, extracting element 0 is free on AArch64,
so it should have zero cost.
I don't have a testcase that requires this for existing -mtune
options, but it becomes more important with a later patch.
gcc/
* config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New
function, split out from...
(aarch64_detect_vector_stmt_subtype): ...here.
(aarch64_add_stmt_cost): Treat extracting element 0 as free.
This patch adds tuning fields for the total cost of a gather load
instruction. Until now, we've costed them as one scalar load
per element instead. Those scalar_load-based values are also
what the patch uses to fill in the new fields for existing
cost structures.
gcc/
* config/aarch64/aarch64-protos.h (sve_vec_cost):
Add gather_load_x32_cost and gather_load_x64_cost.
* config/aarch64/aarch64.c (generic_sve_vector_cost)
(a64fx_sve_vector_cost, neoversev1_sve_vector_cost): Update
accordingly, using the values given by the scalar_load * number
of elements calculation that we used previously.
(aarch64_detect_vector_stmt_subtype): Use the new fields.
This patch splits the SVE-specific part of aarch64_adjust_body_cost
out into its own subroutine, so that a future patch can call it
more than once. I wondered about using a lambda to avoid having
to pass all the arguments, but in the end this way seemed clearer.
gcc/
* config/aarch64/aarch64.c (aarch64_adjust_body_cost_sve): New
function, split out from...
(aarch64_adjust_body_cost): ...here.
aarch64: Add a simple fixed-point class for costing
This patch adds a simple fixed-point class for holding fractional
cost values. It can exactly represent the reciprocal of any
single-vector SVE element count (including the non-power-of-2 ones).
This means that it can also hold 1/N for all N in [1, 16], which should
be enough for the various *_per_cycle fields.
For now the assumption is that the number of possible reciprocals
is fixed at compile time and so the class should always be able
to hold an exact value.
The class uses a uint64_t to hold the fixed-point value, which means
that it can hold any scaled uint32_t cost. Normally we don't worry
about overflow when manipulating raw uint32_t costs, but just to be
on the safe side, the class uses saturating arithmetic for all
operations.
As far as the changes to the cost routines themselves go:
- The changes to aarch64_add_stmt_cost and its subroutines are
just laying groundwork for future patches; no functional change
intended.
- The changes to aarch64_adjust_body_cost mean that we now
take fractional differences into account.
gcc/
* config/aarch64/fractional-cost.h: New file.
* config/aarch64/aarch64.c: Include <algorithm> (indirectly)
and cost_fraction.h.
(vec_cost_fraction): New typedef.
(aarch64_detect_scalar_stmt_subtype): Use it for statement costs.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_sve_adjust_stmt_cost, aarch64_adjust_stmt_cost): Likewise.
(aarch64_estimate_min_cycles_per_iter): Use vec_cost_fraction
for cycle counts.
(aarch64_adjust_body_cost): Likewise.
(aarch64_test_cost_fraction): New function.
(aarch64_run_selftests): Call it.
aarch64: Turn sve_width tuning field into a bitmask
The tuning structures have an sve_width field that specifies the
number of bits in an SVE vector (or SVE_NOT_IMPLEMENTED if not
applicable). This patch turns the field into a bitmask so that
it can specify multiple widths at the same time. For now we
always treat the mininum width as the likely width.
An alternative would have been to add extra fields, which would
have coped correctly with non-power-of-2 widths. However,
we're very far from supporting constant non-power-of-2 vectors
in GCC, so I think the non-power-of-2 case will in reality always
have to be hidden behind VLA.
gcc/
* config/aarch64/aarch64-protos.h (tune_params::sve_width): Turn
into a bitmask.
* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Update
accordingly.
(aarch64_estimated_poly_value): Likewise. Use the least significant
set bit for the minimum and likely values. Use the most significant
set bit for the maximum value.
liuhongt [Tue, 3 Aug 2021 05:22:11 +0000 (13:22 +0800)]
Add cond_add/sub/mul for vector integer modes.
gcc/ChangeLog:
* config/i386/sse.md (cond_<insn><mode>): New expander.
(cond_mul<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/cond_op_addsubmul_d-1.c: New test.
* gcc.target/i386/cond_op_addsubmul_d-2.c: New test.
* gcc.target/i386/cond_op_addsubmul_q-1.c: New test.
* gcc.target/i386/cond_op_addsubmul_q-2.c: New test.
* gcc.target/i386/cond_op_addsubmul_w-1.c: New test.
* gcc.target/i386/cond_op_addsubmul_w-2.c: New test.
Mosè Giordano [Fri, 18 Jun 2021 23:46:44 +0000 (23:46 +0000)]
Fix bashism in `libsanitizer/configure.tgt'
Appending to a string variable with `+=' is a bashism and does not work in
strict POSIX shells like dash. This results in the extra compilation flags not
to be set correctly. This patch replaces the `+=' syntax with a simple string
interpolation to append to the `EXTRA_CXXFLAGS' variable.
libsanitizer/ChangeLog
PR sanitizer/101111
* configure.tgt: Fix bashism in setting of `EXTRA_CXXFLAGS'.
Jakub Jelinek [Tue, 3 Aug 2021 10:44:17 +0000 (12:44 +0200)]
analyzer: Fix ICE on MD builtin [PR101721]
The following testcase ICEs because DECL_FUNCTION_CODE asserts the builtin
is BUILT_IN_NORMAL, but it sees a backend (MD) builtin instead.
The FE, normal and MD builtin numbers overlap, so one should always
check what kind of builtin it is before looking at specific codes.
On the other side, region-model.cc has:
if (fndecl_built_in_p (callee_fndecl, BUILT_IN_NORMAL)
&& gimple_builtin_call_types_compatible_p (call, callee_fndecl))
switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl))
which IMO should use DECL_FUNCTION_CODE instead, it checked first it is
a normal builtin...
2021-08-03 Jakub Jelinek <jakub@redhat.com>
PR analyzer/101721
* sm-malloc.cc (known_allocator_p): Only check DECL_FUNCTION_CODE on
BUILT_IN_NORMAL builtins.
Kewen Lin [Tue, 3 Aug 2021 03:12:00 +0000 (22:12 -0500)]
tree-cfg: Fix typos on dloop in move_sese_region_to_fn
As mentioned in [1], there is one pre-existing issue before
the refactoring of FOR_EACH_LOOP_FN. The macro will always
set the given LOOP as NULL at the end of iterating unless
there is some early break inside, obviously there is no
early break and dloop will be set as NULL after the loop
iterating. It's kept as NULL after the factoring.
I tried to debug the test case gcc.dg/graphite/pr83359.c
with commit 555758de90074 (also reproduced the ICE with 555758de90074~), and noticed the compilation of the test
case only covers the hunk:
it doesn't touch the if condition hunk to increase
"moved_orig_loop_num[dloop->orig_loop_num]". So the
following hunk guarded with
if (moved_orig_loop_num[orig_loop_num] == 2)
using dloop for dereference doesn't get executed. It
explains why the problem doesn't get exposed before.
By looking to the code using dloop, I think it's a copy
paste typo, the modified assertion codes have the same
words as the above condition check. In that context, the
expected original number has been assigned to variable
orig_loop_num by extracting from the arg0 of the call
IFN_LOOP_DIST_ALIAS.
* gcc.target/i386/cond_op_addsubmuldiv_double-1.c: New test.
* gcc.target/i386/cond_op_addsubmuldiv_double-2.c: New test.
* gcc.target/i386/cond_op_addsubmuldiv_float-1.c: New test.
* gcc.target/i386/cond_op_addsubmuldiv_float-2.c: New test.
Patrick Palka [Mon, 2 Aug 2021 19:30:15 +0000 (15:30 -0400)]
libstdc++: Add missing std::move to ranges::copy/move/reverse_copy [PR101599]
In passing, this also renames the template parameter _O2 to _Out2 in
ranges::partition_copy and uglifies two of its function parameters,
out_true and out_false.
PR libstdc++/101599
libstdc++-v3/ChangeLog:
* include/bits/ranges_algo.h (__reverse_copy_fn::operator()):
Add missing std::move in return statement.
(__partition_copy_fn::operator()): Rename templtae parameter
_O2 to _Out2. Uglify function parameters out_true and out_false.
* include/bits/ranges_algobase.h (__copy_or_move): Add missing
std::move to recursive call that unwraps a __normal_iterator
output iterator.
* testsuite/25_algorithms/copy/constrained.cc (test06): New test.
* testsuite/25_algorithms/move/constrained.cc (test05): New test.
Patrick Palka [Mon, 2 Aug 2021 19:30:13 +0000 (15:30 -0400)]
libstdc++: Fix up implementation of LWG 3533 [PR101589]
In r12-569 I accidentally applied the LWG 3533 change to
elements_view::iterator::base instead to elements_view::base.
This patch corrects this, and also applies the corresponding LWG 3533
change to lazy_split_view::inner-iter::base now that we implement P2210.
PR libstdc++/101589
libstdc++-v3/ChangeLog:
* include/std/ranges (lazy_split_view::_InnerIter::base): Make
the const& overload unconstrained and return a const reference
as per LWG 3533. Make unconditionally noexcept.
(elements_view::base): Revert accidental r12-569 change.
(elements_view::_Iterator::base): Make the const& overload
unconstrained and return a const reference as per LWG 3533.
Make unconditionally noexcept.
H.J. Lu [Mon, 2 Aug 2021 17:01:47 +0000 (10:01 -0700)]
x86: Also pass -mno-avx to sw-1.c for ia32
Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.
* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
H.J. Lu [Mon, 2 Aug 2021 17:01:46 +0000 (10:01 -0700)]
x86: Update piecewise move and store
We can use TImode/OImode/XImode integers for piecewise move and store.
1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define MOVE_MAX to the maximum number of bytes we can move from memory
to memory in one reasonably fast instruction. The difference between
MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX must be a constant,
independent of compiler options, since it is used in reload.h to define
struct target_reload and MOVE_MAX can vary, depending on compiler options.
3. When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store. Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.
gcc/
* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
check stack_realign_needed for stack realignment.
(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
than the largest integer supported by vector register.
* config/i386/i386.h (MAX_MOVE_MAX): New. Set to 64.
(MOVE_MAX): Set to bytes of the largest integer supported by
vector register.
(STORE_MAX_PIECES): New.
gcc/testsuite/
* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-24.c: Likewise.
* gcc.target/i386/pr90773-25.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
XMM movd to store 4 bytes.
* gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
YMM registers.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Expect YMM registers.
* gcc.target/i386/pr100865-10b.c: Likewise.
H.J. Lu [Mon, 2 Aug 2021 17:01:46 +0000 (10:01 -0700)]
x86: Avoid stack realignment when copying data
To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.
gcc/
* config/i386/i386-expand.c (ix86_expand_vector_move): Call
ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
data from one memory location to another.
Aldy Hernandez [Mon, 2 Aug 2021 13:12:30 +0000 (15:12 +0200)]
Remove --param=threader-iterative.
This was meant to be an internal construct, but I see folks are using
it and submitting PRs against it. Let's just remove this to avoid
further confusion.
Tom de Vries [Wed, 28 Jul 2021 13:44:54 +0000 (15:44 +0200)]
[gcc/doc] Improve nonnull attribute documentation
Improve nonnull attribute documentation in a number of ways:
Reorganize discussion of effects into:
- effects for calls to functions with nonnull-marked parameters, and
- effects for function definitions with nonnull-marked parameters.
This makes it clear that -fno-delete-null-pointer-checks has no effect for
optimizations based on nonnull-marked parameters in function definitions
(see PR100404).
Patrick Palka [Mon, 2 Aug 2021 13:59:56 +0000 (09:59 -0400)]
c++: Improve memory usage of subsumption [PR100828]
Constraint subsumption is implemented in two steps. The first step
computes the disjunctive (or conjunctive) normal form of one of the
constraints, and the second step verifies that each clause in the
decomposed form implies the other constraint. Performing these two
steps separately is problematic because in the first step the DNF/CNF
can be exponentially larger than the original constraint, and by
computing it ahead of time we'd have to keep all of it in memory.
This patch fixes this exponential blowup in memory usage by interleaving
the two steps, so that as soon as we decompose one clause we check
implication for it. In turn, memory usage during subsumption is now
worst case linear in the size of the constraints rather than
exponential, and so we can safely remove the hard limit of 16 clauses
without introducing runaway memory usage on some inputs. (Note the
_time_ complexity of subsumption is still exponential in the worst case.)
In order for this to work we need to make formula::branch() insert the
copy of the current clause directly after the current clause rather than
at the end of the list, so that we fully decompose a clause shortly
after creating it. Otherwise we'd end up accumulating exponentially
many (partially decomposed) clauses in memory anyway.
PR c++/100828
gcc/cp/ChangeLog:
* logic.cc (formula::formula): Use emplace_back instead of
push_back.
(formula::branch): Insert a copy of m_current directly after
m_current instead of at the end of the list.
(formula::erase): Define.
(decompose_formula): Remove.
(decompose_antecedents): Remove.
(decompose_consequents): Remove.
(derive_proofs): Remove.
(max_problem_size): Remove.
(diagnose_constraint_size): Remove.
(subsumes_constraints_nonnull): Rewrite directly in terms of
decompose_clause and derive_proof, interleaving decomposition
with implication checking. Remove limit on constraint complexity.
Use formula::erase to free the current clause before moving on to
the next one.
Roger Sayle [Mon, 2 Aug 2021 12:27:53 +0000 (13:27 +0100)]
Optimize x ? bswap(x) : 0 in tree-ssa-phiopt
Many thanks again to Jakub Jelinek for a speedy fix for PR 101642.
Interestingly, that test case "bswap16(x) ? : x" also reveals a
missed optimization opportunity. The resulting "x ? bswap(x) : 0"
can be further simplified to just bswap(x).
Conveniently, tree-ssa-phiopt.c already recognizes/optimizes the
related "x ? popcount(x) : 0", so this patch simply makes that
transformation make general, additionally handling bswap, parity,
ffs and clrsb. All of the required infrastructure is already
present thanks to Jakub previously adding support for clz/ctz.
To reflect this generalization, the name of the function is changed
from cond_removal_in_popcount_clz_ctz_pattern to the hopefully
equally descriptive cond_removal_in_builtin_zero_pattern.
2021-08-02 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* tree-ssa-phiopt.c (cond_removal_in_builtin_zero_pattern):
Renamed from cond_removal_in_popcount_clz_ctz_pattern.
Add support for BSWAP, FFS, PARITY and CLRSB builtins.
(tree_ssa_phiop_worker): Update call to function above.
gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/phi-opt-25.c: New test case.
Jason Merrill [Fri, 30 Jul 2021 20:49:03 +0000 (16:49 -0400)]
c++: ICE on anon struct with base [PR96636]
pinski pointed out that my recent change to reject anonymous structs with
bases was relevant to this PR. But we still ICEd after giving that error;
this fixes the ICE.
PR c++/96636
gcc/cp/ChangeLog:
* decl.c (fixup_anonymous_aggr): Clear TYPE_NEEDS_CONSTRUCTING
after error.