Jakub Jelinek [Thu, 10 May 2018 17:40:28 +0000 (19:40 +0200)]
re PR c++/85662 ("error: non-constant condition for static assertion" from __builtin_offsetof in C++)
PR c++/85662
* c-common.h (fold_offsetof_1): Removed.
(fold_offsetof): Add TYPE argument defaulted to size_type_node and
CTX argument defaulted to ERROR_MARK.
* c-common.c (fold_offsetof_1): Renamed to ...
(fold_offsetof): ... this. Remove wrapper function. Add TYPE
argument, convert the pointer constant to TYPE and use size_binop
with PLUS_EXPR instead of fold_build_pointer_plus if type is not
a pointer type. Adjust recursive calls.
* c-fold.c (c_fully_fold_internal): Use fold_offsetof rather than
fold_offsetof_1, pass TREE_TYPE (expr) as TYPE to it and drop the
fold_convert_loc.
* c-typeck.c (build_unary_op): Use fold_offsetof rather than
fold_offsetof_1, pass argtype as TYPE to it and drop the
fold_convert_loc.
* cp-gimplify.c (cp_fold): Use fold_offsetof rather than
fold_offsetof_1, pass TREE_TYPE (x) as TYPE to it and drop the
fold_convert.
Jakub Jelinek [Thu, 10 May 2018 07:38:24 +0000 (09:38 +0200)]
re PR tree-optimization/85699 (gcc.dg/nextafter-2.c fail)
PR tree-optimization/85699
* gcc.dg/nextafter-1.c (NO_LONG_DOUBLE): Define if not defined. Use
!NO_LONG_DOUBLE instead of __LDBL_MANT_DIG__ != 106.
* gcc.dg/nextafter-2.c: Include stdlib.h. For glibc < 2.24 define
NO_LONG_DOUBLE to 1 before including nextafter-1.c.
Eric Botcazou [Thu, 10 May 2018 07:36:38 +0000 (07:36 +0000)]
re PR c++/85400 (invalid Local Dynamic TLS relaxation for symbol defined in method)
PR c++/85400
cp/
* decl2.c (adjust_var_decl_tls_model): New static function.
(comdat_linkage): Call it on a variable.
(maybe_make_one_only): Likewise.
c-family/
* c-attribs.c (handle_visibility_attribute): Do not set no_add_attrs.
go/build, cmd/go: update to match recent changes to gc
Several recent changes to the gc version of cmd/go improve the
gofrontend support. These changes are partially copies of existing
gofrontend differences, and partially new code. This CL makes the
gofrontend match the upstream code.
The changes included here come from:
https://golang.org/cl/111575
https://golang.org/cl/111595
https://golang.org/cl/111635
https://golang.org/cl/111636
For the record, the following recent gc changes are based on code
already present in the gofrontend repo:
https://golang.org/cl/110915
https://golang.org/cl/111615
For the record, a gc change, partially based on earlier gofrontend
work, also with new gc code, was already copied to gofrontend repo in
CL 111099:
https://golang.org/cl/111097
This moves the generated list of standard library packages from
cmd/go/internal/load to go/build.
Paolo Carlini [Wed, 9 May 2018 16:19:09 +0000 (16:19 +0000)]
re PR c++/85713 (ICE in dependent_type_p, at cp/pt.c:24582 on valid code)
/cp
2018-05-09 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/85713
Revert:
2018-05-08 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/84588
* parser.c (cp_parser_parameter_declaration_list): When the
entire parameter-declaration-list is erroneous maybe call
abort_fully_implicit_template.
/testsuite
2018-05-09 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/85713
Revert:
2018-05-08 Paolo Carlini <paolo.carlini@oracle.com>
Paolo Carlini [Wed, 9 May 2018 16:17:36 +0000 (16:17 +0000)]
re PR c++/85713 (ICE in dependent_type_p, at cp/pt.c:24582 on valid code)
/cp
2018-05-09 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/85713
Revert:
2018-05-08 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/84588
* parser.c (cp_parser_parameter_declaration_list): When the
entire parameter-declaration-list is erroneous maybe call
abort_fully_implicit_template.
/testsuite
2018-05-09 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/85713
Revert:
2018-05-08 Paolo Carlini <paolo.carlini@oracle.com>
Jonathan Wakely [Wed, 9 May 2018 13:28:11 +0000 (14:28 +0100)]
Make std::function tolerate semantically non-CopyConstructible objects
To satisfy the CopyConstructible requirement a callable object stored in
a std::function must behave the same when copied from a const or
non-const source. If copying a non-const object doesn't produce an
equivalent copy then the behaviour is undefined. But we can make our
std::function more tolerant of such objects by ensuring we always copy
from a const lvalue.
Additionally use an if constexpr statement in the _M_get_pointer
function to avoid unnecessary instantiations in the discarded branch.
* include/bits/std_function.h (_Base_manager::_M_get_pointer):
Use constexpr if in C++17 mode.
(_Base_manager::_M_clone(_Any_data&, const _Any_data&, true_type)):
Copy from const object.
* testsuite/20_util/function/cons/non_copyconstructible.cc: New.
Richard Biener [Wed, 9 May 2018 13:04:00 +0000 (13:04 +0000)]
tree-vect-slp.c (vect_bb_slp_scalar_cost): Fill a cost vector.
2018-05-09 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Fill a cost
vector.
(vect_bb_vectorization_profitable_p): Adjust. Compute
actual scalar cost using the cost vector and the add_stmt_cost
machinery.
rs6000: Give an argument to every REG_CFA_REGISTER (PR85645)
The one for the prologue mflr did not have any value set, which means
use the SET that is in the insn pattern. This works fine, except when
some late pass decides to replace the SET_SRC -- this changes the
meaning of the REG_CFA_REGISTER! Such passes should not do these
things, but let's be more explicit here, for extra robustness. It
could be argued that this defaulting is a design misfeature (it does
not save much space either, etc.)
PR rtl-optimization/85645
* config/rs6000/rs6000.c (rs6000_emit_prologue_components): Put a SET
in the REG_CFA_REGISTER note for LR, don't leave it empty.
In the testcase for PR85645 we do a pretty dumb placement of the
prologue/epilogue for the LR component: we place an epilogue for LR
before a control flow split where one of the branches clobbers LR
eventually, and the other does not. The branch that does clobber it
will need a prologue again some time later. Because saving and
restoring LR is a two step process---it needs to be moved via a GPR---
the backend emits CFI directives so that we get correct unwind
information. But both regcprop and regrename do not properly handle
such CFI directives leading to ICEs.
Now, neither of the two branches needs to have LR restored at all,
because both of the branches end up in an infinite loop.
This patch makes spread_component return a boolean saying if anything
was changed, and if so, it is called again. This obviously is finite
(there is a finite number of basic blocks, each with a finite number
of components, and spread_components can only assign more components
to a block, never less). I also instrumented the code, and on a
bootstrap+regtest spread_components made changes a maximum of two
times. Interestingly though it made changes on two iterations in
a third of the cases it did anything at all!
PR rtl-optimization/85645
* shrink-wrap.c (spread_components): Return a boolean saying if
anything was changed.
(try_shrink_wrapping_separate): Iterate spread_components until
nothing changes anymore.
regrename: Don't rename the dest of a REG_CFA_REGISTER (PR85645)
We should never change the destination of a REG_CFA_REGISTER, just
like for insns with a REG_CFA_RESTORE, because we need to have the
same control flow information on all branches that join. It is very
doubtful that renaming the scratch registers used for prologue/epilogue
will help anything either.
PR rtl-optimization/85645
* regrename.c (build_def_use): Also kill the chains that include the
destination of a REG_CFA_REGISTER note.
Changing a SET that has a REG_CFA_REGISTER note is wrong if we are
changing the SET_DEST, or if the REG_CFA_REGISTER has nil as its
argument, and maybe some other cases. It's never really useful to
propagate into such an instruction, so let's just bail whenever we
see such a note.
PR rtl-optimization/85645
* regcprop.c (copyprop_hardreg_forward_1): Don't propagate into an
insn that has a REG_CFA_REGISTER note.
We build up the input to IFN_STORE_LANES one vector at a time.
In RTL, each of these vector assignments becomes a write to
subregs of the form (subreg:VEC (reg:AGGR R)), where R is the
eventual input to the store lanes instruction. The problem is
that RTL isn't very good at tracking liveness when things are
initialised piecemeal by subregs, so R tends to end up being
live on all paths from the entry block to the store. This in
turn leads to unnecessary spilling around calls, as well as to
excess register pressure in vector loops.
This patch adds gimple clobbers to indicate the liveness of the
IFN_STORE_LANES variable and makes sure that gimple clobbers are
expanded to rtl clobbers where useful. For consistency it also
uses clobbers to mark the point at which an IFN_LOAD_LANES
variable is no longer needed.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* cfgexpand.c (expand_clobber): New function.
(expand_gimple_stmt_1): Use it.
* tree-vect-stmts.c (vect_clobber_variable): New function,
split out from...
(vectorizable_simd_clone_call): ...here.
(vectorizable_store): Emit a clobber either side of an
IFN_STORE_LANES sequence.
(vectorizable_load): Emit a clobber after an IFN_LOAD_LANES sequence.
gcc/testsuite/
* gcc.target/aarch64/store_lane_spill_1.c: New test.
* gcc.target/aarch64/sve/store_lane_spill_1.c: Likewise.
Eric Botcazou [Wed, 9 May 2018 07:58:29 +0000 (07:58 +0000)]
re PR rtl-optimization/85638 (build failure for Ada runtime with SJLJ exceptions on x86)
PR rtl-optimization/85638
* bb-reorder.c: Include common/common-target.h.
(create_forwarder_block): New function extracted from...
(fix_up_crossing_landing_pad): ...here. Rename into...
(dw2_fix_up_crossing_landing_pad): ...this.
(sjlj_fix_up_crossing_landing_pad): New function.
(find_rarely_executed_basic_blocks_and_crossing_edges): In SJLJ mode,
call sjlj_fix_up_crossing_landing_pad if there are incoming EH edges
from both partitions and exit the loop after one iteration.
Jason Merrill [Wed, 9 May 2018 02:08:52 +0000 (22:08 -0400)]
PR c++/85706 - class deduction under decltype
* pt.c (for_each_template_parm_r): Handle DECLTYPE_TYPE. Clear
*walk_subtrees whether or not we walked into the operand.
(type_uses_auto): Only look at deduced contexts.
Kelvin Nilsen [Wed, 9 May 2018 00:37:35 +0000 (00:37 +0000)]
revert: extend.texi (PowerPC Built-in Functions): Rename this subsection.
2018-05-08 Kelvin Nilsen <kelvin@gcc.gnu.org>
Revert:
* doc/extend.texi (PowerPC Built-in Functions): Rename this
subsection.
(Basic PowerPC Built-in Functions): The new name of the
subsection previously known as "PowerPC Built-in Functions".
(Basic PowerPC Built-in Functions Available on all Configurations):
New subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.05): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.06): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.07): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 3.0): New
subsubsection.
Paolo Carlini [Tue, 8 May 2018 19:35:10 +0000 (19:35 +0000)]
re PR c++/84588 (internal compiler error: Segmentation fault (contains_struct_check()))
/cp
2018-05-08 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/84588
* parser.c (cp_parser_parameter_declaration_list): When the
entire parameter-declaration-list is erroneous maybe call
abort_fully_implicit_template.
/testsuite
2018-05-08 Paolo Carlini <paolo.carlini@oracle.com>
Kelvin Nilsen [Tue, 8 May 2018 17:29:52 +0000 (17:29 +0000)]
extend.texi (PowerPC Built-in Functions): Rename this subsection.
gcc/ChangeLog:
2018-05-08 Kelvin Nilsen <kelvin@gcc.gnu.org>
* doc/extend.texi (PowerPC Built-in Functions): Rename this
subsection.
(Basic PowerPC Built-in Functions): The new name of the
subsection previously known as "PowerPC Built-in Functions".
(Basic PowerPC Built-in Functions Available on all Configurations):
New subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.05): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.06): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.07): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 3.0): New
subsubsection.
Jakub Jelinek [Tue, 8 May 2018 16:17:34 +0000 (18:17 +0200)]
re PR target/85683 (GCC 8 stopped using RMW (Read Modify Write) instructions on x86[_64])
PR target/85683
* config/i386/i386.md: Add peepholes for mem {+,-,&,|,^}= x; mem != 0
after cmpelim optimization.
* gcc.target/i386/pr49095.c: Add -masm=att to dg-options. Add
scan-assembler-times checking that except for [fh]*xor other functions
don't use any load instructions.
Jonathan Wakely [Tue, 8 May 2018 13:05:04 +0000 (14:05 +0100)]
PR libstdc++/85672 #undef _GLIBCXX_USE_FLOAT128 when not supported
Restore the behaviour in GCC 8 and earlier where _GLIBCXX_USE_FLOAT128
is not defined when configure detects support is missing. This avoids
having three states where the macro is either 1, 0, or undefined.
PR libstdc++/85672
* include/Makefile.am [!ENABLE_FLOAT128]: Change c++config.h entry
to #undef _GLIBCXX_USE_FLOAT128 instead of defining it to zero.
* include/Makefile.in: Regenerate.
* include/bits/c++config (_GLIBCXX_USE_FLOAT128): Move definition
within conditional block.
Jakub Jelinek [Tue, 8 May 2018 12:16:19 +0000 (14:16 +0200)]
re PR target/85572 (faster code for absolute value of __v2di)
PR target/85572
* config/i386/i386.c (ix86_expand_sse2_abs): Handle E_V2DImode and
E_V4DImode.
* config/i386/sse.md (abs<mode>2): Use VI_AVX2 iterator instead of
VI1248_AVX512VL_AVX512BW. Handle V2DImode and V4DImode if not
TARGET_AVX512VL using ix86_expand_sse2_abs. Formatting fixes.
* g++.dg/other/sse2-pr85572-1.C: New test.
* g++.dg/other/sse2-pr85572-2.C: New test.
* g++.dg/other/sse4-pr85572-1.C: New test.
* g++.dg/other/avx2-pr85572-1.C: New test.
[arm] PR target/85658 Fix operator precedence errors in parsecpu.awk
There are a number of places in parsecpu.awk where I've managed to get
the operator precedence between ! and 'in' incorrect (! binds more
tightly). In most cases this just makes a consistency test
ineffective, but in a few cases it means we fail to correctly diagnose
errors by the user (for example, when passing an invalid cpu or
architecture name to configure. This patch fixes all the cases I
could find, based on searching for all uses of the two operators in
the same expression. The tweak to the API of check_fpu is to bring it
into line with the other check functions - it now returns the result
rather than printing it directly. The caller now does the printing,
in the same way that the chkarch and chkcpu commands do.
PR target/85658
* config/arm/parsecpu.awk (check_cpu): Fix operator precedence.
(check_arch): Likewise.
(check_fpu): Return the result rather than printing it.
(end arch): Fix operator precedence.
(end cpu): Likewise.
(END): Print the result from check_fpu.
This patch adds SVE patterns that combine a PTRUE-predicated
comparison with a separate AND. The main benefit is for
optimising ANDs with the loop predicate, as in the testcase.
However, one of the potential drawbacks is that it triggers
even for cases in which two naturally-parallel comparisons
are ANDed together. Whether that's a win or a less will
depend on the schedule, but it has the potential to be a win
more often than a loss.
The combine patterns are undeniably ugly. One way of getting
around them would be to allow 1->1 "splits" when combining
2 instructions, as well as 1->2 splits when combining more
than 2 instructions (although that wouldn't really be a split).
Another would be to have a way of defining target-specific
rtx simplifications. branches/ARM/sve-branch has a prototype
implementation of that, but it would need some clean-up before being
ready to submit. It would also be good to make it closer to the
match.pd style.
Until then, I think what the combine patterns are doing is the
"correct" implementation given the current infrastructure.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_6.c: Do not expect any ANDs.
XFAIL the BIC test.
* gcc.target/aarch64/sve/vcond_7.c: New test.
* gcc.target/aarch64/sve/vcond_7_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r260031
This patch rewrites the SVE comparison handling so that it uses
UNSPEC_MERGE_PTRUE for comparisons that are known to be predicated
on a PTRUE, for consistency with other patterns. Specific unspecs
are then only needed for truly predicated floating-point comparisons,
such as those used in the expansion of UNEQ for flag_trapping_math.
The patch also makes sure that the comparison expanders attach
a REG_EQUAL note to instructions that use UNSPEC_MERGE_PTRUE,
so passes can use that as an alternative to the unspec pattern.
(This happens automatically for optabs. The problem was that
this code emits instruction patterns directly.)
No specific benefit on its own, but it lays the groundwork for
the next patch.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/iterators.md (UNSPEC_COND_LO, UNSPEC_COND_LS)
(UNSPEC_COND_HI, UNSPEC_COND_HS, UNSPEC_COND_UO): Delete.
(SVE_INT_CMP, SVE_FP_CMP): New code iterators.
(cmp_op, sve_imm_con): New code attributes.
(SVE_COND_INT_CMP, imm_con): Delete.
(cmp_op): Remove above unspecs from int attribute.
* config/aarch64/aarch64-sve.md (*vec_cmp<cmp_op>_<mode>): Rename
to...
(*cmp<cmp_op><mode>): ...this. Use UNSPEC_MERGE_PTRUE instead of
comparison-specific unspecs.
(*vec_cmp<cmp_op>_<mode>_ptest): Rename to...
(*cmp<cmp_op><mode>_ptest): ...this and adjust likewise.
(*vec_cmp<cmp_op>_<mode>_cc): Rename to...
(*cmp<cmp_op><mode>_cc): ...this and adjust likewise.
(*vec_fcm<cmp_op><mode>): Rename to...
(*fcm<cmp_op><mode>): ...this and adjust likewise.
(*vec_fcmuo<mode>): Rename to...
(*fcmuo<mode>): ...this and adjust likewise.
(*pred_fcm<cmp_op><mode>): New pattern.
* config/aarch64/aarch64.c (aarch64_emit_unop, aarch64_emit_binop)
(aarch64_emit_sve_ptrue_op, aarch64_emit_sve_ptrue_op_cc): New
functions.
(aarch64_unspec_cond_code): Remove handling of LTU, GTU, LEU, GEU
and UNORDERED.
(aarch64_gen_unspec_cond, aarch64_emit_unspec_cond): Delete.
(aarch64_emit_sve_predicated_cond): New function.
(aarch64_expand_sve_vec_cmp_int): Use aarch64_emit_sve_ptrue_op_cc.
(aarch64_emit_unspec_cond_or): Replace with...
(aarch64_emit_sve_or_conds): ...this new function. Use
aarch64_emit_sve_ptrue_op for the individual comparisons and
aarch64_emit_binop to OR them together.
(aarch64_emit_inverted_unspec_cond): Replace with...
(aarch64_emit_sve_inverted_cond): ...this new function. Use
aarch64_emit_sve_ptrue_op for the comparison and
aarch64_emit_unop to invert the result.
(aarch64_expand_sve_vec_cmp_float): Update after the above
changes. Use aarch64_emit_sve_ptrue_op for native comparisons.
sve/vcond_6.c was effectively testing a three-input logical operation,
since the result of BINOP needed to be ANDed with the loop predicate
before loading src[i]. This patch makes it really test a binary
operation instead. A later patch will add (and optimise) the
three-operand case.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_6.c (LOOP): Unconditionally
load from src[i].
Thomas Koenig [Tue, 8 May 2018 07:47:19 +0000 (07:47 +0000)]
re PR fortran/54613 ([F08] Add FINDLOC plus support MAXLOC/MINLOC with KIND=/BACK=)
2018-05-08 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/54613
* check.c (gfc_check_minmaxloc): Remove error for BACK not being
implemented. Use gfc_logical_4_kind for BACK.
* simplify.c (min_max_choose): Add optional argument back_val.
Handle it.
(simplify_minmaxloc_to_scalar): Add argument back_val. Pass
back_val to min_max_choose.
(simplify_minmaxloc_to_nodim): Likewise.
(simplify_minmaxloc_to_array): Likewise.
(gfc_simplify_minmaxloc): Add argument back, handle it.
Pass back_val to specific simplification functions.
(gfc_simplify_minloc): Remove ATTRIBUTE_UNUSED from argument back,
pass it on to gfc_simplify_minmaxloc.
(gfc_simplify_maxloc): Likewise.
* trans-intrinsic.c (gfc_conv_intrinsic_minmaxloc): Adjust
comment. If BACK is true, use greater or equal (or lesser or
equal) insteal of greater (or lesser). Mark the condition of
having found a value which exceeds the limit as unlikely.
Jason Merrill [Mon, 7 May 2018 23:50:16 +0000 (19:50 -0400)]
PR c++/85646 - lambda visibility.
* decl2.c (determine_visibility): Don't mess with template arguments
from the containing scope.
(vague_linkage_p): Check DECL_ABSTRACT_P before looking at a 'tor
thunk.
Luis Machado [Mon, 7 May 2018 15:47:14 +0000 (15:47 +0000)]
re PR bootstrap/85681 (r259995 breaks bootstrap on x86_64-*-freebsd)
2018-05-07 Luis Machado <luis.machado@linaro.org>
PR bootstrap/85681
Revert:
2018-05-07 Luis Machado <luis.machado@linaro.org>
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<prefetch_dynamic_strides>: New const bool field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
prefetch_dynamic_strides.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to false.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_DYNAMIC_STRIDES.
* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
prefetch-dynamic-strides setting.
2018-05-07 Luis Machado <luis.machado@linaro.org>
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<minimum_stride>: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
Luis Machado [Mon, 7 May 2018 14:12:54 +0000 (14:12 +0000)]
Introduce prefetch-dynamic-strides option.
The following patch adds an option to control software prefetching of memory
references with non-constant/unknown strides.
Currently we prefetch these references if the pass thinks there is benefit to
doing so. But, since this is all based on heuristics, it's not always the case
that we end up with better performance.
For Falkor there is also the problem of conflicts with the hardware prefetcher,
so we need to be more conservative in terms of what we issue software prefetch
hints for.
This also aligns GCC with what LLVM does for Falkor.
Similarly to the previous patch, the defaults guarantee no change in behavior
for other targets and architectures.
2018-05-07 Luis Machado <luis.machado@linaro.org>
gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<prefetch_dynamic_strides>: New const bool field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
prefetch_dynamic_strides.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to false.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_DYNAMIC_STRIDES.
* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
prefetch-dynamic-strides setting.
Luis Machado [Mon, 7 May 2018 14:08:55 +0000 (14:08 +0000)]
Introduce prefetch-minimum stride option
This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:
* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.
* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.
I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.
The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.
2018-05-07 Luis Machado <luis.machado@linaro.org>
Introduce option to limit software prefetching to known constant
strides above a specific threshold with the goal of preventing
conflicts with a hardware prefetcher.
gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<minimum_stride>: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
Tom de Vries [Mon, 7 May 2018 11:33:45 +0000 (11:33 +0000)]
[openacc, testsuite] Allow installed testing of libgomp to find gomp-constants.h
2018-05-07 Tom de Vries <tom@codesourcery.com>
PR testsuite/85677
* testsuite/lib/libgomp.exp (libgomp_init): Move inclusion of top-level
include directory in ALWAYS_CFLAGS out of $blddir != "" condition.
re PR fortran/85507 (ICE in gfc_dep_resolver, at fortran/dependency.c:2258)
gcc/fortran/ChangeLog:
2018-05-06 Andre Vehreschild <vehre@gcc.gnu.org>
PR fortran/85507
* dependency.c (gfc_dep_resolver): Revert looking at coarray dimension
introduced by r259385.
* trans-intrinsic.c (conv_caf_send): Always report a dependency for
same variables in coarray assignments.
Roland McGrath [Sat, 5 May 2018 23:35:25 +0000 (23:35 +0000)]
PR other/77609: Let the assembler choose ELF section types for miscellaneous named sections
gcc/
PR other/77609
* varasm.c (default_section_type_flags): Set SECTION_NOTYPE for
any section for which we don't know a specific type it should have,
regardless of name. Previously this was done only for the exact
names ".init_array", ".fini_array", and ".preinit_array".
(default_elf_asm_named_section): Add comment about
relationship with default_section_type_flags and SECTION_NOTYPE.
(get_section): Don't consider it a type conflict if one side has
SECTION_NOTYPE and the other doesn't, as long as neither has the
SECTION_BSS et al used in the default_section_type_flags logic.