]> gcc.gnu.org Git - gcc.git/log
gcc.git
23 months agoopenmp: Improve handling of nested OpenMP metadirectives in C and C++
Kwok Cheung Yeung [Fri, 18 Feb 2022 19:00:57 +0000 (19:00 +0000)]
openmp: Improve handling of nested OpenMP metadirectives in C and C++

This patch fixes a misparsing issue when encountering code like:

  #pragma omp metadirective when {<selector_set>={...}: A)
    #pragma omp metadirective when (<selector_set>={...}: B)

When called for the first metadirective, analyze_metadirective_body would
stop just before the colon in the second metadirective because it naively
assumes that the '}' marks the end of a code block.

The assertion for clauses to end parsing at the same point is now disabled
if a parse error has occurred during the parsing of the clause, since some
tokens may not be consumed if a parse error cuts parsing short.

2022-02-18  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/c/
* c-parser.cc (c_parser_omp_construct): Move handling of
PRAGMA_OMP_METADIRECTIVE from here...
(c_parser_pragma): ...to here.
(analyze_metadirective_body): Check that the bracket nesting level
is also zero before stopping the adding of tokens on encountering a
close brace.
(c_parser_omp_metadirective): Modify function signature and update.
Do not assert on remaining tokens if there has been a parse error.

gcc/cp/
* parser.cc (cp_parser_omp_construct): Move handling of
PRAGMA_OMP_METADIRECTIVE from here...
(cp_parser_pragma): ...to here.
(analyze_metadirective_body): Check that the bracket
nesting level is also zero before stopping the adding of tokens on
encountering a close brace.
(cp_parser_omp_metadirective): Modify function signature and update.
Do not assert on remaining tokens if there has been a parse error.

gcc/testsuite/
* c-c++-common/gomp/metadirective-1.c (f): Add test for
improperly nested metadirectives.

23 months agoopenmp: More Fortran front-end fixes for metadirectives
Kwok Cheung Yeung [Fri, 11 Feb 2022 15:42:50 +0000 (15:42 +0000)]
openmp: More Fortran front-end fixes for metadirectives

This adds a check for declarative OpenMP directives in metadirective
variants (already present in the C/C++ front-ends), and fixes an
ICE when an empty metadirective (i.e. just '!$omp metadirective')
is presented.

2022-02-11  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/fortran/
* gfortran.h (is_omp_declarative_stmt): New.
* openmp.cc (match_omp_metadirective): Reject declarative OpenMP
directives with 'sorry'.
* parse.cc (parse_omp_metadirective_body): Check that state stack head
is non-null before dereferencing.
(is_omp_declarative_stmt): New.

gcc/testsuite/
* gfortran.dg/gomp/metadirective-2.f90 (main): Test empty
metadirective.

23 months agoopenmp: Eliminate non-matching metadirective variants early in Fortran front-end
Kwok Cheung Yeung [Fri, 11 Feb 2022 11:20:18 +0000 (11:20 +0000)]
openmp: Eliminate non-matching metadirective variants early in Fortran front-end

This patch checks during parsing if a metadirective selector is both
resolvable and non-matching - if so, it is removed from further
consideration.  This is both more efficient, and avoids spurious
syntax errors caused by considering combinations of selectors that
lead to invalid combinations of OpenMP directives, when that
combination would never arise in the first place.

This exposes another bug - when metadirectives that are not of the
begin-end variety are nested, we might have to drill up through
multiple layers of the state stack to reach the state for the
next statement.  This is now fixed.

2022-02-11  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* omp-general.cc (DELAY_METADIRECTIVES_AFTER_LTO): Check that cfun is
non-null before derefencing.

gcc/fortran/
* decl.cc (gfc_match_end): Search for first previous state that is not
COMP_OMP_METADIRECTIVE.
* gfortran.h (gfc_skip_omp_metadirective_clause): Add prototype.
* openmp.cc (match_omp_metadirective): Skip clause if
result of gfc_skip_omp_metadirective_clause is true.
* trans-openmp.cc (gfc_trans_omp_set_selector): Add argument and
disable expression conversion if false.
(gfc_skip_omp_metadirective_clause): New.

gcc/testsuite/
* gfortran.dg/gomp/metadirective-8.f90: New.

23 months agoopenmp: Add warning when functions containing metadirectives with 'construct={target...
Kwok Cheung Yeung [Fri, 28 Jan 2022 13:56:33 +0000 (13:56 +0000)]
openmp: Add warning when functions containing metadirectives with 'construct={target}' called directly

void f(void)
{
  #pragma omp metadirective \
    when (construct={target}: A) \
    default (B)
    ...
}
...
{
  #pragma omp target
    f(); // Target call

  f(); // Local call
}

With the OpenMP 5.0/5.1 specifications, we would expect A to be selected in
the metadirective when the target call is made, but B when f is called
directly outside of a target context.  However, since GCC does not have
separate copies of f for local and target calls, and the construct selector
is static, it must be resolved one way or the other at compile-time (currently
in the favour of selecting A), which may be unexpected behaviour.

This patch attempts to detect the above situation, and will emit a warning
if found.

2022-01-28  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* gimplify.cc (gimplify_omp_metadirective): Mark offloadable functions
containing metadirectives with 'construct={target}' in the selector.
* omp-general.cc (omp_has_target_constructor_p): New.
* omp-general.h (omp_has_target_constructor_p): New prototype.
* omp-low.cc (lower_omp_1): Emit warning if marked functions called
outside of a target context.

gcc/testsuite/
* c-c++-common/gomp/metadirective-4.c (main): Add expected warning.
* gfortran.dg/gomp/metadirective-4.f90 (test): Likewise.

libgomp/
* testsuite/libgomp.c-c++-common/metadirective-2.c (main): Add
expected warning.
* testsuite/libgomp.fortran/metadirective-2.f90 (test): Likewise.

23 months agoopenmp: Add support for 'target_device' context selector set
Kwok Cheung Yeung [Tue, 25 Jan 2022 19:50:08 +0000 (11:50 -0800)]
openmp: Add support for 'target_device' context selector set

2022-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* builtin-types.def (BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR): New
type.
* omp-builtins.def (BUILT_IN_GOMP_EVALUATE_TARGET_DEVICE): New builtin.
* omp-general.cc (omp_context_selector_matches): Handle 'target_device'
selector set.
(omp_dynamic_cond): Generate expression tree for 'target_device'
selector set.
(omp_context_compute_score): Handle selectors in 'target_device' set.

gcc/c/
* c-parser.cc (omp_target_device_selectors): New.
(c_parser_omp_context_selector): Accept 'target_device' selector set.
Treat 'device_num' selector as expression.
(c_parser_omp_context_selector_specification): Handle 'target_device'
selector set.

gcc/cp/
* parser.cc (omp_target_device_selectors): New.
(cp_parser_omp_context_selector): Accept 'target_device' selector set.
Treat 'device_num' selector as expression.
(cp_parser_omp_context_selector_specification): Handle 'target_device'
selector set.

gcc/fortran/
* openmp.cc (omp_target_device_selectors): New.
(gfc_match_omp_context_selector): Accept 'target_device' selector set.
Treat 'device_num' selector as expression.
(gfc_match_omp_context_selector_specification): Handle 'target_device'
selector set.
* types.def (BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR): New type.

gcc/testsuite/
* c-c++-common/gomp/metadirective-7.c: New.
* gfortran.dg/gomp/metadirective-7.f90: New.

libgomp/
* Makefile.am (libgomp_la_SOURCES): Add selector.c.
* Makefile.am: Regenerate.
* config/gcn/selector.c: New.
* config/linux/selector.c: New.
* config/linux/x86/selector.c: New.
* config/nvptx/selector.c: New.
* libgomp-plugin.h (GOMP_OFFLOAD_evaluate_device): New.
* libgomp.h (struct gomp_device_descr): Add evaluate_device_func field.
* libgomp.map (GOMP_5.1): Add GOMP_evaluate_target_device.
* libgomp_g.h (GOMP_evaluate_current_device): New.
(GOMP_evaluate_target_device): New.
* oacc-host.c (host_evaluate_device): New.
(host_openacc_exec): Initialize evaluate_device_func field to
host_evaluate_device.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_evaluate_device): New.
* plugin/plugin-nvptx.c (struct ptx_device): Add compute_major and
compute_minor fields.
(nvptx_open_device): Read compute capability information from device.
(CHECK_ISA): New macro.
(GOMP_OFFLOAD_evaluate_device): New.
* selector.c: New.
* target.c (GOMP_evaluate_target_device): New.
(gomp_load_plugin_for_device): Load evaulate_device plugin function.
* testsuite/libgomp.c-c++-common/metadirective-5.c: New testcase.
* testsuite/libgomp.fortran/metadirective-5.f90: New testcase.

23 months agoopenmp: Metadirective fixes
Kwok Cheung Yeung [Tue, 25 Jan 2022 19:40:58 +0000 (11:40 -0800)]
openmp: Metadirective fixes

Fix regressions introduced by block/statement skipping.

If user condition selector is constant, do not return it as a dynamic
selector.

2022-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/c/
* c-parser.cc (c_parser_skip_to_end_of_block_or_statement): Track
bracket depth separately from nesting depth.

gcc/cp/
* parser.cc (cp_parser_skip_to_end_of_statement): Revert.
(cp_parser_skip_to_end_of_block_or_statement): Track bracket depth
separately from nesting depth.

gcc/
* omp-general.cc (omp_dynamic_cond): Do not return user condition if
constant.

23 months agoopenmp: Add testcases for metadirectives
Kwok Cheung Yeung [Tue, 25 Jan 2022 19:32:08 +0000 (11:32 -0800)]
openmp: Add testcases for metadirectives

2022-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/gomp/metadirective-1.c: New.
* c-c++-common/gomp/metadirective-2.c: New.
* c-c++-common/gomp/metadirective-3.c: New.
* c-c++-common/gomp/metadirective-4.c: New.
* c-c++-common/gomp/metadirective-5.c: New.
* c-c++-common/gomp/metadirective-6.c: New.
* gcc.dg/gomp/metadirective-1.c: New.
* gfortran.dg/gomp/metadirective-1.f90: New.
* gfortran.dg/gomp/metadirective-2.f90: New.
* gfortran.dg/gomp/metadirective-3.f90: New.
* gfortran.dg/gomp/metadirective-4.f90: New.
* gfortran.dg/gomp/metadirective-5.f90: New.
* gfortran.dg/gomp/metadirective-6.f90: New.

libgomp/
* testsuite/libgomp.c-c++-common/metadirective-1.c: New.
* testsuite/libgomp.c-c++-common/metadirective-2.c: New.
* testsuite/libgomp.c-c++-common/metadirective-3.c: New.
* testsuite/libgomp.c-c++-common/metadirective-4.c: New.
* testsuite/libgomp.fortran/metadirective-1.f90: New.
* testsuite/libgomp.fortran/metadirective-2.f90: New.
* testsuite/libgomp.fortran/metadirective-3.f90: New.
* testsuite/libgomp.fortran/metadirective-4.f90: New.

23 months agoopenmp, fortran: Add Fortran support for parsing metadirectives
Kwok Cheung Yeung [Tue, 25 Jan 2022 19:24:55 +0000 (11:24 -0800)]
openmp, fortran: Add Fortran support for parsing metadirectives

This adds support for parsing OpenMP metadirectives in the Fortran front end.

2022-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* omp-general.cc (omp_check_context_selector): Revert string length
check.
(omp_context_name_list_prop): Likewise.

gcc/fortran/
* decl.cc (gfc_match_end): Handle COMP_OMP_METADIRECTIVE and
COMP_OMP_BEGIN_METADIRECTIVE.
* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_METADIRECTIVE.
(show_code_node): Handle EXEC_OMP_METADIRECTIVE.
* gfortran.h (enum gfc_statement): Add ST_OMP_METADIRECTIVE,
ST_OMP_BEGIN_METADIRECTIVE and ST_OMP_END_METADIRECTIVE.
(struct gfc_omp_metadirective_clause): New structure.
(gfc_get_omp_metadirective_clause): New macro.
(struct gfc_st_label): Add omp_region field.
(enum gfc_exec_op): Add EXEC_OMP_METADIRECTIVE.
(struct gfc_code): Add omp_metadirective_clauses field.
(gfc_free_omp_metadirective_clauses): New prototype.
(match_omp_directive): New prototype.
* io.cc (format_asterisk): Initialize omp_region field.
* match.h (gfc_match_omp_begin_metadirective): New prototype.
(gfc_match_omp_metadirective): New prototype.
* openmp.cc (gfc_match_omp_eos): Match ')' in context selectors.
(gfc_free_omp_metadirective_clauses): New.
(gfc_match_omp_clauses): Remove context_selector argument.  Rely on
gfc_match_omp_eos to match end of clauses.
(match_omp): Remove extra argument to gfc_match_omp_clauses.
(gfc_match_omp_context_selector): Remove extra argument to
gfc_match_omp_clauses.  Set gfc_matching_omp_context_selector
before call to gfc_match_omp_clauses and reset after.
(gfc_match_omp_context_selector_specification): Modify to take a
gfc_omp_set_selector** argument.
(gfc_match_omp_declare_variant): Pass set_selectors to
gfc_match_omp_context_selector_specification.
(match_omp_metadirective): New.
(gfc_match_omp_begin_metadirective): New.
(gfc_match_omp_metadirective): New.
(resolve_omp_metadirective): New.
(gfc_resolve_omp_directive): Handle EXEC_OMP_METADIRECTIVE.
* parse.cc (gfc_matching_omp_context_selector): New variable.
(gfc_in_metadirective_body): New variable.
(gfc_omp_region_count): New variable.
(decode_omp_directive): Match 'begin metadirective',
'end metadirective' and 'metadirective'.
(match_omp_directive): New.
(case_omp_structured_block): New.
(case_omp_do): New.
(gfc_ascii_statement): Handle metadirective statements.
(gfc_omp_end_stmt): New.
(parse_omp_do): Delegate to gfc_omp_end_stmt.
(parse_omp_structured_block): Delegate to gfc_omp_end_stmt. Handle
ST_OMP_END_METADIRECTIVE.
(parse_omp_metadirective_body): New.
(parse_executable): Delegate to case_omp_structured_block and
case_omp_do.  Return after one statement if compiling regular
metadirective.  Handle metadirective statements.
(gfc_parse_file): Reset gfc_omp_region_count,
gfc_in_metadirective_body and gfc_matching_omp_context_selector.
* parse.h (enum gfc_compile_state): Add COMP_OMP_METADIRECTIVE and
COMP_OMP_BEGIN_METADIRECTIVE.
(gfc_omp_end_stmt): New prototype.
(gfc_matching_omp_context_selector): New declaration.
(gfc_in_metadirective_body): New declaration.
(gfc_omp_region_count): New declaration.
* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_METADIRECTIVE.
* st.cc (gfc_free_statement): Handle EXEC_OMP_METADIRECTIVE.
* symbol.cc (compare_st_labels): Take omp_region into account.
(gfc_get_st_labels): Incorporate omp_region into label.
* trans-decl.cc (gfc_get_label_decl): Add omp_region into translated
label name.
* trans-openmp.cc (gfc_trans_omp_directive): Handle
EXEC_OMP_METADIRECTIVE.
(gfc_trans_omp_set_selector): Hoist code from...
(gfc_trans_omp_declare_variant): ...here.
(gfc_trans_omp_metadirective): New.
* trans-stmt.h (gfc_trans_omp_metadirective): New prototype.
* trans.cc (trans_code): Handle EXEC_OMP_METADIRECTIVE.

23 months agoopenmp: Add C++ support for parsing metadirectives
Kwok Cheung Yeung [Tue, 25 Jan 2022 19:01:53 +0000 (11:01 -0800)]
openmp: Add C++ support for parsing metadirectives

This adds support for parsing OpenMP metadirectives in the C++ front end.

2022-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/cp/
* parser.cc (cp_parser_skip_to_end_of_statement): Handle parentheses.
(cp_parser_skip_to_end_of_block_or_statement): Likewise.
(cp_parser_omp_context_selector): Add extra argument.  Allow
non-constant expressions.
(cp_parser_omp_context_selector_specification): Add extra argument and
propagate to cp_parser_omp_context_selector.
(analyze_metadirective_body): New.
(cp_parser_omp_metadirective): New.
(cp_parser_omp_construct): Handle PRAGMA_OMP_METADIRECTIVE.
(cp_parser_pragma): Handle PRAGMA_OMP_METADIRECTIVE.

23 months agoopenmp: Add support for streaming metadirectives and resolving them after LTO
Kwok Cheung Yeung [Tue, 25 Jan 2022 18:49:44 +0000 (10:49 -0800)]
openmp: Add support for streaming metadirectives and resolving them after LTO

This patch adds support for streaming metadirective Gimple statements during
LTO, and adds a metadirective expansion pass that runs after LTO.  This is
required for metadirectives with selectors that can only be resolved from
within the accel compiler.

2022-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* Makefile.in (OBJS): Add omp-expand-metadirective.o.
* gimple-streamer-in.cc (input_gimple_stmt): Add case for
GIMPLE_OMP_METADIRECTIVE.  Handle metadirective labels.
* gimple-streamer-out.cc (output_gimple_stmt): Likewise.
* omp-expand-metadirective.cc: New.
* passes.def: Add pass_omp_expand_metadirective.
* tree-pass.h (make_pass_omp_expand_metadirective): New prototype.

23 months agoopenmp: Add support for resolving metadirectives during parsing and Gimplification
Kwok Cheung Yeung [Tue, 25 Jan 2022 18:42:50 +0000 (10:42 -0800)]
openmp: Add support for resolving metadirectives during parsing and Gimplification

This adds support for resolving metadirectives according to the OpenMP 5.1
specification.  The variants are sorted by score, then gathered into a list
of dynamic replacement candidates.  The metadirective is then expanded into
a sequence of 'if..else' statements to test the dynamic selector and execute
the variant if the selector is satisfied.

If any of the selectors in the list are unresolvable, GCC will give up on
resolving the metadirective and try again later.

2022-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* gimplify.cc (expand_omp_metadirective): New.
* omp-general.cc: Include tree-pretty-print.h.
(DELAY_METADIRECTIVES_AFTER_LTO): New macro.
(omp_context_selector_matches): Delay resolution of selectors.  Allow
non-constant expressions.
(omp_dynamic_cond): New.
(omp_dynamic_selector_p): New.
(sort_variant): New.
(omp_get_dynamic_candidates): New.
(omp_resolve_metadirective): New.
(omp_resolve_metadirective): New.
* omp-general.h (struct omp_metadirective_variant): New.
(omp_resolve_metadirective): New prototype.

gcc/c-family/
* c-omp.cc (c_omp_expand_metadirective_r): New.
(c_omp_expand_metadirective): New.

23 months agoopenmp: Add middle-end support for metadirectives
Kwok Cheung Yeung [Tue, 25 Jan 2022 18:36:59 +0000 (10:36 -0800)]
openmp: Add middle-end support for metadirectives

This adds a new Gimple statement type GIMPLE_OMP_METADIRECTIVE, which
represents the metadirective in Gimple. In high Gimple, the statement
contains the body of the directive variants, whereas in low Gimple, it
only contains labels to the bodies.

This patch adds support for converting metadirectives from tree to Gimple
form, and handling of the Gimple form (Gimple lowering, OpenMP lowering
and expansion, inlining, SSA handling etc).

Metadirectives should be resolved before they reach the back-end, otherwise
the compiler will crash as GCC does not know how to convert metadirective
Gimple statements to RTX.

2022-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* gimple-low.cc (lower_omp_metadirective): New.
(lower_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
* gimple-pretty-print.cc (dump_gimple_omp_metadirective): New.
(pp_gimple_stmt_1): Handle GIMPLE_OMP_METADIRECTIVE.
* gimple-walk.cc (walk_gimple_op): Handle GIMPLE_OMP_METADIRECTIVE.
(walk_gimple_stmt): Likewise.
* gimple.cc (gimple_alloc_omp_metadirective): New.
(gimple_build_omp_metadirective): New.
(gimple_build_omp_metadirective_variant): New.
* gimple.def (GIMPLE_OMP_METADIRECTIVE): New.
(GIMPLE_OMP_METADIRECTIVE_VARIANT): New.
* gimple.h (gomp_metadirective_variant): New.
(gomp_metadirective): New.
(is_a_helper <gomp_metadirective *>::test): New.
(is_a_helper <gomp_metadirective_variant *>::test): New.
(is_a_helper <const gomp_metadirective *>::test): New.
(is_a_helper <const gomp_metadirective_variant *>::test): New.
(gimple_alloc_omp_metadirective): New prototype.
(gimple_build_omp_metadirective): New prototype.
(gimple_build_omp_metadirective_variant): New prototype.
(gimple_has_substatements): Add GIMPLE_OMP_METADIRECTIVE case.
(gimple_has_ops): Add GIMPLE_OMP_METADIRECTIVE.
(gimple_omp_metadirective_label): New.
(gimple_omp_metadirective_set_label): New.
(gimple_omp_metadirective_variants): New.
(gimple_omp_metadirective_set_variants): New.
(CASE_GIMPLE_OMP): Add GIMPLE_OMP_METADIRECTIVE.
* gimplify.cc (is_gimple_stmt): Add OMP_METADIRECTIVE.
(expand_omp_metadirective): New.
(gimplify_omp_metadirective): New.
(gimplify_expr): Add case for OMP_METADIRECTIVE.
* gsstruct.def (GSS_OMP_METADIRECTIVE): New.
(GSS_OMP_METADIRECTIVE_VARIANT): New.
* omp-expand.cc (build_omp_regions_1): Handle GIMPLE_OMP_METADIRECTIVE.
(omp_make_gimple_edges): Likewise.
* omp-low.cc (struct omp_context): Add next_clone field.
(new_omp_context): Initialize next_clone field.
(clone_omp_context): New.
(delete_omp_context): Delete clone contexts.
(scan_omp_metadirective): New.
(scan_omp_1_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
(lower_omp_metadirective): New.
(lower_omp_1): Handle GIMPLE_OMP_METADIRECTIVE.
* tree-cfg.cc (cleanup_dead_labels): Handle GIMPLE_OMP_METADIRECTIVE.
(gimple_redirect_edge_and_branch): Likewise.
* tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
(estimate_num_insns): Likewise.
* tree-pretty-print.cc (dump_generic_node): Handle OMP_METADIRECTIVE.
* tree-ssa-operands.cc (parse_ssa_operands): Handle
GIMPLE_OMP_METADIRECTIVE.

23 months agoopenmp: Add C support for parsing metadirectives
Kwok Cheung Yeung [Tue, 25 Jan 2022 18:31:19 +0000 (10:31 -0800)]
openmp: Add C support for parsing metadirectives

This patch implements parsing for the OpenMP metadirective introduced in
OpenMP 5.0.  Metadirectives are parsed into an OMP_METADIRECTIVE node,
with the variant clauses forming a chain accessible via
OMP_METADIRECTIVE_CLAUSES.  Each clause contains the context selector
and tree for the variant.

User conditions in the selector are now permitted to be non-constant when
used in metadirectives as specified in OpenMP 5.1.

2021-01-25  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* omp-general.cc (omp_context_selector_matches): Add extra argument.
(omp_resolve_metadirective): New stub function.
* omp-general.h (struct omp_metadirective_variant): New.
(omp_context_selector_matches): Add extra argument.
(omp_resolve_metadirective): New prototype.
* tree.def (OMP_METADIRECTIVE): New.
* tree.h (OMP_METADIRECTIVE_CLAUSES): New macro.

gcc/c/
* c-parser.cc (c_parser_skip_to_end_of_block_or_statement): Handle
parentheses in statement.
(c_parser_omp_metadirective): New prototype.
(c_parser_omp_context_selector): Add extra argument.  Allow
non-constant expressions.
(c_parser_omp_context_selector_specification): Add extra argument and
propagate it to c_parser_omp_context_selector.
(analyze_metadirective_body): New.
(c_parser_omp_metadirective): New.
(c_parser_omp_construct): Handle PRAGMA_OMP_METADIRECTIVE.

gcc/c-family/
* c-common.h (enum c_omp_directive_kind): Add C_OMP_DIR_META.
(c_omp_expand_metadirective): New prototype.
* c-gimplify.cc (genericize_omp_metadirective_stmt): New.
(c_genericize_control_stmt): Handle OMP_METADIRECTIVE tree nodes.
* c-omp.cc (omp_directives): Classify metadirectives as C_OMP_DIR_META.
(c_omp_expand_metadirective): New stub function.
* c-pragma.cc (omp_pragmas): Add entry for metadirective.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_METADIRECTIVE.

23 months agolibgomp, nvptx: Add fallback for PTX versions lower than 4.1
Kwok Cheung Yeung [Wed, 22 Jun 2022 17:54:44 +0000 (18:54 +0100)]
libgomp, nvptx: Add fallback for PTX versions lower than 4.1

Avoid using the dynamic_smem_size register if the PTX version does not
support it.

This patch should be included when the 'libgomp, nvptx: low-latency memory
allocator' patch is upstreamed.

2022-06-21  Kwok Cheung Yeung  <kcy@codesourcery.com>

libgomp/
* config/nvptx/team.c (gomp_nvptx_main): Initialize shared_pool_size
to zero.  Do not use dynamic_smem_size register if PTX version lower
than 4.1.

23 months agolibgomp, nvptx: low-latency memory allocator
Andrew Stubbs [Fri, 3 Dec 2021 17:46:41 +0000 (17:46 +0000)]
libgomp, nvptx: low-latency memory allocator

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that the minimum version
requirement is now bumped to 4.1 (still old at this point).

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC.
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.

23 months agolibgomp, nvptx: Update bundled CUDA header file
Kwok Cheung Yeung [Wed, 22 Jun 2022 14:43:05 +0000 (07:43 -0700)]
libgomp, nvptx: Update bundled CUDA header file

This updates the bundled cuda.h header file to include some new API calls and
constants that are now used in the code.

This patch should be included when the "libgomp, nvptx: low-latency memory
allocator" or "openmp: Add support for 'target_device' context selector set"
patches are upstreamed.

2022-06-21  Kwok Cheung Yeung  <kcy@codesourcery.com>

include/
* cuda/cuda.h (CUdevice_attribute): Add definitions for
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.
(CUmemAttach_flags): New.
(CUpointer_attribute): New.
(cuMemAllocManaged): New prototype.
(cuPointerGetAttribute): New prototype.

libgomp/
* plugin/cuda-lib.def (cuMemAllocManaged): Add new call.
(cuPointerGetAttribute): Likewise.

23 months agoifcvt: Don't introduce trapping or faulting reads in noce_try_sign_mask [PR106032]
Jakub Jelinek [Tue, 21 Jun 2022 09:40:16 +0000 (11:40 +0200)]
ifcvt: Don't introduce trapping or faulting reads in noce_try_sign_mask [PR106032]

noce_try_sign_mask as documented will optimize
  if (c < 0)
    x = t;
  else
    x = 0;
into x = (c >> bitsm1) & t;
The optimization is done if either t is unconditional
(e.g. for
  x = t;
  if (c >= 0)
    x = 0;
) or if it is cheap.  We already check that t doesn't have side-effects,
but if t is conditional, we need to punt also if it may trap or fault,
as we make it unconditional.

I've briefly skimmed other noce_try* optimizations and didn't find one that
would suffer from the same problem.

2022-06-21  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/106032
* ifcvt.cc (noce_try_sign_mask): Punt if !t_unconditional, and
t may_trap_or_fault_p, even if it is cheap.

* gcc.c-torture/execute/pr106032.c: New test.

(cherry picked from commit a0c30fe3b888f20215f3e040d21b62b603804ca9)

23 months agoexpand: Fix up expand_cond_expr_using_cmove [PR106030]
Jakub Jelinek [Tue, 21 Jun 2022 09:38:59 +0000 (11:38 +0200)]
expand: Fix up expand_cond_expr_using_cmove [PR106030]

If expand_cond_expr_using_cmove can't find a cmove optab for a particular
mode, it tries to promote the mode and perform the cmove in the promoted
mode.

The testcase in the patch ICEs on arm because in that case we pass temp which
has the promoted mode (SImode) as target to expand_operands where the
operands have the non-promoted mode (QImode).
Later on the function uses paradoxical subregs:
  if (GET_MODE (op1) != mode)
    op1 = gen_lowpart (mode, op1);

  if (GET_MODE (op2) != mode)
    op2 = gen_lowpart (mode, op2);
to change the operand modes.

The following patch fixes it by passing NULL_RTX as target if it has
promoted mode.

2022-06-21  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/106030
* expr.cc (expand_cond_expr_using_cmove): Pass NULL_RTX instead of
temp to expand_operands if mode has been promoted.

* gcc.c-torture/compile/pr106030.c: New test.

(cherry picked from commit 2df1df945fac85d7b3d084001414a66a2709d8fe)

23 months agolibgomp: Fix up target-31.c test [PR106045]
Jakub Jelinek [Tue, 21 Jun 2022 15:51:08 +0000 (17:51 +0200)]
libgomp: Fix up target-31.c test [PR106045]

The i variable is used inside of the parallel in:
      #pragma omp simd safelen(32) private (v)
      for (i = 0; i < 64; i++)
        {
          v = 3 * i;
          ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i];
        }
where i is predetermined linear (so while inside of the body
it is safe, private per SIMD lane var) the final value is written to
the shared variable, and in:
      for (i = 0; i < 64; i++)
        if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i)
          #pragma omp atomic write
            err = 1;
which is a normal loop and so it isn't in any way privatized there.
So we have a data race, fixed by adding private (i) clause to the
parallel.

2022-06-21  Jakub Jelinek  <jakub@redhat.com>
    Paul Iannetta  <piannetta@kalrayinc.com>

PR libgomp/106045
* testsuite/libgomp.c/target-31.c: Add private (i) clause.

(cherry picked from commit 85d613da341b76308edea48359a5dbc7061937c4)

23 months agoloongarch: exclude LARCH_PROLOGUE_TEMP from SIBCALL_REGS [PR 106096]
Xi Ruoyao [Tue, 28 Jun 2022 08:00:14 +0000 (16:00 +0800)]
loongarch: exclude LARCH_PROLOGUE_TEMP from SIBCALL_REGS [PR 106096]

The epilogue may clobber LARCH_PROLOGUE_TEMP ($r13/$t1), so it cannot be
used for sibcalls.

gcc/ChangeLog:

PR target/106096
* config/loongarch/loongarch.h (REG_CLASS_CONTENTS): Exclude
$r13 from SIBCALL_REGS.
* config/loongarch/loongarch.cc (loongarch_regno_to_class):
Change $r13 to JIRL_REGS.

gcc/testsuite/ChangeLog:

PR target/106096
* g++.target/loongarch/loongarch.exp: New test support file.
* g++.target/loongarch/pr106096.C: New test.

(cherry picked from commit 020b7d98589bbc928b5a66b1ed56b42af8791355)

23 months agolibgomp: fix typo in mold linker detection
Martin Liska [Tue, 28 Jun 2022 08:34:56 +0000 (10:34 +0200)]
libgomp: fix typo in mold linker detection

libgomp/ChangeLog:

* acinclude.m4: Fix typo in mold linker detection.
* Makefile.in: Regenerate.
* configure: Regenerate.

(cherry picked from commit 6835baee7196eeb03f24f6e650c0d37cbb295647)

23 months agoDaily bump.
GCC Administrator [Tue, 28 Jun 2022 00:19:29 +0000 (00:19 +0000)]
Daily bump.

2 years agoDaily bump.
GCC Administrator [Mon, 27 Jun 2022 00:19:05 +0000 (00:19 +0000)]
Daily bump.

2 years agoDaily bump.
GCC Administrator [Sun, 26 Jun 2022 00:18:48 +0000 (00:18 +0000)]
Daily bump.

2 years agoDaily bump.
GCC Administrator [Sat, 25 Jun 2022 00:19:16 +0000 (00:19 +0000)]
Daily bump.

2 years agoopenacc: Adjust test expectations to new "kernels" handling
Frederik Harwath [Tue, 16 Nov 2021 15:22:48 +0000 (16:22 +0100)]
openacc: Adjust test expectations to new "kernels" handling

Adjust tests to changed expectations with the new Graphite-based
"kernels" handling.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr84955-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90: Adjust.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90: Removed.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/acc-icf.c: Adjust.
* c-c++-common/goacc/cache-3-1.c: Adjust.
* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c: Adjust.
* c-c++-common/goacc/classify-kernels.c: Adjust.
* c-c++-common/goacc/classify-serial.c: Adjust.
* c-c++-common/goacc/if-clause-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-1.c: Adjust.
* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Adjust.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Adjust.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Adjust.
* c-c++-common/goacc/kernels-loop-3.c: Adjust.
* c-c++-common/goacc/loop-2-kernels.c: Adjust.
* c-c++-common/goacc/nested-reductions-2-parallel.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Adjust.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c: Adjust.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
* c-c++-common/goacc/routine-1.c: Adjust.
* c-c++-common/goacc/routine-level-of-parallelism-2.c: Adjust.
* c-c++-common/goacc/routine-nohost-1.c: Adjust.
* c-c++-common/goacc/uninit-copy-clause.c: Adjust.
* gcc.dg/goacc/loop-processing-1.c: Adjust.
* gcc.dg/goacc/nested-function-1.c: Adjust.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust.
* gfortran.dg/goacc/classify-kernels.f95: Adjust.
* gfortran.dg/goacc/classify-parallel.f95: Adjust.
* gfortran.dg/goacc/classify-routine.f95: Adjust.
* gfortran.dg/goacc/classify-serial.f95: Adjust.
* gfortran.dg/goacc/common-block-3.f90: Adjust.
* gfortran.dg/goacc/gang-static.f95: Adjust.
* gfortran.dg/goacc/kernels-decompose-1.f95: Adjust.
* gfortran.dg/goacc/kernels-decompose-2.f95: Adjust.
* gfortran.dg/goacc/kernels-loop-2.f95: Adjust.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Adjust.
* gfortran.dg/goacc/kernels-loop-inner.f95: Adjust.
* gfortran.dg/goacc/kernels-loop.f95: Adjust.
* gfortran.dg/goacc/kernels-tree.f95: Adjust.
* gfortran.dg/goacc/loop-2-kernels.f95: Adjust.
* gfortran.dg/goacc/loop-auto-transfer-2.f90: Adjust.
* gfortran.dg/goacc/loop-auto-transfer-3.f90: Adjust.
* gfortran.dg/goacc/loop-auto-transfer-4.f90: Adjust.
* gfortran.dg/goacc/nested-function-1.f90: Adjust.
* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Adjust.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Adjust.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95: Adjust.
* gfortran.dg/goacc/routine-module-mod-1.f90: Adjust.
* gfortran.dg/goacc/uninit-copy-clause.f95: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Removed.

2 years agotilegx: Fix infinite loop in gen-mul-tables generator
Iain Buclaw [Wed, 22 Jun 2022 17:11:20 +0000 (19:11 +0200)]
tilegx: Fix infinite loop in gen-mul-tables generator

Since around GCC 10, the condition `j < (INTMAX_MAX / 10)' will get
optimized into `j != 922337203685477580', which will result in an
infinite loop for certain inputs of `j'.

Copy the condition already used by the -DTILEPRO generator code, which
doesn't fall into this trap.

gcc/ChangeLog:

* config/tilepro/gen-mul-tables.cc (tilegx_emit): Adjust loop
condition to avoid overflow.

(cherry picked from commit c0ad48527c314a1e9354b7c26718b56ed4abc92c)

2 years agoc++: constexpr folding in unevaluated context [PR105931]
Patrick Palka [Thu, 23 Jun 2022 20:36:43 +0000 (16:36 -0400)]
c++: constexpr folding in unevaluated context [PR105931]

Changing the type of N from int to unsigned in decltype82.C (from
r13-986-g0ecb6b906f215e) reveals another spot where we perform constexpr
evaluation in an unevaluated context for sake of warnings, this time
from the call to shorten_compare in cp_build_binary_op, which calls
fold_for_warn.

We could (and probably should) suppress the shorten_compare warnings
when in an unevaluated context, but there's probably other callers of
fold_for_warn that are similarly affected.  So this patch takes the
approach of directly suppressing fold_for_warn when in an unevaluated
context.

PR c++/105931

gcc/cp/ChangeLog:

* expr.cc (fold_for_warn): Don't fold when in an unevaluated
context.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype82a.C: New test.

(cherry picked from commit b00b95198e6720eb23a2618870d67800f6180fdd)

2 years agoDaily bump.
GCC Administrator [Fri, 24 Jun 2022 00:18:59 +0000 (00:18 +0000)]
Daily bump.

2 years agoc++: anon union designated init [PR105925]
Jason Merrill [Thu, 23 Jun 2022 20:04:02 +0000 (16:04 -0400)]
c++: anon union designated init [PR105925]

This testcase was failing because CONSTRUCTOR_IS_DESIGNATED_INIT wasn't
getting set on the introduced CONSTRUCTOR for the anonymous union, and
build_aggr_conv uses that flag to decide whether to pay attention to the
indexes of the CONSTRUCTOR.  So set the flag when we see a designator rather
than relying on copying it from another CONSTRUCTOR.

PR c++/105925

gcc/cp/ChangeLog:

* decl.cc (reshape_init_array_1): Set
CONSTRUCTOR_IS_DESIGNATED_INIT here.
(reshape_init_class): And here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig26.C: New test.

2 years agoc++: -Waddress and value-dependent expr [PR105885]
Jason Merrill [Thu, 23 Jun 2022 03:50:23 +0000 (23:50 -0400)]
c++: -Waddress and value-dependent expr [PR105885]

We already suppress various warnings for code that would be tautological if
written directly, but not when it's the result of template substitution.  It
seems we need to do this for -Waddress as well.

PR c++/105885

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build): Also suppress -Waddress for
comparison of dependent operands.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-if37.C: New test.

2 years agoipa-icf: skip variables with body_removed
Martin Liska [Wed, 18 May 2022 13:07:53 +0000 (15:07 +0200)]
ipa-icf: skip variables with body_removed

Similarly to cgraph_nodes, it may happen that body_removed is set
during merging of symbols.

PR ipa/105600

gcc/ChangeLog:

* ipa-icf.cc (sem_item_optimizer::filter_removed_items):
Skip variables with body_removed.

(cherry picked from commit 31ce821a790caec8a2849dd67a9847e78a33d14c)

2 years agotree-object-size: Don't let error_mark_node escape for ADDR_EXPR [PR105736]
Siddhesh Poyarekar [Tue, 21 Jun 2022 06:45:07 +0000 (12:15 +0530)]
tree-object-size: Don't let error_mark_node escape for ADDR_EXPR [PR105736]

The addr_expr computation does not check for error_mark_node before
returning the size expression.  This used to work in the constant case
because the conversion to uhwi would end up causing it to return
size_unknown, but that won't work for the dynamic case.

Modify the control flow to explicitly return size_unknown if the offset
computation returns an error_mark_node.

gcc/ChangeLog:

PR tree-optimization/105736
* tree-object-size.cc (addr_object_size): Return size_unknown
when object offset computation returns an error.

gcc/testsuite/ChangeLog:

PR tree-optimization/105736
* gcc.dg/builtin-dynamic-object-size-0.c (TV4): New struct.
(val3): New variable.
(test_pr105736): New test.
(main): Call it.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
(cherry picked from commit 70454c50b4592fe6876ecca13268264e395e058f)

2 years agoc++: dependence of baselink [PR105964]
Jason Merrill [Wed, 22 Jun 2022 22:19:11 +0000 (18:19 -0400)]
c++: dependence of baselink [PR105964]

helper<token>::c isn't dependent just because we haven't deduced its return
type yet.  type_dependent_expression_p already knows how to deal with that
for bare FUNCTION_DECL, but needs to learn to look through a BASELINK.

PR c++/105964

gcc/cp/ChangeLog:

* pt.cc (type_dependent_expression_p): Look through BASELINK.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype-auto21.C: New test.

2 years agoc++: class scope function lookup [PR105908]
Jason Merrill [Wed, 22 Jun 2022 18:57:21 +0000 (14:57 -0400)]
c++: class scope function lookup [PR105908]

In r12-1273 for PR91706, I removed the code in get_class_binding that
stripped BASELINK.  This testcase demonstrates that we still need to strip
it in outer_binding before putting the overload set in IDENTIFIER_BINDING,
for compatibility with bindings added directly for declarations.

PR c++/105908

gcc/cp/ChangeLog:

* name-lookup.cc (outer_binding): Strip BASELINK.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/trailing16.C: New test.

2 years agoaarch64: Revert bogus fix for PR105254
Richard Sandiford [Wed, 15 Jun 2022 10:12:51 +0000 (11:12 +0100)]
aarch64: Revert bogus fix for PR105254

In f2ebf2d98efe0ac2314b58cf474f44cb8ebd5244 I'd forced the
chosen unroll factor to be a factor of the VF, in order to
work around an exact_div ICE in PR105254.  This was completely
bogus -- clearly I didn't look in enough detail at why we ended
up with an unrolled VF that wasn't a multiple of the UF.

Kewen has since fixed the bug properly for PR105940, so this
patch reverts my earlier attempt.  Sorry for the stupidity.

gcc/
PR tree-optimization/105254
PR tree-optimization/105940

Revert:

* config/aarch64/aarch64.cc
(aarch64_vector_costs::determine_suggested_unroll_factor): Take a
loop_vec_info as argument.  Restrict the unroll factor to values
that divide the VF.
(aarch64_vector_costs::finish_cost): Update call accordingly.

gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_14.c: New test.

(cherry picked from commit 2636660b6f35423e0cfbf53bfad5c5fed6ae6471)

2 years agovect: Move suggested_unroll_factor applying [PR105940]
Kewen Lin [Tue, 14 Jun 2022 05:57:01 +0000 (00:57 -0500)]
vect: Move suggested_unroll_factor applying [PR105940]

As PR105940 shown, when rs6000 port tries to assign
m_suggested_unroll_factor by 4 or so, there will be ICE on:

  exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
             loop_vinfo->suggested_unroll_factor);

In function vect_analyze_loop_2, the current place of
suggested_unroll_factor applying can't guarantee it's
applied for all cases.  As the case shows, vectorizer
could retry with SLP forced off, the vf is reset by
saved_vectorization_factor which isn't applied with
suggested_unroll_factor before.  It means it can end
up with one vf which neglects suggested_unroll_factor.
I think it's off design, we should move the applying
of suggested_unroll_factor after start_over.

PR tree-optimization/105940

gcc/ChangeLog:

* tree-vect-loop.cc (vect_analyze_loop_2): Move the place of
applying suggested_unroll_factor after start_over.

(cherry picked from commit f907cf4c07cf51863dadbe90894e2ae3382bada5)

2 years agoDaily bump.
GCC Administrator [Thu, 23 Jun 2022 00:19:39 +0000 (00:19 +0000)]
Daily bump.

2 years agoDaily bump.
GCC Administrator [Wed, 22 Jun 2022 00:19:10 +0000 (00:19 +0000)]
Daily bump.

2 years agoi386: Disallow sibcall for calling ifunc functions with PIC register
H.J. Lu [Tue, 14 Jun 2022 15:20:16 +0000 (08:20 -0700)]
i386: Disallow sibcall for calling ifunc functions with PIC register

Disallow siball when calling ifunc functions with PIC register so that
PIC register can be restored.

gcc/

PR target/105960
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
false if PIC register is used when calling ifunc functions.

gcc/testsuite/

PR target/105960
* gcc.target/i386/pr105960.c: New test.

(cherry picked from commit fe9765c0b97e6b4ce2cd226631d329fc05ba2aa5)

2 years agographite: Accept loops without data references
Frederik Harwath [Tue, 16 Nov 2021 15:22:29 +0000 (16:22 +0100)]
graphite: Accept loops without data references

It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops.  Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions.  If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.

gcc/ChangeLog:

* graphite-scop-detection.cc (scop_detection::harmful_loop_in_region):
Remove check for loops without data references.

2 years agographite: Adjust scop loop-nest choice
Frederik Harwath [Tue, 16 Nov 2021 15:21:57 +0000 (16:21 +0100)]
graphite: Adjust scop loop-nest choice

The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP.  The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region.  The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop.  This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops.  This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.

gcc/ChangeLog:

        * graphite-scop-detection.cc (scop_context_loop): New function.
        (build_alias_set): Use scop_context_loop instead of find_common_loop.
        * graphite-isl-ast-to-gimple.cc (graphite_regenerate_ast_isl): Likewise.
        * graphite.h (scop_context_loop): New declaration.

2 years agographite: Tune parameters for OpenACC use
Frederik Harwath [Tue, 16 Nov 2021 15:21:42 +0000 (16:21 +0100)]
graphite: Tune parameters for OpenACC use

The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes.  Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.

This commit increases the parameter values on OpenACC functions.  The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite.  Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.

gcc/ChangeLog:

        * graphite-optimize-isl.cc (optimize_isl): Adjust
param_max_isl_operations value for OpenACC functions and add
special warnings if value gets exceeded.

* graphite-scop-detection.cc (build_scops): Likewise for
param_graphite_max_arrays_per_scop.

gcc/testsuite/ChangeLog:

        * gcc.dg/goacc/graphite-parameter-1.c: New test.
        * gcc.dg/goacc/graphite-parameter-2.c: New test.

2 years agoopenacc: Disable pass_pre on outlined functions analyzed by Graphite
Frederik Harwath [Tue, 16 Nov 2021 15:20:56 +0000 (16:20 +0100)]
openacc: Disable pass_pre on outlined functions analyzed by Graphite

The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.

This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.

gcc/ChangeLog:

        * tree-ssa-pre.cc (insert): Skip any insertions in OpenACC
functions that might be processed by Graphite.

2 years agoopenacc: Handle internal function calls in pass_lim
Frederik Harwath [Tue, 16 Nov 2021 15:20:41 +0000 (16:20 +0100)]
openacc: Handle internal function calls in pass_lim

The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable.  The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.

This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop.  This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process.  Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.

A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.

gcc/ChangeLog:
        * passes.def: Set restrict_oacc_hoisting to true for the early
pass_lim instance.
* tree-ssa-loop-im.cc (movement_possibility): Add
restrict_oacc_hoisting flag to function; restrict movement if set.
(compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
        (gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
calls.
        (loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
pass it on.
        (pass_lim::execute): Pass on new flags.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust declaration.
* gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call to
loop_invariant_motion_in_fun.

2 years agoopenacc: Warn about "independent" "kernels" loops with data-dependences
Frederik Harwath [Tue, 16 Nov 2021 15:20:15 +0000 (16:20 +0100)]
openacc: Warn about "independent" "kernels" loops with data-dependences

This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences.  A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.

gcc/ChangeLog:

        * common.opt: Add flag Wopenacc-false-independent.
        * omp-offload.cc (oacc_loop_warn_if_false_independent): New function.
        (oacc_loop_fixed_partitions): Call from here.

2 years agoopenacc: Add runtime alias checking for OpenACC kernels
Andrew Stubbs [Tue, 16 Nov 2021 15:19:53 +0000 (16:19 +0100)]
openacc: Add runtime alias checking for OpenACC kernels

This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite.  The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls.  If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.

gcc/ChangeLog:

* graphite-isl-ast-to-gimple.cc: Include internal-fn.h.
(graphite_oacc_analyze_scop): Implement runtime alias checks.
* omp-expand.cc (expand_oacc_for): Add an additional "noalias" parameter
to GOACC_LOOP internal calls, and initialise it to integer_one_node.
* omp-offload.cc (oacc_xform_loop): Integrate the runtime alias check
into the GOACC_LOOP expansion.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.

2 years agoopenacc: Add data optimization pass
Andrew Stubbs [Tue, 16 Nov 2021 15:19:23 +0000 (16:19 +0100)]
openacc: Add data optimization pass

Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.

This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private".  The pass is restricted
"kernels" regions but could be extended to other types of regions.

gcc/ChangeLog:

        * Makefile.in: Add pass.
        * doc/gimple.texi: TODO.
        * gimple-walk.cc (walk_gimple_seq_mod): Adjust for backward walking.
        * gimple-walk.h (struct walk_stmt_info): Add field.
        * passes.def: Add new pass.
        * tree-pass.h (make_pass_omp_data_optimize): New declaration.
        * omp-data-optimize.cc: New file.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Expect optimization messages.
        * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

gcc/testsuite/ChangeLog:

        * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
        * c-c++-common/goacc/uninit-copy-clause.c: Likewise.
        * gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
        * c-c++-common/goacc/omp_data_optimize-1.c: New test.
        * g++.dg/goacc/omp_data_optimize-1.C: New test.
        * gfortran.dg/goacc/omp_data_optimize-1.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2 years agoAdd function for printing a single OMP_CLAUSE
Frederik Harwath [Tue, 16 Nov 2021 15:18:02 +0000 (16:18 +0100)]
Add function for printing a single OMP_CLAUSE

Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs.  The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.

gcc/ChangeLog:

        * tree-pretty-print.cc (print_omp_clause_to_str): Add new function.
        * tree-pretty-print.h (print_omp_clause_to_str): Add declaration.

2 years agoopenacc: Remove unused partitioning in "kernels" regions
Frederik Harwath [Tue, 16 Nov 2021 15:17:48 +0000 (16:17 +0100)]
openacc: Remove unused partitioning in "kernels" regions

With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.

This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.

gcc/ChangeLog:

        * omp-offload.cc (oacc_remove_unused_partitioning): New function
        for removing partitioning that is not used by any loop.
        (oacc_validate_dims): Call oacc_remove_unused_partitioning and
        enable warnings about unused partitioning.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
        expectations.

2 years agoopenacc: Add further kernels tests
Frederik Harwath [Tue, 16 Nov 2021 15:17:15 +0000 (16:17 +0100)]
openacc: Add further kernels tests

Add some copies of tests to continue covering the old "parloops"-based
"kernels" implementation - until it gets removed from GCC - and
add further tests for the new Graphite-based implementation.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90:
New test.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
New test.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
New test.
* c-c++-common/goacc/kernels-decompose-1-parloops.c: New test.
* c-c++-common/goacc/kernels-reduction-parloops.c: New test.
* c-c++-common/goacc/loop-auto-reductions.c: New test.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c:
New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c:
New test.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
New test.
* gfortran.dg/goacc/kernels-conversion.f95: New test.
* gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test.
* gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops.f95: New test.
* gfortran.dg/goacc/kernels-reductions.f90: New test.

2 years agoopenacc: Add "can_be_parallel" flag info to "graph" dumps
Frederik Harwath [Tue, 16 Nov 2021 15:16:47 +0000 (16:16 +0100)]
openacc: Add "can_be_parallel" flag info to "graph" dumps

gcc/ChangeLog:

* graph.cc (oacc_get_fn_attrib): New declaration.
(find_loop_location): New declaration.
(draw_cfg_nodes_for_loop): Print value of the
can_be_parallel flag at the top of loops in OpenACC
functions.

2 years agoopenacc: Use Graphite for dependence analysis in "kernels" regions
Frederik Harwath [Tue, 16 Nov 2021 15:16:22 +0000 (16:16 +0100)]
openacc: Use Graphite for dependence analysis in "kernels" regions

This commit changes the handling of OpenACC "kernels" to use Graphite
for dependence analysis. To this end, it first introduces a new
internal representation for "kernels" regions which should be analyzed
by Graphite in pass_omp_oacc_kernels_decompose.  This is now the
default for all "kernels" regions, but the old handling is still
available through the command line parameter
"--param=openacc_kernels=decompose-parloops".  The handling of this
new region type in the omp lowering and omp offloading passes follows
the existing handling for "parallel" regions.  This replaces the
specialized handling for "kernels" regions that was previously used
and which was in limited in many ways.

Graphite is adjusted to be able to analyze the OpenACC functions that
get outlined from the "kernels" regions. It is enabled to handle the
internal function calls that contain information about OpenACC
constructs. In some places where function calls would be rejected by
Graphite, those calls need to be ignored. In other places, information
about the loop step, bounds etc. needs to be extracted from the
calls. The goal is to enable an analysis of the original loop
parameters although the omp lowering and expansion steps have already
modified the loop structure.  Some parallelization-enabling constructs
such as OpenACC "reduction" and "private"/"firstprivate" clauses must
be recognized and the data-dependences must be adjusted to reflect the
semantics of those constructs.  The data-dependence analysis step in
Graphite has so far been tied to the code generation step.  This
commit introduces a separate data-dependence analysis step that avoids
the code generation.  This is necessary because adjusting the code
generation to create a correct OpenACC loop structure would require
very considerable effort and the goal of this commit is to implement
the dependence analysis only. The ability to use Graphite for
dependence analysis without its code generation might be of
independent interest, but it is so far used for OpenACC purposes
only. In general, all changes to Graphite try to avoid affecting other
uses of Graphite as much as possible.

gcc/ChangeLog:

* Makefile.in: Add graphite-oacc.o
* cfgloop.cc (alloc_loop): Set can_be_parallel_valid_p to false.
* cfgloop.h: Add can_be_parallel_valid_p field.
* cfgloopmanip.cc (copy_loop_info): Add assert.
* config/nvptx/nvptx.cc (nvptx_goacc_reduction_setup):
* doc/invoke.texi: Adjust param openacc-kernels description.
* doc/passes.texi: Adjust pass_ipa_oacc_kernels description.
* flag-types.h (enum openacc_kernels):Add
OPENACC_KERNELS_DECOMPOSE_PARLOOPS.
* gimple-pretty-print.cc (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
* gimple.h (enum gf_mask): Add
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and
widen GF_OMP_TARGET_KIND_MASK.
(is_gimple_omp_oacc): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(is_gimple_omp_offloaded): Likewise.
* gimplify.cc (gimplify_omp_for): Enable reduction localization
for "kernels" regions.
(gimplify_omp_workshare): Likewise.
* graphite-dependences.cc (scop_get_reads_and_writes): Handle
"kills" and "reduction" PDRs.
(apply_schedule_on_deps): Add dump output for intermediate
steps of the dependence computation to enable understanding
of unexpected dependences.
(carries_deps): Likewise.
(scop_get_dependences): Handle "kill" operations and add dump
output.
* graphite-isl-ast-to-gimple.cc (visit_schedule_loop_node): New function.
(graphite_oacc_analyze_scop): New function.
* graphite-optimize-isl.cc (optimize_isl): Remove "static" and
add argument to identify OpenACC use; don't fail on unchanged
schedule in this case.
* graphite-poly.cc (new_poly_dr): Handle "kills".
(print_pdr): Likewise.
(new_gimple_poly_bb): Likewise.
(free_gimple_poly_bb): Likewise.
(new_scop): Handle "reduction", "private", and "firstprivate"
hash sets.
(free_scop): Likewise.
(print_isl_space): New function.
(debug_isl_space): New function.
* graphite-scop-detection.cc (scop_detection::can_represent_loop):
Don't fail if niter is 0 in OpenACC functions.
(scop_detection::add_scop): Don't reject regions with only one
loop in OpenACC functions.
(ignored_oacc_internal_call_p): New function.
(scan_tree_for_params): Handle VIEW_CONVERT_EXPR.
(stmt_has_side_effects): Ignore internal OpenACC function calls.
(add_write): Likewise.
(add_read): Likewise.
(add_kill): New function.
(add_kills): New function.
(add_oacc_kills): New function.
(try_generate_gimple_bb): Kill false dependences for OpenACC
"private"/"firstprivate" vars.
(gather_bbs::gather_bbs): Determin OpenACC
"private"/"firstprivate" vars in region.
(gather_bbs::before_dom_children): Add assert.
(determine_openacc_reductions): New function.
(build_scops): Determine OpenACC "reduction" vars in SCoP.
* graphite-sese-to-poly.cc (oacc_ifn_call_extract): New declaration.
(oacc_internal_call_p): New function.
(build_poly_dr): Ignore internal OpenACC function calls,
handle "reduction" refs.
(build_poly_sr): Likewise; handle "kill" operations.
* graphite.cc (graphite_transform_loops): Accept functions with
only a single loop.
(oacc_enable_graphite_p): New function.
(gate_graphite_transforms): Enable pass on OpenACC functions.
* graphite.h (enum poly_dr_type): Add PDR_KILL.
(struct poly_dr): Add "is_reduction" field.
(new_poly_dr): Add argument to declaration.
(pdr_kill_p): New function.
(print_isl_space): New declaration.
(debug_isl_space): New declaration.
(struct scop): Add fields "reductions_vars",
"oacc_firstprivate_vars", and "oacc_private_scalars".
(optimize_isl): New declaration.
(graphite_oacc_analyze_scop): New declaration.
* internal-fn.cc (expand_UNIQUE): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE
* internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE
* omp-expand.cc (struct omp_region): Adjust comment.
(expand_omp_taskloop_for_inner):
(expand_omp_for): Add asserts about expected "kernels" region types.
(mark_loops_in_oacc_kernels_region): Likewise.
(expand_omp_target): Likewise; handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(build_omp_regions_1): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
Likewise.
(omp_make_gimple_edges): Likewise.
* omp-general.cc (oacc_get_kernels_attrib): New function.
(oacc_get_fn_dim_size): Allow argument to be NULL.
* omp-general.h (oacc_get_kernels_attrib): New declaration.
* omp-low.cc (struct omp_context): Add fields
"oacc_firstprivate_vars" and "oacc_private_scalars".
(was_originally_oacc_kernels): New function.
(is_oacc_kernels):
(is_oacc_kernels_decomposed_graphite_part): New function.
(new_omp_context): Allocate "oacc_first_private_vars" and
"oacc_private_scalars" ...
(delete_omp_context): ... and free from here.
(oacc_record_firstprivate_var_clauses): New function.
(oacc_record_private_scalars): New function.
(scan_sharing_clauses): Call functions to record "private"
scalars and "firstprivate" variables.
(check_oacc_kernel_gwv): Add assert.
(ctx_in_oacc_kernels_region): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(scan_omp_for): Likewise.
(check_omp_nesting_restrictions): Likewise.
(lower_oacc_head_mark): Likewise.
(lower_omp_for): Likewise.
(lower_omp_target): Create "private" and "firstprivate" marker
call statements.
(lower_oacc_head_tail): Adjust "private" and "firstprivate"
marker calls.
(lower_oacc_reductions): Emit "private" and "firstprivate"
 marker call statements.
(make_oacc_firstprivate_vars_marker): New function.
(make_oacc_private_scalars_marker): New function.
* omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to
region using the new "kernels" handling.
(make_region_seq): Adjust default region type for new
"kernels" handling; no more exceptions, let Graphite handle everything.
(make_region_loop_nest): Likewise; add dump output and assert.
(adjust_nested_loop_clauses): Stop creating "auto" clauses if
loop has "independent", "gang" etc.
(transform_kernels_loop_clauses): Likewise.
* omp-offload.cc (oacc_extract_loop_call): New function.
(oacc_loop_get_cfg_loop): New function.
(can_be_parallel_str): New function.
(oacc_loop_can_be_parallel_p): New function.
(oacc_parallel_kernels_graphite_fun_p): New function.
(oacc_parallel_fun_p): New function.
(oacc_loop_transform_auto_into_independent): New function, ...
(oacc_loop_fixed_partitions): ... called from here to transfer
the result of Graphite's analysis to the loop.
(execute_oacc_loop_designation): Handle "oacc
functions with "parallel_kernels_graphite" attribute.
(execute_oacc_device_lower): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE.
* omp-offload.h (oacc_extract_loop_call): Add declaration.
* params.opt: Add "param=openacc-kernels" value "decompose-parloops".
* sese.cc (scalar_evolution_in_region): "Redirect" SCEV
analysis to outer loop for IFN_GOACC_LOOP calls.
* sese.h: Add field "kill_scalar_refs".
* tree-chrec.cc (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR
like CASE_CONVERT.
* tree-data-ref.cc (dump_data_reference): Include
DR_BASE_ADDRESS and DR_OFFSET in dump output.
(get_references_in_stmt): Don't reject OpenACC internal function
calls.
(graphite_find_data_references_in_stmt): Remove unused variable.
* tree-parloops.cc (pass_parallelize_loops::execute): Disable
pass with the new kernels handling, enable if requested explicitly.
* tree-scalar-evolution.cc (set_scev_analyze_openacc_calls):
Set flag to enable the analysis of internal OpenACC function
calls (use for Graphite only).
(oacc_call_analyzable_p): New function.
(oacc_ifn_call_extract): New function.
(oacc_simplify): New function.
(add_to_evolution): Simplify OpenACC internal function calls
if applicable.
(follow_ssa_edge_binary): Likewise.
(follow_ssa_edge_expr): Likewise.
(follow_copies_to_constant): Likewise.
(analyze_initial_condition): Likewise.
(interpret_loop_phi): Likewise.
(interpret_gimple_call): New function.
(interpret_rhs_expr): Likewise.
(instantiate_scev_name): Likewise.
(analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions.
(expression_expensive_p): Consider internal OpenACC calls to
be cheap.
* tree-scalar-evolution.h (set_scev_analyze_openacc_calls):
New declaration.
(oacc_call_analyzable_p): New declaration.
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Mark
lhs of internal OpenACC function calls necessary.
* tree-ssa-ifcombine.c (recognize_if_then_else):
* tree-ssa-loop-niter.cc (oacc_call_analyzable_p):
(oacc_ifn_call_extract): New declaration.
(interpret_gimple_call): New delcaration.
(expand_simple_operations): Handle internal OpenACC function calls.
* tree-ssa-loop.cc (gate_oacc_kernels): Disable for new
"kernels" handling.
* graphite-oacc.cc: New file.
* graphite-oacc.h: New file.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/classify-kernels.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Removed.
* c-c++-common/goacc/kernels-reduction.c: Removed.
* gfortran.dg/goacc/loop-auto-transfer-2.f90: New test.
* gfortran.dg/goacc/loop-auto-transfer-3.f90: New test.
* gfortran.dg/goacc/loop-auto-transfer-4.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2 years agographite: Add runtime alias checking
Frederik Harwath [Tue, 16 Nov 2021 15:15:08 +0000 (16:15 +0100)]
graphite: Add runtime alias checking

Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

        * common.opt: Add fgraphite-runtime-alias-checks.
        * graphite-isl-ast-to-gimple.cc
        (generate_alias_cond): New function.
        (graphite_regenerate_ast_isl): Use from here.
        * graphite-poly.cc (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
        * graphite-scop-detection.cc (dr_defs_outside_region): New function.
        (dr_well_analyzed_for_runtime_alias_check_p): New function.
        (graphite_runtime_alias_check_p): New function.
        (build_alias_set): Record unhandled alias ddrs for later alias check
        creation if flag_graphite_runtime_alias_checks is true instead
        of failing.
        * graphite.h (struct scop): Add field unhandled_alias_ddrs.
        * sese.h (has_operands_from_region_p): New function.

gcc/testsuite/ChangeLog:

        * gcc.dg/graphite/alias-1.c: New test.

2 years agoMove compute_alias_check_pairs to tree-data-ref.c
Frederik Harwath [Tue, 16 Nov 2021 15:14:41 +0000 (16:14 +0100)]
Move compute_alias_check_pairs to tree-data-ref.c

Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
* tree-loop-distribution.cc (data_ref_segment_size): Remove function.
(latch_dominated_by_data_ref): Likewise.
(compute_alias_check_pairs): Likewise.

* tree-data-ref.cc (data_ref_segment_size): New function,
copied from tree-loop-distribution.c
(compute_alias_check_pairs): Likewise.
(latch_dominated_by_data_ref): Likewise.

* tree-data-ref.h (compute_alias_check_pairs): New declaration.

2 years agoFix branch prediction dump message
Frederik Harwath [Tue, 16 Nov 2021 15:13:51 +0000 (16:13 +0100)]
Fix branch prediction dump message

Instead of, for instance, "Loop got predicted 1 to iterate 10 times"
the message should be "Loop 1 got predicted to iterate 10 times".

gcc/ChangeLog:

* predict.cc (pass_profile::execute): Fix dump message.

2 years agographite: Fix minor mistakes in comments
Frederik Harwath [Tue, 16 Nov 2021 15:13:03 +0000 (16:13 +0100)]
graphite: Fix minor mistakes in comments

gcc/ChangeLog:

* graphite-sese-to-poly.cc (build_poly_sr_1): Fix a typo and
          a reference to a variable which does not exist.
* graphite-isl-ast-to-gimple.cc (gsi_insert_earliest): Fix typo
          in comment.

2 years agographite: Rename isl_id_for_ssa_name
Frederik Harwath [Tue, 16 Nov 2021 15:12:23 +0000 (16:12 +0100)]
graphite: Rename isl_id_for_ssa_name

The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

* graphite-sese-to-poly.cc (isl_id_for_ssa_name): Rename to ...
  (isl_id_for_parameter): ... this new function name.
  (build_scop_context): Adjust function use.

2 years agographite: Extend SCoP detection dump output
Frederik Harwath [Tue, 16 Nov 2021 15:11:21 +0000 (16:11 +0100)]
graphite: Extend SCoP detection dump output

Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

gcc/ChangeLog:

        * graphite-scop-detection.cc (scop_detection::can_represent_loop):
Output reason for failure to dump file.
        (scop_detection::harmful_loop_in_region): Likewise.
        (scop_detection::graphite_can_represent_expr): Likewise.
        (scop_detection::stmt_has_simple_data_refs_p): Likewise.
        (scop_detection::stmt_simple_for_scop_p): Likewise.
(print_sese_loop_numbers): New function.
        (scop_detection::add_scop): Use from here to print loops in
rejected SCoP.

2 years agoopenacc: Move pass_oacc_device_lower after pass_graphite
Frederik Harwath [Tue, 16 Nov 2021 15:07:34 +0000 (16:07 +0100)]
openacc: Move pass_oacc_device_lower after pass_graphite

The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

* omp-general.cc (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-offload.cc (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.cc (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations.  Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.cc (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.cc (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.cc (pass_phiprop::clone): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* c-c++-common/goacc/device-lowering-no-optimization.c: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2 years agoFortran: delinearize multi-dimensional array accesses
Sandra Loosemore [Tue, 16 Nov 2021 15:09:51 +0000 (16:09 +0100)]
Fortran: delinearize multi-dimensional array accesses

The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

        gcc/
        * expr.cc (get_inner_reference): Handle NOP_EXPR like
        VIEW_CONVERT_EXPR.

        gcc/fortran/
        * lang.opt (-param=delinearize=): New.
        * trans-array.cc (get_class_array_vptr): New, split from...
        (build_array_ref): ...here.
        (get_array_lbound, get_array_ubound): New, split from...
        (gfc_conv_array_ref): ...here.  Additional code refactoring
        plus support for delinearization of the array access.

        gcc/testsuite/
        * gfortran.dg/assumed_type_2.f90: Adjust patterns.
        * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
        * gfortran.dg/graphite/block-3.f90: Remove xfails.
        * gfortran.dg/graphite/block-4.f90: Likewise.
        * gfortran.dg/inline_matmul_24.f90: Adjust patterns.
        * gfortran.dg/no_arg_check_2.f90: Likewise.
        * gfortran.dg/pr32921.f: Likewise.
        * gfortran.dg/reassoc_4.f: Disable delinearization for this test.

Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
2 years agoFix gimple_debug_cfg declaration
Frederik Harwath [Tue, 16 Nov 2021 15:08:40 +0000 (16:08 +0100)]
Fix gimple_debug_cfg declaration

Silence a warning. The argument type did not match the definition.

gcc/ChangeLog:

* tree-cfg.h (gimple_debug_cfg): Change argument type from int
to dump_flags_t.

2 years agotestsuite/libgomp.oacc-fortran/: Add -Wopenacc-parallelism
Tobias Burnus [Thu, 21 Oct 2021 07:28:57 +0000 (09:28 +0200)]
testsuite/libgomp.oacc-fortran/: Add -Wopenacc-parallelism

The following testcases expect the -Wopenacc-parallelism warning output
but did fail as not compiled with that warning; solution: add it.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: Compile
with -Wopenacc-parallelism.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: Likewise.

2 years agogomp/target-device-ancestor-*.f90: Fix testcase of OG11
Tobias Burnus [Thu, 14 Oct 2021 07:29:35 +0000 (09:29 +0200)]
gomp/target-device-ancestor-*.f90: Fix testcase of OG11

Contrary to GCC 12 mainline, OG11 defers the error for
'omp requires reverse_offload' until runtime (via libgomp).
Update the testcases accordingly.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/target-device-ancestor-2.f90: Remove dg-error
for the requires-reverse_offload sorry.
* gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise.
* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.

2 years agoOpenMP: Fix target device ancestor tests according to reverse_offload.
Marcel Vollweiler [Fri, 24 Sep 2021 15:32:53 +0000 (08:32 -0700)]
OpenMP: Fix target device ancestor tests according to reverse_offload.

This patch removes the expectation that 'requires reverse_offload' is
unsupported from some 'target device ancester' tests which were introduced in
commit 03be3cfeef7b3811acb6c4a8da2fc5c1e25d3e4c. This is necessary since
commit f5bfc65f9a6e1f69b17d3740d043d2fbda339e05 changed the behaviour for
reverse_offload.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-device-ancestor-2.c: Remove message for
unsupported reverse offload.
* c-c++-common/gomp/target-device-ancestor-3.c: Likewise.
* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.

2 years agoopenacc: fix ICE for non-decl expression in non-contiguous array base-pointer
Chung-Lin Tang [Thu, 19 Aug 2021 08:17:02 +0000 (16:17 +0800)]
openacc: fix ICE for non-decl expression in non-contiguous array base-pointer

Currently, we do not support cases like struct-members as the base-pointer
for an OpenACC non-contiguous array. Mark such cases as unsupported in the
C/C++ front-ends, instead of ICEing on them.

gcc/c/ChangeLog:

* c-typeck.cc (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.

gcc/cp/ChangeLog:

* semantics.cc (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.

2 years agolibgomp amdgcn: Fix issues with dynamic OpenMP thread scaling
Andrew Stubbs [Tue, 3 Aug 2021 12:45:35 +0000 (13:45 +0100)]
libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling

libgomp/ChangeLog:

* config/gcn/bar.h (gomp_barrier_init): Limit thread count to the
actual physical number.
* config/gcn/team.c (gomp_team_start): Don't attempt to set up
threads that do not exist.

2 years ago[og11] OpenMP/OpenACC: Move array_ref/indirect_ref handling code out of extract_base_...
Julian Brown [Thu, 3 Jun 2021 13:47:00 +0000 (06:47 -0700)]
[og11] OpenMP/OpenACC: Move array_ref/indirect_ref handling code out of extract_base_bit_offset

At Richard Biener's suggestion, this patch undoes the following patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571712.html

and moves the stripping of ARRAY_REFS/INDIRECT_REFS out of
extract_base_bit_offset and back into the (two) call sites of the
function. The difference between the two ways of looking through these
nodes comes down to (I think) what processing has been done on the
clause in question already: in the case where BASE_REF is non-NULL,
we are processing an OMP_CLAUSE_DECL for the first time. Conversely,
when BASE_REF is NULL, we are processing a node from the sorted list
that is being constructed after a GOMP_MAP_STRUCT node.

2021-06-07  Julian Brown  <julian@codesourcery.com>

gcc/
* gimplify.cc (extract_base_bit_offset): Don't look through ARRAY_REFs or
INDIRECT_REFs here.
(build_struct_group): Reinstate previous behaviour for handling
ARRAY_REFs/INDIRECT_REFs.

2 years ago[og11] Rework indirect struct handling for OpenACC in gimplify.c
Julian Brown [Tue, 18 May 2021 17:22:56 +0000 (10:22 -0700)]
[og11] Rework indirect struct handling for OpenACC in gimplify.c

This patch reworks indirect struct handling in gimplify.c (i.e. for
struct components mapped with "mystruct->a[0:n]", "mystruct->b", etc.),
for OpenACC.  The key observation leading to these changes was that
component mappings of references-to-structures is already implemented
and working, and indirect struct component handling via a pointer can
work quite similarly.  That lets us remove some earlier, special-case
handling for mapping indirect struct component accesses for OpenACC,
which required the pointed-to struct to be manually mapped before the
indirect component mapping.

With this patch, you can map struct components directly (e.g. an array
slice "mystruct->a[0:n]") just like you can map a non-indirect struct
component slice ("mystruct.a[0:n]"). Both references-to-pointers (with
the former syntax) and references to structs (with the latter syntax)
work now.

For Fortran class pointers, we no longer re-use GOMP_MAP_TO_PSET for the
class metadata (the structure that points to the class data and vptr)
-- it is instead treated as any other struct.

For C++, the struct handling also works for class members ("this->foo"),
without having to explicitly map "this[:1]" first.

For OpenACC, we permit chained indirect component references
("mystruct->a->b[0:n]"), though only the last part of such mappings will
trigger an attach/detach operation.  To properly use such a construct
on the target, you must still manually map "mystruct->a[:1]" first --
but there's no need to map "mystruct[:1]" explicitly before that.

This version of the patch avoids altering code paths for OpenMP,
where possible.

2021-06-02  Julian Brown  <julian@codesourcery.com>

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Don't create GOMP_MAP_TO_PSET
mappings for class metadata, nor GOMP_MAP_POINTER mappings for
POINTER_TYPE_P decls.

gcc/
* gimplify.cc (extract_base_bit_offset): Add BASE_IND and OPENMP
parameters.  Handle pointer-typed indirect references for OpenACC
alongside reference-typed ones.
(strip_components_and_deref, aggregate_base_p): New functions.
(build_struct_group): Add pointer type indirect ref handling,
including chained references, for OpenACC.  Also handle references to
structs for OpenACC.  Conditionalise bits for OpenMP only where
appropriate.
(gimplify_scan_omp_clauses): Rework pointer-type indirect structure
access handling to work more like the reference-typed handling for
OpenACC only.
* omp-low.cc (scan_sharing_clauses): Handle pointer-type indirect struct
references, and references to pointers to structs also.

gcc/testsuite/
* g++.dg/goacc/member-array-acc.C: New test.
* g++.dg/gomp/member-array-omp.C: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test.
* testsuite/libgomp.oacc-c++/deep-copy-17.C: New test.

2 years ago[og11] Refactor struct lowering for OpenACC/OpenMP in gimplify.c
Julian Brown [Tue, 18 May 2021 17:08:22 +0000 (10:08 -0700)]
[og11] Refactor struct lowering for OpenACC/OpenMP in gimplify.c

This patch is a second attempt at refactoring struct component mapping
handling for OpenACC/OpenMP during gimplification, after the patch I
posted here:

  https://gcc.gnu.org/pipermail/gcc-patches/2018-November/510503.html

And improved here, post-review:

  https://gcc.gnu.org/pipermail/gcc-patches/2019-November/533394.html

This patch goes further, in that the struct-handling code is outlined
into its own function (to create the "GOMP_MAP_STRUCT" node and the
sorted list of nodes immediately following it, from a set of mappings
of components of a given struct or derived type). I've also gone through
the list-handling code and attempted to add comments documenting how it
works to the best of my understanding, and broken out a couple of helper
functions in order to (hopefully) have the code self-document better also.

2021-06-02  Julian Brown  <julian@codesourcery.com>

gcc/
* gimplify.cc (insert_struct_comp_map): Refactor function into...
(build_struct_comp_nodes): This new function.  Remove list handling
and improve self-documentation.
(insert_node_after, move_node_after, move_nodes_after,
move_concat_nodes_after): New helper functions.
(build_struct_group): New function to build up GOMP_MAP_STRUCT node
groups to map struct components. Outlined from...
(gimplify_scan_omp_clauses): Here.  Call above function.

2 years ago[og11] Unify ARRAY_REF/INDIRECT_REF stripping code in extract_base_bit_offset
Julian Brown [Mon, 19 Apr 2021 13:24:41 +0000 (06:24 -0700)]
[og11] Unify ARRAY_REF/INDIRECT_REF stripping code in extract_base_bit_offset

For historical reasons, it seems that extract_base_bit_offset
unnecessarily used two different ways to strip ARRAY_REF/INDIRECT_REF
nodes from component accesses. I verified that the two ways of performing
the operation gave the same results across the whole testsuite (and
several additional benchmarks).

The code was like this since an earlier "mechanical" refactoring by me,
first posted here:

  https://gcc.gnu.org/pipermail/gcc-patches/2018-November/510503.html

It was never clear to me if there was an important semantic
difference between the two ways of stripping the base before calling
get_inner_reference, but it appears that there is not, so one can go away.

2021-06-02  Julian Brown  <julian@codesourcery.com>

gcc/
* gimplify.cc (extract_base_bit_offset): Unify ARRAY_REF/INDIRECT_REF
stripping code in first call/subsequent call cases.

2 years ago[og11] Rewrite GOMP_MAP_ATTACH_DETACH mappings unconditionally
Julian Brown [Tue, 18 May 2021 17:10:12 +0000 (10:10 -0700)]
[og11] Rewrite GOMP_MAP_ATTACH_DETACH mappings unconditionally

It never makes sense for a GOMP_MAP_ATTACH_DETACH mapping to survive
beyond gimplify.c, so this patch rewrites such mappings to GOMP_MAP_ATTACH
or GOMP_MAP_DETACH unconditionally (rather than checking for a list
of types of OpenACC or OpenMP constructs), in cases where it hasn't
otherwise been done already in the preceding code.

2021-06-02  Julian Brown  <julian@codesourcery.com>

gcc/
* gimplify.cc (gimplify_scan_omp_clauses): Simplify condition
for changing GOMP_MAP_ATTACH_DETACH to GOMP_MAP_ATTACH or
GOMP_MAP_DETACH.

2 years agoc-c++-common/gomp/map-6.c: Fix dg-error due to mapping changes
Tobias Burnus [Fri, 14 May 2021 11:41:52 +0000 (13:41 +0200)]
c-c++-common/gomp/map-6.c: Fix dg-error due to mapping changes

OpenMP 5 relaxed some repetition rules such that some double-use
warnings no longer occur; that patch is not yet on mainline.

gcc/testsuite/
* c-c++-common/gomp/map-6.c: Remove two dg-error.

2 years agoAdd -Wopenacc-parallelism to tests only in OG11
Kwok Cheung Yeung [Fri, 30 Apr 2021 17:18:13 +0000 (10:18 -0700)]
Add -Wopenacc-parallelism to tests only in OG11

2021-04-30  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Add
-Wopenacc-parallelism option.

2 years agoUpdate expected messages in data-clause-1 tests
Kwok Cheung Yeung [Thu, 29 Apr 2021 22:06:38 +0000 (15:06 -0700)]
Update expected messages in data-clause-1 tests

The patch 'Merge non-contiguous array support patches' handles one of the
non-contiguous cases such that it is no longer an error.

2021-04-29  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/data-clause-1.c (foo): Remove expected message.
* g++.dg/goacc/data-clause-1.C (foo): Remove expected message.

2 years agoUpdate expected messages in kernels-decompose-2 tests
Kwok Cheung Yeung [Thu, 29 Apr 2021 21:38:15 +0000 (14:38 -0700)]
Update expected messages in kernels-decompose-2 tests

This changes expected messages that differ between mainline and OG11.  On
OG10, these messages were added in the patch:

081a01963ca8 Update expected messages, errors and warnings for "kernels" tests

2021-04-29  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c (main): Update expected
messages.
* gfortran.dg/goacc/kernels-decompose-2.f95 (main): Update expected
messages.

2 years agoFix is_oacc_parallel_or_serial for kernel regions
Kwok Cheung Yeung [Wed, 7 Apr 2021 19:49:31 +0000 (12:49 -0700)]
Fix is_oacc_parallel_or_serial for kernel regions

2021-04-07  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* omp-low.cc (is_oacc_parallel_or_serial): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED and
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE.

2 years agoUpdate expected messages in OpenACC tests
Kwok Cheung Yeung [Wed, 7 Apr 2021 18:48:50 +0000 (11:48 -0700)]
Update expected messages in OpenACC tests

This updates the types of messages expected in the test, and the '-fopt-info'
option used to request them.  The phrasing of the expected messages has also
changed somewhat and has been adjusted to match.

2021-04-07  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c:
Update additional options and expected messages.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.

2 years ago[WIP] OpenMP 5.0: requires directive: workaround to fix libgomp IntelMIC plugin build
Thomas Schwinge [Wed, 3 Mar 2021 21:37:58 +0000 (22:37 +0100)]
[WIP] OpenMP 5.0: requires directive: workaround to fix libgomp IntelMIC plugin build

Fix-up for og10 commit c2e4a17adc0989f216c7fc3f93f150c66adba23a "OpenMP 5.0:
requires directive".

The GCC offloading target configurations don't build/use
'crtoffloadbegin.o'/'crtoffloadtable.o'/'crtoffloadend.o'
('libgcc/offloadstuff.c'), but the libgomp IntelMIC plugin still does link
against libgomp, and the latter unconditionally refers to
'__requires_mask_table', '__requires_mask_table_end':

    make[3]: Entering directory '[...]/build-gcc-offload-x86_64-intelmicemul-linux-gnu/x86_64-intelmicemul-linux-gnu/liboffloadmic/plugin'
    [...]/build-gcc-offload-x86_64-intelmicemul-linux-gnu/./gcc/xg++ [...] -loffloadmic_target -lcoi_device -lgomp -rdynamic ../ofldbegin.o offload_target_main.o ../ofldend.o -o offload_target_main
    ./../../libgomp/.libs/libgomp.so: undefined reference to `__requires_mask_table_end'
    ./../../libgomp/.libs/libgomp.so: undefined reference to `__requires_mask_table'
    collect2: error: ld returned 1 exit status
    Makefile:806: recipe for target 'offload_target_main' failed
    make[3]: *** [offload_target_main] Error 1

I have not researched what a proper fix would look like.

libgomp/
* target.c (__requires_mask_table, __requires_mask_table_end): Add
'__attribute__((weak))'.

2 years agoDWARF: late code range fixup
Andrew Stubbs [Thu, 4 Mar 2021 23:12:17 +0000 (23:12 +0000)]
DWARF: late code range fixup

Ensure that the parent DWARF subprograms of offload kernel functions have a
code range, and are therefore not discarded by GDB.  This is only necessary
when the parent function does not actually exist in the final binary, which is
commonly the case within the offload device's binary.

This patch replaces 808bdf1bb29 and fdcb23540a2.  It should be squashed with
those before being posted upstream.

gcc/

* dwarf2out.cc (notional_parents_list): New file variable.
(gen_subprogram_die): Record offload kernel functions in
notional_parents_list.
(fixup_notional_parents): New function.
(dwarf2out_finish): Call fixup_notional_parents.
(dwarf2out_c_finalize): Reset notional_parents_list.

2 years agoopenmp: Scale type precision of collapsed iterator variable
Kwok Cheung Yeung [Mon, 1 Mar 2021 22:15:30 +0000 (14:15 -0800)]
openmp: Scale type precision of collapsed iterator variable

This sets the type precision of the collapsed iterator variable to the
sum of the precision of the collapsed loop variables, up to a maximum of
sizeof(long long) (i.e. 64-bits).

2021-03-01  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/
* omp-expand.cc (expand_oacc_for): Convert .tile variable to
diff_type before multiplying.
* omp-general.cc (omp_extract_for_data): Use accumulated precision
of all collapsed for-loops as precision of iteration variable, up
to the precision of a long long.

libgomp/
* testsuite/libgomp.c-c++-common/collapse-4.c: New.
* testsuite/libgomp.fortran/collapse5.f90: New.

2 years agoAllow static constexpr fields in mappable types for C++
Chung-Lin Tang [Wed, 3 Mar 2021 14:39:10 +0000 (22:39 +0800)]
Allow static constexpr fields in mappable types for C++

This patch is a merge of:
https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg01246.html

Static members in general disqualify a C++ class from being target mappable,
but static constexprs are inline optimized away, so should not interfere.

OpenMP 5.0 in general lifts the static member limitation, so this
patch will probably further adjusted later.

2021-03-03  Chung-Lin Tang  <cltang@codesourcery.com>

gcc/cp/ChangeLog:

* decl2.cc (cp_omp_mappable_type_1): Allow fields with
DECL_DECLARED_CONSTEXPR_P to be mapped.

gcc/testsuite/ChangeLog:

* g++.dg/goacc/static-constexpr-1.C: New test.
* g++.dg/gomp/static-constexpr-1.C: New test.

2 years agonvptx: remove erroneous stack deletion
Andrew Stubbs [Tue, 23 Feb 2021 21:35:08 +0000 (21:35 +0000)]
nvptx: remove erroneous stack deletion

The stacks are not supposed to be deleted every time memory is allocated, only
when there is insufficient memory.  The unconditional call here seems to be in
error, and is causing a costly reallocation of the stacks before every launch.

libgomp/

* plugin/plugin-nvptx.c (GOMP_OFFLOAD_alloc): Remove early call to
nvptx_stacks_free.

2 years agoDWARF: fix ICE caused by offload debug fix
Andrew Stubbs [Fri, 26 Feb 2021 11:40:06 +0000 (11:40 +0000)]
DWARF: fix ICE caused by offload debug fix

This should be squashed with 808bdf1bb29 and fdcb23540a2 to go to mainline.

gcc/

* dwarf2out.cc (gen_subprogram_die): Replace existing low/high PC
attributes, rather than ICE.

2 years agoOpenMP 5.0: requires directive
Chung-Lin Tang [Tue, 2 Feb 2021 12:34:01 +0000 (20:34 +0800)]
OpenMP 5.0: requires directive

This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563393.html

This patch completes more of the reverse_offload, unified_address, and
unified_shared_memory clauses for the OpenMP 5.0 requires directive,
including runtime verification of the offload target.
(currently no offload devices actually support above features, only
warning messages are emitted)

This may possibly reverted/updated when a final patch is approved
for mainline.

2021-02-02  Chung-Lin Tang  <cltang@codesourcery.com>

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Set
OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
"omp declare target" attribute.
(c_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
omp_requires_mask.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_declaration): Set
OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
"omp declare target" attribute.
(cp_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
omp_requires_mask.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.

gcc/fortran/ChangeLog:

* openmp.cc (gfc_check_omp_requires): Fix REVERSE_OFFLOAD typo.
(gfc_match_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.
* parse.cc ("tree.h"): Add include.
("omp-general.h"): Likewise.
(gfc_parse_file): Add code to merge omp_requires to omp_requires_mask.

gcc/ChangeLog:

* omp-offload.cc (omp_finish_file): Add code to create OpenMP requires
mask variable in .gnu.gomp_requires section if needed.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/requires-4.c: Remove prune of "not supported yet".
* gfortran.dg/gomp/requires-4.f90: Fix REVERSE_OFFLOAD typo.
* gfortran.dg/gomp/requires-8.f90: Likewise.

include/ChangeLog:

* gomp-constants.h (GOMP_REQUIRES_UNIFIED_ADDRESS): New symbol.
(GOMP_REQUIRES_UNIFIED_SHARED_MEMORY): Likewise.
(GOMP_REQUIRES_REVERSE_OFFLOAD): Likewise.

libgcc/ChangeLog:

* offloadstuff.c (__requires_mask_table): New symbol to mark start of
.gnu.gomp_requires section.
(__requires_mask_table_end): New symbol to mark end of
.gnu.gomp_requires section.

libgomp/ChangeLog:

* libgomp-plugin.h (GOMP_OFFLOAD_supported_features): New declaration.
* libgomp.h (struct gomp_device_descr): New 'supported_features_func'
plugin hook field.
* oacc-host.c (host_supported_features): New host hook function.
(host_dispatch): Initialize 'supported_features_func' host hook.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_supported_features): New function.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_supported_features): Likewise.
* target.c (<stdio.h>): Add include of standard header.
(gomp_requires_mask): New static variable.
(__requires_mask_table): New declaration.
(__requires_mask_table_end): Likewise.
(gomp_load_plugin_for_device): Add loading of 'supported_features' hook.
(gomp_target_init): Add code to summarize .gnu._gomp_requires section
mask values, emit error if inconsistency found.

* testsuite/libgomp.c-c++-common/requires-1.c: New test.
* testsuite/libgomp.c-c++-common/requires-1-aux.c: New file linked with
above test.
* testsuite/libgomp.c-c++-common/requires-2.c: New test.
* testsuite/libgomp.c-c++-common/requires-2-aux.c: New file linked with
above test.

liboffloadmic/ChangeLog:

* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_supported_features):
New function.

2 years agoOpenMP 5.0: Allow multiple clauses mapping same variable
Chung-Lin Tang [Mon, 1 Feb 2021 11:16:47 +0000 (03:16 -0800)]
OpenMP 5.0: Allow multiple clauses mapping same variable

This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562081.html

This patch now allows multiple clauses on the same construct to map
the same variable, which was not valid in OpenMP 4.5, but now allowed
in 5.0.

This may possibly reverted/updated when a final patch is approved
for mainline.

2021-02-01  Chung-Lin Tang  <cltang@codesourcery.com>

gcc/cp/ChangeLog:

* semantics.cc (finish_omp_clauses):  Adjust to allow duplicate
mapped variables for OpenMP.

gcc/ChangeLog:

* omp-low.cc (install_var_field): Add new 'tree key_expr = NULL_TREE'
default parameter. Set splay-tree lookup key to key_expr instead of
var if key_expr is non-NULL. Adjust call to install_parm_decl.
Update comments.
(scan_sharing_clauses): Use clause tree expression as splay-tree key
for map/to/from and OpenACC firstprivate cases when installing the
variable field into the send/receive record type.
(maybe_lookup_field_in_outer_ctx): Add code to search through
construct clauses instead of entirely based on splay-tree lookup.
(lower_oacc_reductions): Adjust to find map-clause of reduction
variable, then create receiver-ref.
(lower_omp_target): Adjust to lookup var field using clause expression.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/clauses-2.c: Adjust testcase.

2 years agoCorrect fix offload dwarf info
Andrew Stubbs [Sat, 16 Jan 2021 15:18:07 +0000 (15:18 +0000)]
Correct fix offload dwarf info

The previous patch wasn't quite right, apparently.  Somehow the behaviour
changed after another clean build?  This tweak fixes it.

This patch should be squashed with fdcb23540a2 to go to mainline.

gcc/ChangeLog:

* dwarf2out.cc (gen_subprogram_die): Check offload attributes only.

2 years agoDWARF address space for variables
Andrew Stubbs [Fri, 15 Jan 2021 11:26:46 +0000 (11:26 +0000)]
DWARF address space for variables

Add DWARF address class attributes for variables that exist outside the
generic address space.  In particular, this is the case for gang-private
variables in OpenACC offload kernels.

gcc/ChangeLog:

* dwarf2out.cc (add_location_or_const_value_attribute): Set
DW_AT_address_class, if appropriate.

2 years agoFix offload dwarf info
Andrew Stubbs [Sun, 6 Dec 2020 19:23:55 +0000 (19:23 +0000)]
Fix offload dwarf info

Add a notional code range to the notional parent function of offload kernel
functions.  This is enough to prevent GDB discarding them.

gcc/ChangeLog:

* dwarf2out.cc (gen_subprogram_die): Add high/low_pc attributes for
parents of offload kernels.

2 years ago[og10] vect: Add target hook to prefer gather/scatter instructions
Julian Brown [Wed, 25 Nov 2020 17:08:01 +0000 (09:08 -0800)]
[og10] vect: Add target hook to prefer gather/scatter instructions

For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.

The attached patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.

2021-01-13  Julian Brown  <julian@codesourcery.com>

gcc/
* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
documentation hook.
* doc/tm.texi: Regenerate.
* target.def (prefer_gather_scatter): Add target hook under vectorizer.
* tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer
gather/scatter instructions to scalar/elementwise fallback.
* config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define
hook.

2 years ago[og10] openacc: Adjust loop lowering for AMD GCN
Julian Brown [Fri, 6 Nov 2020 23:17:29 +0000 (15:17 -0800)]
[og10] openacc: Adjust loop lowering for AMD GCN

This patch adjusts OpenACC loop lowering in the AMD GCN target compiler
in such a way that the autovectorizer can vectorize the "vector" dimension
of those loops in more cases.

Rather than generating "SIMT" code that executes a scalar instruction
stream for each lane of a vector in lockstep, for GCN we model the GPU
like a typical CPU, with separate instructions to operate on scalar and
vector data. That means that unlike other offload targets, we rely on
the autovectorizer to handle the innermost OpenACC parallelism level,
which is "vector".

Because of this, the OpenACC builtin functions to return the current
vector lane and the vector width return 0 and 1 respectively, despite
the native vector width being 64 elements wide.

This allows generated code to work with our chosen compilation model,
but the way loops are lowered in omp-offload.c:oacc_xform_loop does not
understand the discrepancy between logical (OpenACC) and physical vector
sizes correctly. That means that if a loop is partitioned over e.g. the
worker AND vector dimensions, we actually lower with unit vector size --
meaning that if we then autovectorize, we end up trying to vectorize
over the "worker" dimension rather than the vector one! Then, because
the number of workers is not fixed at compile time, that means the
autovectorizer has a hard time analysing the loop and thus vectorization
often fails entirely.

We can fix this by deducing the true vector width in oacc_xform_loop,
and using that when we are on a "non-SIMT" offload target. We can then
rearrange how loops are lowered in that function so that the loop form
fed to the autovectorizer is more amenable to vectorization -- namely,
the innermost step is set to process each loop iteration sequentially.

For some benchmarks, allowing vectorization to succeed leads to quite
impressive performance improvements -- I've observed between 2.5x and
40x on one machine/GPU combination.

The low-level builtins available to user code (__builtin_goacc_parlevel_id
and __builtin_goacc_parlevel_size) continue to return 0/1 respectively
for the vector dimension for AMD GCN, even if their containing loop is
vectorized -- that's a quirk that we might possibly want to address at
some later date.

Only non-"chunking" loops are handled at present. "Chunking" loops are
still lowered as before.

2021-01-13  Julian Brown  <julian@codesourcery.com>

gcc/
* omp-offload.cc (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter.
Add overloaded wrapper for previous arguments & behaviour.
(oacc_xform_loop): Lower vector loops to iterate a multiple of
omp_max_vf times over contiguous steps on non-SIMT targets.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Adjust for loop
lowering changes.
* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.

2 years agodwarf: Multi-register CFI address support
Andrew Stubbs [Mon, 27 Jul 2020 09:55:22 +0000 (10:55 +0100)]
dwarf: Multi-register CFI address support

Add support for architectures such as AMD GCN, in which the pointer size is
larger than the register size.  This allows the CFI information to include
multi-register locations for the stack pointer, frame pointer, and return
address.

Note that this uses a newly proposed DWARF operator DW_OP_LLVM_piece_end,
which is currently only recognized by the ROCGDB debugger from AMD.  The exact
name and encoding for this operator is subject to change if and when the DWARF
standard accepts it.

gcc/ChangeLog:

* dwarf2cfi.cc (get_cfa_from_loc_descr): Support register spans
with DW_OP_piece and DW_OP_LLVM_piece_end.
* dwarf2out.cc (build_cfa_loc): Support register spans.

include/ChangeLog:

* dwarf2.def (DW_OP_LLVM_piece_end): New extension operator.

2 years agoRelax some restrictions on the loop bound in kernels loop annotation.
Sandra Loosemore [Sun, 30 Aug 2020 19:15:23 +0000 (12:15 -0700)]
Relax some restrictions on the loop bound in kernels loop annotation.

OpenACC loop semantics require that the loop bound be computable
before entering the loop, rather than the C/C++ semantics where the
end test is evaluated on every iteration.  Formerly the kernels loop
annotater permitted only constants and variables not modified in the
loop body in the loop bound expression.  This patch relaxes those
restrictions somewhat to allow many forms of expressions involving
such constants and variables, including calls to constant functions.

2020-08-30  Sandra Loosemore  <sandra@codesourcery.com>

gcc/c-family/
* c-omp.cc (end_test_ok_for_annotation_r): New.
(end_test_ok_for_annotation): New.
(check_and_annotate_for_loop): Use the new helper function.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-21.c: New.
* c-c++-common/goacc/kernels-loop-annotation-22.c: New.

2 years agoClean up loop variable extraction in OpenACC kernels loop annotation.
Sandra Loosemore [Sun, 30 Aug 2020 19:15:23 +0000 (12:15 -0700)]
Clean up loop variable extraction in OpenACC kernels loop annotation.

The code for identifying annotatable loops in OpenACC kernels regions
previously looked for the loop variable as the left-hand side of the
comparison in the loop end test.  However, front end optimizations
sometimes switch the sense of the comparison, making this method
unreliable.  In particular, it's ambiguous when both operands to the
end test comparison are local variables.

This patch reorders the loop processing to identify the loop variable
from the initializer, rather than the end test. The processing of the
end test then just checks that one of the operands to the comparison
matches the variable appearing in the initializer.  Much of the patch
is code refactoring, moving the initializer analysis out of
annotate_for_loop to check_and_annotate_for_loop so it can be
performed earlier.

2020-08-30  Sandra Loosemore  <sandra@codesourcery.com>

gcc/c-family/
* c-omp.cc (annotate_for_loop): Move initializer processing...
(check_and_annotate_for_loop): ... to here.  Allow the loop
variable as either operand to the condition.

2 years agoFix patterns in Fortran tests for kernels loop annotation.
Sandra Loosemore [Sun, 23 Aug 2020 05:43:57 +0000 (22:43 -0700)]
Fix patterns in Fortran tests for kernels loop annotation.

Several of the Fortran tests for kernels loop annotation were failing
due to changes in the formatting of "acc loop" constructs in the dump
file.  Now the "auto" clause appears first, instead of after "private".

2020-08-23   Sandra Loosemore  <sandra@codesourcery.com>

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update
expected output.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise.

2 years agoPermit calls to builtins and intrinsics in kernels loops.
Sandra Loosemore [Sun, 23 Aug 2020 01:23:26 +0000 (18:23 -0700)]
Permit calls to builtins and intrinsics in kernels loops.

This tweak to the OpenACC kernels loop annotation relaxes the
restrictions on function calls in the loop body.  Normally calls to
functions not explicitly marked with a parallelism attribute are not
permitted, but C/C++ builtins and Fortran intrinsics have known
semantics so we can generally permit those without restriction.  If
any turn out to be problematical, we can add on here to recognize
them, or in the processing of the "auto" annotations.

2020-08-22  Sandra Loosemore  <sandra@codesourcery.com>

gcc/c-family/
* c-omp.cc (annotate_loops_in_kernels_regions): Test for
calls to builtins.

gcc/fortran/
* openmp.cc (check_expr_for_invalid_calls): Check for intrinsic
functions.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-20.c: New.
* gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.

2 years agoUpdate dg-* in gfortran.dg/gomp/pr67500.f90
Tobias Burnus [Fri, 21 Aug 2020 11:28:06 +0000 (13:28 +0200)]
Update dg-* in gfortran.dg/gomp/pr67500.f90

Contrary to GCC 11, OG10 uses an error instead of a warning,
cf. commit 271c7fef548a86676d304b1eb2be5c0d47280bd6.

gcc/testsuite/
        * gfortran.dg/gomp/pr67500.f90: Change dg-warning to
        dg-error.

2 years agoAnnotate inner loops in "acc kernels loop" directives (Fortran).
Sandra Loosemore [Thu, 20 Aug 2020 02:24:43 +0000 (19:24 -0700)]
Annotate inner loops in "acc kernels loop" directives (Fortran).

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior in Fortran.

2020-08-19  Sandra Loosemore  <sandra@codesourcery.com>

gcc/fortran/
* openmp.cc (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
* gfortran.dg/goacc/combined-directives.f90: Adjust expected
patterns.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95:
Likewise.

2 years agoAnnotate inner loops in "acc kernels loop" directives (C/C++).
Sandra Loosemore [Thu, 20 Aug 2020 02:18:57 +0000 (19:18 -0700)]
Annotate inner loops in "acc kernels loop" directives (C/C++).

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior for C and C++.

2020-08-19  Sandra Loosemore  <sandra@codesourcery.com>

gcc/c-family/
* c-omp.cc (annotate_loops_in_kernels_regions): Process inner
loops in combined "acc kernels loop" directives.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-18.c: New.
* c-c++-common/goacc/kernels-loop-annotation-19.c: New.
* c-c++-common/goacc/combined-directives.c: Adjust expected
patterns.

2 years agoAdd a "combined" flag for "acc kernels loop" etc directives.
Sandra Loosemore [Thu, 20 Aug 2020 02:13:55 +0000 (19:13 -0700)]
Add a "combined" flag for "acc kernels loop" etc directives.

2020-08-19  Sandra Loosemore  <sandra@codesourcery.com>

gcc/
* tree.h (OACC_LOOP_COMBINED): New.

gcc/c/
* c-parser.cc (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/cp/
* parser.cc (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_do): Add combined parameter,
use it to set OACC_LOOP_COMBINED.  Update all call sites.

This page took 0.131954 seconds and 5 git commands to generate.