This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables


[ was: Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta ]

On 16/12/15 17:02, Tom de Vries wrote:
On 10/12/15 14:14, Tom de Vries wrote:
[ copy-pasting-with-quote from
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some
reason I didn't get this email ]

On Thu, 3 Dec 2015, Tom de Vries wrote:
The flag is set here in expand_omp_target:
...
12682         /* Prevent IPA from removing child_fn as unreachable,
                 since there are no
12683            refs from the parent function to child_fn in offload
                 LTO mode.  */
12684         if (ENABLE_OFFLOADING)
12685           cgraph_node::get (child_fn)->mark_force_output ();
...


How are there no refs from the "parent"?  Are there not refs from
some kind of descriptor that maps fallback CPU and offloaded variants?

That descriptor is the offload table, which is emitted in
omp_finish_file. The function iterates over vectors offload_vars and
offload_funcs.

[ I would guess there's a one-on-one correspondance between
symtab_node::offloadable and membership of either offload_vars or
offload_funcs. ]

I think the above needs sorting out in somw way, making the refs
explicit rather than implicit via force_output.

I've tried an approach where I add a test for node->offloadable next to
each test for node->force_output, except for the test in the nonlocal_p
def in ipa_pta_execute. But I didn't (yet) manage to make that work.

I guess setting forced_by_abi instead would also mean child_fn is not
removed
as unreachable, while still allowing optimizations:
...
  /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
     to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
     symbols promoted to static and it does not inhibit
     optimization.  */
  unsigned forced_by_abi : 1;
...

But I suspect that other optimizations (than ipa-pta) might break
things.

How so?

Probably it's more accurate to say that I do not understand the
difference very well between force_output and force_by_abi, and what is
the class of optimizations enabled by using forced_by_abi instead of
force_output.'

Essentially we have two situations:
- in the host compiler, there is no need for the forced_output flag,
  and it inhibits optimization
- in the accelerator compiler, it (or some equivalent) is needed

Actually, things are slightly more complicated, I realize now. There's
also the distinction between:
- symbols declared as offloadable in the source code, and
- symbols create by the compiler and marked offloadable

I wonder if setting the force_output flag only when streaming the
bytecode for
offloading would work. That way, it wouldn't be set in the host
compiler,
while being set in the accelerator compiler.

Yeah, that was my original thinking btw.

FTR, I've tried that approach, as attached. It fixed the
goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also
using an accelerator configuration) without any regressions.

How about this patch?


Ping.

Thanks,
- Tom

We remove the setting of force_output when:
- encountering offloadable symbols in the frontend, or
- creating offloadable symbols in expand-omp.

Instead, we set force_output in input_offload_tables.

This is an improvement because:
- it moves the force_output setting to a single location
- it does the force_output setting ALAP

Thanks,
- Tom

0008-Mark-symbols-in-offload-tables-with-force_output-in-read_offload_tables.patch


Mark symbols in offload tables with force_output in read_offload_tables

2015-12-15  Tom de Vries<tom@codesourcery.com>

	* c-parser.c (c_parser_oacc_declare, c_parser_omp_declare_target): Don't
	set force_output.

	* parser.c (cp_parser_oacc_declare, cp_parser_omp_declare_target): Don't
	set force_output.

	* omp-low.c (expand_omp_target): Don't set force_output.
	* varpool.c (varpool_node::get_create): Same.
	* lto-cgraph.c (input_offload_tables): Mark entries in offload_vars and
	offload_funcs with force_output.

---
  gcc/c/c-parser.c | 10 ++--------
  gcc/cp/parser.c  | 10 ++--------
  gcc/lto-cgraph.c |  9 +++++++++
  gcc/omp-low.c    |  5 -----
  gcc/varpool.c    |  1 -
  5 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 124c30b..6e6f4b8 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13527,10 +13527,7 @@ c_parser_oacc_declare (c_parser *parser)
  		    {
  		      g->have_offload = true;
  		      if (is_a <varpool_node *> (node))
-			{
-			  vec_safe_push (offload_vars, decl);
-			  node->force_output = 1;
-			}
+			vec_safe_push (offload_vars, decl);
  		    }
  		}
  	    }
@@ -16412,10 +16409,7 @@ c_parser_omp_declare_target (c_parser *parser)
  		{
  		  g->have_offload = true;
  		  if (is_a <varpool_node *> (node))
-		    {
-		      vec_safe_push (offload_vars, t);
-		      node->force_output = 1;
-		    }
+		    vec_safe_push (offload_vars, t);
  		}
  	    }
  	}
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a420cf1..340cc4a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -35091,10 +35091,7 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
  		    {
  		      g->have_offload = true;
  		      if (is_a <varpool_node *> (node))
-			{
-			  vec_safe_push (offload_vars, decl);
-			  node->force_output = 1;
-			}
+			vec_safe_push (offload_vars, decl);
  		    }
  		}
  	    }
@@ -35631,10 +35628,7 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
  		{
  		  g->have_offload = true;
  		  if (is_a <varpool_node *> (node))
-		    {
-		      vec_safe_push (offload_vars, t);
-		      node->force_output = 1;
-		    }
+		    vec_safe_push (offload_vars, t);
  		}
  	    }
  	}
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 62e5454..cdaee41 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1911,6 +1911,11 @@ input_offload_tables (void)
  	      tree fn_decl
  		= lto_file_decl_data_get_fn_decl (file_data, decl_index);
  	      vec_safe_push (offload_funcs, fn_decl);
+
+	      /* Prevent IPA from removing fn_decl as unreachable, since there
+		 may be no refs from the parent function to child_fn in offload
+		 LTO mode.  */
+	      cgraph_node::get (fn_decl)->mark_force_output ();
  	    }
  	  else if (tag == LTO_symtab_variable)
  	    {
@@ -1918,6 +1923,10 @@ input_offload_tables (void)
  	      tree var_decl
  		= lto_file_decl_data_get_var_decl (file_data, decl_index);
  	      vec_safe_push (offload_vars, var_decl);
+
+	      /* Prevent IPA from removing var_decl as unused, since there
+		 may be no refs to var_decl in offload LTO mode.  */
+	      varpool_node::get (var_decl)->force_output = 1;
  	    }
  	  else
  	    fatal_error (input_location,
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e1d7c09..a76d8fc 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12813,11 +12813,6 @@ expand_omp_target (struct omp_region *region)
  	assign_assembler_name_if_neeeded (child_fn);
        cgraph_edge::rebuild_edges ();

-      /* Prevent IPA from removing child_fn as unreachable, since there are no
-	 refs from the parent function to child_fn in offload LTO mode.  */
-      if (ENABLE_OFFLOADING)
-	cgraph_node::get (child_fn)->mark_force_output ();
-
        /* Some EH regions might become dead, see PR34608.  If
  	 pass_cleanup_cfg isn't the first pass to happen with the
  	 new child, these dead EH edges might cause problems.
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 5e4fcbf..5ed65e5 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -158,7 +158,6 @@ varpool_node::get_create (tree decl)
  	  g->have_offload = true;
  	  if (!in_lto_p)
  	    vec_safe_push (offload_vars, decl);
-	  node->force_output = 1;
  	}
      }






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]