This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6]))


Hi!

On Fri, 16 Sep 2016 10:59:16 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
> On Fri, Sep 16, 2016 at 9:05 AM, Thomas Schwinge
> <thomas@codesourcery.com> wrote:
> > On Thu, 08 Sep 2016 13:43:30 +0200, I wrote:
> >> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
> >> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge <thomas@codesourcery.com> wrote:
> >> > > As I noted in <https://gcc.gnu.org/PR77458>:
> >> > >
> >> > >     As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx types", nvptx
> >> > >     offloading is broken, ICEs in LTO stream-in.  Probably some kind of data-type
> >> > >     mismatch that is not visible with Intel MIC offloading (using the same data
> >> > >     types) but explodes with nvptx.  I'm having a look.
> >
> >> > [...] preload_common_nodes.  This is carefully crafted to _not_ diverge by
> >> > frontend (!) it wasn't even designed to cope with global trees being present
> >> > or not dependent on target (well, because the target is always the
> >> > same! mind you!)
> >>
> >> Scary.  ;-/
> >>
> >> > Now -- in theory it should deal with NULLs just fine (resulting in
> >> > error_mark_node), but it can diverge when there are additional
> >> > compount types (like vectors, complex
> >> > or array or record types) whose element types are not in the set of
> >> > global trees.
> >> > The complex _FloatN types would be such a case given they appear before their
> >> > components.  That mixes up the ordering at least.
> >>
> >> ACK, but that's also an issue for "regular" float/complex float, which
> >> also is in "reverse" order.  But that's "fixed" by the recursion in
> >> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
> >> COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.
> >
> > So, that mechanism does work, but what's going wrong is the following:
> > with differing target vs. offload target, we potentially (and actually
> > do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and
> > _FloatNx types.  That is, for nvptx, a few of these don't exist (so,
> > NULL_TREE), and so it follows their complex variants also don't exist,
> > and thus the recursion that I just mentioned for complex types is no
> > longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we
> > get an offset (of two, in this specific case), and consequently streaming
> > explodes, for example, as soon as it hits a forward-reference (due to
> > looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when it
> > should be looking for tree 183 (nvptx lto1 view).
> >
> >> (I've
> >> put "fixed" in quotes -- doesn't that recursion mean that we're thus
> >> putting "complex float", "float", [...], "float" (again) into the cache?
> >> Anyway that's for later...)
> >
> > Maybe it would make sense to do this tree streaming in two passes: first
> > build a set of what we actually need, and then stream that, without
> > duplicates.  (Or, is also for these "pickled" trees their order relevant,
> > so that one tree may only refer to other earlier but not later ones?
> > Anyway, we could still remember the set of trees already streamed, and
> > avoid the double streaming I described?)
> >
> > So I now understand that due to the stream format, the integer tree IDs
> > (cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have to
> > make that work for x86_64 target vs. nvptx offload target being
> > different.  (I'm pondering some ideas about how to rework that integer
> > tree ID generation.)
> >
> > (I have not digested completely yet what the implications are for the
> > skipping we're doing for some trees in preload_common_nodes), but here is
> > a first patch that at least gets us rid of the immediate problem.  (I'll
> > fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to
> > gcc/tree-core.h, OK?)  Will such a patch be OK for trunk, at least for
> > now?
> 
> Humm ... do we anywhere compare to those global trees by pointer equivalence?

I have not verified that.  Does GCC permit/forbid that on a case-by-case
basis -- which seems very fragile to me?

> If so then it breaks LTO support for those types.

OK, I think I understand that -- but I do have a "lto_stream_offload_p"
conditional in my code changes, so these changes should only affect the
offloading stream, in my understanding?

> I think forcing proper ordering so that we can assert that at the
> point we'd like
> to recurse the nodes we recurse for are already in the cache would be a better
> fix.

ACK.  That's what I'd planned to look into as a next step.

> This might need some additional global trees in case components do not
> explicitely exist

That's what I was afraid of: for example, I can't tell if it holds for
all GCC configurations (back ends), that complex types' component types
will always match one of the already existing global trees?  (I can
certainly imagine some "strange" ISAs/data types breaking this assertion
-- no idea whether GCC is to support such things.)  The conservative fix
for that appears to be to add a new global tree for every complex types'
component type -- which in "sane" configurations/ISAs would always match
one of the already existing ones...

Or, of course, we need to change the tree cache ID format, so that
they're not allocated using a monotonically increasing step-one counter
-- explicitly allowing the cc1/lto1s (etc.) to differ?  Conceptually a
bit like (but differently) that we allow the mode table to differ,
streamer_mode_table.

> and it might need re-ordering of a few globals.
> Can you try if all hell breaks lose if you change the recursions to instead do
> 
>    gcc_assert (streamer_tree_cache_lookup (cache, <node-to-recurse-on>, &tem));
> 
> ?

Yes, ;-) because in certain cases when we reach this code,
cache->node_map is NULL, and so streamer_tree_cache_lookup can't be
called.  Anyway, something like the following?  (Likely, something needs
to be done to handle more effectively the "!cache->node_map" case,
instead of the linear search.)  But such a patch would again solve the
problem only for complex types, so far, and only if that complex
type/component type concern raised above is actually moot.  Anyway, with
the following "step II" patch, offloading appears to work (but I have not
tested the _FloatN, _FloatNx types), and no breakage in lto.exp either,
but only lightly tested so far.

commit cad8fb0754797927ecdd32210d5f981872635b1a
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Fri Sep 16 14:58:06 2016 +0200

    [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types, step II
---
 gcc/tree-core.h     | 31 +++++++++++++++++--------------
 gcc/tree-streamer.c | 23 ++++++++++++++++-------
 2 files changed, 33 insertions(+), 21 deletions(-)

diff --git gcc/tree-core.h gcc/tree-core.h
index 8b3e5cc..0d4653a 100644
--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -553,20 +553,6 @@ enum tree_index {
   TI_BOOLEAN_FALSE,
   TI_BOOLEAN_TRUE,
 
-  TI_COMPLEX_INTEGER_TYPE,
-  TI_COMPLEX_FLOAT_TYPE,
-  TI_COMPLEX_DOUBLE_TYPE,
-  TI_COMPLEX_LONG_DOUBLE_TYPE,
-
-  TI_COMPLEX_FLOAT16_TYPE,
-  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
-  TI_COMPLEX_FLOAT32_TYPE,
-  TI_COMPLEX_FLOAT64_TYPE,
-  TI_COMPLEX_FLOAT128_TYPE,
-  TI_COMPLEX_FLOAT32X_TYPE,
-  TI_COMPLEX_FLOAT64X_TYPE,
-  TI_COMPLEX_FLOAT128X_TYPE,
-
   TI_FLOAT_TYPE,
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
@@ -596,6 +582,23 @@ enum tree_index {
 			     - TI_FLOATN_NX_TYPE_FIRST		\
 			     + 1)
 
+  /* Put the complex types after their component types, so that in (sequential)
+     tree streaming we can assert that their component types have already been
+     handled (see tree-streamer.c:record_common_node).  */
+  TI_COMPLEX_INTEGER_TYPE,
+  TI_COMPLEX_FLOAT_TYPE,
+  TI_COMPLEX_DOUBLE_TYPE,
+  TI_COMPLEX_LONG_DOUBLE_TYPE,
+
+  TI_COMPLEX_FLOAT16_TYPE,
+  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
+  TI_COMPLEX_FLOAT32_TYPE,
+  TI_COMPLEX_FLOAT64_TYPE,
+  TI_COMPLEX_FLOAT128_TYPE,
+  TI_COMPLEX_FLOAT32X_TYPE,
+  TI_COMPLEX_FLOAT64X_TYPE,
+  TI_COMPLEX_FLOAT128X_TYPE,
+
   TI_FLOAT_PTR_TYPE,
   TI_DOUBLE_PTR_TYPE,
   TI_LONG_DOUBLE_PTR_TYPE,
diff --git gcc/tree-streamer.c gcc/tree-streamer.c
index 061d831..2a5538e 100644
--- gcc/tree-streamer.c
+++ gcc/tree-streamer.c
@@ -278,9 +278,23 @@ record_common_node (struct streamer_tree_cache_d *cache, tree node)
   streamer_tree_cache_append (cache, node, cache->nodes.length ());
 
   if (POINTER_TYPE_P (node)
-      || TREE_CODE (node) == COMPLEX_TYPE
       || TREE_CODE (node) == ARRAY_TYPE)
     record_common_node (cache, TREE_TYPE (node));
+  else if (TREE_CODE (node) == COMPLEX_TYPE)
+    {
+      /* Assert that complex types' component types have already been handled
+	 (and we thus don't need to recurse here).  See PR lto/77458.  */
+      if (cache->node_map)
+	gcc_assert (streamer_tree_cache_lookup (cache, TREE_TYPE (node), NULL));
+      else
+	{
+	  gcc_assert (cache->nodes.exists ());
+	  bool found = false;
+	  for (unsigned i = 0; !found && i < cache->nodes.length (); ++i)
+	    found = true;
+	  gcc_assert (found);
+	}
+    }
   else if (TREE_CODE (node) == RECORD_TYPE)
     {
       /* The FIELD_DECLs of structures should be shared, so that every
@@ -333,12 +347,7 @@ preload_common_nodes (struct streamer_tree_cache_d *cache)
 	&& (!lto_stream_offload_p
 	    || (i != TI_VA_LIST_TYPE
 		&& i != TI_VA_LIST_GPR_COUNTER_FIELD
-		&& i != TI_VA_LIST_FPR_COUNTER_FIELD
-		/* Likewise for the _FloatN and _FloatNx types.  */
-		&& (i < TI_FLOATN_NX_TYPE_FIRST
-		    || i > TI_FLOATN_NX_TYPE_LAST)
-		&& (i < TI_COMPLEX_FLOATN_NX_TYPE_FIRST
-		    || i > /* TODO */ TI_COMPLEX_FLOAT128X_TYPE))))
+		&& i != TI_VA_LIST_FPR_COUNTER_FIELD)))
       record_common_node (cache, global_trees[i]);
 }
 

Here is again the proposed temporary band-aid patch, which at least
unbreaks trunk for different-architecture offloading (without claiming
support for using the _FloatN, _FloatNx types across the offloading
boundary, admittedly):

> > commit d2045bd028104be2ede5ae1d5d7b9395e67a8180
> > Author: Thomas Schwinge <thomas@codesourcery.com>
> > Date:   Thu Sep 15 21:56:16 2016 +0200
> >
> >     [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types
> >
> >         gcc/
> >         PR lto/77458
> >         * tree-streamer.c (preload_common_nodes): Skip _FloatN and
> >         _FloatNx types if offloading.
> > ---
> >  gcc/tree-streamer.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git gcc/tree-streamer.c gcc/tree-streamer.c
> > index 7ea7096..061d831 100644
> > --- gcc/tree-streamer.c
> > +++ gcc/tree-streamer.c
> > @@ -333,7 +333,12 @@ preload_common_nodes (struct streamer_tree_cache_d *cache)
> >         && (!lto_stream_offload_p
> >             || (i != TI_VA_LIST_TYPE
> >                 && i != TI_VA_LIST_GPR_COUNTER_FIELD
> > -               && i != TI_VA_LIST_FPR_COUNTER_FIELD)))
> > +               && i != TI_VA_LIST_FPR_COUNTER_FIELD
> > +               /* Likewise for the _FloatN and _FloatNx types.  */
> > +               && (i < TI_FLOATN_NX_TYPE_FIRST
> > +                   || i > TI_FLOATN_NX_TYPE_LAST)
> > +               && (i < TI_COMPLEX_FLOATN_NX_TYPE_FIRST
> > +                   || i > /* TODO */ TI_COMPLEX_FLOAT128X_TYPE))))
> >        record_common_node (cache, global_trees[i]);
> >  }


Grüße
 Thomas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]