This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Check canonical types in verify_type

From: Richard Biener <rguenther at suse dot de>
To: Jan Hubicka <hubicka at ucw dot cz>
Cc: gcc-patches at gcc dot gnu dot org
Date: Fri, 22 May 2015 09:59:49 +0200 (CEST)
Subject: Re: Check canonical types in verify_type
Authentication-results: sourceware.org; auth=none
References: <20150518051512 dot GA63788 at kam dot mff dot cuni dot cz> <alpine dot LSU dot 2 dot 11 dot 1505180943190 dot 18702 at zhemvz dot fhfr dot qr> <20150518150328 dot GE63788 at kam dot mff dot cuni dot cz> <alpine dot LSU dot 2 dot 11 dot 1505191033440 dot 18702 at zhemvz dot fhfr dot qr> <20150521171113 dot GA8821 at kam dot mff dot cuni dot cz>

On Thu, 21 May 2015, Jan Hubicka wrote:

> > > Hmm, I see, interesting hack.  For the first part of comment, I see that
> > > qualifiers needs to be ignored, but I do not see why we put
> > > short * and int * pointers to same class.
> > 
> > For the reason that people are very lazy.  For example GCC has code
> > punning void ** and void * and void * to Type * (and vice versa).  People
> > just don't expect pointers to be in different alias sets (and there is
> > little gained with doing that).
> 
> I think this observation may be out of date.  Removing that code (punting all
> pointer to ptr_type_node) increase number of TBAA disambiguations for firefox
> by 20%. It also drops number of queries so in reality the number is higher.
> 
> So it seems that it may be worth to track this in more sensible way.  This was
> tested on LTO where most pointer alias sets are te same anyway, so the increase
> should be bigger in non-LTO build. I will gather some stats.
> 
> The issue is that modern C++ code packs everything into instances, puts pointers
> to instances everywhere and as wel inline everything into one glob, we could
> track down a lot of data if we did not get lost in pointers.
> 
> We may want to have flag disabling that and we may want to throw away some
> precision without throwing away all.
> 
> I tried to just trhow away the qualifier from pointer (i.e. make pointer to
> qualified type to be globbed to pointer to unqualified type). This seems to
> work in a way bootstrapping GCC with one bug (in ipa-icf) and passing
> testsuite.

Well, see the big comment before the alias.c pointer type handling.
This is what the C family frontends did before in its get_alias_set
langhook:

      t1 = build_type_no_quals (t);
      if (t1 != t)
        return get_alias_set (t1);

where build_type_no_quals strips qualifiers recursively from t and
pointed-to-types.

> > > The complete/incomplete type fun with LTO can be solved for C++ but indeed I
> > > see why pointer to incomplete type needs to be considered compatible with every
> > > other pointer to structure.  Can't this be dealt with by adding correct edges
> > > into the aliasing dag?
> > 
> > I don't think so, without the dag degenerating.  We've had this mess
> > before (complete vs. incomplete structs and so).  It didn't work very 
> > well.
> 
> Can you point me to the code/issues, please?

It was the old type merging code.  I can spot it in the 4.5 code,
gimple.c:gimple_types_compatible_p where in the POINTER_TYPE
case it does

            && compare_type_names_p (TYPE_MAIN_VARIANT (TREE_TYPE (t1)),
                                     TYPE_MAIN_VARIANT (TREE_TYPE (t2)), 
true))
          {
            /* Replace the pointed-to incomplete type with the
               complete one.
               ???  This simple name-based merging causes at least some
               of the ICEs in canonicalizing FIELD_DECLs during stmt
               read.  For example in GCC we have two different struct deps
               and we mismatch the use in struct cpp_reader in sched-int.h
               vs. mkdeps.c.  Of course the whole exercise is for TBAA
               with structs which contain pointers to incomplete types
               in one unit and to complete ones in another.  So we
               probably should merge these types only with more context.  
*/
            if (COMPLETE_TYPE_P (TREE_TYPE (t2)))
              TREE_TYPE (t1) = TREE_TYPE (t2);
            else
              TREE_TYPE (t2) = TREE_TYPE (t1);

where that of course wrecks the tree graph quite a bit (and it is
dependent on the order of TUs).  Luckily we now have SCC based
streaming and merging, not trying to "complete" types to get more
merging done.

> > 
> > I don't remember if I added a testcase, so no.  It's just very clear to me
> > that the look of pointer members may be very different (consider 
> > reference vs. pointer or pointer to array vs pointer to element).
> 
> Yep, depending on how much of cross-lanugage datastructures we want to permit.
> We could definitely unpeel those differences just like I did with qualifiers.
> However this trick is kind of weird because it does not propagate up to aggregates
> buildt from these.
> 
> In a way some of this sould probably better be handled at canonical type
> calculation time if we really want to permit mixing up aggregates with those
> differences.
> 
> For example I think it would make sense to look throught ARRAY_TYPEs when
> handling determining pointer canonical type (so pointer to array and pointer to
> element are the same).  We also may ignore TREE_CODE match for POINTER_TYPE_P
> if we really do have languages that mixes that.

Another case (I've seen this in fortran vs. C) is:

 int i;

vs.

 common/i/a(1)

thus inter-operability of int[1] and int.  A similar issue (with alias
sets used for accesses) is

struct X { int trailing[1]; };

and code doing

 struct X *p;
 int (*a[16]) = p->trailing;

and

 (*a)[i] vs. p->trailing[i]

(or in the middle-end with frontends that can emit array assignments).

> reference types are used by Ada and Fortran (not Java), so it depends on how
> these interface to C. I suppose you are right that these ought to be compatible
> with C pointers.  I will dig into respective language standards.

It's not so much about language standards but existing practice - few
language standards define interoperability and even if they do not
very many follow the well-defined ways (see fortran which has bind_c
but also tons of existing code that doesn't care and just adapts to
mangling used by various fortran compilers - see SPEC CPU2006 code)

> > 
> > > > > Other issue I run into is that for Ada bootstrap we have variadic type whose
> > > > > canonical types are having different temporary set as size.  I think this is
> > > > > valid and perhaps gimple_canonical_types_compatible_p should consider
> > > > > variadic arrays to be compatible with any array of compatible type?
> > > > 
> > > > Those are all local types and thus the strict equality compare should be
> > > > ok?  Not sure if we can do in C sth like
> > > > 
> > > > void foo (int n)
> > > > {
> > > >   struct { int i; int a[n]; } a;
> > > >   struct { int i; int a[n]; } b;
> > > > }
> > > > 
> > > > and if we'd have to assign the same alias-sets to 'a' and 'b'.
> > > 
> > > No idea here either. I wonder if the types are intedned to be TBAA compatible
> > > across two calls of the function.  In that case we may introduce multiple copies
> > > of body by early inlining already that may be a problem.
> > 
> > No idea.  Well, the canonical type machinery was supposed to be 
> > conservative but here we're obviously not 100% conservative.
> 
> I will look into producing a testcases and lets see.

I'd say that only aggregate assignments might be a problem (component
assignment should pun down to field alias-sets), and you should already
get GIMPLE verifier fails if we ever end up with aggregate assignments
where TYPE_CANONICAL doesn't match.

> > > > > I am not quite convinced we get variadic types right at LTO time, because
> > > > > they bypass canonical type calculation anyway and their canonical type
> > > > > is set by TYPE_MAIN_VARIANT in lto_read_body_or_constructor which I think
> > > > > is not safe.  I will look for a testcase.
> > > > 
> > > > That is because if they are streamed locally they do not enter type
> > > > merging, but they still go via gimple_register_canonical_type, so I'm not
> > > > sure where you see they always get their main variant as canonical type.
> > > 
> > > I tought these are handled by:
> > >       /* And fixup types we streamed locally.  */
> > >         {
> > >           struct streamer_tree_cache_d *cache = data_in->reader_cache;
> > >           unsigned len = cache->nodes.length ();
> > >           unsigned i;
> > >           for (i = len; i-- > from;)
> > >             {
> > >               tree t = streamer_tree_cache_get_tree (cache, i);
> > >               if (t == NULL_TREE)
> > >                 continue;
> > > 
> > >               if (TYPE_P (t))
> > >                 {
> > >                   gcc_assert (TYPE_CANONICAL (t) == NULL_TREE);
> > >                   TYPE_CANONICAL (t) = TYPE_MAIN_VARIANT (t);
> > >                   if (TYPE_MAIN_VARIANT (t) != t)
> > >                     {
> > >                       gcc_assert (TYPE_NEXT_VARIANT (t) == NULL_TREE);
> > >                       TYPE_NEXT_VARIANT (t)
> > >                         = TYPE_NEXT_VARIANT (TYPE_MAIN_VARIANT (t));
> > >                       TYPE_NEXT_VARIANT (TYPE_MAIN_VARIANT (t)) = t;
> > >                     }
> > >                 }
> > >             }
> > >         }
> > > 
> > > which does not do registering.
> > 
> > That's your code, not mine - but yes, it looks like local types are
> > streamed when we stream the trees refering to them.  And the canonical
> 
> No, it is not mine ;) I only added the code copying fields to variants
> and then reverted it per your request, so i suppose SVN blame me :).
>  Probably the code comes from original LTO branch.

;)

> > type hash is probably teared down at this point.  The code should
> 
> Yep, but we can also keep it and just add the types - it is incremental
> in its nature.

Sure.  But local types are usually variadic for which the canonical
merging is somewhat broken (as you noticed).

> > probably call lto_fixup_prevailing_type to also fix 
> > pointer-to/reference-to chains (and it will deal with main variants
> > as well).  IMHO the canonical type setting here is ok (during inlining
> > when we copy variable-length types we should end up sharing TYPE_CANONICAL
> > IMHO).  AFAICS we do, via copy_node which doesn't reset TYPE_CANONICAL.
> 
> Yep, I think canonical types of arrays are merged by C FE if their TREE_TYPE
> and TYPE_DOMAIN match and then they are just copied by inliner.
> So variable length arrays from different copies of an inline functions are
> considered compatible without LTO, while with LTO it depends on early
> inliing... 

Hmm, true.

> I am also bit worried about canonical types for C++ where these are
> not always defined by their TYPE_MAIN_VARIANTs. I will look into this
> in detail.

Yeah - but your type verifier should now catch these, correct?

Richard.

References:
- Check canonical types in verify_type
  - From: Jan Hubicka
- Re: Check canonical types in verify_type
  - From: Richard Biener
- Re: Check canonical types in verify_type
  - From: Jan Hubicka
- Re: Check canonical types in verify_type
  - From: Richard Biener
- Re: Check canonical types in verify_type
  - From: Jan Hubicka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]