This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC/RFHelp: c-decl.c rewrite - almost but not quite

From: Geoff Keating <geoffk at geoffk dot org>
To: Zack Weinberg <zack at codesourcery dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: 31 Mar 2004 18:12:03 -0800
Subject: Re: RFC/RFHelp: c-decl.c rewrite - almost but not quite
References: <1079475817.11024.ezmlm@gcc.gnu.org><47BA338C-77B1-11D8-B0EB-000A957D89DA@apple.com><87r7vrye8c.fsf@egil.codesourcery.com><2492B048-784F-11D8-AF30-000A957D89DA@apple.com><87r7vqx5b2.fsf@codesourcery.com>

Zack Weinberg <zack@codesourcery.com> writes:

> Robert Bowdidge <bowdidge@apple.com> writes:
> 
> > On Mar 17, 2004, at 2:12 AM, Zack Weinberg wrote:
> >> Well, since this is (hopefully) the final data structure, I intended
> >> that the comments in c-decl.c should explain it adequately.  Can you
> >> look at those and tell me where they're inadequate?
> >
> > What I'm hoping for is a summary paragraph saying what these 4000
> > lines of changes mean at a high level -- what were you changing, and
> > why -- just so I have some hope of asking the right questions about
> > the patch.
> 
> Okay.  I *was* hoping this was clear from the file itself ... at
> least, the what of the new, and why the various features are
> necessary.  I'll try to go over it here, and hopefully we can converge
> on some better commentary.
> 
> > As far as I can tell, it appears that you've added the c_binding
> > structure to the name lookup and scoping code.  So how do the
> > c_binding structure, c_scope structure, and lookup code interact?
> > From the bits of removed code, it looks like struct c_scope still
> > exists, but that most of the lookup questions get done on the
> > c_binding structure rather than c_scope structure, and this changed a
> > lot of the commonly-called functions.   It also looks like several
> > functions were renamed (pushlevel -> push_scope, poplevel->pop_scope).
> > Any particular reason for this?  Are you trying to make the C and C++
> > code more analogous?  Is struct c_binding intended to resemble struct
> > cxx_binding?
> 
> First the metaobservations: I didn't code this with any reference to
> the C++ front end, because C++ has different and much more complicated
> rules.  Despite that, I may have ended up with a similar data
> structure, but it wasn't intentional.  The nomenclature changes are
> because I wanted to use the same terminology that the C standard does
> (i.e. "scope" not "binding level"); also, it guaranteed that I had
> flushed out all callers of the old pushlevel/poplevel.
> 
> The new data structure works like this.  In C, an identifier can have
> meaning in three different namespaces simultaneously: the namespace of
> symbols, the namespace of tags, and the namespace of labels.
> (Ignoring reserved words and macros, which are handled elsewhere.)  So
> there are three pointers in struct lang_identifier.  Formerly they
> pointed to DECLs; now they point to c_binding structures.  This is the
> core change.  The reason for it is, a DECL node only has one
> TREE_CHAIN pointer.  So if you need to keep track of the visibility of
> a DECL in multiple different scopes simultaneously, which we do, you
> have a choice of copying the entire DECL, or using a data structure
> external to the DECL to do the job.  The old code took the copying
> approach; this was bad because it violated the basic assumption made
> elsewhere in the compiler that there is exactly one DECL node for each
> assembly-level symbol.  Hence all the bugs.

I am a little bit concerned about part of this approach.  It's
possible to have distinct declarations for the same object in the same
translation unit that are not the same (and so cannot be represented
by the same DECL in the front-end).  In the backend, you can collapse
them, but in the front-end you'll need to keep them distinct so that
you know when to issue diagnostics (and possibly for correctness in
other ways, I would have to think about that).

You might think this would be horrible, and *it is*, but it's valid C.

For instance:

int foo_A (void) {
  struct baz;
  struct bar {
    struct baz *x;
  };
  extern struct bar *foo;
  return foo->x == 0;
}

int foo_B (void) {
  struct baz {
     int y;
  };
  struct bar {
    struct baz *x;
  };
  extern struct bar *foo;
  return foo->x->y + 1 + foo_A();
}

And if you add to the above this:

int foo_C (void) {
  struct baz;
  struct bar {
    struct baz *x;
  };
  extern struct bar *foo;
  return foo->x->y + 1 + foo_A();   // diagnostic required here
}

you need to have a diagnostic because although all the 'struct baz'
declarations are compatible with the type of the actual variable, they
are not the same in this file and in foo_C, 'struct baz' is incomplete
and does not have a 'y' field.  This works with gcc 3.3, and doesn't
work with HEAD; I get:

/Network/Servers/cauchy/homes/thorin/gkeating/t.c: In function `foo_B':
/Network/Servers/cauchy/homes/thorin/gkeating/t.c:17: error: conflicting types for 'foo'
/Network/Servers/cauchy/homes/thorin/gkeating/t.c:6: error: previous declaration of 'foo' was here

Note that the C standard in 6.7 only says:

  All declarations in the same scope that refer to the same object or
  function shall specify compatible types.

These declarations are not in the same scope, so this doesn't apply.


Arguably, with -fno-strict-aliasing, it should be possible to even do:

void bat_A (void) {
  extern long long x;
  x = 1;
}
double bat_B (void) {
  extern double x;
  return x;
}

although that's too horrible for even me to support, and so I don't
care if we drop it in 3.5.  (Works in 3.3, though!)

> > You also mention that "the new data structure is somewhat more memory
> > intensive than strictly necessary."  Ok, which data structure do you
> > mean (c_binding?), and why do you think it's more memory intensive?
> > Why might this be bad?  Why was it necessary?
> 
> The combination of struct c_binding and IDENTIFIER_NODE is what I was
> thinking of.  We have to allocate something like 40 bytes to represent
> an identifier bound to a single DECL, which is by far the most common
> case.  I have some ideas for how to improve this - e.g. using just a
> single chain of bindings, and you have to look for the one you want -
> but I am not sure they're actually performance wins.

I am also concerned about the performance impact.  Allocating and
filling in memory is known to be a performance problem in GCC: GCC does
too much of it.  Did you do any performance measurements of this
patch?

> How's that?

That was a very helpful description!  Thanks.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

Follow-Ups:
- Re: RFC/RFHelp: c-decl.c rewrite - almost but not quite
  - From: Zack Weinberg

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]