Patch for c/13801

Zack Weinberg zack@codesourcery.com
Tue Aug 3 02:55:00 GMT 2004


"Joseph S. Myers" <jsm@polyomino.org.uk> writes:

> On Mon, 2 Aug 2004, Zack Weinberg wrote:
>
>> You cannot do it this way.  At end of compilation, the decls must have
>> types which reflect all the details we have amassed from all
>> declarations seen -- possibly across multiple translation units.
>
> The decls in the external scope do - the type stored there is the
> composite of all the types seen so the decls get that scope when we
> pop back to the external scope.

Consider 

file1.c {
  int n[5] = { 1, 2, 3, 4, 5 };
}
file2.c {
  extern int n[];

  int foo() { return sizeof(n); }
}

As is, we compile foo without complaint if file1.c appears first on
the command line, for exactly the same reason that we get the nested-
refinement case wrong.  And I find this testcase to be a far more
persuasive argument that the bug needs fixing, because - unlike the
single-file case - people actually do make the above mistake in real
code.  (Show me *one* non-testsuite example of a nested refinement and
I'll change my tune.)

So you gotta handle IMA.  What does your patch do with the above?  I
can certainly see how one might get this right in the framework of
your patch, but it's not clear to me that you considered this.

> I do not in fact expect problems arising from issues with static
> declarations with types refined in inner scopes, but if such arise
> (and the testsuite showed none)
...
> Nested block scope declarations can't have initializers, so if there are
> problems there then we should discard the initializer found at block
> scope.
>
> In the case of the implicit initializer for incomplete arrays that get
> implicitly initialized to an array of size 1 if incomplete at the end of
> the translation unit, if the type is completed incompatibly in a block (to
> a size bigger than 1) then the code has compile-time undefined behavior
> (of which diagnosis would be nice).

I don't like these "should"s and "would be nice"s.  Not encountering
any problems in the test suite is probably an indicator that none of
the above scenarios occur in the test suite - I would recommend you
write the code to generate the appropriate diagnostics, and then write
additional test cases to make sure they all trigger and don't crash
afterward.  Note that problems in this domain have historically showed
up in the debug info generators - test all of the possible debug
formats before saying there's no bugs.

There's at least one scenario you didn't consider, which is where a
variable has incomplete type because it's declared using a struct tag
that has only been defined in an inner scope, e.g.

  struct foo;
  function() { struct foo { int a, int b; }; }
  struct foo array[5];

Currently we get this right but only by accident, and I suspect
further that IMA nightmares are lurking in this area (the struct-tag
handling code in c-decl.c is mostly untouched by my rewrite).

[... leaving aside things addressed above ...]

> typedef int IA[];
> typedef int A5[5];
> typedef int A10[10];
>
> A10 array10;
>
> A5 *ap;
> void
> f (void)
> {
>   int ap;
>   {
>     extern IA *ap;
>     ap = &array10;
>   }
> }

> (Whether this is valid isn't entirely clear, but there's a good case
> that it is; certainly the pointers to differently sized arrays must
> have the same representation and alignment requirements.  There's
> even a plausible argument that it is strictly conforming where the
> equivalent code with a pointer cast instead of the shadowing
> declaration wouldn't be, because so little about pointer casts is
> defined in ISO C.)

Even if it is arguably valid, I consider this code sufficiently
dubious style to warrant a diagnostic.  Furthermore I think it's
another example of code that we needn't worry too much about because
no one ever writes code like that in real life.

> Unmodified compiler, timings of three runs:
>
> user    3m59.300s
> sys     0m4.500s
>
> user    3m57.290s
> sys     0m4.360s
>
> user    3m56.600s
> sys     0m4.570s
>
> Modified compiler, timings of three runs:
>
> user    3m51.380s
> sys     0m4.500s
>
> user    3m51.130s
> sys     0m4.690s
>
> user    3m51.620s
> sys     0m4.220s
>
> Both configured, installed and tested the same way, the only variable
> changed being the patch applied.

So I ran the numbers (paired two-tailed t-test) and indeed I get about
2% probability of its being a chance effect.  Fine.  BUT YOU SHOULD
HAVE DONE THAT.  I do not appreciate having to do the analysis for
you.  (If you did do this analysis, I apologize for shouting - but why
the hell didn't you say so the first time?)

More to the point, with an unanticipated effect like this, I expect to
see some investigation of why.  On a 32-bit system you increased the
size of the structure from 20 to 24 bytes, and there's no special GC
bucket for 20-byte structures so they're already going into the
24-byte bucket.  So this isn't a cache locality effect.  And your code
means c-decl.c is doing strictly more work.  So we have no
explanation, and it's your job to come up with one.

I will add that I had a patch in mind that would have cut the size of
the structure down to 16 bytes, and I would prefer a solution that
doesn't interfere with that.

> It's not just a matter of the decl having incomplete type, any type
> nested within it might be incomplete.  See the testcase examples of
> pointers to incomplete types and functions returning such pointers.
> It can also apply to function parameter types (and so to whether
> function calls with pointer to array arguments get diagnosed,
> similar to the example above of valid code).  Effectively, the type
> of a decl at all points in compiling and constraint checking should
> be the type the standard says it has at that point in the source,
> rather than duplicating checks for special cases everywhere, just as
> the proper solution to bit-field problems was to ensure that
> bit-fields have the proper types everywhere in the compiler and fix
> the fallout from such types as it was found.

I don't disagree with any of this in principle - but I do continue to
say that you can't willy-nilly reduce the amount of information
carried by the type of a decl, even if you are going to put it back
before end of file.  Not unless you are prepared to do a total
decouple of the C front-end from the next stage in the pipeline, and
by that I mean the next stage gets handed one pointer to a data
structure representing the entire compilation unit, after the
front-end is completely done.  We are closer to that than we used to
be but we are still nowhere near it.  As long as the
language-independent compiler gets given decl nodes before the
front-end is done, your approach is not safe.

You may be wondering why I'm being so hard on you and this patch, and
the reason is I spent a year and a half of my life coming up with what
I consider to be the least-bad implementation of C declaration
handling that's feasible in the present state of the compiler, and I
still get flak because it's not perfect.  And anything that goes wrong
with it I will get the blame for.  So I'm going to be very, very picky
when it comes to further changes to that code.

zw



More information about the Gcc-patches mailing list