PATCH: change splay_tree_key & splay_tree_value types to void*

Thu Aug 31 10:28:00 GMT 2000

>>>>> "Greg" == Greg McGary <greg@mcgary.org> writes:

    Greg> weeks ago, and look forward to more of your informed
    Greg> opinions.  I'm glad to hear you're a supporter of this work,
    Greg> and am sure you'll continue show your support through rapid
    Greg> response to BP patches!  8^)

:-)  Not likely in the short term.  I'm very busy over the next few
weeks, but CodeSourcery has scheduled me to work on GCC a lot after
that point.

    >> I would suggest you *do* go with the permissive thing here,
    >> unless the user specifically switches on a maximum paranoia
    >> flag.

    Greg> What exactly do you mean by "almost bug"?  Do you mean a

I mean, basically, anything that is not likely to actually cause bad
output or bad behavior in the program.  Many technically invalid
practices don't actually cause trouble on most systems; casting from
`int' to `void *' is not portable, but harmless on many 32-bit
systems.  Only some users will care about this -- many will not.

    Greg> Besides those attributable to K&Risms such as lack of
    Greg> prototypes and pointer/int casts, what other varieties of
    Greg> "almost bugs" stand out in your memory?  

Another one was trailing arrays:

  struct S { int i [1]; };

It's specifically allowed by C/C++ to make a bigger array, and index
through `i' with indices bigger than 1.

  struct S { char c[3]; char d[4]; };

If you have a pointer into `c' you can't use it to index into `d' --
but people can, and do, and sometimes that's OK.  

    >> It's hard to do source-assisted run-time error-checking in
    >> programs that aren't typesafe -- and programs that cast
    >> integers to pointers and back are included in that category.

    Greg> Exactly.  I think of BPs as a testing strategy that is
    Greg> available for programs that have already achieved a standard
    Greg> of type safety through rigorous use of function prototypes
    Greg> and proper casting. 

Most programs don't exist a vacuum.  One of the key things is working
with uninstrumented code, i.e., libraries that are not compiled with
BP enabled.  So, the `char* -> int -> char*' example I gave above is
just a short-hand for a call to an instrumented function:

     // Uninstrumented code
     extern void f(char *); 
     void g(char *s) { f(s); } 
     --
     // Instrument 
     void f(char *s) { strcmp (s, s); }

This will again lead to a pointer that doesn't have known bounds.

    >> For maximum win, our "pointer descriptors" (which contained not
    >> only bound information, but also type information) were
    >> recoverable -- if there was no descriptor available for a
    >> pointer, and you had certain switches on, the run-time system
    >> (which also controlled malloc, in order to do leak-detection,
    >> etc.) would go looking for a pointer descriptor by using the
    >> pointers address.

    Greg> Very cool in concept, but likely very slow at runtime and
    Greg> difficult to implement, I expect.

Actually not -- pointer descriptors were stored in a hash table, and
we only looked them up when we lost the bounds.  It took something
like 50 cycles to look them up, I think.

    >> I guess my overall point is that I bet this problem will exist
    >> in user packages.  splay-tree.h isn't really part of GCC --
    >> it's just a library.  If that library causes false positives
    >> then so will other libraries; I don't think splay-tree.h is
    >> that unique a beast.

    Greg> No, not uniquely flawed, but neither is it uniquely
    Greg> unfixable.  I proposed three fixes already, and submitted

There's nothing technically wrong with your patches -- but I think
you're not quite approaching things correctly.  The point is that
splay-tree.h isn't broken -- it just makes checking harder.

My point is that BPs should deal with this situation gracefully.  Even
very clean code will make use of libraries and other code that are not
necessarily as clean.  BPs should be lenient in this case -- or at the
very least have a mode in which they are lenient, and my opinion is
that this case should be the default.  

Note that I think `char*' could be handled differently from `struct X
*'.  A pointer of type char * with unknown bounds probably points to a
string; a `struct X *' with unknown bounds probably points to a single
struct X.  (Here `probably' means statistically speaking.)

    Greg> Honest question: do you think there can or should be a
    Greg> difference of philosophy between community-driven GCC and
    Greg> profit-driven Centerline regarding accommodation for crufty
    Greg> code?  (i.e., Economically, I expect that Centerline
    Greg> couldn't afford look down its nose at customers with
    Greg> desperate need debug large legacy codebases.  OTOH, I think
    Greg> GCC can afford to do that.  OTOH, maybe CygHat and/or
    Greg> Codesourcery would like to earn some dollars giving the
    Greg> customer what it craves? 8^)

:-) 

I really don't think what you suggest is true.  I think that even new
code relies on old crufty code, and I think that new C/C++ code will
always play some non-portable platform-dependent tricks.

To some extent, because our error-checkers did type-related checks as
well as bounds-related checks, they were inherently noisier.  On the
other hand, I honestly think that the right way to build an
error-checker is with a default mode that is lenient; users then
gradually crank up the strictness to get to a less lenient mode as
they fix the bugs found by the first mode.  (This is like compiler
warning options: you might not start with -Wall if you had never
compiled the code with GCC before.)  

    Greg> Anyway, I'd like you to indulge me for now and let me pursue
    Greg> my fantasy of BPs for grownups and see how far that gets us.
    Greg> My ultimate fantasy is to build and run a complete Linux
    Greg> distribution with BPs wall-to-wall at the user-code level
    Greg> (*maybe* the kernel too, someday).  IMO, bringing all of
    Greg> GNU/Linux code up strict BP correctness is a worthier effort
    Greg> than adding a pile of tricky code and options to dumb BPs
    Greg> down for crufty legacy systems.

I think you should do both.  I'm trying to emphasize how important it
is to have the leniency.  

Basically, when you see a pointer you don't know anything about
(because bounds information has been lost), I think you should be
optimistic, not pessimistic.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com