PATCH: change splay_tree_key & splay_tree_value types to void*

Wed Aug 30 01:42:00 GMT 2000

Mark Mitchell <mark@codesourcery.com> writes:

> I'm glad to see you working on this bounded-pointer stuff.  I don't
> know if I've mentioned it before, but I did a lot of work on run-time
> error-checking tools at CenterLine.  I'm going to love using your
> work.

Yes, I first saw mention of your Centerline roots a few weeks ago, and
look forward to more of your informed opinions.  I'm glad to hear
you're a supporter of this work, and am sure you'll continue show your
support through rapid response to BP patches!  8^)

> I would suggest you *do* go with the permissive thing here, unless the
> user specifically switches on a maximum paranoia flag.

At this time, I'm rather uncomfortable about the whole notion of
compromising with the enemy of bad code.  See below.

> The biggest problem with our error-checkers was always that they
> found too many almost bugs -- even when we tried to make them
> really, really lenient.

What exactly do you mean by "almost bug"?  Do you mean a questionable
programming practices that are legit for K&R, but deprecated for C89
and possibly outlawed for C99?  I call those real bugs.  While it is
true that you can persuade GCC to tolerate many such bugs and generate
executable regardless, I still think they're real bugs that ought to
be fixed.  Moreover, I think that the promise of using BPs provides
added incentive to fix them.  IMO, the offending code should be fixed,
rather than BPs made more lenient.

Besides those attributable to K&Risms such as lack of prototypes and
pointer/int casts, what other varieties of "almost bugs" stand out in
your memory?  I'm particularly interested in things that might trip
bounds checking in otherwise clean code.  I'll work very hard to make
BPs flawless for clean code.  I don't want to work very hard (yet) to
dumb-down BPs so that they sort-of work for dirty code.

> For example, this K&R code:
> 
>   foo_p(s) { 
>     return strcmp (s, "foo") == 0;
>   }
> 
> will otherwise likely give a false positive.

Ugh!  I really don't want to work very hard to accommodate this
sort of code for BPs... at least not yet.

> Another strategy would be to use a special bounds token that says
> "integer converted to pointer" and interpret it strictly in most
> places, but leniently inside the string functions.  I'm actually
> serious about this -- ugly as it seems.  I think it might be the best
> strategy.  Special-casing certain library functions was definitely
> done in CenterLine's tools.

Since Centerline was struggling to make a profit in a tough business,
I expect that it needed to accept challenges that I hope GCC+glibc can
avoid.

> It's hard to do source-assisted run-time error-checking in programs
> that aren't typesafe -- and programs that cast integers to pointers
> and back are included in that category.

Exactly.  I think of BPs as a testing strategy that is available for
programs that have already achieved a standard of type safety through
rigorous use of function prototypes and proper casting.  A program
that hasn't yet achieved some minimal standard of conformance to
modern C practice just isn't ready yet, and there's much that can be
accomplished with static analysis tools, such as GCC's warnings.

> For maximum win, our "pointer
> descriptors" (which contained not only bound information, but also
> type information) were recoverable -- if there was no descriptor
> available for a pointer, and you had certain switches on, the run-time
> system (which also controlled malloc, in order to do leak-detection,
> etc.) would go looking for a pointer descriptor by using the pointers
> address.

Very cool in concept, but likely very slow at runtime and difficult to
implement, I expect.

> I guess my overall point is that I bet this problem will exist in user
> packages.  splay-tree.h isn't really part of GCC -- it's just a
> library.  If that library causes false positives then so will other
> libraries; I don't think splay-tree.h is that unique a beast.

No, not uniquely flawed, but neither is it uniquely unfixable.  I
proposed three fixes already, and submitted patches for two of them.
The patch with `#if __BOUNDED_POINTERS__' should be a good interim
solution that allows full checking of GCC.  A more involved fix that
involves a union and works with any architecture can come later, if
desired.  Just because C lets you get away with indiscriminate
pointer/int casts doesn't mean that I think we should tolerate such
practices.

I ought to put this in the BP project page:

	BPs are a grownup tool for grownup programs.  Your program is
	grownup when it substantially passes static analysis checkers
	such as GCC with lots of warnings, or better yet, something
	like lint.  If it's not grownup, help it to grow by using
	static analysis until it is clean.  After that, the program
	can graduate to dynamic bounds checking.

	Once suitable optimizations are implemented in GCC, BPs
	promise to perform well enough (in time & space) for
	production environments, including embedded, and your
	discipline to make the program clean will be amply rewarded.

Harsh?  Doubtlessly to someone who must maintain a zillion lines of
cruft and lacks the time, skill or will to make the program clean
enough, and is praying for a magic-bullet tool that will make all the
crashes go away.

Now, for some perspective: this is my attitude today, while I struggle
to get basic functionality integrated into GCC, move on to the C++
front-end, implement necessary optimizations to bring overhead under
50%, etc...  Once all of that is done, and BPs are a reliable and fast
tool for grownup programs, I might be willing to soften and see what
can be done to make reasonable compromises to accommodate cruftier
code.  (Actually, come to think of it, I have already done more of
that than I wanted in order to pass the bulk of c-torture tests.  8^)

Honest question: do you think there can or should be a difference of
philosophy between community-driven GCC and profit-driven Centerline
regarding accommodation for crufty code?  (i.e., Economically, I
expect that Centerline couldn't afford look down its nose at customers
with desperate need debug large legacy codebases.  OTOH, I think GCC
can afford to do that.  OTOH, maybe CygHat and/or Codesourcery would
like to earn some dollars giving the customer what it craves? 8^)

Have a look at the BP project page section on packages tested
( http://gcc.gnu.org/projects/bp/main.html#testbpuse ).  These are all
grownup GNU packages that required little or no special treatement for
BPs, and BPs found some real bounds-violation bugs.  As one would
expect with heavily tested and widely deployed packages, the bugs were
inconsequential, mostly because of luck, because if they had
user-visible manifestation, they would have been diagnosed and fixed
before BPs came along.

Anyway, I'd like you to indulge me for now and let me pursue my
fantasy of BPs for grownups and see how far that gets us.  My ultimate
fantasy is to build and run a complete Linux distribution with BPs
wall-to-wall at the user-code level (*maybe* the kernel too, someday).
IMO, bringing all of GNU/Linux code up strict BP correctness is a
worthier effort than adding a pile of tricky code and options to dumb
BPs down for crufty legacy systems.

Greg