Pointer & array bounds checking (was Re: Code Generation)

Greg McGary gkm@eng.ascend.com
Wed Jun 9 13:33:00 GMT 1999


Per Bothner <bothner@cygnus.com> writes:

> > Anyway, Purify for free as a side-effect of egcs would be very sweet.
> 
> Careful.  Purify has restrive (and enforced) patents.
> What we need is array bounds checking - without using the purify
> parented techniques.

Can someone tell us exactly what Purify has patented?  I vaguely
recall hearing that all of their patents apply to techniques for
instrumenting object code.  The original checker operated on assembler
code, and `gcc -fcheck-memory-usage' operates on C source code.

> The Checker implementation does not I believe violate any patents.
> However, I don't think it is the right approach.  It uses "fat
> pointers" (i.e. each pointer is represented by a two- or three-word
> set of pointers).

You are not talking about Checker.  You must be talking about what I
call "bounded pointers", or some similar bounds checking proposal.  I
have edited the remaining quotes from your message to use the phrase
"BOUNDS CHECKING" rather than "Checker".

> This breaks binary compatibility, which makes it very inconvenient.

There are satisfactory ways of preserving binary compatibility with
fat pointers, so they are not "very inconvenient".

> A better approach uses "thin" (normal) pointers in function parameters,
> results, globals, and structure fields, but uses "fat" pointers
> *internally* in a function.  Thin pointers are coerced to fat
> pointers as needed by looking them up in a run-time table.

IMO, this is not a better approach because (1) it introduces excessive
runtime overhead and (2) the binary compatibility problems might not
be significant, but when they are, they can be handled in other ways.

Regarding (1), the primary advantage to fat pointers is that they are
runtime efficient, and therefore usable in real environments.  Rather
than restricting fat pointers to be internal to functions only, it
would be much better to broaden the use of fat pointers to entire
subsystems, or entire applications--large collections of cooperating
code that are all fat-pointer compatible.  It is very wasteful for a
fat-pointer aware function to strip the pointer bounds of arguments
passed to another fat-pointer aware function, and force the called
function to do an expensive lookup operation in order to reconstitute
the fat pointer.

Regarding (2), binary compatibility is not significant if all code
(application, libraries, system-call interfaces) is compiled with BPs.
A very cool goal is to build a completely BP'ed GNU/Linux system.  In
cases where binary compatibility is important, BP'ed functions can
interface with non-BP'ed functions through automatically-generated
thunks.  All BP'ed functions can have their return-type and argument
signatures mangled into the function name.  At link time, if a
function calls a mangled name, but only the unmangled name is defined,
then we generate a thunk to translate from fat to thin pointer args
and from thin to fat pointer return value.  Conversely, if a function
calls an unmangled name but only the mangled name is defined, we
generate a thunk to do the reverse translation.  Handling global data
is somewhat tricker.  I've been thinking lately that if we can mangle
function names, then why not mangle struct and global data names as
well?  In that case, we won't translate between fat & thin pointers,
but rather fail to link if the definer and user disagree about the
boundedness of pointers.  It will then be up to the programmer to
qualify the decls of types and data that contain pointers that should
remain thin.

> This is not a efficient or clean or simple as fat pointers, but it
> preserves binary compatibility, without which BOUNDS CHECKING is
> little more than a toy.

OTOH, without runtime efficiency, bounds checking is a toy because you
can't use it in production systems that are latency and
throughput sensitive.  With optimizations to remove redundant checks,
I am confident that code compiled with bounded pointers can run with
only 10%-25% runtime overhead.  You should be able to run a completely
BP'ed system on today's hardware with the same performance as a
non-BP'ed system running on last year's hardware.

> If someone want to work on an improved binary-compatible BOUNDS CHECKING,
> I have some ideas.  I'd be happy in sharing these ideas - if there
> actually is a chance someone might actually take on such a project.

I am in the midst of negotiating with my primary consulting client
(Ascend, soon to be Lucent) a schedule for implementing fine-grained
bounds checking using bounded (a/k/k fat) pointers in egcs.  I'd be
happy to discuss this at any length.

Greg


More information about the Gcc mailing list