This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: bounds checking

To: law at cygnus dot com
Subject: Re: bounds checking
From: Greg McGary <gkm at eng dot ascend dot com>
Date: 15 May 1998 23:53:15 -0700
Cc: Haj dot Ten dot Brugge at net dot HCC dot nl (Herman ten Brugge), Greg McGary <gkm at eng dot ascend dot com>, wilson at cygnus dot com, egcs at cygnus dot com, d dot love at dl dot ac dot uk, drepper at cygnus dot com, bothner at cygnus dot com
References: <5876.895296813@hurl.cygnus.com>

Jeffrey A Law <law@hurl.cygnus.com> writes:

> Haj.Ten.Brugge@net.HCC.nl (Herman ten Brugge) writes:
> > Richard Jones and I want to integrate the bounds-checking patch into the
> > official gcc release.

> It would be interesting to know how your work differs from Greg's
> since he's done a bounded pointer implementation.
> 
> If they don't overlap in functionality, then we can/should consider
> the patches separately.
> 
> If they do overlap, then we'll need to evaluate both and select
> one for inclusion.  

FYI, in the conclusion of Richard W.M. Jones's (RWMJ) report on his
bounds-checking work, he speculated about alternative bounds checking
implementations, and suggested that bounded pointers would perform
best of all--that's what initially inspired me to pursue BPs in gcc.
None of the problems with BPs that RWMJ cited in his report turned out
to be significant:

1) GCC likes pointers to fit into registers:
   Multi-word BPs are handled in a way analogous to complex numbers.
   The three words of a BP can be assigned to registers independently.	

2) The size of static objects is not always known at compile time:
   GCC generates references to the extent of a variable as foo.extent.
   GCC generates definitions of foo.extent at the time it
   compiles foo, or if foo is in an unchecked library, ld (or collect)
   can synthesize the foo.extent symbol.

3) These pointers are incompatible with code compiled by other compilers ...
   and with code compiled by GCC with bounds checking switched off:
   BPs can be selectively disabled for declarations that represent
   unchecked code.  BPs have their bounds automatically stripped when
   passed as an unchecked function arg, or assigned to an unchecked
   pointer variable.

Here are the differences between BPs and RWMJ's gcc-2.7.2 based
implementation, as of a year and a half ago.  (I don't know what new
work RWMJ might have done since then):

* BPs only check prior to pointer dereference & array reference;
  RWMJ's stuff checks pointer deref, array ref, as well as all pointer
  and array index arithmetic.

  Since BPs always know the bounds of their referent object and never
  need to do a table lookup, checking at dereference time is
  sufficient.  Checking arithmetic is unnecessary.

* BPs change the size of pointers to occupy three-words (value base &
  extent), so all information needed to check a pointer travels with
  the pointer so object descriptor tables need never be used.

  RWJM's pointers are of conventional size, however in order to check
  arithmetic, the pointer must be resolved to an object descriptor,
  requiring a table lookup.  While more efficient, BPs can impact
  external interfaces when pointer sizes don't agree, though there are
  work-abounds for that situation.

* BPs offer the finest grained checking.  As I recall, RWMJ's pointers
  are only guaranteed to remain within the outermost object.  E.g., if
  you have a structure that contains an array member, a single
  object-table entry describes the whole structure, so you couldn't
  tell if an access to the array member walked off the end into
  neighboring structure members, so long as it remained within the
  structure.

* BPs are implemented in both the C and C++ front-ends.  So far,
  RWMJ's stuff is only implemented in the C front-end, though it could
  work just as well in C++.

* BPs are much more runtime efficient, since no object descriptor
  table lookups are required.  Efficiency is unimportant for some
  applications but vital for others, in particular for embedded
  systems.  Speed is also important for long-running applications
  where the bugs only show up after significant run time.

Per Bothner has recently suggested a hybrid implementation where BPs
(what he calls "fat pointers") are used internally, but thin pointers
are used for all external interfaces, requiring that descriptors be
looked-up when a pointer crosses the boundary from unchecked into
checked code.  I still favor pure BPs, since I have found the problems
of pointer size to be minimal for code of reasonably good quality
(i.e., bad code is that which assumes (sizeof (void*) == sizeof (int)),
or contains many pointer/integer casts.  When faced with bad code, I
find it is a good opportunity to fix those flaws as a prerequisite for
running BPs, but not everyone who wishes to use bounds-checking will
have the freedom to fix such programs), and the problems of mixing BP
and non BP code to be solvable.

I have been working with Ulrich Drepper to create a BP GNU C library.
Despite the differences in pointer sizes, it has been my experience
that BPs can mix with unchecked libraries, provided that GCC is told
that the library's declarations should be considered to have unbounded
pointers.  There are several ways to accomplish this--see the spec for
details.

I think that about covers it from my perspective.  I'll chime in again
if I think of anything else.  Herman & Richard should correct me if I
have misrepresented or underrepresented their work in any way.

Greg

Follow-Ups:
- Re: bounds checking
  - From: Joe Buck

References:
- Re: bounds checking
  - From: Jeffrey A Law

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]