Bounds checking

Greg McGary gkm@eng.ascend.com
Tue Nov 30 23:39:00 GMT 1999


Nigel.Horne@marconicomms.com writes:

> Is anyone planning to port http://www-ala.doc.ic.ac.uk/~phjk/BoundsChecking.html
> from gcc-2.7.2 to gcc-2.95? (And if not why not :-)

FYI, I am working on bounds checking in gcc at this very moment,
however I am using a completely different approach.  My work uses
bounded (a/k/a/ "fat") pointers.  Instead of the usual single-word
pointer, BPs are three-word records containing pointer value, base and
extent.  The pointer value is checked against base & extent at the
time of dereference.

Checked and unchecked code may be mixed to the extent that checked and
unchecked code don't share aggregates (structs & arrays) containing
pointers whose size & layout change based on the size of pointers.
When a file is compiled with bounded pointers, gcc attaches a special
prefix to the names of functions that accept pointer arguments or
return pointer values.  If the function's signature has only simple
pointers to scalars or aggregates that don't have pointers, then gcc
can automatically generate a thunk to translate between BP and non-BP
versions of the function.  For each function defined with BPs, gcc
generates a non-BP version that adds bounds to pointer args, calls the
BP version, then strips away bounds from return value.  For extern
functions that might be defined as non-BP, gcc generates a BP version
that does the opposite.  Thunks are only generated when it is safe to
do so: Functions that have argument or return values that are
aggregates or pointers to aggregates whose element sizes & layouts
change based on the size of pointers will not get automatically
generated thunks.  If such functions are shared between checked and
unchecked code, there will be undefined symbols at link time which
must be resolved by either shifting the boundary between checked and
unchecked code, or by manually writing thunks to rewrite argument and
return-value aggregates.  Thunks are only of concern in mixed
environments.  For applications where all source code is available,
thunks are not needed.  For device code where the size of
pointers in memory-mapped registers or shared data structures cannot
change, it is possible to designate pointers as `__unbounded', which
can be used as a qualifier or attribute.

I am working on the gcc trunk (2.96) and have finished most of the
C-tree hacking code working well enough to pass most of c-torture with
fat pointers, but with no actual bounds checks yet.  I am almost done
with the thunk generation pass.  After I have thunks, I can test
building and running some application code, such as GNU fileutils,
textutils and sh-utils linked with an unchecked libc.  After that
works, then I'll build a checked glibc, making mods to the
assembler-language interfaces (mostly string functions and system
calls) as required.

After that, I'll do C++, then maybe Java and/or Chill.

IMO, the implementation you cite above is severely crippled with
regard to runtime and space overhead.  My earlier implementation of
bounded pointers (based on 2.7.2) ran with approx. 75% space and time
overhead on i386 and i960 without any optimizations to eliminate
redundant checks.  With optimizations, I hope to shave space & time
overhead to as little as 25%.  By comparison, the bounds checking
implementation you cite has over 500% time overhead and over 100% space
overhead, making it unsuitable for many real-world applications,
particularly embedded & realtime ones.

Greg



More information about the Gcc-bugs mailing list