This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[wwwdocs] Cleanup of "Open projects" 1/n
- From: Steven Bosscher <stevenb at suse dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Gerald Pfeifer <gerald at pfeifer dot com>
- Date: Sat, 6 Nov 2004 13:28:01 +0100
- Subject: [wwwdocs] Cleanup of "Open projects" 1/n
- Organization: SUSE Labs
Hi,
This is a first step towards cleaning up our "Open projects" web
pages in projects/ from the "Work in progress" section.
Let's start with removing some obviously dead stuff:
- Bounded pointer checking "gcc.gnu.org/projects/bp/". The
line "Project Status (updated 2000-08-11)" should tell you
enough. My understanding is this work has been subsumed
by mudflap, but even if it is not, this is Really Old Stuff
that needs to go away.
- Value range propagation pass "which isn't yet in GCC" and
never will be. There is a newer vrp.c on the hammer branch
and Diefo wants a VRP pass for tree-ssa.
- Automaton based pipeline hazard recognizer. This is not
work in progress, it's finished.
OK?
Gr.
Steven
Index: index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/index.html,v
retrieving revision 1.45
diff -c -3 -p -r1.45 index.html
*** index.html 4 Jul 2004 20:13:53 -0000 1.45
--- index.html 6 Nov 2004 12:18:20 -0000
***************
*** 11,24 ****
<li><a href="#projects_for_beginner_gcc_hackers">Projects for beginner GCC hackers</a></li>
<li><a href="#projects_for_the_c_preprocessor">Projects for the C preprocessor</a></li>
<li><a href="#projects_for_the_gcc_web_pages">Projects for the GCC web pages</a></li>
- <li><a href="#work_in_progress">Work in progress</a>
- <ul>
<li><a href="#development_branches">Development Branches</a></li>
- <li><a href="#bounds_checking_with_bounded_pointers">Bounds Checking with Bounded Pointers</a></li>
- <li><a href="#value_range_propagation_pass">Value range propagation pass</a></li>
- <li><a href="#automaton_based_pipeline_hazard_recognizer">Automaton based pipeline hazard recognizer</a></li>
<li><a href="#data_prefetch">Data prefetch support</a></li>
- </ul></li>
<li><a href="#optimizer_inadequacies">Optimizer inadequacies</a></li>
<li><a href="#ia64_projects">Projects to improve performance on IA-64</a></li>
<li><a href="#changes_to_support_c99_standard">Changes to support C99 standard</a></li>
--- 11,18 ----
*************** href="cpplib.html">C preprocessor</a>.</
*** 95,124 ****
<p>There is a separate projects list for the <a href="web.html">web
pages</a>.</p>
! <h2><a name="work_in_progress">Work in progress</a></h2>
! <p>Different projects that are in progress.</p>
!
! <h3><a name="development_branches">Development Branches</a></h3>
<p>There are several <a href="../cvs.html#devbranches">development branches</a>
pursuing various goals.</p>
! <h3><a name="bounds_checking_with_bounded_pointers">Bounds Checking with Bounded Pointers</a></h3>
! <p>There is a separate page for <a
! href="bp/main.html">Bounds Checking with Bounded Pointers</a>.</p>
!
! <h3><a name="value_range_propagation_pass">Value range propagation pass</a></h3>
! <p>John Wehle (john@feith.com) implemented a <a
! href="http://gcc.gnu.org/ml/gcc-patches/2000-07/msg00968.html">value
! range propagation pass</a> which isn't yet in GCC.</p>
!
! <h3><a name="automaton_based_pipeline_hazard_recognizer">Automaton based pipeline hazard recognizer</a></h3>
! <p>Vladimir Makarov has implemented an
! <a href="http://gcc.gnu.org/ml/gcc-patches/2001-06/msg00951.html">
! automaton based pipeline hazard recognizer</a>. It enables better
! instruction scheduling and it provides the base for a future software
! pipelining implementation.</p>
!
! <h3><a name="data_prefetch">Data prefetch support</a></h3>
<p>A separate page describes <a href="prefetch.html">
data prefetch support and optimizations</a> that are in development
in the main branch.</p>
--- 89,99 ----
<p>There is a separate projects list for the <a href="web.html">web
pages</a>.</p>
! <h2><a name="development_branches">Development Branches</a></h2>
<p>There are several <a href="../cvs.html#devbranches">development branches</a>
pursuing various goals.</p>
! <h2><a name="data_prefetch">Data prefetch support</a></h2>
<p>A separate page describes <a href="prefetch.html">
data prefetch support and optimizations</a> that are in development
in the main branch.</p>
Index: bp/main.html
===================================================================
RCS file: bp/main.html
diff -N bp/main.html
*** bp/main.html 4 Feb 2004 08:06:37 -0000 1.19
--- /dev/null 1 Jan 1970 00:00:00 -0000
***************
*** 1,834 ****
- <html>
-
- <head>
- <title>Bounds Checking in C & C++ using Bounded Pointers</title>
- <link rev="made" href="mailto:greg@mcgary.org" />
- </head>
-
- <body>
- <h1>Bounds Checking Projects</h1>
-
- <p>This page describes work in progress to add fine-grained bounds
- checking to GCC's C and C++ front-ends. Interested parties are
- invited to port to Objective C as well. Please contact <a
- href="mailto:greg@mcgary.org">Greg McGary, greg@mcgary.org</a> if you
- wish to assist with development or testing.</p>
-
- <h2>Contents</h2>
-
- <ul>
- <li><a href="#overview">Overview of Bounded Pointers</a></li>
- <li><a href="#status">Project Status</a></li>
- <li><a href="#goals">Goals</a></li>
- <li><a href="#nongoals">Non-Goals</a></li>
- <li><a href="#maybegoals">Maybe Goals</a></li>
- <li><a href="#toolchain">Other Links in the Toolchain</a></li>
- <li><a href="#building">Building GCC and glibc for Bounded Pointers</a></li>
- <li><a href="#testing">Testing with Bounded Pointers</a></li>
- <li><a href="#testbpuse">Packages Tested Using Bounded Pointers</a></li>
- <li><a href="#knownbugs">Known Bugs</a></li>
- <li><a href="#porting">How to Port to a new CPU</a></li>
- <li><a href="#gccdetails">GCC Implementation Details</a></li>
- </ul>
-
- <hr />
-
- <h2><a name="overview">Overview</a></h2>
-
- <p>Bounded Pointers are easy to understand. GCC augments every
- pointer datum with two additional pointers that hold the low bound and
- high bound of the object to which the pointer is seated. Prior to
- dereference, GCC generates code to test whether the pointer's value
- lies within the bounds, and if bounds are violated, to generate a
- machine exception.</p>
-
- <p>Many find the notion of changing the size of a fundamental data
- type alarming, but for well-formed higher-level C code that uses
- accurate function prototypes and avoids abusing pointer/integer casts,
- this is seldom a problem in practice. Even low-level code can use
- bounded pointers with some extra care.</p>
-
- <h2><a name="status">Project Status (updated 2000-08-11)</a></h2>
-
- <ul>
-
- <li><h3>Working for Intel x86</h3>
-
- <p>Basic functionality is present
- for Intel x86 using GCC code on the CVS tag
- ``<code>bounded-pointers-ss-20000730</code>''. Basic functionality
- includes ...</p>
-
- <ul>
- <li>synthesis of a datum's bounds upon application of
- <code>addressof</code> operator.
- (Bounded pointers are also returned by memory allocators such as
- <code>malloc</code>, but that's implemented by the allocator library.)</li>
- <li>propagation of bounds via pointer assignment,
- passing of function argument and return of function value.</li>
- <li>programmer control over boundedness of pointer types via new qualifiers
- ``<code>__bounded</code>'' and ``<code>__unbounded</code>,'' and
- access to the components of a bounded pointer via new prefix operators
- ``<code>__ptrvalue</code>'', ``<code>__ptrlow</code>'' and
- ``<code>__ptrhigh</code>''.</li>
- </ul>
-
- <p>I have tested BPs on a number of packages (see <a
- href="#testbpuse">Packages Tested Using Bounded Pointers</a> for
- details). I have completed a full (mostly successful) bootstrap of
- GCC for <code>LANGUAGES=c</code> passing
- ``<code>-fbounded-pointers</code>'' at all stages (see <a
- href="#bootbpgcc">Bootstrap GCC with Full Bounds Checks</a> for
- details).</p>
- </li>
-
- <li><h3>GNU C Library</h3>
-
- <p>I have recently committed support for
- bounded pointers to the trunk of the GNU C library CVS tree. Intel
- x86 is functional. PowerPC is in progress but still incomplete. 90%
- of the <code>glibc</code> testsuite passes in BP mode for Intel x86. C library
- support includes thunks for system calls that accept bounded pointer
- arguments, check their bounds and pass simple pointers on to the
- OS kernel.</p>
- </li>
-
- <li><h3>Committing GCC Changes</h3>
-
- <p>My primary focus is on getting GCC
- changes out of the branch and committed to the CVS trunk.</p>
- </li>
-
- <li><h3>Documentation</h3>
-
- <p>My secondary focus is on writing
- documentation which is necessary in order to guide developers and
- testers.</p>
- </li>
-
- <li><h3>Unfinished Business</h3>
- <p>The most important unfinished bits are:</p>
-
- <ul>
- <li>C++ front end.</li>
- <li>Optimize to eliminate redundant bounds checks.</li>
- <li>Relax GCC's requirement that structs reside in memory
- so that elements of a bounded pointer may be independently
- assigned to machine registers.</li>
- <li>Port to more CPU architectures. (PowerPC port is in progress.)
- See <a href="#porting">How to Port to a new CPU</a>.</li>
- </ul>
- </li>
-
- </ul>
-
- <h2><a name="goals">Goals</a></h2>
-
- <ul>
-
- <li><h3>Finest Granularity</h3>
-
- <p>Bounded pointers enforce
- data-integrity at the finest possible granularity. Once a pointer is
- seated to a datum, be it a scalar, array, array element, structure, or
- structure member, references through that pointer may not exceed the
- bounds of the datum. Purify won't do this for you. As long as a
- pointer references valid memory, purify won't protest that your
- program blew the bounds of an array and started overwriting an
- adjacent data structure.</p>
- </li>
-
- <li><h3>Prevent Unwanted Mixing of Checked and Unchecked Code</h3>
-
- <p>Functions having pointers in their return-type/arg-types signature are
- incompatible between the BP and non-BP modes. In order to prevent
- unwanted mixing (i.e., calling a function in BP mode when it is
- defined in non-BP mode, and vice-versa), GCC ``mangles'' the symbols
- of all BP mode functions that have pointers in their signatures.
- The presence of BP-mangled symbols causes unwanted mixing to be
- detected at link time, rather than at runtime where the debug cost
- is very much higher.</p>
-
- <p>As of this writing, the C function names are mangled by prepending
- ``<code>__BP_</code>''. This is subject to change, since using a
- suffix might work better with gdb (see <a href="#toolchain">Other
- Links in the Toolchain</a>). At this time, only function names are so
- mangled. It would be better to also mangle the names of global data
- structures that contain pointers.</p>
-
- <p>For C++, whose functions are already mangled, I intend to add a
- boundedness qualifier to the mangling scheme, perhaps adding the
- letter `X' after the `P' that indicates pointer. C++ does not mangle
- the type of a function's return value, but in BP mode, this
- information is essential. The calling convention for returning a
- bounded-pointer is incompatible with that for returning a single word.
- A bounded pointer is represented as a three-word struct, so returning
- one means returning a struct by value, which requires that the caller
- designate space for the return value and pass a hidden first argument
- that points to it. The presence of the hidden pointer argument shifts
- the argument list by one slot, making it incompatible with the
- non-BP-return case.</p>
- </li>
-
- <li><h3>Low Overhead</h3>
-
- <p>Space and time overheads for
- bounded-pointer programs are both approx 150%..200% (i.e., 2.5x..3x
- slowdown and 2.5x..3x code size increase). A couple years back I
- implemented bounded pointers in <code>gcc-2.7.2</code> with much hackage at
- the RTL layer, and using a special BP machine mode (akin to the complex-number
- machine modes) that allowed GCC to assign BP components individually
- to registers, and to pass/return BPs components in registers. This
- version had space and time overhead of only 75%, and that was without
- any optimizations to eliminate redundant checks.</p>
-
- <p>This experience leads me to believe that with optimizations to
- eliminate redundant bounds checks, and with the ability to assign BP
- components individually to registers, space and time overhead can be
- brought under 50% (i.e., 1.5x slowdown and 1.5x code size increase).</p>
- </li>
-
- </ul>
-
- <h2><a name="nongoals">Non-Goals</a></h2>
-
- <p>Bounded pointers do not detect the following errors in memory-usage:</p>
-
- <ul>
- <li>Memory Leaks</li>
- <li>References through Dangling Pointers</li>
- <li>References to Uninitialized Memory</li>
- </ul>
-
- <p>Memory checks are done by Purify or <code>Checker</code> (Refer to
- GCC's <code>-fcheck-memory-usage</code> option). The checks provided
- by bounded pointers and the memory-usage checkers complement each
- other nicely without overlap.</p>
-
- <h2><a name="maybegoals">Maybe Goals</a></h2>
-
- <ul>
-
- <li><h3>Support Controlled Mixing of Checked and Unchecked Code</h3>
-
- <p>Mixing checked and unchecked code is something that's theoretically
- possible using two mechanisms: (1) explicit qualification of the
- boundedness of declarations and (2) thunks that translate between
- bounded-pointer and unbounded-pointer function interfaces.</p>
-
- <p>In practice, the amount of work to properly control mixing is
- unpredictable. For instance, it's bloody difficult to build
- bounded-pointer applications of reasonable complexity with an
- unbounded-pointer C library. On the other hand, it's considerably
- easier to mix bounded-pointer application code with unbounded-pointer
- X11 libraries.</p>
-
- <p>I have implemented the beginnings of automatic thunk-generation in
- GCC, but so far it has only proven useful for building the C torture
- testsuite in the days before I had a BP-capable C library.</p>
-
- <p>I consider this to be a back-burner project, since I believe that with
- proper optimization, a 100% bounded-pointer program can be built and
- run with acceptable space & time overhead. In the absence of a
- performance justification for mixing unchecked code, the other reason
- to mix unchecked code is because one has only binaries. As a
- free-software project, bounded pointers in GCC exist primarily to
- benefit the free-software community, so I don't intend to go out of my
- way to accommodate programs that can't be built entirely from source
- code.</p>
-
- <p>A third reason to mix unchecked code might be to work in stages on
- converting a large system to become bounded-pointer capable. It would
- be nice to provide this option, but other things are more important
- for now, particularly optimizations and broadening the list of
- supported CPUs.</p>
- </li>
-
- </ul>
-
- <h2><a name="toolchain">Other Links in the Toolchain</a></h2>
-
- <ul>
-
- <li><h3>ld</h3>
-
- <p>GCC synthesizes bounds with the
- <code>addressof</code> operation. A data object declared as
- ``<code>extern</code>'' with an incomplete type (or with a structure type
- containing a flexible array member) has unknown size, but might have
- its address taken. Since GCC can't compute the high bound based on an
- unknown size, it generates datum <code>foo</code>'s high bound as a
- reference to the synthetic symbol ``<code>foo.high_bound</code>''. If
- <code>foo</code> is defined as initialized data, GCC generates the
- label definition of <code>foo.high_bound</code> immediately following
- <code>foo</code>'s initializers. However, if <code>foo</code> resides
- in uninitialized data (BSS or common), GCC cannot do this, and it's
- left to the linker to synthesize <code>foo.high_bound</code>.
- I have a small patch to GNU ld that does this for ELF targets.
- (<a href="patch-ld.txt">Get the ld patch from here</a>)</p>
- </li>
-
- <li><h3>gdb</h3>
-
- <p>Bounded pointers introduce two nuisances for debugging:</p>
-
- <p>First, bounded pointers are represented internally as three-member
- structures containing simple pointer members for the value, low bound
- and high bound. Gdb currently knows nothing about bounded pointers
- and treats them according to the information in the symbol table.
- Print a pointer variable and you'll see a three member struct.
- Attempt to dereference a pointer variable via the expression
- ``<code>*foo</code>'', and you'll get an error because gdb thinks foo is
- a struct--you must dereference with ``<code>*foo.value</code>''.</p>
-
- <p>Second, if a function has a pointers as any of its return type or
- argument types, its assembler-name is prefixed with
- ``<code>__BP_</code>''. Therefore, you need to prefix such function
- names when setting breakpoints or printing function addresses.</p>
-
- <p>It would be useful to teach gdb about these two idiosyncrasies of
- bounded pointers.</p>
-
- <p>You will need a small patch to gdb so that it won't crash starting up
- on a BP-mode program. (<a href="patch-gdb.txt">Get the gdb
- patch from here</a>)</p>
- </li>
-
- <li><h3><a name="autoconf">autoconf</a></h3>
-
- <p>The ``<code>__BP_</code>'' prefix that is applied to functions
- having pointers in their return-type/arg-types signature presents
- problems for autoconf. Autoconf tests for the presence of library
- functions by creating a tiny test program that compiles and links with
- a library. If the test program fails to link, then the function is
- considered to be absent from the library and the package supplies a
- substitute. The declaration coded into the test program is a phony
- one of this form: ``<code>char foo ();</code>''. If one wishes to
- configure with the GCC option ``<code>-fbounded-pointers</code>'', and
- <code>foo</code> has pointers in its signature, its library definition
- will be as ``<code>__BP_foo</code>'', but the phony declaration will
- compile as a reference to the simple ``<code>foo</code>'' and thus
- yield a false negative. A work-around is to always configure with the
- non-BP version of a library. I hope that a long-term solution will
- come with extensions to autoconf that arrange to get a prototype for
- the function under test.</p>
- </li>
-
- </ul>
-
- <h2><a name="building">Building GCC and glibc for Bounded Pointers</a></h2>
-
- <p>If you wish to help with development and/or testing, you must first
- build a baseline. In the examples below, the shell variables
- ``<code>$..._dir</code>'' represent the directory names of your
- toplevel <code>gcc</code>, <code>glibc</code>, <code>ld</code> and
- <code>gdb</code> trees. The shell variables
- ``<code>$..._repo</code>'' hold the names of the GCC and
- <code>glibc</code> CVS repositories. The values of these repository
- variables will depend on whether you have write access or have
- readonly access through <code>pserver/anoncvs</code> mode. I'll
- assume you know enough about CVS and about configuring and building
- GNU packages to adapt the procedure below to fit your environment.</p>
-
- <ol>
-
- <li><h3>Checkout, build and install GCC</h3>
-
- <pre>
- $ mkdir -p $gcc_dir/BUILD
- $ cd $gcc_dir
- $ cvs -d $gcc_repo co -rbounded-pointers-ss-20000730 -d src gcc
- $ cd BUILD
- $ ../src/configure --prefix=$gcc_dir --enable-languages=c
- $ make bootstrap
- $ make install
- </pre>
-
- <p>For convenience, you might wish to install a symlink called
- ``<code>gcc-bp</code>'' in one of your bin directories that refers to
- <code>$gcc_dir/bin/gcc</code>.</p>
- </li>
-
- <li><h3>Checkout, build and install glibc</h3>
-
- <pre>
- $ mkdir -p $glibc_dir/BUILD
- $ cd $glibc_dir
- $ cvs -d $glibc_repo co -d src libc
- $ cd BUILD
- $ env CC=$gcc_dir/bin/gcc ../src/configure --prefix=$glibc_dir \
- --enable-bounded --disable-profile --disable-shared
- $ make
- $ make install
- </pre>
-
- <p>I recommend ``<code>--disable-profile</code>'' and
- ``<code>--disable-shared</code>'' in order to shorten build time since
- you won't need these targets.</p>
- </li>
-
- <li><h3>Obtain, patch, build and install GNU ld</h3>
-
- <p>I won't give detailed instructions here, because there's nothing out
- of the ordinary. Download a modern binutils release, or get the code
- from CVS.</p>
-
- <p>You will need a small patch to GNU ld so that it will synthesize
- ``<code>foo.high_bound</code>'' symbols for common & bss symbols. (<a
- href="patch-ld.txt">Get the ld patch from here</a>) The patch
- is relative to <code>binutils-2.10</code>, but will work on
- <code>binutils-2.9</code> as well.</p>
- </li>
-
- <li><h3>Obtain, patch, build and install gdb</h3>
-
- <p>I won't give detailed instructions here, because there's nothing out
- of the ordinary. Download a modern gdb release, or get the code from
- CVS.</p>
-
- <p>You will need a small patch to gdb so that it won't crash starting up
- on a BP-mode program. <a href="patch-gdb.txt">Get the gdb
- patch from here.</a> The patch is relative to <code>gdb-5.0</code>,
- but will work on <code>gdb-4.18</code> as well, if you supply the
- `<code>-l</code>' option to <code>patch</code> to make it more lenient
- about whitespace differences.</p>
- </li>
-
- </ol>
-
- <h2><a name="testing">Testing with Bounded Pointers</a></h2>
-
- <p>Now that you have the essentials for working with bounded pointers,
- here are some suggestions for testing. I present them in order of
- increasing difficulty. You will be testing three things: (1)
- correctness of BP-mode code generated by GCC, the correctness of the C
- library's handling of BPs, and (3) correctness of the code under test.
- If you wish to focus on debugging the BP implementation in GCC and the
- GNU C library, then you should test using mature infrastructure
- packages that have been around for many years. If you test on new
- code, you're more likely to find bugs in the application, which is of
- course what the BP feature is designed for, so you are most surely
- welcome to do that!</p>
-
- <ul>
-
- <li><h3>Run glibc's test suite</h3>
-
- <p>This is easy. Just run ``<code>make check</code>'' after building.
- Most tests pass. As for the rest, pick one and debug it.</p>
- </li>
-
- <li><h3>Run GCC's test suite for C</h3>
-
- <p>Here's how to run the GCC C torture tests in BP mode:</p>
-
- <pre>
- $ make check-gcc RUNTESTFLAGS="--tool_opts=\"-g -fbounded-pointers -static \
- -B$glibc_build_dir/csu/ -L$glibc_build_dir\""
- </pre>
-
- <p>Remember that ``<code>$..._dir</code>'' variables represent directory
- names from your system. Note the use of
- ``<code>-B$glibc_build_dir/csu/</code>'' to get <code>bcrt1.o</code>,
- and ``<code>-L$glibc_build_dir</code>'' to get libraries. Both of
- these options refer to your C library build directories, not to the
- directories in which you installed the C library. This is
- intentional. The only thing you really need from the install tree is
- the header tree in ``<code>$glibc_dir/include</code>''. For the rest,
- it is more convenient to get the files directly from the build tree,
- so that when you rebuild <code>glibc</code> after fixing a bug, you
- can avoid the install step. Naturally, if you change a public header
- file, you'll need to do the install, but this happens much less
- frequently.</p>
- </li>
-
- <li><h3>Build some other package and run its test suite</h3>
-
- <p>Pick a favorite package and have at it. Don't forget to build
- a BP version of any extra libraries the package requires.</p>
-
- <p>Because of the <a href="#autoconf">problems with <code>autoconf</code></a>
- mentioned above, the best workaround is to configure with the static
- non-BP version of the C library you built alongside the BP version.
- Your installed C library will invariably be an older version of
- <code>glibc</code>, and will yield different configuration results, so
- you don't want to use it.</p>
-
- <p>I use a couple of ``wrapper'' script to prefix the
- <code>configure</code> command that gives me a suitable environment
- for using the newly-built C library.</p>
-
- <p>This one is called ``<code>ubpenv</code>'':</p>
-
- <pre>
- #!/bin/bash
- export CC=$gcc_dir/bin/gcc
- export LDFLAGS="-static -B$glibc_build_dir/csu/ -L$glibc_build_dir"
- export CFLAGS="-isystem $glibc_dir/include -O2"
- "$@"
- </pre>
-
- <p>This one is called ``<code>bpenv</code>'', and differs only in the
- value of <code>CFLAGS</code>:</p>
-
- <pre>
- #!/bin/bash
- export CC=$gcc_dir/bin/gcc
- export LDFLAGS="-static -B$glibc_build_dir/csu/ -L$glibc_build_dir"
- export CFLAGS="-isystem $glibc_dir/include -fbounded-pointers \
- -fno-optimize-sibling-calls -O2 -g"
- "$@"
- </pre>
-
- <p>I recommend that you turn off sibling-call optimizations in order to
- preserve complete call traces and avoid surprises while debugging.</p>
-
- <p>In order to override the configured value of CFLAGS, you need to build
- like so:</p>
-
- <pre>
- $ bpenv eval make 'CFLAGS="$CFLAGS"'
- </pre>
-
- <p>To save some typing, I have a third script called ``<code>bpmake</code>'':</p>
-
- <pre>
- #!/bin/bash
- bpenv eval make 'CC="$CC"' 'CFLAGS="$CFLAGS"' 'LDFLAGS="$LDFLAGS"' "$@"
- </pre>
-
- <p>With these scripts, the sequence for building and testing a GNU
- package in BP mode is this:</p>
-
- <pre>
- $ ubpenv ./configure
- $ bpmake
- $ bpmake check
- </pre>
- </li>
-
- <li><h3><a name="bootbpgcc">Bootstrap GCC with Full Bounds Checks</a></h3>
-
- <p>The bootstrap procedure outlined below depends on already having a
- BP-capable compiler installed, and is performed on the GCC source tree
- at CVS tag ``<code>bounded-pointers-ss-20000730</code>''. This procedure
- doesn't produce a GCC that's particularly useful, since it's so much
- slower. This is purely a testing exercise in order to expose bounds
- violations in GCC, and to validate the correctness of bounds-checked
- code.</p>
-
- <p>The host compiler is <code>gcc-bp</code>, an ordinary unchecked
- program that produces a checked stage1. The stage1 compiler is fully
- bounds checked, and so runs like a pig on quaaludes while producing
- the stage2 compiler. The stage2 compiler is a companion pig on
- quaaludes that produces a third drugged pig. We do the final binary
- compare on the second- and third-stage pigs, and use the third-stage
- pig to run the test suite.</p>
-
- <p>There are some potholes along the road that you'll need to steer around:</p>
-
- <ul>
- <li> <code>makeinfo</code>, <code>install-info</code>, and
- <code>texindex</code> don't link for lack of a BP version of
- <code>libz.a</code>. We don't need <code>texinfo</code>, so
- we can just ignore it.</li>
- <li> GCC's <code>gettext</code> implementation in
- <code>gcc/intl/libintl.a</code> conflicts with
- <code>glibc</code>'s, so we must configure to ignore GCC's.</li>
- </ul>
-
- <p>First, you must supplement the command-line in the
- <code>bpmake</code> script with these extra arguments:</p>
-
- <pre>
- 'BOOT_CFLAGS="$CFLAGS"' 'BOOT_LDFLAGS="$LDFLAGS"'
- 'SYSTEM_HEADER_DIR="$glibc_dir/include"'
- </pre>
-
- <p>With that done, this procedure does the trick:</p>
-
- <pre>
- $ ubpenv ./configure --without-included-gettext --enable-languages=c
- $ bpmake all-libiberty
- $ bpmake -C gcc
- $ bpmake -C gcc stage1 bootstrap2
- </pre>
-
- <p>The second and third stages compare cleanly. Unfortunately,
- running the test suite yields these extra failures that did not
- appear for the installed <code>gcc-bp</code>:</p>
-
- <pre>
- FAIL: gcc.c-torture/compile/981001-2.c, -O0
- FAIL: gcc.c-torture/compile/981001-2.c, -O1
- FAIL: gcc.c-torture/compile/981001-2.c, -O2
- FAIL: gcc.c-torture/compile/981001-2.c, -O3 -fomit-frame-pointer
- FAIL: gcc.c-torture/compile/981001-2.c, -O3 -g
- FAIL: gcc.c-torture/compile/981001-2.c, -O3 -fssa
- FAIL: gcc.c-torture/compile/981001-2.c, -Os
- FAIL: gcc.c-torture/execute/990117-1.c execution, -O3 -fomit-frame-pointer
- FAIL: gcc.c-torture/execute/990117-1.c execution, -O3 -g
- FAIL: gcc.c-torture/execute/990117-1.c execution, -O3 -fssa
- FAIL: gcc.c-torture/execute/ieee/minuszero.c execution, -O1
- FAIL: gcc.c-torture/execute/ieee/minuszero.c execution, -O2
- FAIL: gcc.c-torture/execute/ieee/minuszero.c execution, -O3 -fomit-frame-pointer
- FAIL: gcc.c-torture/execute/ieee/minuszero.c execution, -O3 -fomit-frame-pointer -funroll-loops
- FAIL: gcc.c-torture/execute/ieee/minuszero.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
- FAIL: gcc.c-torture/execute/ieee/minuszero.c execution, -O3 -g
- FAIL: gcc.c-torture/execute/ieee/minuszero.c execution, -O3 -fssa
- FAIL: gcc.c-torture/execute/ieee/minuszero.c execution, -Os
- FAIL: gcc.dg/20000419-2.c (test for excess errors)
- FAIL: alias-1.c
- FAIL: wkali-1.c
- FAIL: wkali-2a.o
- FAIL: gcc.misc-tests/gcov-1.c (test for excess errors)
- WARNING: gcc.misc-tests/gcov-1.c compilation failed to produce executable
- FAIL: gcov-1.c:1:is 4:should be 11
- FAIL: gcov-1.c:1:is 5:should be 10
- FAIL: gcov-1.c:1:is 7:should be 1
- FAIL: gcc.misc-tests/gcov-2.c (test for excess errors) (PRMS 8294)
- WARNING: gcc.misc-tests/gcov-2.c compilation failed to produce executable
- </pre>
-
- <p>Even so, it's not so bad for an intoxicated pig.</p>
- </li>
-
- </ul>
-
- <h2><a name="testbpuse">Packages Tested Using Bounded Pointers (updated 2000-08-09)</a></h2>
-
- <p>Below is a list of results for packages tested with bounds checking.
- Unless otherwise noted, tests were done by me (Greg).</p>
-
- <ul>
-
- <li><h3>GNU awk 3.0.61</h3>
-
- <p>Bounds violations exposed:</p>
- <ul>
- <li> <code>regex.c</code>: <a href="patch-regex.txt">This patch
- was required.</a></li>
- </ul>
-
- <p>100% of the test suite passes after fixing the bugs listed above.
- (However, the maintainer admits that the test suite is hardly
- comprehensive.) Fixes appear in 3.0.6.
- </p>
- </li>
-
- <li><h3>GNU textutils 2.0g</h3>
-
- <p>Bounds violations exposed:</p>
- <ul>
- <li> <code>regex.c</code>: <a href="patch-regex.txt">This patch
- was required.</a></li>
- <li> <code>pr</code>: <code>init_header</code> wrote past the end
- of a string buffer for pages with column-width less than 22 characters.</li>
- <li> <code>pr</code>: <code>store_columns</code> read a column descriptor
- that was one past the end of the array of columns.</li>
- <li> <code>tail</code>: <code>pipe_lines</code> read past the
- beginning of a string buffer when given an empty input file.</li>
- </ul>
-
- <p>100% of the test suite passes after fixing the bugs listed above.
- Fixes appear in 2.0h</p>
- </li>
-
- <li><h3>GNU shellutils 2.0k</h3>
-
- <p>Bounds violations exposed:</p>
- <ul>
- <li> <code>regex.c</code>: <a href="patch-regex.txt">This patch
- was required.</a></li>
- </ul>
-
- <p>100% of the test suite passes after fixing the bugs listed above.</p>
- </li>
-
- <li><h3>GNU make 3.79.1</h3>
-
- <p>No bounds violations exposed. 100% of the test suite passes.</p>
- </li>
-
- <li><h3>GNU findutils 4.1.5</h3>
-
- <p>No bounds violations exposed. 100% of the test suite passes.</p>
- </li>
-
- <li><h3>web2c 7.3.2 (TeX and Metafont)</h3>
-
- <p>Bounds violations exposed:</p>
- <ul>
- <li> <code>fixwrites</code>: <code>main</code> read past the
- beginning of a string buffer when presented with an empty line.</li>
- </ul>
-
- <p>100% of the test suite passes after fixing the bug listed above
- and compiling with ``<code>gcc ... -fno-strict-aliasing</code>''.
- A strict-aliasing bug for i586 and i686 caused two assertion failures
- in <code>kpathsea</code>.</p>
- </li>
-
- <li><h3>GNU binutils 2.10</h3>
-
- <p>Bounds violations exposed:</p>
- <ul>
- <li> <code>bfd/archive.c</code>: Many calls to sprintf
- for filling fields of <code>struct ar_hdr</code> write
- a NUL-terminator one beyond the end of the field.</li>
- <li> <code>ld/ldlang.c</code>: Missing prototype for
- <code>walk_wild.c</code> caused callback pointer to be erroneously
- treated as bounded. GCC should be fixed to handle this, since
- the non-prototype definition of <code>walk_wild</code> appeared
- before its use.</li>
- </ul>
-
- <p>100% of the test suite passes after fixing the bug listed above.</p>
- </li>
-
- <li><h3>GCC at CVS tag <code>bounded-pointers-ss-20000730</code></h3>
-
- <p>Preliminary results are described at <a href="#bootbpgcc">Bootstrap
- GCC with Full Bounds Checks</a>. Later, I'll turn the bounds-checked
- gcc loose on a recent release, such as <code>gcc-2.95.2</code> and
- see if any bounds violations occur.</p>
- </li>
-
- <li><h3>GNU fileutils 4.0y</h3>
-
- <p>Bounds violations exposed:</p>
- <ul>
- <li> <code>regex.c</code>: <a href="patch-regex.txt">This patch
- was required.</a></li>
- </ul>
-
- <p>Testing and fixing is in progress...</p>
- </li>
-
- <li><h3>GNU id-utils 3.2d</h3>
-
- <p>Bounds violations exposed:</p>
-
- <ul>
- <li> <code>regex.c</code>: <a href="patch-regex.txt">This patch
- was required.</a></li>
- <li> <code>mkid</code>: <code>assert_hits</code> read past the
- beginning of an array.</li>
- </ul>
-
- <p>Testing and fixing is in progress...</p>
- </li>
-
- </ul>
-
- <h2><a name="knownbugs">Known Bugs</a></h2>
-
- <p>Here is a list of bugs known to exist for bounded pointer mode in
- GCC and in the GNU C library, as well as some commonly found problems
- in applications:</p>
-
- <ul>
-
- <li><p>GCC generates bad bounds-checking code causing spurious bounds
- violations in <code>nss_parse_service_list</code>, which is used
- internally by the GNU C library's name-service switch. The cause is
- unknown.</p></li>
-
- <li><p>GCC generates bad bounds-checking code causing spurious bounds
- violations in <code>canonicalize</code>, which is used internally by
- the GNU C library's character-set conversion code. The cause is
- unknown.</p></li>
-
- <li><p>Programs that use their own version of GNU <code>regex.c</code>
- are missing some special handling for BPs in
- <code>EXTEND_BUFFER</code>. <a href="patch-regex.txt">Get the
- patch for <code>regex.c</code> from here.</a></p></li>
-
- <li><p>Threaded applications with <code>linuxthreads</code> are
- unusable in BP mode. So far, <code>gdb</code> has been useless for
- debugging these, so this will take some time and head-scratching to
- resolve.</p></li>
-
- </ul>
-
- <h2><a name="porting">How to Port to a new CPU</a></h2>
-
- <p>Most of the bounded pointers implementation is machine independent,
- both in GCC and in the C library. These are the machine-dependent
- parts:</p>
-
- <ul>
-
- <li><h3>Conditional Traps in GCC Machine Description</h3>
-
- <p>Bounded pointers depend on conditional trap patterns being defined
- in the machine description. Some machine descriptions already have
- them, namely SPARC, rs6000 (PowerPC), and m68k. All of
- these have machine instructions that implement conditional traps with
- one instruction. Beginning with ISA-II, MIPS has conditional trap
- instructions as well, but its GCC machine description so far lacks
- them. Intel x86 has no conditional trap instructions, but I defined
- conditional trap patterns that expand to primitive instructions to
- test and conditionally jump around an ``<code>int 5</code>''
- instruction. If the CPU you wish to support has no conditional trap
- instructions, you should define pseudo conditional traps as I have
- done for x86.</p>
-
- <p>Conditional traps are important for the sake of optimization.
- Without them, GCC would need to emit conditional branches as RTL,
- whose presence would artificially partition basic blocks and inhibit
- other optimizations. Also conditional trap RTL expressions are
- readily identifiable and thus more conveniently checked for
- redundancies that can be eliminated.</p>
- </li>
-
- <li><h3>Assembler Language Functions in GNU C Library</h3>
-
- <p>Most CPUs on which the GNU C library runs define some functions in
- assembler which have pointers in their signatures. Some are coded in
- assembler because they are performance critical, such as the memory
- and string functions (<code>bcopy</code>, <code>bzero</code>,
- <code>memcpy</code>, <code>memcmp</code>, <code>memset</code>,
- <code>strchr</code>, <code>strcpy</code>, <code>strcmp</code>,
- <code>strlen</code>, <code>strtok</code>, etc), primitives for
- multi-precision arithmetic (<code>add_n</code>, <code>addmul_1</code>,
- <code>mul_1</code>, <code>sub_n</code>, <code>submul_1</code>,
- <code>lshift</code>, <code>rshift</code>), and primitives for
- floating-point math (<code>frexp</code>, <code>frexpf</code>,
- <code>freexpl</code>, <code>remquo</code>, <code>remquof</code>,
- <code>remquol</code>, <code>sincos</code>, <code>sincosf</code>,
- <code>sincosl</code>). Some are coded in assembler because they have
- special semantics that can't be achieved with plain C, namely
- <code>setjmp</code> and <code>longjmp</code>. Finally, some have
- special interfaces to the kernel or C runtime, namely
- <code>brk</code>, <code>clone</code> and the startup code.</p>
-
- <p>The assembler functions need to conditionally compile in BP and
- non-BP modes. In BP mode, they must accommodate the calling
- convention where pointer arguments and return value are structs
- passed by value, and they must check the bounds of their arguments.
- The best way to proceed is to study what's already been done for Intel
- x86, a CISC target, and for PowerPC, a RISC target.</p>
- </li>
-
- <li><h3>Startup Functions in GNU C Library</h3>
-
- <p>Again, the best way to proceed is to study what's already been
- done for Intel x86 and (soon) for PowerPC.</p>
- </li>
-
- </ul>
-
- <h2><a name="gccdetails">GCC Implementation Details</a></h2>
-
- <p>Sorry, nothing yet... This stuff properly belongs in either the
- GCC manual or the GCC ``Internal Representation'' document.</p>
-
- <hr />
-
- <address>Greg McGary,
- <a href="mailto:greg@mcgary.org">greg@mcgary.org</a>
- </address>
-
- </body>
- </html>
--- 0 ----
Index: bp/patch-gdb.txt
===================================================================
RCS file: bp/patch-gdb.txt
diff -N bp/patch-gdb.txt
*** bp/patch-gdb.txt 4 Aug 2000 00:51:29 -0000 1.1
--- /dev/null 1 Jan 1970 00:00:00 -0000
***************
*** 1,21 ****
- diff -p -u partial-stab.h.~1~ partial-stab.h
- --- partial-stab.h.~1~ Tue Mar 28 10:44:53 2000
- +++ partial-stab.h Mon Jul 31 16:59:25 2000
- @@ -401,7 +401,7 @@ switch (CUR_SYMBOL_TYPE)
- function relative stabs, or the address of the function's
- end for old style stabs. */
- valu = CUR_SYMBOL_VALUE + last_function_start;
- - if (pst->texthigh == 0 || valu > pst->texthigh)
- + if (pst && (pst->texthigh == 0 || valu > pst->texthigh))
- pst->texthigh = valu;
- break;
- }
- @@ -647,7 +647,7 @@ switch (CUR_SYMBOL_TYPE)
- use the address of this function as the low bound for
- the partial symbol table. */
- if (textlow_not_set
- - || (CUR_SYMBOL_VALUE < pst->textlow
- + || (pst && CUR_SYMBOL_VALUE < pst->textlow
- && CUR_SYMBOL_VALUE
- != ANOFFSET (objfile->section_offsets, SECT_OFF_TEXT)))
- {
--- 0 ----
Index: bp/patch-ld.txt
===================================================================
RCS file: bp/patch-ld.txt
diff -N bp/patch-ld.txt
*** bp/patch-ld.txt 4 Aug 2000 00:51:29 -0000 1.1
--- /dev/null 1 Jan 1970 00:00:00 -0000
***************
*** 1,51 ****
- diff -p -u ldlang.c.~1~ ldlang.c
- --- ldlang.c.~1~ Mon Feb 21 05:01:27 2000
- +++ ldlang.c Mon Jul 31 17:03:36 2000
- @@ -135,6 +135,7 @@ static void ignore_bfd_errors PARAMS ((c
- static void lang_check PARAMS ((void));
- static void lang_common PARAMS ((void));
- static boolean lang_one_common PARAMS ((struct bfd_link_hash_entry *, PTR));
- +static void lang_one_common_high_bound PARAMS ((struct bfd_link_hash_entry *, bfd_vma));
- static void lang_place_orphans PARAMS ((void));
- static int topower PARAMS ((int));
- static void lang_set_startof PARAMS ((void));
- @@ -3520,6 +3521,9 @@ lang_one_common (h, info)
- h->u.def.section = section;
- h->u.def.value = section->_cooked_size;
-
- + /* Synthesize a definition for <symname>.high_bound. */
- + lang_one_common_high_bound (h, size);
- +
- /* Increase the size of the section. */
- section->_cooked_size += size;
-
- @@ -3576,6 +3580,29 @@ lang_one_common (h, info)
- }
-
- return true;
- +}
- +
- +/*
- +Synthesize a definition for the symbol <symname>.high_bound,
- +which might be needed if the user has enabled bounds checking.
- +*/
- +
- +static void
- +lang_one_common_high_bound (h, size)
- + struct bfd_link_hash_entry *h;
- + bfd_vma size;
- +{
- + static char extsuff[] = ".high_bound";
- + char *extname = xmalloc (sizeof (extsuff) + strlen (h->root.string));
- + struct bfd_link_hash_entry *exth;
- + sprintf (extname, "%s%s", h->root.string, extsuff);
- + exth = bfd_link_hash_lookup (link_info.hash, extname, false, false, true);
- + if (exth)
- + {
- + exth->type = bfd_link_hash_defined;
- + exth->u.def.section = h->u.def.section;
- + exth->u.def.value = h->u.def.value + size;
- + }
- }
-
- /*
--- 0 ----
Index: bp/patch-regex.txt
===================================================================
RCS file: bp/patch-regex.txt
diff -N bp/patch-regex.txt
*** bp/patch-regex.txt 4 Aug 2000 00:51:29 -0000 1.1
--- /dev/null 1 Jan 1970 00:00:00 -0000
***************
*** 1,53 ****
- diff -p -Bw -u regex.c.~1~ regex.c
- --- regex.c.~1~ Wed Jun 7 01:46:52 2000
- +++ regex.c Thu Aug 3 14:30:00 2000
- @@ -1576,6 +1576,26 @@ static reg_errcode_t compile_range _RE_A
- reset the pointers that pointed into the old block to point to the
- correct places in the new one. If extending the buffer results in it
- being larger than MAX_BUF_SIZE, then flag memory exhausted. */
- +#if __BOUNDED_POINTERS__
- +# define SET_HIGH_BOUND(P) (__ptrhigh (P) = __ptrlow (P) + bufp->allocated)
- +# define MOVE_BUFFER_POINTER(P) \
- + (__ptrlow (P) += incr, SET_HIGH_BOUND (P), __ptrvalue (P) += incr)
- +# define ELSE_EXTEND_BUFFER_HIGH_BOUND \
- + else \
- + { \
- + SET_HIGH_BOUND (b); \
- + SET_HIGH_BOUND (begalt); \
- + if (fixup_alt_jump) \
- + SET_HIGH_BOUND (fixup_alt_jump); \
- + if (laststart) \
- + SET_HIGH_BOUND (laststart); \
- + if (pending_exact) \
- + SET_HIGH_BOUND (pending_exact); \
- + }
- +#else
- +# define MOVE_BUFFER_POINTER(P) (P) += incr
- +# define ELSE_EXTEND_BUFFER_HIGH_BOUND
- +#endif
- #define EXTEND_BUFFER() \
- do { \
- unsigned char *old_buffer = bufp->buffer; \
- @@ -1590,15 +1610,17 @@ static reg_errcode_t compile_range _RE_A
- /* If the buffer moved, move all the pointers into it. */ \
- if (old_buffer != bufp->buffer) \
- { \
- - b = (b - old_buffer) + bufp->buffer; \
- - begalt = (begalt - old_buffer) + bufp->buffer; \
- + int incr = bufp->buffer - old_buffer; \
- + MOVE_BUFFER_POINTER (b); \
- + MOVE_BUFFER_POINTER (begalt); \
- if (fixup_alt_jump) \
- - fixup_alt_jump = (fixup_alt_jump - old_buffer) + bufp->buffer;\
- + MOVE_BUFFER_POINTER (fixup_alt_jump); \
- if (laststart) \
- - laststart = (laststart - old_buffer) + bufp->buffer; \
- + MOVE_BUFFER_POINTER (laststart); \
- if (pending_exact) \
- - pending_exact = (pending_exact - old_buffer) + bufp->buffer; \
- + MOVE_BUFFER_POINTER (pending_exact); \
- } \
- + ELSE_EXTEND_BUFFER_HIGH_BOUND \
- } while (0)
-
-
--- 0 ----