Bug reduction instructions

Wolfgang Bangerth bangerth@ticam.utexas.edu
Fri Jan 10 21:12:00 GMT 2003


Janis asked me to write up something about how to reduce testcases to 
something small, which I did with feedback from Christian Ehrhardt and 
Volker Reichelt. Janis significantly straightened it and converted it to 
HTML.

Comments?

W.


2001-01-10 Wolfgang Bangerth <bangerth@ticam.utexas.edu>
           Janis Johnson <janis187@us.ibm.com>
	* minimize-howto.html: New file


--- empty	Fri Dec 13 17:01:24 2002
+++ minimize-howto.html	Fri Jan 10 12:23:31 2003
@@ -0,0 +1,208 @@
+<html>
+
+<head>
+<title>How to Minimize Test Cases for Bugs</title>
+</head>
+
+<body>
+
+<h1>How to Minimize Test Cases for Bugs</h1>
+
+<p>In order for a GCC developer to fix a bug, the bug must be
+reproducible by means of a self-contained test case. Our
+<a href="http://gcc.gnu.org/bugs.html">bug reporting instructions</a>
+ask that a bug report include the preprocessed version of the file that
+triggers the bug.  Often this file is very large; there are several
+reasons for making it as small as possible:</p>
+
+<ul>
+  <li>If source code in the original file or in the included header
+  files is proprietary, then the submitter might need to cut it down as
+  much as possible before making it public.</li>
+
+  <li>If the bug is demonstrated by executing the test case, it's
+  more likely to be tried if the test can be compiled and run on common
+  platforms, not just on the platform used by the PR submitter.  This
+  can't be done if there are dependencies from system header files.</li>
+
+  <li>If the problem is difficult to debug, then the developer who is
+  fixing the bug might want the test case to be as small as possible
+  to make debugging easier.</li>
+
+  <li>GCC developers are more likely to tackle bug reports that have
+  small, portable test cases.  Often there are GCC volunteers who help
+  out by cutting down large test cases.</li>
+
+  <li>A minimized test case can be used as the regression test for the
+  fix in the GCC test suites.</li>
+</ul>
+
+<p>Experience shows that most testcases can be reduced to fewer than
+30 lines, and often fewer than ten lines</p>
+
+<h2>Direct approach</h2>
+
+<p>For a bogus error or warning message from the compiler or for an
+internal compiler error that reports a line number from the source
+file being compiled, it's often possible to write a very small test
+case based on the code at the location reported in the messages.
+Such a test case doesn't need to include any header files; if it must
+call a library function it can specify the declaration for that
+function directly (this includes <code>printf</code>).</p>
+
+<p>A bug that is suspected to be due to a problem in a GCC runtime
+library can often be demonstrated with a small test case that calls
+that function, either directly or indirectly.  For a bug in the
+standard C++ library, the test case can include library header files.
+</p>
+
+<p>Always try this direct approach before resorting to the brute-force
+method of minimizing a large test case.</p>
+
+<h2>Brute force approach</h2>
+
+<p>This brute force approach gets easier with experience.  The first
+time it can take an hour or two, but people who have done it many times
+report getting it down to 20 minutes or so per test case.</p>
+
+<p>After each attempt to shorten the test case, check whether the bug
+still is evident.  If not, back up to the previous version that
+demonstrated the bug.  There are two basic approaches to this:</p>
+
+<ul>
+  <li>Delete chunks of code until one makes the bug go away, then
+  back up (using your editor's <code>undo</code> command) and try
+  deleting something else instead; this is the preferred method of
+  the 20-minute guys.</li>
+  <li>Keep a copy of the last version of the file that demonstrated
+  the bug.  Surround chunks of code with <code>#if 0</code> and
+  <code>#endif</code> rather than actually deleting it at first.</li>
+</ul>
+
+<p>If you're reporting the bug, you've got the choice of starting
+with either the original source files or with a preprocessed file.
+Even given that choice, some people (including the 20-minute guys)
+prefer to start with preprocessed source.</p>
+
+<h3>Stripping the original code</h3>
+
+<p>Copy the original source file, plus header files that you might need
+to modify, to a separate location where you can duplicate the failure.
+Each of these steps is an attempt to minimize the test case; some will
+be successful, some will not.  If something is unsuccessful, undo it
+and move on to a different step.</p>
+
+<ul>
+  <li>Delete all functions that come after the function in which the
+  bug happens.
+
+  <ul>
+    <li>If the bug happens within an inline function, make sure that
+    the function is actually used (write a dummy function that calls it)
+    or remove the <code>inline</code> directive.</li>
+
+    <li>If the bug happens in a template function, it must be
+    instantiated.  You can do this with an explicit instantiation.</li>
+  </ul></li>
+
+  <li>Strip as much code as possible from the function in which the
+  problem happens.  It's not important if the remaining code doesn't
+  make any sense, it only needs to trigger the bug.
+
+  <ul>
+    <li>Remove definitions of variables that are no longer used.</li>
+    <li>Replace variables of user-defined types (structs, classes) with
+    builtin types (<code>int</code>, <code>double</code>).</li>
+    <li>Replace typedefs with primitive types, or with the types they
+    reference.</li>
+    <li>In C++, remove I/O statements.  This will allow you to remove
+    inclusion of iostream headers, saving thousands of lines in the
+    preprocessed version of the source file.  If you need output to
+    demonstrate the bug, use <code>printf</code>, and specify its
+    declaration directly: <code>extern "C" int printf(const char *, ...);
+    </code>.</li>
+  </ul></li>
+
+  <li>Delete functions that come before the function that causes
+  the bug.  If they are referenced later, replace their definitions with
+  declarations.</li>
+
+  <li>Rearrange the order of <code>#include</code> directives so that
+  system headers come first; this can make the rest of the process
+  simpler.</li>
+
+  <li>Remove <code>#include</code> directives, or replace them with
+  the code that is needed from them in order to get the file to
+  continue to compile; this will help when you start cutting down the
+  preprocessed file.</li>
+
+  <li>Replace C++ templates with regular functions and classes.  If
+  that doesn't work, get rid of as many template arguments as possible.
+  Don't try this too early; it involves changing many places if there
+  are still too many references to the template.</li>
+
+  <li>Iterate through the following:
+    <ul>
+      <li>Remove class and struct members that are no longer referenced.</li>
+
+      <li>Remove definitions for classes, structs, and other data types
+      that are no longer used.</li>
+
+      <li>Remove includes of header files that are no longer needed.</li>
+    </ul>
+  </li>
+
+</ul>
+
+<p>The file in which the problem happens will now usually be down to
+20 or 30 lines, plus the necessary include directives.
+Repeat these steps with the header files of your project that are
+still included in the file that shows the problem.
+To reduce the number of files you are working on, you can directly
+include the content of the header file into the source file you are
+working on.</p>
+
+<p>Prepare for the next step by running the compiler with
+<code>-save-temps</code> to get the preprocessed source file.</p>
+
+<h2>Stripping preprocessed sources</h2>
+
+<p>The preprocessed file contains lines starting with <code>#</code>
+that tell the compiler the original file and line number of each line
+of code.  You'll want to get rid of these lines so the compiler will
+report the location within the preprecessed file.  For a preprocessed
+file called <code>bug.ii</code>, any of the following will do this:</p>
+
+<ul>
+  <li><code>perl -pi -e 's/^#.*\n//g;' bug.ii</code></li>
+  <li><code>sed 's/^#/d' bug.ii > bug2.ii</code></li>
+  <li>within <code>vi</code>: <code>:g/^#/d</code></li>
+</ul>
+
+<p>The preprocessed sources will now consist largely of header files.
+Follow the same steps as for stripping the original code, with these
+additional techniques.</p>
+
+<ul>
+  <li>If you recognize the boundaries of included files even after
+  preprocessing then you can delete whole header files at once.</li>
+
+  <li>There will be large stretches of typedefs from system headers
+  and, for C++, things enclosed in namespace <code>std</code>.  Delete
+  large chunks working from the bottom up, e.g. whole extern "C" blocks,
+  or blocks enclosed in <code>namespace XXX {...}</code>; make sure you
+  only delete pairs of braces, otherwise you'll get errors at the end
+  of the file.</li>
+
+  <li>If you get a new compiler error due to a missing definition for
+  a function or type that you deleted, replace the whole deleted chunk
+  with only the missing declaration, or modify the code to no longer
+  use it.</li>
+</ul>
+
+<p>At this stage, you will be able to delete chunks of hundreds or even
+thousands of lines at a time, and you will quickly be able to reduce the
+preprocessed sources to something small.</p>
+
+</body>
+</html>



More information about the Gcc-patches mailing list