This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

projects/beginner.html


Here's the projects-for-beginners web page, all ready to drop into
place.  I don't know where is the best place to link to it, though.

Any suggestions for additions, clarifications, etc. would be
appreciated.  In a couple places there are XXX comments; I'd
particularly like help with those.

I validated this as XHTML 1.0 "strict" except for the align on the
<h1> tag.  Putting in all the close tags was tedious, but probably
worth it.  I also verified it looks sensible in Mozilla 0.7, Netscape
4.7, and Lynx.

Does anyone know how to do blue bars to the left of text without CSS?

zw

<html>

<head>
<title>Simple GCC projects</title>
</head>

<body>
<h1 align="center">Simple GCC projects</h1>

<p>This page lists projects which are feasible for people who aren't
intimately familiar with GCC's internals.  Many of them are things
which would be extremely helpful if they got done, but the core team
never seems to get around to them.  They're all busy wrestling with
the problems that <em>do</em> require deep familiarity with the
internals.  We hope this will make it easier for more people to assist
the GCC project, by giving new developers places to jump in.</p>

<p>Most of these projects require a reasonable amount of experience
with C and the Unix programming environment.  Do not despair if any
individual task seems daunting; there's probably an easier one.  If
you have <em>no</em> programming skills, we can still use your help
with documentation or the bug database.  See below.</p>

<p>We assume that you already know how to <a href="../cvs.html">get the
latest sources</a>, <a href="../install/configure.html">configure</a> and
<a href="../install/build.html">build</a> the compiler, and <a
href="../install/test.html">run the test suite</a>.  You should also
familiarize yourself with the <a href="../contribute.html">requirements
for contributions</a> to GCC.</p>

<p>Many of these projects will require at least a reading knowledge of
GCC's intermediate language, <a href="../onlinedocs/gcc_17.html">RTL</a>.
It may help to understand the higher-level <code>tree</code> structure as
well.  Unfortunately, for this we only have an <a
href="../onlinedocs/c-tree_toc.html">incomplete, C/C++ specific manual</a>.</p>

<h2>Bug patrol</h2>

<p>These projects all have to do with bugs in the compiler, and our
test suite which is supposed to make sure no bugs come back.</p>

<ul>
<li>Analyze failing test cases.

<p>Pick a test case which fails (expected or unexpected) with the
present compiler, and try to figure out what's going wrong.  For
internal compiler errors ("ICEs") often you can find the problem by
running <code>cc1</code> under the debugger.  Set a breakpoint on
<code>fancy_abort</code> (this happens automatically if you work in
your build directory).  When gdb stops, go up the stack to the
function that called <code>fancy_abort</code>.  It will have just
performed some sort of consistency check, which failed.  Normally this
check will be visible right there.  (If the ICE prints "Tree check:"
or "RTL check:" before the usual message, the check is hiding in the
accessor macros.)  Examine the data structure that was checked.  Walk
back in time and figure out when it got messed up.</p>

<p>There are a large number of routines which you can call from the
debugger, to display internal data in readable form.  Their names all
begin with "<samp>debug_</samp>".  The most useful ones are
<code>debug_tree</code> for printing tree structures,
<code>debug_rtx</code> for printing chunks of RTL, and
<code>debug_bb</code> and <code>debug_bb_n</code> for printing out
basic block information.</p>

<p>If the problem is that the compiler generates incorrect code, the
place to start is the RTL debugging dumps.  Run the compiler with the
<samp>-da</samp> switch.  This will generate twenty or so debug dumps,
one after each pass.  Read through them in order (they are numbered).
The code should start off correct, but then become erroneous.  When
you find the mistake, enter the debugger, set a breakpoint on the pass
that made the mistake, and watch what it does.  You can find out the
name of the entry point for each pass by reading through
<code>rest_of_compilation</code> in <code>toplev.c</code>.</p>
</li>

<li>Get rid of <code>testsuite/gcc.misc-tests</code> and
<code>testsuite/g++.dg/special.</code>

<p>These are a handful of tests each that aren't handled by the normal
test sequence.  We'd like to get rid of the special case framework.
We <em>think</em> that they're only done this way for historical
reasons, but we aren't sure.  Most of the work would be figuring out
what's going on in those directories.  You'll need some understanding
of Expect, TCL, and the DejaGNU test harness.</p>
</li>

<li>Cross-reference all the tests and find all the duplicates.

<p>It's likely that the same test has been added more than once, over
the years.  You'd need to figure out a sensible definition of "the
same test" that can be checked mechanically, then write a program that
does that check, and run it against the entire test suite.</p>
</li>

<li>Set up more autobuilders.

<p>We already have two autobuilders: Geoff Keating's <a
href="http://www.cygnus.com/~geoffk/gcc-regression/">gcc-regression</a>,
and the netwinder.org <a
href="http://www.netwinder.org/build/gcc.html">AutoBuild system</a>.
Geoff's system is better known, because it nags people in e-mail when
they break the tree.</p>

<p>We'd like to have a similar setup for other platforms.  At the
least we should have one for each of the primary evaluation platforms
listed in the <a href="../gcc-3.0/criteria.html">criteria for GCC
3.0</a>.  However, the more the better.</p>

<p>It would be nice if at least one of these platforms were beefy
enough that it could run with RTL consistency checks enabled.  This
slows the compiler down by an order of magnitude, but has found plenty
of bugs in the past.</p>

<p>In addition to the existing systems, you should look at the <a
href="http://www.mozilla.org/tinderbox.html">Tinderbox</a> system
developed for the Mozilla project.</p>

<p>Kaveh Ghazi &lt;<a
href="mailto:ghazi@caip.rutgers.edu">ghazi@caip.rutgers.edu</a>&gt;
suggests that the autobuilders should keep track of regressions in the
number of warnings, and bug patchers until they are fixed, just as
they do for testsuite regressions now.</p>
</li>

</ul>

<h2>General code cleanliness</h2>

<p>These are projects which will generally make it easier to work with
the source tree.</p>

<ul>
<li>Warnings patrol.

<p>Simple: build the tree, run the <code>warn_summary</code> script
(from the <code>contrib</code> directory) against your build log, then
go through the list and squelch the warnings.  In most cases this is
easy.  However, if you have any doubt about what some piece of code
does, ask.  Sometimes the proper fix is not obvious.  For example,
there are a lot of warnings about "comparison between signed and
unsigned" in a GCC build, but unless you really know what you're
doing, you should leave them alone.</p>

<p>Also, some warnings are spurious: for example, the floods
of "ISO C requires rest arguments to be used" complaints on some
platforms are technically correct but useless, because the offending
macro is <code>printf</code>.  If you can patch the part of the
compiler that issues spurious warnings, so it doesn't anymore (but
still does generate the warning where it's appropriate), we're happy
to take those patches too.</p>
</li>

<li>Find and expunge all the places where one <code>.c</code> file includes
another.

<p>In most cases this is just sloppiness, and can easily be converted
to separate compilation of both files, then linking the two objects
together.  There may be places where someone is trying to simulate
generic programming through the macro facility.  Discuss what should
be done with the maintainers of those files.</p>
</li>

<li>Break up enormous source files.

<p>Not terribly hard.  Watch out for file-scope globals.  Suggested
targets:</p>

<pre>
	472k java/parse.y
	440k cp/decl.c
	428k combine.c
	356k dwarf2out.c
	336k expr.c
	308k cp/pt.c
	300k loop.c
	248k cp/class.c
	244k cse.c
	240k flow.c
	232k fold-const.c
	228k c-decl.c
	224k function.c
	220k cp/typeck.c
	220k c-typeck.c
	204k dwarfout.c
</pre>

<p>There are several other files in this size range, which I have left
out because touching them at all is unwise (reload, the Fortran front
end).  You can try, but I am not responsible for any damage to your
sanity which may result.</p>
</li>

<li>Remove as much code from parser actions as possible.

<p>This goes more or less with the above.  Good existing code:</p>

<pre>
expr_no_commas:
        expr_no_commas '+' expr_no_commas
                { $$ = parser_build_binary_op ($2, $1, $3); }
</pre>

<p>Bad existing code:</p>

<pre>
cast_expr:
        '(' typename ')' cast_expr  %prec UNARY
                { tree type;
                  int SAVED_warn_strict_prototypes = warn_strict_prototypes;
                  /* This avoids warnings about unprototyped casts on
                     integers.  E.g. "#define SIG_DFL (void(*)())0".  */
                  if (TREE_CODE ($4) == INTEGER_CST)
                    warn_strict_prototypes = 0;
                  type = groktypename ($2);
                  warn_strict_prototypes = SAVED_warn_strict_prototypes;
                  $$ = build_c_cast (type, $4); }
</pre>

<p>All the logic here should be moved into a separate function in
c-typeck.c, named something like parser_build_c_cast.  The point of
doing this is, the less code in Yacc input files, the easier it is to
rearrange the grammar and/or replace it entirely.  Also it makes it
less likely that someone will muck with action code and then forget to
rebuild the generated parser and check it in.</p>

<p>We also want to minimize the number of helper functions embedded in
the grammar file.  <code>java/parse.y</code> is a particularly bad
example, having upwards of 10,000 lines of code after the second
<code>%%</code>.</p>
</li>

<li>Break up enormous functions.

<p>This is in the same vein as the above, but significantly harder,
because you must take care not to change any semantics.  The general
idea is to extract independent chunks of code to their own functions.
Any inner block that has a half dozen local variable declarations at
its head is a good candidate.  However, watch out for places where
those local variables communicate information between iterations of
the outer loop!</p>

<p>With even greater caution, you may be able to find places where
entire blocks of code are duplicated between large functions (probably
with slight differences) and factor them out.</p>
</li>

<li>Break up enormous conditionals.

<p>Harder still, because it's unlikely that you can tell what the
conditional tests, and even less likely that you can tell if that's
what it's supposed to test.  It is definitely worth the effort if you
can hack it, though.  An example of the sort of thing we want
changed:</p>

<pre>
 if (mode1 == VOIDmode
     || GET_CODE (op0) == REG || GET_CODE (op0) == SUBREG
     || (modifier != EXPAND_CONST_ADDRESS
         &amp;&amp; modifier != EXPAND_INITIALIZER
         &amp;&amp; ((mode1 != BLKmode &amp;&amp; ! direct_load[(int) mode1]
              &amp;&amp; GET_MODE_CLASS (mode) != MODE_COMPLEX_INT
              &amp;&amp; GET_MODE_CLASS (mode) != MODE_COMPLEX_FLOAT)
             /* If the field isn't aligned enough to fetch as a memref,
                fetch it as a bit field.  */
             || (mode1 != BLKmode  
                 &amp;&amp; SLOW_UNALIGNED_ACCESS (mode1, alignment)
                 &amp;&amp; ((TYPE_ALIGN (TREE_TYPE (tem))
                      &lt; GET_MODE_ALIGNMENT (mode))
                     || (bitpos % GET_MODE_ALIGNMENT (mode) != 0)))
             /* If the type and the field are a constant size and the
                size of the type isn't the same size as the bitfield,
                we must use bitfield operations.  */
             || ((bitsize &gt;= 0
                  &amp;&amp; (TREE_CODE (TYPE_SIZE (TREE_TYPE (exp)))
                      == INTEGER_CST)
                  &amp;&amp; 0 != compare_tree_int (TYPE_SIZE (TREE_TYPE (exp)),
                                            bitsize)))))
     || (modifier != EXPAND_CONST_ADDRESS
         &amp;&amp; modifier != EXPAND_INITIALIZER
         &amp;&amp; mode == BLKmode
         &amp;&amp; SLOW_UNALIGNED_ACCESS (mode, alignment)
         &amp;&amp; (TYPE_ALIGN (type) &gt; alignment
             || bitpos % TYPE_ALIGN (type) != 0)))
   {
</pre>
</li>

<li>Verify all the object->header dependencies in the Makefiles.

<p>Mega bonus points for working out a way to do automatic dependency
generation <em>without</em> relying on features of GCC or GNU
make.  And we don't want a <samp>make dep</samp> pass if it can
possibly be avoided.</p>
</li>

<li>Figure out some way to get dependencies of source files on
<code>tm.h</code> and <code>xm-<var>host</var>.h</code> headers.

<p>Presently these dependencies are omitted entirely.  Almost
everything has to be rebuilt if you change <code>tm.h</code> or
<code>xm-<var>host</var>.h</code>, and right now the only way to do
that is rebuild from scratch.</p>
</li>

<li>Delete garbage.

<p><code>#if 0</code> blocks that have been there for years, unused
functions, unused entire files, dead configurations, dead Makefile
logic, dead RTL and tree forms, and on and on and on.  Depending on
what it is, it may not be obvious if it's garbage or not.  Go for the
easy ones first.</p>
</li>

<li>Revisit issues put off till later.

<p>Find comments of the form /* Look at this again after gcc 2.3 */,
or /* ... after <var>date</var> */ where <var>date</var> was sometime in
the last millennium, and investigate.  Analyze test cases marked XFAIL
and patch them.</p>
</li>

<li>Add multiple include guards to all internal header files.

<p>This is simple, mindless, and can not break much - but be prepared
for surprises.  See e.g. <code>cpplib.h</code> for how this is
done.</p>

<p>We do not have a consistent convention for the names of guard
macros.  If you do this, pick a sensible convention, then stick to
it.  You should <em>not</em> use the same convention that the system C
library does, to avoid conflicts.</p>
</li>

<li>Disentangle the current web of header-header interdependencies.

<p>This is a major undertaking, and you should be able to deal with
all kinds of lurking monsters.</p>

<p>At present, most of GCC's internal headers use whatever they need
without any consideration for whether or not it has been declared yet.
This forces the users of those headers to know what each one needs,
and use it explicitly.  Worse, there is no simple or even documented
relation between the source file where something is defined, and the
header where it is declared.</p>

<p>There are some horrible kludges lurking here and there.  In places
we avoid prototyping things if we haven't seen necessary typedefs, for
example.  Some things are declared in several different headers, each
used by a disjoint subset of the source.  Odds are that some of those
duplicates don't match the definition.</p>

<p>Your goals for this project:</p>

<ol>
  <li><p>It should be possible to include any header without having to
  worry about what its dependencies are; i.e. all headers should
  explicitly pull in their dependencies.  (like the standard library
  headers).</p>

  <p>As an exception, headers should not explicitly reference
  <code>config.h</code>, <code>system.h</code>, or
  <code>ansidecl.h</code>.  Nor should they reference any headers
  explicitly included by <code>system.h</code>, such as
  <code>stdio.h</code>.  They <em>should</em> reference other headers
  from libiberty or libc, where necessary.</p></li>

  <li><p>Each function, global declaration, or type definition should
  appear in exactly one header.  Forward declarations of structs and
  unions do not count.</p></li>

  <li><p>That one header should have an obvious relationship to the
  nature of the thing being declared.  It should never be necessary to
  grep the entire source tree to figure out which header you need.</p></li>

  <li><p>Each header should have the minimum possible number of
  references to other headers.  If a header describes ten routines,
  two of which require <code>rtl.h</code>, and the other eight are
  useful by themselves, then the header should be split so that they
  can be used without dragging in RTL.  Possibly the corresponding
  source file should be split to match.</p></li>
</ol>
</li>

<li>Disambiguate flags.

<p>Find all the places where one flag bit is used with several
different meanings depending what sort of tree or RTL it is in, and
give each different meaning a different accessor macro.  Augment the
tree/RTL checking macros so they verify that the accessors match the
data.</p>
</li>

<li>Get rid of all remaining uses of <code>bcopy</code>.

<p>To do this, you need to understand the surrounding code well enough
to tell whether they should be <code>memcpy</code> or
<code>memmove</code>.</p>
</li>

<li>Rename routines used by the debugging information generators, so
they do not occupy the same namespace as routines intended to be used
when debugging the compiler.

<p>Currently, if you ask gdb for a list of all the functions whose
names begin with "<samp>debug_</samp>", you get a mixed bag of
data structure dumpers and debug-info generators:</p>

<pre>
(gdb) call debug_&lt;TAB&gt;&lt;TAB&gt;
debug_args                      debug_line_section_label
debug_bb                        debug_loop
debug_bb_n                      debug_loops
debug_binfo                     debug_name
debug_bitmap                    debug_no_type_hash
debug_bitmap_file               debug_print_page_list
debug_biv                       debug_ready_list
debug_call_placeholder_verbose  debug_real
debug_candidate                 debug_regions
debug_candidates                debug_regset
debug_define                    debug_reload
debug_dependencies              debug_reload_to_stream
debug_dwarf                     debug_rli
debug_dwarf_die                 debug_rtx
debug_end_source_file           debug_rtx_count
debug_flow_info                 debug_rtx_find
debug_giv                       debug_rtx_list
debug_ignore_block              debug_rtx_range
debug_info_level                debug_sbitmap
debug_info_section_label        debug_start_source_file
debug_info_type                 debug_stderr
debug_insn                      debug_tree
debug_iv_class                  debug_type_names.2
debug_ivs                       debug_undef
</pre>

<p>It is not at all obvious which is which.  Rename functions so that
everything which is useful from the debugger has a name starting with
<samp>debug_</samp>, and nothing else does.</p>
</li>
</ul>

<h2>Port cleanliness</h2>

<p>This involves mostly bringing back ends up to date with the current
state of the art in the machine-independent code.  Many ports date
back to the 1980s and have not been actively maintained since then.
There is also work to be done in cleaning up the places where the MI
code uses machine-specific macros.</p>

<p>In addition to understanding RTL, you need to read the <a
href="../onlinedocs/gcc_18.html">machine description</a> and <a
href="../onlinedocs/gcc_19.html">target macros</a> sections of the GCC
manual.</p>

<ul>
<li>Migrate default definitions of <code>tm.h</code> macros out of
random source files into <code>defaults.h</code>.

<p>It would be a lot more work, but we might consider including
<code>defaults.h</code> <em>first</em>, have it define everything
unconditionally, then have <code>tm.h</code>'s <code>#undef</code>
whatever they need to override.</p>
</li>

<li>Remove commented-out definitions of macros and descriptions of
macros which ports do not use from all <code>tm.h</code> files.

<p>This is so that grepping for all the uses of a particular macro
will get no false positives.</p>
</li>

<li>Convert huge macros in each <code>tm.h</code> to functions in the
corresponding <code>tm.c</code>.

<p>This can be tricky when a huge macro is defined not by the general
<code>tm.h</code> for a processor, but the specific one for some
particular target triple.  The best known approach here is to set some
flag macros in the target-specific <code>tm.h</code>, then
<code>#ifdef</code> up the function in <code>tm.c</code>.  Better
ideas would be appreciated.</p>
</li>

<li>Adjust the definitions of porting macros to make the above easier.

<p>There are some macros that need a lengthy definition, and are
required to perform a <code>goto</code> to a label outside the macro
under certain conditions.  This makes moving all the logic into a
separate function difficult.  These macros should be replaced by
new macros which return a flag instead.  The goto then happens in the
code that uses the macro.</p>
</li>

<li>Convert configurations to the new style where tm.h chunks do not
include each other incestuously.

<p>Instead, <code>config.gcc</code> lists each chunk explicitly, in
order from least to most specific.</p>
<!-- XXX Can someone describe this better? -->
</li>

<li>Clean up <code>#ifdef</code> messes in <code>tm.h</code> chunks.

<p>The preferred style is: Chunks are used in order from least to most
specific.  Each chunk mentions only the macros it has specific
definitions for.  Each chunk <code>#undef</code>s any previous
definition first.  (Contrary to popular belief, it is always safe to
<code>#undef</code> a macro, whether or not it has already been
defined.)</p>
</li>

<li>Make porting macros testable at runtime.

<p>We'd like to be able to change more of the compiler's behavior at
runtime using <samp>-m</samp> switches.  To do this, regions of code
that presently read</p>

<pre>
     #ifdef MACRO
       ... code ...
     #endif
</pre>

<p>must become instead</p>

<pre>
     #ifdef MACRO
       if (MACRO)  
         ... code ...
     #endif
</pre>

<p>If possible (this depends on which macro it is) a third form is
even better: in <code>defaults.h</code></p>

<pre>
	#ifndef MACRO
	#define MACRO 0
	#endif
</pre>

<p>and then the users become simply</p>

<pre>
	  if (MACRO)
	    ... code ...
</pre>

<p>This style subjects more code to compile-time checking, so bit-rot
in obscure target-specific features is more likely to be noticed.</p>
</li>

<li>Reverse the sense of TARGET_MEM_FUNCTIONS.

<p>This macro controls which set of bulk memory operation routines are
used internally by the compiler.  The ISO C standard provides
<code>memcpy</code>, <code>memcmp</code>, and <code>memset</code>;
older BSD-derived systems have instead <code>bcopy</code>,
<code>bcmp</code>, and <code>bzero</code> (which can only set memory
to all-bits-zero).  The default in the compiler is to use the BSD
functions, but these days a target is much more likely to have the ISO
C ones.</p>

<p>Make the default be to use the standard functions, requiring the
definition of <code>TARGET_BSD_MEM_FUNCTIONS</code> to use the others.
This requires careful checking that each target keeps the same
behavior.</p>
</li>

<li>Convert text peepholes to RTL peepholes.

<p>GCC has two forms of peephole optimization: the old style that
edited the text assembly output as it was being generated, and the new
style that transforms RTL to RTL.  The new form is conceptually
cleaner and requires less gunk in the implementation.</p>

<p>The targets with text peepholes are:</p>
<pre>
  1750a arm avr c4x dsp16xx fr30 i860 i960 m32r m68hc11 m68k
  mcore mips mn10200 mn10300 ns32k pa romp rs6000 sh sparc.
</pre>
</li>

<li>Convert text prologue/epilogue generation to use expanders
instead.

<p>As with peepholes, there is an old style and a new.  The old style
uses the <code>FUNCTION_PROLOGUE</code> and <code>FUNCTION_EPILOGUE</code> 
macros, which insert text directly into the output.  The new style
uses the <code>prologue</code> and <code>epilogue</code> named
expanders to generate RTL.</p>

<p>The situation here is a bit weird.  Targets which only have
<code>FUNCTION_PROLOGUE/EPILOGUE</code> in <code>tm.h</code> are:</p>
<pre>
  1750a a29k arc avr clipper dsp16xx elxsi h8300 i370 i860 i960
  m68k ns32k pdp11 romp vax
</pre>
<p>Targets which only have <code>prologue</code> and <code>epilogue</code>
named expanders are:</p>
<pre>
  alpha c4x fr30 m68hc11 mcore mn10200 mn10300 pj sh
</pre>
<p>Targets which have <em>both</em> are:</p>
<pre>
  arm convex d30v i386 ia64 m32r m88k mips pa rs6000 sparc
</pre>
<p>I'd suggest starting with the targets that have both.</p>
</li>

<li>Find magic numbers in <samp>.md</samp> files and make them use
<code>define_constants</code> instead.

<p><code>define_constants</code> is brand new, so few targets know
about it.  It is most useful for things like fixed register numbers.
Constants defined with it are also visible to C code via the
<code>insn-codes.h</code> header.</p>
</li>

<li>Correct all warnings and errors emitted by <code>gen*.c</code> in
the course of a bootstrap.

<p>This may require pretty detailed knowledge of the way machine
definition files are supposed to be written, unfortunately.  For the
more exotic targets, you can usually start by building a
cross-compiler from whatever you have to &lt;processor&gt;-unknown-none.  It
doesn't have to <em>work</em>, just build far enough to run the MD
generators.</p>
</li>

<li>Remove all ad hoc <code>__attribute__</code> parsers.

<p>Some machine-specific attributes use their own personal routines to
detect both the unqualified and the underscore-surrounded forms of the
attribute name.  All of them should be changed to use
<code>is_attribute_p</code> instead.</p>

<p>Consider also making the adjustments described in the comment above
the definition of <code>is_attribute_p</code>: caller is required to
state the unqualified form of the name, not the underscored form; all
internal attribute lists remember the unqualified form, no matter what
was used in the code.</p>
</li>

<li>Convert md files that use <code>(cc0)</code> so they don't anymore.

<p>This is hard, but would be a great improvement to the compiler if
it were done for all existing targets.  The basic idea is that</p>

<pre>
(insn ### {cmpsi} (set (cc0) (compare (reg:SI A) (reg:SI B))))
(insn ### {bgt} (set (pc) (if_then_else
                        (gt (cc0) (const_int 0))
                        (label_ref 23)
                        (pc)))
</pre>

<p>becomes</p>

<pre>
(insn ### {bsicc} (set (pc) (if_then_else
                        (gt:SI (reg:SI A) (reg:SI B))
                        (label_ref 23)
                        (bc)))
</pre>

<p>Unfortunately, the technique is very poorly documented and may need
extending to other conditional operations (setcc, movcc) as well.
Contact Bernd Schmidt &lt;<a href="bernds@redhat.com">bernds@redhat.com</a>&gt;
before beginning any work on this.</p>
</li>

<li>Find hooks in the machine-independent code which aren't used by
any target anymore, and remove them.

<p>Right now there probably aren't too many of these, but there will
be once some of the above projects get rolling.</p>
</li>
</ul>

<h2>Configuration and Makefiles</h2>

<p>This largely consists of the same sort of thing as the above, but
for per-host configuration instead of per-target.  You will need to
understand autoconf, or Make, to do these projects.</p>

<ul>

<li>Find places that are still using obsolete system-category macros
(<code>USG</code>, <code>POSIX</code>, etc) and autoconfiscate them.

<p><code>tsystem.h</code> uses <code>USG</code> and a couple others to
know if it can safely include <code>string.h</code> and
<code>time.h</code>.  As both of them are required by C99, we should
just synthesize them and include them unconditionally.  (fixproto
already does this for <code>stdlib.h</code> and several others.)</p>

<p>The real mess is in the debug info generators.</p>
</li>

<li>Get rid of build-make and cross-make.

<p>These do things that are properly autoconf's job.</p>
</li>

<li>Fix the Makefile so it doesn't confuse the build and host systems
anymore.

<p>This should be search-and-replace, but you need to understand the
distinction.  GCC needs to know about the machine it is being
<dfn>built</dfn> on, the machine it will <dfn>run</dfn> on, and the
machine it will <dfn>generate code</dfn> for.  In a normal "native"
build, these are all the same.  A generic cross-compiler has a
different target than its host, but the build machine is the same as
the host.  And in a "Canadian cross" build, they are all different.</p>

<p>Autoconf knows about this sort of thing.  It calls the three
machines the <dfn>build</dfn>, <dfn>host</dfn>, and <dfn>target</dfn>
respectively.  GCC's Makefile also knows about this, but it
pervasively refers to the build machine as the host.  This is
confusing.  The Makefile should be changed to match Autoconf's
convention.</p>
</li>

<li>Clean up the configure script.

<p>The horrible tests for assembler features particularly need to die,
but there are plenty of other atrocities.  If you want a relatively
easy one, find all the places that use <samp>test -a</samp> or
<samp>-o</samp>, and make them use <samp>&amp;&amp;</samp> or
<samp>||</samp> instead.</p>

<p>Check out the prereleases of autoconf 2.50 and see if they will
help any.  Odds are they will.  If they have broken something we
depend on, let the autoconf maintainers know.</p>

<p>Feed back gcc-private autoconf macros to the autoconf maintainers.
We have several that would be widely useful, such as
<code>GCC_NEED_DECLARATIONS</code> and the <code>mmap</code> tests.</p>
</li>

<li>Run fixincludes and fixproto on all targets.  Eliminate the
exotic fixincludes scripts used on some targets.

<p>We want all targets' headers to be handled the same way.  The
existing practice causes hard-to-find bugs which only manifest on
platforms that are unpopular, so they never get fixed.</p>
</li>

<li>Get as much as possible out of the <code>t-<var>target</var></code>
Makefile fragments.

<p>It's unlikely that these can be eliminated entirely, since we have
no way of testing the features of a <var>target</var> when we are still
constructing its cross-compiler.  However, there is a lot of obsolete
cruft in them.  Start by expunging all remaining traces of
libgcc1.</p>

<p>There are also things in there that should be handled by
fixincludes and fixproto, such as INSTALL_ASSERT_H and the corresponding
Makefile magic.</p>

<p>Note that targets do not need to supply a
<code>t-<var>target</var></code> fragment, if it has nothing to do.
Empty fragments can be deleted and all references to them nuked from
<code>config.gcc</code>.</p>
</li>

<li>Get as much out of the <code>x-<var>host</var></code> fragments and
<code>xm-<var>host</var>.h</code> headers into autoconf tests,
<code>system.h</code>, etc., as possible.

<p>I am fairly sure that all of these files can be eliminated
completely, and their infrastructure done away with.  Information in
them is in six categories:</p>

<ol>
  <li><p>Historical dead wood: definitions of macros or Make variables
      that are no longer used for anything, definitions that are
      invariably overridden by something else, etc.  Some files contain
      only comments!</p></li>

  <li><p>Things that belong in <code>system.h</code> or
      <code>ansidecl.h</code>, such as definitions of
      <code>TRUE</code>.</p></li>

  <li><p>Things that belong in a <code>tm.h</code> or
      <code>t-<var>target</var></code> file.  E.g. <code>x-linux</code>
      has no business saying not to run fixproto,
      <code>xm-interix.h</code> has no business specifying how to run
      global constructors.</p></li>

  <li><p>System category assertions, which should be replaced by feature
      checks, but we have to do work in machine-independent code
      first.</p></li>

  <li><p>Feature assertions, which should be replaced by autoconf
      probes.  Some of these are there because at the time they were
      written, autoconf couldn't detect whatever it was.  Note that
      all the autoconf tests have to work when the compiler is itself
      being cross-compiled (with exceptions when we can do graceful
      degradation, e.g. the mmap tests).  Others are there because the
      autoconf test for the feature in question breaks in the presence
      of a buggy host compiler and/or library.</p>

      <p>In principle there is no reason why all of the feature
      assertions can't be replaced by autoconf probes, with sufficient
      cleverness.  The hardest ones will probably be
      <code>{SUCCESS,FATAL}_EXIT_CODE</code>.  Note that autoconf 2.50
      has sufficient tricks up its sleeve to do
      <code>HOST_BITS_PER_*</code> even when cross compiling.</p></li>

  <li><p>Information on how to deal with file systems which are not
      Unix-y.  For instance, definitions of
      <code>PATH_SEPARATOR(_2)</code> and/or
      <code>HAVE_DOS_BASED_FILE_SYSTEM</code>, a complete override of
      <code>INCLUDE_DEFAULTS</code> for VMS, etc.</p>

      <p>This stuff is harder to deal with than the others.  For DOS,
      we could restructure the machine-independent code so there was
      just one switch, namely <code>HAVE_DOS_BASED_FILE_SYSTEM</code>,
      and autoconf could set that based on the host machine name.  We
      probably want to go in that direction anyway.  See "Library
      infrastructure," below.</p>

      <p>I don't know what to do about VMS.  It is utterly different,
      although I'm told the system libraries mask a lot of the
      differences these days.  I would be very surprised if GCC
      actually builds on <samp>{alpha,vax}-dec-*vms*</samp> right now.</p></li>
</ol>
</li>

<li>Move the bootstrap logic up to the top level Makefile.  Cause
libiberty to be 3-staged as well as the gcc directory.  Cause a blind
"make" from the top level to do the Right Thing for native as well as
cross compiles.

<p>This may be too big for anyone other than a Make expert to attempt,
but if done it would be immensely useful.</p>
</li>

<li>Autoconfiscate the top level of the directory tree.

<p>The top level is handled by a strange beast known as "Cygnus
configure," which is understood by very few people.  If it were
replaced by an autoconf script it would be much easier to work
with.</p>

<p>Unfortunately, you will need guru-level Make and shell script
skills to even attempt this.</p>
</li>
</ul>

<h2>Library infrastructure</h2>

<p>These tasks are about improving the utility routine library used by
GCC.  If you like data structures, these may be for you.</p>

<ul>
<li>Find private implementations of general data structures, and make them
use library routines instead.

<p>For example, there are hand-rolled hash tables all over the place.
Most of them should be using libiberty's <code>hashtab.c</code>
instead.  However, there are at least three places where we
deliberately use custom code for performance reasons, so be careful.</p>
</li>

<li>Write nifty pseudo-template versions of existing general data
structures to avoid abstraction penalties.

<p>This is for someone who likes working with preprocessor macros, and
can use them cleverly but still readably.  Start with
<code>hashtab.c</code> and <code>splay-tree.c</code> (both in
libiberty).</p>

<p>Once this is done, we can stop avoiding the general code in
performance-critical areas.</p>
</li>

<li>Generalize gcc-specific data structure modules and move them to
libiberty.

<p>For example: <code>[s]bitmap.c</code>, <code>lists.c</code>,
<code>stringpool.c</code>.</p>
</li>

<li>Find private workarounds for host bugs and move them to libiberty.

<p>These tend to be hiding in odd places like the config directory, or
else woven through important areas of code, e.g. the garbage
collector.</p>
</li>

<li>Extract all the code that manipulates pathnames, make sure it can
handle DOS as well as Unix style paths, and move it to libiberty.

<p><code>prefix.c</code>, <code>simplify_pathname</code> in
<code>cppfiles.c</code>, and so on.  Also, make all the DOS handling
conditional only on <code>HAVE_DOS_BASED_FILE_SYSTEM</code>, and get
rid of the <code>PATH_SEPARATOR</code> macros.</p>
</li>

<li>Get libiberty built for the build system, so we can get rid of all
kinds of cruft in the Makefile and the programs that run there.

<p>This is for someone with serious Make skills.  Talk to Kaveh
R. Ghazi &lt;<a
href="mailto:ghazi@caip.rutgers.edu">ghazi@caip.rutgers.edu</a>&gt;
first.  He says:</p>
<blockquote>
<div>Whoever wants to do this needs to be prepared to do a Canadian cross
compile as a test of whatever final patch is installed.  I have a
patch you can use as a starting point, I'd love for someone to step
forward and do this.</div>
</blockquote>
</li>

<li>Implement a macro preprocessor for <samp>.md</samp> files.

<p>It should act like the macro processor for <a
href="http://sources.redhat.com/cgen/">CGEN</a>, which also uses
RTL-ish definition files.  You can start with conditional blocks and
include files.  Remember that we already have define_constants.</p>

<p>You probably want to rip the RTL reader out of <code>rtl.c</code>
before it gets too big.  It does not have to be part of the final
compiler, only the programs that read the <samp>.md</samp> file.</p>
</li>
</ul>

<h2>Documentation</h2>

<ul>
<li>Document every RTX code and accessor macro thoroughly.</li>
<li>Ditto, every meaningful insn name.</li>
<li>Ditto, every tm.h macro.</li>
<li>Ditto, every command line switch.

<p>These may involve hunting down whoever added whichever thing it is
and torturing information out of them.</p></li>

<li>Update the porting manual.

<p>The porting manual describes what used to be the proper way to
write a GCC back end.  It is several years out of date.  Find all the
out-of-date advice for porters and replace it with correct advice.
Mark old, deprecated features as such.</p></li>

<li>Finish documenting the tree structure and the front-end interface.

<p>We've got quite a bit of this but it is scattered all over the
place.  It belongs in the official manual.  There is a <a
href="../onlinedocs/c-tree_toc.html">C/C++ specific manual</a>, and a
<a href="http://www.ncsa.uiuc.edu/~wendling/tree.html">third party,
general manual</a>.  Both of them are incomplete.  Several people have
written partial manuals on implementing new front ends: look at <a
href="http://members.wri.com/johnnyb/compilers/">The GNU Compiler
Writer's Jump Point</a> and our <a
href="../readings.html">readings list</a>.</p>

<p>Michael Dupont &lt;<a
href="mailto:michael.dupont@mciworldcom.de">michael.dupont@mciworldcom.de</a>&gt;
is working on this, so contact him first.</p></li>

<li>Roll information in external documents into the official manual.

<p>Start with the <a href="../readings.html">readings list</a> and the
secondary Texinfo documents in the source tree.</p></li>

<li>Improve user and installation documentation.

<p>Pick your favorite FAQ from the lists and roll it into the manual.
Add information on relevant standards.  Document the exact semantics
of all the extensions.  Also say what they're good for.  If they're
useless, admit it.</p></li>

<li>Read the whole manual.

<p>Become familiar with what's documented where and report or fix any
problems you see.  Then shout at anyone who sends a patch to <a
href="../ml/gcc-patches/">gcc-patches</a> without including all
relevant documentation changes.</p></li>

<li>Install all Texinfo manuals.

<p>Such as <code>objc-features.texi</code> and
<code>libstdc++-v3/porting.texi</code>.  not yet done).  You will have
to adapt the configure checks for available makeinfo to use outside the gcc
subdirectory.</p></li>

<li>Adapt the send-pr manpage in GNATS to a manpage for gccbug.

<p>Or document gccbug in the official manual, then use
<code>texi2pod</code> and <code>pod2man</code> to make a manpage out
of that.  See what's presently done for the <code>cpp</code> and
<code>gcc</code> manpages.</p></li>

<li>Give all commands a manpage.

<p>This is best done by documenting them in the Texinfo manual, then
generating the manpages via <code>texi2pod</code> etc.  That way we
only have to remember to update the documentation in one place.</p>
</li>
</ul>

<h2>User interface</h2>

<ul>

<li>Implement <samp>-std</samp> for the C++ front end.  (Also Fortran,
Java?)</li>
<li>Add a <samp>-std</samp> value equivalent to
<samp>-traditional</samp> to the C front end.</li>

<li>Fix the places where <samp>-std=c89</samp> is not the same thing
as <samp>-ansi</samp>.</li>

<li>More broadly, make more and more flags consistent across all the
front ends.</li>

<li>Implement a <samp>-Wstd</samp> switch that turns on all warning
flags useful in well-written standard-compliant code (for C,
<samp>-Wstrict-prototypes -Wmissing-prototypes
-Wwrite-strings</samp>).  (Should this imply <samp>-Wall</samp>?
<samp>-W</samp>?)</li>

<li>Give <samp>-W</samp> a better name, such as <samp>-Wextra</samp>.</li>

<li>Implement fine-grained warning control, e.g. disabling a specific
warning by name.</li>

<li>Teach collect2 to recognize when an object module requires a
specific runtime support library and link it in automatically.

<p>That is, if the first linker invocation spits out undefined
symbols, see if they are from libstdc++, libf2c, etc. and throw in the
appropriate library on the second pass.  This would pretty much
eliminate the need for language specific drivers.</p>

<p>It would be neat if it would recognize when libm was necessary,
too.  (No more "where's <code>sqrt(3)</code>?" bug reports!)</p>
</li>
</ul>

<h2>Optimizer improvements</h2>

<p>These require some knowledge of compiler internals and substantial
programming skills, but not detailed knowledge of GCC internals.
I think.</p>

<ul>
<li>Make <code>insn-recog.c</code> use a byte-coded DFA.

<p>Richard Henderson and I started this back in 1999 but never
finished.  I may still be able to find the code.  It produces an order
of magnitude size reduction in <code>insn-recog.o</code>, which is
huge (432KB on i386).</p>
</li>

<li>Make GCSE (and CSE?) capable of digging inside PARALLELs.

<p>This is needed for GCSE to do any good at all on i386.</p>

<p>Here's some dialogue on the subject, which unfortunately may only
confuse you.</p>

<blockquote>
<div>Michael Meissner:</div>
<div style="border-left: solid blue; padding-left: 4pt">
Actually I would imagine gcse handles clobbers [inside parallels] just
fine and dandy, since it uses <code>single_set</code> which strips off
the clobbers/uses if there is only one set.  What it doesn't handle is
a parallel that has two sets, which on the x86 is for setting the
condition code register.  This probably applies to more phases than
just gcse (look for <code>single_set</code>).  Another place a
parallel with 2 sets is used is for machines that do both the divide
and modulus in one step.</div>
</blockquote>

<blockquote>
<div>Richard Henderson:</div>
<div style="border-left: solid blue; padding-left: 4pt">
Those don't get created until combine.
<p>No, the real problem is that gcse doesn't handle hard registers,
so the clobber of hard register 17 (flags) squelches everything.</p>
</div>
</blockquote>

<blockquote>
<div>Daniel Berlin:</div>
<div style="border-left: solid blue; padding-left: 4pt">
The comment above hash_scan_insn claims it doesn't handle clobbers in
parallels, yet the code appears to.
</div>
</blockquote>
</li>

<li>Teach the combiner to delete no-op moves it generates.

<p>This includes unrecognizable no-op moves.  You can get things like
<samp>(set&nbsp;(cc0)&nbsp;(cc0))</samp>, or
<samp>(set&nbsp;(pc)&nbsp;(pc))</samp>.  Neither of these is a valid
insn, but throwing them out would win.  See the <a
href="http://gcc.gnu.org/ml/gcc-patches/2000-07/msg00580.html">discussion</a>
on gcc-patches last year.</p>
</li>

<li>Find all the places that simplify RTL and make them use
<code>simplify-rtx.c</code>.

<p>Here is some commentary from there:</p>
<blockquote>
<p>Right now GCC has three (yes, three) major bodies of RTL simplification
code that need to be unified.</p>
<ol>
<li><code>fold_rtx</code> in <code>cse.c</code>.  This code uses
various CSE specific information to aid in RTL simplification.</li>
<li><code>combine_simplify_rtx</code> in <code>combine.c</code>.
Similar to <code>fold_rtx</code>, except that it uses combine specific
information to aid in RTL simplification.</li>
<li>The routines in this file.</li>
</ol>

<p>Long term we want to only have one body of simplification code; to
get to that state I recommend the following steps:</p>
<ol>
<li>Pore over fold_rtx and simplify_rtx and move any simplifications
which are not pass dependent state into these routines.</li>
<li>As code is moved by #1, change <code>fold_rtx</code> and
<code>simplify_rtx</code> to use this routine whenever possible.</li>
<li>Allow for pass dependent state to be provided to these routines
and add simplifications based on the pass dependent state.  Remove
code from <code>cse.c</code> and <code>combine.c</code> that becomes
redundant/dead.</li>
</ol>

<p>It will take time, but ultimately the compiler will be easier to
maintain and improve.  It's totally silly that when we add a
simplification that it needs to be added to four places (three for RTL
simplification and one for tree simplification).</p>
</blockquote>
</li>

<li>Convert <code>reorg.c</code> to use the flow graph.

<p>Then we can throw away <code>resource.c</code>. Long term we want
reorg folded into the scheduler, but that's much harder.</p></li>

<li>Improve <code>dwarf2out.c</code>.

<p>DWARF2 can handle all kinds of heavy optimizations that we'd like
to do, but our generator doesn't know how just yet.  At the very least
it'd be nice if <samp>-gdwarf-2 -fomit-frame-pointer</samp> could give
you a clean backtrace on all targets where DWARF works.  (This is
definitely possible.)</p>

<p>You need to coordinate with the gdb team.  It does no good for gcc
to generate fancy debug info if the debugger doesn't understand
it.</p>
</li>
</ul>

<h2>C/C++ front end</h2>

<ul>
<li>Clean up <code>special_function_p</code> and other handling of
functions with names implying given properties.

<p>All properties <code>special_function_p</code> determines ought to
be specifiable with attributes as well.  Where
<code>special_function_p</code> checks for a function not defined by
ISO C, the attribute ought to be added by fixincludes rather than
presuming anything about its semantics within the compiler.  All this
special handing should be disabled by <samp>-ffreestanding</samp>.</p>

<p>There should be a unified way of attaching attributes to functions
with known semantics when they are declared (explicitly or
implicitly), which should also be used for "builtins" such as exit and
abort that only exist for this purpose.</p></li>

<li>Move all flags shared between C and C++ to <code>c-common.[ch]</code>.

<p>(Make sure that the flags in question are genuinely used in both
front ends. When I [Joseph Myers] started looking at this, the first
case I found was that <samp>-fcond-mismatch</samp> was ignored for C++ - so I
documented that instead.)</p></li>

<li>More generally, share more code between the C and C++ front ends.

<p>For instance, the tree-based inliner should be common to both.</p></li>

</ul>

<h2>Web page work</h2>

<ul>
<li>Set up a system that automatically checks the mirrors list.

<p>It should detect mirrors that have gone away, are persistently
down, or are very out of date (the last being easy to do for those
carrying snapshots, harder for those with releases only).</p>

<p>DJ Delorie &lt;<a href="mailto:dj@redhat.com">dj@redhat.com</a>&gt;
has some scripts to do this already.</p></li>

<li>Extend this to check for other broken links.

<p>Also to links which lead to a permanent HTTP redirect or a to a
"this page has moved" page.</p></li>

</ul>

</body>
</html>




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]