This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: "Documentation by paper"
- From: law at redhat dot com
- To: Mark Mitchell <mark at codesourcery dot com>
- Cc: Richard Kenner <kenner at vlsi1 dot ultra dot nyu dot edu>, gcc at gcc dot gnu dot org
- Date: Tue, 03 Feb 2004 01:08:02 -0700
- Subject: Re: "Documentation by paper"
- Reply-to: law at redhat dot com
In message <401AE59C.9080809@codesourcery.com>, Mark Mitchell writes:
>Richard Kenner wrote:
>
>>I've been noticing for a while that there are an increasing number of files
>>in GCC where the only overview documentation is a reference to a paper or
>>textbook.
>>
>>I think this is totally unacceptable documentation and that we need to have
>a
>>policy about this sort of documentation.
>>
>>
>Very little discussion in the long ensuing thread seems to relate back
>to this key point from Kenner's original email. Independent of the pros
>and cons of Doxygen and its ilk, let's agree that documentation has to
>be present in the GCC source tree for the algorithms that are in use in
>the compiler. If you write new code, it's good to reference papers that
>inspired it, but that's no excuse for good comments on the functions
>that explain what they do and good comments in the code that explain why
>it works the way it does.
>
>I don't think we need to officially adopt Kenner's list of policies
>because I think they are already implied by our current coding standards.
>
>But, we do need to enforce them!
Can't argue with that. Though sometimes it is difficult to find a more
concise way to describe the algorithms than is found in the references.
What I tend to be particularly interested in is implementation details,
flaws in the paper, limitations imposed due to our framework/implementation,
differences in design decisions, etc.
One of the interesting questions I'm working through right now is to
figure out how much I should assume the reader knows. For example,
do I assume that everyone knows what dominators and post-dominators are?
What about a dominator tree? Value numbering? SSA form? How using
value numbering on the SSA form during a dominator walk gives us redundancy
elimination on an almost-global scale without the need to invalidate values
from our hash tables?
I'm going to be somewhat lucky in the specific documentation I'm working
on right now as each of these subjects largely lives in a single file where
I can document the basic concepts. [ And in cases where we don't have that
kind of nice separation, I'll be looking to add that separation. ]
But even with that documentation, someone reading the dominator optimizer's
docs is going to have to have a good grasp on a number of underlying
concepts before the dominator optimizer's documentation is going to make
sense.
Which brings me to the need for this kind of documentation to also live
in the modern world of the web -- hyperlinks from the optimizer's high
level documentation back to the underlying concepts it builds upon,
IMHO, is much better than bringing up a zillion files in your favorite
text editor.
I doubt we'll have an integrated solution where the docs live in one place
(the source) and are automagically extracted into web pages, but if I could
do that without introducing too much clutter in the source, I definitely
would... Sigh, sniffle.
[ Not to mention things like graphs look at lot better if you're not
limited by ascii text :-) ]
On the subject of APIs -- as someone mentioned, there are things in the
compiler that really should be exposed purely by an API -- interfaces
into the gimplifier, dominator walker, statement linking/walking, etc.
Even if we never actually separate that code into a library, programming
as if it were a shared library would greatly help with the separation
issues that plague gcc.
jeff