GCC Documentation: Overview, Issues, Plans
Contents
Taxonomy
Manuals, Reference Material
End User
User documentation is composed of user manuals for the compilers (gcc, g++, etc.), support tools (cpp), various language front ends (java, fortran) and runtime libraries (c++, OpenMP, ada), and support libraries (quadmath).
These manuals are packaged for every release and put on gcc.gnu.org. Standard GNU Coding Convention rules are used to generate the documentation, ie make html and maintainer-scripts/make install-html. Manual content is authored in either texinfo (all but libstdc++) or docbook (libstdc++). Texinfo is a markup created by RMS and is the official markup language used by GNU projects. Reference content for libstdc++ is generated with doxygen.
Documents are packaged for each release via a script, maintainer-scripts/update_web_docs_svn
Printed output requires TeX.
Known Weaknesses
- Ad-Hoc contributions, no editors means coverage poor. Language coverage varies widely, some hosted off of GNU domains.
texinfo is largely unmaintained. Use and markup are niche. The HTML it generates does not validate. Utilities required for generating print output, ie interfacing with TeX, have been deprecated. No support for man. No integration with web-based editing tools or wikis.
- doxygen is finicky. Generating print output requires a long/elaborate toolchain. Generated content can be large, taxing TeX subsystem.
Developer
A separate set of documents exist that are focused on the internals of GCC.
However, most of the recent internals documentation takes place on the wiki . Subjects include how to use git for source code control, and various higher-level overviews of the source tree and or specific passes or sub-components.
In addition, there are source code comments. These comments vary widely in terms of completeness and relevance. A visualization of gcc sources via doxygen has been sporadically generated, but is currently not maintained.
Self-reported status of internal docs can be found here.
Known Weaknesses
- Some sources lack comments, and functions that are commented do not document arguments
Coverage is poor. (Notably absent is overview of build process.)
- Lack of updates makes current content misleading
- Wiki orphans, lack of organization, duplicates
Add your favorite here
Web Site
Main domain is gcc.gnu.org, although several other domains are also hosted, including cygwin.com. Content includes release notes, install and configuration details, FAQ, build results, porting information, historical release dates and version info, mission statement and information on the GCC steering committee, and links to additional resources. Content is authored in "bare" html without CSS, and then a CSS layer is applied when content is "published" on gcc.gnu.org. Sources to the website are available via CVS and are maintained by Gerald Pfeifer.
Traffic can be estimated via quantcast/compete. Roughly, compete guesses around 12.8k unique visitors for the current month, with a year variance of between 11-22k unique vistors per month. Demographic data is guessed by quantcast to be 65% male, high education levels, evenly split between 25-35, 35-45, 45-55 age levels, asian geography heavyweight.
Comparing, cygwin.com is around 21k, llvm.org is around 3.4k, sourceforge.net is around 2M, gnu.org is around 105k, python.org is around 69k, stackoverflow.com is around 750k.
Mailing List Archives
Extensive design and development history of new features is archived on various mailing lists. Partial archives of the mailing list are available, with the last twelve years easy accessible and well-indexed by search engines. Problems, bugs, and issues are tracked in bugzilla. Before 1997, GCC development mailing lists (ie gcc2) were closed: no known archives exist of the first ten years of development.
Legal and Licensing Issues
GDFL
GCC Manuals were switched from GPL to GDFL in 2001. Adoption of GDFL has not been without controversy: Debian has issues with invariant sections, Wikipedia with license incompatibilities. GDFL is incompatible with Creative Commons and GPL licenses. Many of these problems have been well-known for around 10 years: there has been no progress on resolution.
GPL
Generated documents take the same license as the originating sources. So, for the libstdc++ doxygen API reference, the files are licensed under the GPL. To be precise, they are licensed under the GPLv3 with the runtime exception.
Misc
The content on the gcc.gnu.org website is not licensed per se but instead copying is allowed as long as the FSF copyright is preserved.
GFDL vs. GPL
Literate programming, via Knuth, is the placement of code and commentary in the same context. When the code is GPL, and the commentary is GFDL, the license incompatibilities prevent this natural combination.
- FSF owns copyright on this stuff. It can very easily dual-license content to remove barriers, but has refused. Or rather, it only allows it for .po (localization) files.
Approved usage is index or generated content in GFDL, sources in GPL, separate documents. This type of recommended construction is impractical, and is equivalent to a book having no index, but a separate pamphlet that is an external table of contents. This distinction is arbitrary and inconvenient.
According to http://gcc.gnu.org/ml/gcc/2010-08/msg00068.html, a possible solution is to make a new GPL document of everything that is automatically generated. If content needs to be moved from a GFDL doc to a GPL doc, ask RMS to be able to duplicate. PDFs can be concatenated by mere aggregation. HTML pages are even less troublesome, since each HTML page can have its own license. --ManuelLópezIbáñez
There are lots of ways this impacts GCC developers.
- Eclipse wants to use GPL libstdc++ API XML files for hover help, but prefers GFDL.
- libstdc++ wants to generate a class index for the user manual (GDFL) from the GPL'd API XML files, but cannot.
Plans
Have a Plan
Make a list of what's wrong. Prioritize. Take the top three and fix in a month, quarter, year. Re-evaluate progress and priorities on a yearly basis.
Removal of Literal Programming Restrictions
Essential for progress on multiple fronts.
Create a Doc Stage
Instead of requiring complete documentation when a new feature is being developed in stage one, let it slide until stage two, which would now be freeze on new features and documentation of new features and changes to old capabilities. Then stage three would be the normal stabilization phase. Admittedly, the boundary is fuzzy. The goal is for every release, have accurate developer commentary on contemporary state of implementation. Regularly-scheduled documentation "checkpoints" may allow the GCC community to incorporate the "expert help" from doc writers and editors outside the normal development community.
Questions
- What format is most important? Seems like HTML.
- Who writes docs, and at what stage of development?
- What authoring/editing strategies are working?
- How can this project attract writers? How can this project attract talented editors? Who would be willing to take on some of the gigantic editing tasks?
- Is there a place for GNU patronage or other corporate sponsorship?
- Who prunes the wikis? How is this content folded in to manuals?
- What about newer electronic formats like EPUB? How is this supported?
- How can vendors re-writing GCC release notes be avoided?
- How can vendors that use GCC manuals in product documentation contribute back effectively?