Warnings in the C++ Front-End and GCC in General

Craig Burley burley@gnu.org
Thu Sep 10 03:53:00 GMT 1998


>Furthermore, using mnemonic names for the warnings would just add to
>the internationalization problem we already have: "-Wno-sign-compare"
>may be intuitive to you but probably isn't to some who doesn't speak
>English.  Unfortunately, it doesn't make sense to translate
>command-line options, or the arguments to #pragma's, __attribute__,
>and so forth because we want the same source code to compile no matter
>what language the person compiling it happens to speak.

A *big* disadvantage of numbers is that it becomes much harder
to reliably search for all instances of where a particular
diagnostic is issued in the code.  With mnemonics -- regardless
of the language they're written in -- the reliability goes up much
higher, especially if the creators of the mnemonics understand
the importance of this capability.  (Try searching for 62 in gcc,
as compared to searching for "uninitialized var", or some such thing.
In g77, the distinction is likely to be even more useful.)

It might be instructive for some people to look at g77's handling
of the issues.  It has *two* mnemonics for some diagnostics, and
at least one for most of them:

  -  One mnemonic is the internal name.  This is the "connective
     tissue" between the code's reference to the diagnostic text,
     and the diagnostic text itself.

  -  Another, more recently-introduced mnemonic, is the "doc hook name",
     or whatever one wants to call it.  This is the "connective
     tissue" between the diagnostic text and the precise place in
     the (on-line) documentation that describes the diagnostic in
     more detail, including how to avoid the diagnostic in the future.

*Neither* of these needs to be in English, French, or whatever.
The former need only be consistent within a given version of the
compiler source, since the user never sees it.

The latter, doc-hook, name has a different set of requirements.  Because
the GNU Info system is used, and that system doesn't really cope
naturally with multiple versions, a doc-hook name must not be changed
to mean something substantially different than what is described by
an older version of the manual, or issued by an older version of
the compiler.  And, just as `const' remains `const' even in the French
domain, these doc-hook names work just fine as mnemonics, regardless
of the authors' native language (though, for consistency across a
code base, I'd recommend sticking to one language/dialect).

(Note that there are "in-line" messages, a mechanism I created to
avoid the long build times, that have no internal name and,
probably in most cases, no doc-hook name either.  I plan to change
all these to internal-name messages, for consistency's sake, over
time.)

The effects of g77's approach -- assuming I and/or others get
around to making all its diagnostics employ both names -- includes:

  -  The messages are, indeed, all in one file

  -  Multiple lines and/or source references are properly assigned
     to a single conceptual message

  -  A direct, mnemonic connection is made between messages of a
     given class and a unique chunk of documentation that describes
     messages of that class

None of the above are true for the rest of egcs (well, gcc anyway)
currently.  Mark's patches accomplished most of the above, I assume,
though using numbers rather than names and not offering the doc
aspects (at first, anyway -- these could "fall out" of his work
fairly naturally, I would think).

So, when a user runs g77 and gets one of these full-bore diagnostics,
it looks something like this:

970221-2.f:Ambiguous use of intrinsic `AIMAG' at (^) [info -f g77 M CMPAMBIG]

The "magic" stuff is in brackets, and the "info -f g77 M" part of it
is constant -- as a UNIX command, assuming GNU Info is installed, it
drops the user directly into the section of the manual that lists the
described diagnostics.

`CMPAMBIG' is the doc-hook name, and drops the user into the part of
the manual that describes that specific diagnostic.

Whether the user issues the command himself, his editor (Emacs?) does
it for him upon typing some "expand upon this" sequence (a la
hypertext), or whatever, the important distinction between what g77
provides and what Mark proposed is that the connective tissue, visible
to the user, is, instead of some arbitrary number that must be
maintained accordingly to loosely defined rules, a simple, unique
mnemonic that has only the requirement that it be maintained such
that differently-versioned documentation and compiler sets coexist
peacefully.  (In particular, what I think is most important is that
the latest documentation still have appropriate descriptions of
messages generated by any-old version of the compiler, even if the
description is "this diagnostic was deemed so stupid it was removed
in later versions of the compiler -- please upgrade".  :)

This mechanism suffers from some of the same problems already raised
by others (including myself) -- e.g. it centralizes diagnostics,
so it has those disadvantages.

But, that's really a result of the internal-naming scheme, not the
doc-hook-naming scheme.

Because of the requirements stated above for the doc-hook names,
once a diagnostic has been given such a name (however that's done),
it isn't particularly hard to add a general facility to disable,
or render as error, warnings on a doc-hook name-by-name basis.

In egcs, given that we probably don't want to switch wholesale
to a separate registry (mainly to avoid making future integration
much harder), we *could* implement just the doc-hook scheme, and
also offer an internal-name scheme for portions of egcs that
don't have merge problems (like, natch, g77).

So, consider adding a special diagnostic syntax, e.g. turning

  warning ("enumeration value `%s' not handled in switch",
    IDENTIFIER_POINTER (TREE_PURPOSE (v)));

into

  warning ("enumeration value `%s' not handled in switch %E[ENUMSW]",
    IDENTIFIER_POINTER (TREE_PURPOSE (v)));

where %E[name] identifies the doc-hook name.

Given this, when the compiler goes to issue a diagnostic, it can:

  -  Substitute "[info -f gcc ENUMSW]" for the %E[] construct

  -  Omit the %E[] construct entirely, surrounding whitespace, etc.

  -  Choose to not emit the warning at all, because ENUMSW warnings
     have been disabled (as a class, even if a class with only one
     instance)

  -  Always emit the warnings, yet leave sed/awk/perl hackers a
     much easier means to reliably identify, and thus filter out,
     messages

This approach makes integration a bit annoying nevertheless, but it
doesn't invite huge tracts of differences right away, because it'd
be undertaken more slowly.

And, it'd generally leave the text of most messages in the code,
and even make such messages a bit easier to find in the first
place, IMO.

Whether individually turning off messages is offered (I'd still
recommend against that, especially offered as in-code facilities a la
#pragma) would not change the ease by which we, and future egcs
developers, could emphatically insist on the role of doc-hook
names being maintained:

  "Doc-hook names are provided only as a means to connect a
   diagnostic with a chunk of documentation that describes it.
   You should not assume that you can use these names to
   enable/disable warnings in ways that are consistent from
   one version to the next or one architecture to the next."

That is, I believe it's much easier to tell people not to use doc-hook
names for other purposes -- because they clearly *have* a purpose,
and it isn't to enable/disable individual warnings -- than to tell
them not to use arbitrary numbers/names introduced, apparently, only
to provide "connective tissue" to some command-line options and/or
#pragma's that enable/disable individual warnings!

>Yes, the error messages are removed from the source code itself.  In
>my patch, they were replaced by enumerals like `you_did_x' for a
>message like "you did x", which I don't think is so unreadable.  For
>that, we get a lot of leverage; like the ability to disable related
>messages. 

I've found the readability of g77's internal vs. doc-hook names
pretty okay to work with -- only slightly worse than having the
text in the code itself.  Sometimes it's even a bit better --
messages with lots of similar text but a distinct internal name
allow me to more quickly decide which message I'm looking up,
and once I have that internal name, searching for it is more
reliable than searching for the text.  (Especially true since
in-line substitution variables, a la %s in gcc, can make it
hard to guess which parts of a message are "constant" and which
are variable.  So, limiting the search to a single data base
of diagnostics has usually made my life easier in such cases.)

It is, IMO, important to have the distinction carefully made, however.

That is, I would recommend against trying to make one name serve
as both an internal name to connect to the database *and* an exported
name for users to use as "handles" to look up documentation, or
disable/enable via command-line options or code constructs.

That's especially important to avoid when there are different
contexts for given chunks of code: that is, *products* (aka programs)
are not necessarily the boundaries of *code*.  The latter must
employ unique internal names in the "space" of the data base
used in that code, while the former has different requirements,
depending on how the code is actually packaged into products.

That's part of why I considered a single number, like 62, to
serve as both the internal compiler number and the external,
user-visible number that would find its way into code (and, yes,
Makefiles are code!), to be a risky solution to the problem at hand.

        tq vm, (burley)



More information about the Gcc mailing list