Bug 51437 - GCC should warn on the use of reserved identifier/macro names
Summary: GCC should warn on the use of reserved identifier/macro names
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-06 11:20 UTC by Ruben Van Boxem
Modified: 2016-04-21 17:25 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ruben Van Boxem 2011-12-06 11:20:00 UTC
For C and C++, Clang should warn when user code uses identifiers or macros that are implementation-reserved.

For C(99), this would mean names starting with an underscore followed by
another unerscore or capitalized letter.

For C++ a single underscore is also reserved for the global namespace.

I could have messed the above rules up, but these are the cause of a lot of
unexpected bugs, and a warning would be super-easy to implement.
Comment 1 Jonathan Wakely 2011-12-06 11:37:39 UTC
(In reply to comment #0)
> For C and C++, Clang

Ahem!

> For C++ a single underscore is also reserved for the global namespace.

That comes from C, not C++:
"All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces."
Comment 2 Ruben Van Boxem 2011-12-06 11:50:18 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > For C and C++, Clang
> 
> Ahem!
> 
> > For C++ a single underscore is also reserved for the global namespace.
> 
> That comes from C, not C++:
> "All identifiers that begin with an underscore are always reserved for use as
> identifiers with file scope in both the ordinary and tag name spaces."

Fair enough. I warned you, I only quickly glanced at both standard documens :)
Comment 3 Jonathan Wakely 2011-12-06 13:34:36 UTC
The C standard also reserves various names for future use in 7.26
Comment 4 David Krauss 2011-12-16 11:27:45 UTC
The #pragma GCC system_header directive helps here by tagging places where reserved names are OK, but accommodating headers from the operating system which may lack that directive might require something else.
Comment 5 Josh Triplett 2011-12-21 18:53:18 UTC
I'd like to see this as well.  While issuing such a warning by default would cause numerous warnings with existing code, having it as an opt-in -Wreserved-identifiers would help greatly.

Also, the rules for C prohibit a leading underscore followed by a capital letter, or two adjacent underscores *anywhere* in the identifier.  So, for example, this__identifier, or_this_identifier__.
Comment 6 Rich Felker 2012-02-19 04:27:43 UTC
I would greatly like to see this as well; actually I was just about to file a duplicate report when I found this, as a result of finding "struct _Color" in GIMP source... >_<

I think the system headers issue could be handled very easily, with no false positives and few false negatives. The heuristic would basically be to flag any occurrence of reserved names except in header files found in the -isystem path, or in text resulting from expansion of a macro defined by a header file in the -isystem path.
Comment 7 Josh Triplett 2012-02-19 04:41:59 UTC
(In reply to comment #6)
> I would greatly like to see this as well; actually I was just about to file a
> duplicate report when I found this, as a result of finding "struct _Color" in
> GIMP source... >_<
> 
> I think the system headers issue could be handled very easily, with no false
> positives and few false negatives. The heuristic would basically be to flag any
> occurrence of reserved names except in header files found in the -isystem path,
> or in text resulting from expansion of a macro defined by a header file in the
> -isystem path.

You don't want to flag all occurrences of reserved names, just definitions using reserved names.  Flagging uses will just result in duplicate warnings for the same symbols.  Also, the code that uses a symbol doesn't really have a choice; it has to use the symbol as defined.  Better to generate a warning on the definition only.

Good call on the issue of "text resulting from expansion of a macro"; that seems like precisely one of the cases where system headers will want to use _Reserved names to avoid conflict with user-defined identifiers.

I'd suggest using precisely the heuristic GCC currently uses to avoid generating warnings in system headers, as specified in http://gcc.gnu.org/onlinedocs/cpp/System-Headers.html , including the bit saying "Macros defined in a system header are immune to a few warnings wherever they are expanded."

Arguably this also needs special handling to avoid making -Wsystem-headers any more useless, similar to the handling of -Wunknown-pragmas, but that doesn't seem like a necessary prerequisite for adding this feature.
Comment 8 Rich Felker 2012-02-19 05:47:18 UTC
You really do want to flag both definition and usage. For instance, there's plenty of buggy software (especially GNU software like binutils) using __pid_t and similar when it should be using pid_t, etc.

From an undefined behavior standpoint, defining a name in the reserved namespace is no worse than using a name in the referred namespace assuming the implementation has defined it. Both are incorrect C usage and both should be flagged.

My heuristic about -isystem would prevent flagging usage of reserved names resulting from implementations of system headers - for instance, if a macro in a system header used __uint32_t because it needs to avoid making uint32_t visible.
Comment 9 Josh Triplett 2012-02-19 06:29:27 UTC
(In reply to comment #8)
> You really do want to flag both definition and usage. For instance, there's
> plenty of buggy software (especially GNU software like binutils) using __pid_t
> and similar when it should be using pid_t, etc.

In the case of identifiers containing __ or starting with _[A-Z], that does hold true; I hadn't considered programs using internal identifiers when corresponding public identifiers exist.  (Ideally the headers could have avoided exposing those internal identifiers to user programs in the first place, but I don't know any sensible way to implement that.)

However, note that the standards also reserve various other classes of names, such as types ending in _t, for which GCC should only flag definitions, not uses.  Only system headers should define new _t types, but user code can *use* types like time_t or pid_t without warning.

(Some of the other reserved identifier categories, such as E[A-Z0-9].*, is[a-z].*, to[a-z].*, and mem[a-z].* should go under some separate, more pedantic warning option.)

> From an undefined behavior standpoint, defining a name in the reserved
> namespace is no worse than using a name in the referred namespace assuming the
> implementation has defined it. Both are incorrect C usage and both should be
> flagged.

True.  I had mostly wanted to avoid generating hundreds of warnings for the same identifier.  However, that seems better than missing cases like the __pid_t you mentioned above.

> My heuristic about -isystem would prevent flagging usage of reserved names
> resulting from implementations of system headers - for instance, if a macro in
> a system header used __uint32_t because it needs to avoid making uint32_t
> visible.

Right.  That seems like the same heuristic documented at http://gcc.gnu.org/onlinedocs/cpp/System-Headers.html that I referenced in comment 7: "Macros defined in a system header are immune to a few warnings wherever they are expanded."
Comment 10 Ruben Van Boxem 2012-02-19 09:33:05 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > You really do want to flag both definition and usage. For instance, there's
> > plenty of buggy software (especially GNU software like binutils) using __pid_t
> > and similar when it should be using pid_t, etc.
> 
> In the case of identifiers containing __ or starting with _[A-Z], that does
> hold true; I hadn't considered programs using internal identifiers when
> corresponding public identifiers exist.  (Ideally the headers could have
> avoided exposing those internal identifiers to user programs in the first
> place, but I don't know any sensible way to implement that.)
> 
> However, note that the standards also reserve various other classes of names,
> such as types ending in _t, for which GCC should only flag definitions, not
> uses.  Only system headers should define new _t types, but user code can *use*
> types like time_t or pid_t without warning.

These are only reserved for POSIX, and should not always be warned about!

> 
> (Some of the other reserved identifier categories, such as E[A-Z0-9].*,
> is[a-z].*, to[a-z].*, and mem[a-z].* should go under some separate, more
> pedantic warning option.)

I don't see why this should happen at all. There's nothing special about these general names?
> 
> > From an undefined behavior standpoint, defining a name in the reserved
> > namespace is no worse than using a name in the referred namespace assuming the
> > implementation has defined it. Both are incorrect C usage and both should be
> > flagged.
> 
> True.  I had mostly wanted to avoid generating hundreds of warnings for the
> same identifier.  However, that seems better than missing cases like the
> __pid_t you mentioned above.
> 
> > My heuristic about -isystem would prevent flagging usage of reserved names
> > resulting from implementations of system headers - for instance, if a macro in
> > a system header used __uint32_t because it needs to avoid making uint32_t
> > visible.
> 
> Right.  That seems like the same heuristic documented at
> http://gcc.gnu.org/onlinedocs/cpp/System-Headers.html that I referenced in
> comment 7: "Macros defined in a system header are immune to a few warnings
> wherever they are expanded."
Comment 11 David Krauss 2012-02-19 11:09:28 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > However, note that the standards also reserve various other classes of names,
> > such as types ending in _t, for which GCC should only flag definitions, not
> > uses.  Only system headers should define new _t types, but user code can *use*
> > types like time_t or pid_t without warning.
> 
> These are only reserved for POSIX, and should not always be warned about!

Yes, respecting POSIX would be overkill. In particular, it's OK and common to define such types in C++ outside the global namespace.

> > (Some of the other reserved identifier categories, such as E[A-Z0-9].*,
> > is[a-z].*, to[a-z].*, and mem[a-z].* should go under some separate, more
> > pedantic warning option.)
> 
> I don't see why this should happen at all. There's nothing special about these
> general names?

See comment #3, they are reserved in C99 §7.26 "Future library directions" (although those patterns aren't quite right).
Comment 12 Josh Triplett 2012-02-19 18:50:34 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > However, note that the standards also reserve various other classes of names,
> > such as types ending in _t, for which GCC should only flag definitions, not
> > uses.  Only system headers should define new _t types, but user code can *use*
> > types like time_t or pid_t without warning.
> 
> These are only reserved for POSIX, and should not always be warned about!

I stand corrected.  I thought that the C standard reserved types with a _t suffix, but I just re-checked the C99 and C11 specs and they don't have that reservation, only some more specific reservations such as the others I mentioned.
Comment 13 Josh Triplett 2012-02-19 18:56:28 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > (Some of the other reserved identifier categories, such as E[A-Z0-9].*,
> > > is[a-z].*, to[a-z].*, and mem[a-z].* should go under some separate, more
> > > pedantic warning option.)
> > 
> > I don't see why this should happen at all. There's nothing special about these
> > general names?
> 
> See comment #3, they are reserved in C99 §7.26 "Future library directions"
> (although those patterns aren't quite right).

I wrote those patterns based on C11 section 7.30 "Future library directions".  7.30.3 says "Macros that begin with E and a digit or E and an uppercase letter may be added to the declarations in the <errno.h> header.".  7.30.2 says "Function names that begin with either is or to, and a lowercase letter may be added to the declarations in the <ctype.h> header."; 7.30.12 also reserves those two for <wctype.h>.  7.30.11 says "Function names that begin with str, mem, or wcs and a lowercase letter may be added to the declarations in the <string.h> header."  AFAICT, those reservations match the regexes I gave.
Comment 14 Ruben Van Boxem 2012-02-19 21:51:37 UTC
I don't think adding future reserved identifiers serves any purpose. In general, code is written against a certain version of a language's Standard, with the current constraints, not would-be might-be constraints.

Perhaps split that into a very pedantic extra warning or something.