This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Warning for trigraphs in comment?


This 

<quote>

   Second, consider the comment line.  Did you notice that it ends oddly,
   with a "/"?

       // What will the next line do? Increment???????????/
                                                          ^

   Nikolai Smirnov writes:

      "Probably, what's happened in the program is obvious for you but I
      lost a couple of days debugging a big program where I made a
      similar error.  I put a comment line ending with a lot of question
      marks accidentally releasing the 'Shift' key at the end.  The
      result is unexpected trigraph sequence '??/' which was converted to
      '\' (phase 1) which was annihilated with the following '\n' (phase
      2)."  [1]

   The "??/" sequence is converted to '\' which, at the end of a line, is
   a line-splicing directive (surprise!).  In this case, it splices the
   following line "++x;" to the end of the comment line and thus makes
   the increment part of the comment.  The increment is never executed.

   Interestingly, if you look at the Gnu g++ documentation for the
   -Wtrigraphs command-line switch, you will encounter the following
   statement:

      "Warnings are not given for trigraphs within comments, as they do
      not affect the meaning of the program."  [2]

   That may be true most of the time, but here we have a case in point --
   from real-world code, no less -- where this expectation does not hold.

</quote>

is extracted from Herb's GTW #86 -- full message appended below.
I would suggest we warn for trigraphs in comments.

-- Gaby


--- Begin Message ---
 -------------------------------------------------------------------
   Guru of the Week problems and solutions are posted regularly on
    news:comp.lang.c++.moderated. For past problems and solutions
      see the GotW archive at www.GotW.ca. (c) 2003 H.P.Sutter
            News archives may keep copies of this article.
 -------------------------------------------------------------------

______________________________________________________________________

GotW #86:   Slight Typos? Graphic Language and Other Curiosities

Difficulty: 5 / 10
______________________________________________________________________



>Answer the following questions without using a compiler.
>
>1. What is the output of the following program on a
>   standards-conforming C++ compiler?
>
>    #include <iostream>
>
>    int main()
>    {
>      int x = 1;
>      for( int i = 0; i < 100; ++i );
>        // What will the next line do? Increment???????????/
>        ++x;
>      std::cout << x;
>    }

Assuming that there is no invisible whitespace at the end of the
comment line, the output is "1".

There are two tricks here, one obvious and one less so.

First, consider the for loop line:

  for( int i = 0; i < 100; ++i );
                                ^

There's a semicolon at the end, a "curiously recurring typo pattern"
that (usually accidentally) makes the body of the for loop just the
empty statement.  Even though the following lines may be indented, and
may even have braces around them, they are not part of the body of the
for loop.  This was a deliberate red herring -- in this case, because
of the next point, it doesn't matter that the for loop never repeats
any statements because there's no increment statement to be repeated
at all (even though there appears to be one).  This brings us to the
second point:

Second, consider the comment line.  Did you notice that it ends oddly,
with a "/"?

    // What will the next line do? Increment???????????/
                                                       ^

Nikolai Smirnov writes:

   "Probably, what's happened in the program is obvious for you but I
   lost a couple of days debugging a big program where I made a
   similar error.  I put a comment line ending with a lot of question
   marks accidentally releasing the 'Shift' key at the end.  The
   result is unexpected trigraph sequence '??/' which was converted to
   '\' (phase 1) which was annihilated with the following '\n' (phase
   2)."  [1]

The "??/" sequence is converted to '\' which, at the end of a line, is
a line-splicing directive (surprise!).  In this case, it splices the
following line "++x;" to the end of the comment line and thus makes
the increment part of the comment.  The increment is never executed.

Interestingly, if you look at the Gnu g++ documentation for the
-Wtrigraphs command-line switch, you will encounter the following
statement:

   "Warnings are not given for trigraphs within comments, as they do
   not affect the meaning of the program."  [2]

That may be true most of the time, but here we have a case in point --
from real-world code, no less -- where this expectation does not hold.


>2. How many distinct errors should be reported when compiling the
>   following code on a conforming C++ compiler?
>
>    struct X {
>      static bool f( int* p )
>      {
>        return p && 0[p] and not p[1:>>p[2];
>      };
>    };
>


The short answer is:  Zero.  This code is perfectly legal and
standards-conforming (whether the author might have wanted it to be or
not).

Let's consider in turn each of the expressions that might be
questionable, and see why they're really okay:

 - 0[p] is legal and is defined to have the same meaning as "p[0]".
   In C (and C++), an expression of the form x[y], where one of x and
   y is a pointer type and the other is an integer value, always means
   *(x+y).  In this case, 0[p] and p[0] have the same meaning because
   they mean *(0+p) and *(p+0), respectively, which comes out to the
   same thing.  For more details, see clause 6.5.2.1 in the C99
   standard [3].

 - and and not are valid keywords that are alternative spellings of &&
   and !, respectively.

 - :> is legal.  It is a digraph for the "]" character, not a smiley
   (smileys are unsupported in the C++ language outside comment
   blocks, which is rather a shame).  This turns the final part of the
   expression into "p[1]>p[2]".

 - The "extra" semicolon is allowed at the end of a function
   declaration.

Of course, it could well be that the colon ":"  was a typo and the
author really meant "p[1]>>p[2]", but even if it was a typo it's still
(unfortunately, in that case) perfectly legal code.


Acknowledgements
----------------

Thanks to Nikolai Smirnov for contributing part of the Example 1 code;
I added the for loop line.


References
----------

[1] N. Smirnov, private communication.

[2] A Google search for "trigraphs within comments" yields this and
several other interesting and/or amusing hits.

[3] ISO/IEC 9899:1999 (E), International Standard, Programming
Languages -- C.


---
Herb Sutter (www.gotw.ca)

Convener, ISO WG21 (C++ standards committee)     (www.gotw.ca/iso)
Contributing editor, C/C++ Users Journal         (www.gotw.ca/cuj)
Visual C++ program manager, Microsoft      (www.gotw.ca/microsoft)

      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]

--- End Message ---

-- 
Gabriel Dos Reis,	gdr@integrable-solutions.net

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]