[RFC stage 1] Proposed new warning: -Wmisleading-indentation

David Malcolm dmalcolm@redhat.com
Thu Apr 16 15:08:00 GMT 2015


Attached is a work-in-progress patch for a new
  -Wmisleading-indentation
warning I've been experimenting with, for GCC 6.

It kind-of-works, but there are some issues, so I wanted to get feedback
on it here.

The idea is to issue a warning when the C/C++ compiler's view of the
block structure doesn't match that of a naive human reader of the code
via indentation.

For example, CVE-2014-1266 (aka "goto fail") had:

  hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
  hashOut.length = SSL_SHA1_DIGEST_LEN;
  if ((err = SSLFreeBuffer(&hashCtx)) != 0)
      goto fail;
  if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
      goto fail;
  if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
      goto fail;
  if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
      goto fail;
  if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
      goto fail;
      goto fail;  /* MISTAKE! THIS LINE SHOULD NOT BE HERE */
  if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
      goto fail;

  err = sslRawVerify(...);

where the second "goto fail;" here:

  if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
      goto fail;
      goto fail;  /* MISTAKE! THIS LINE SHOULD NOT BE HERE */

is indented as if it's guarded by the conditional.  It isn't, making the
real meaning of the code hard to discern, and masking a security
vulnerability (where the stray goto led to the  certificate-checking
code later in the function being dead code, skipping it with err == 0,
for success).

The patch successfully issues a warning on a simplified version of this:

	if ((err = foo (b)) != 0)
		goto fail;
		goto fail;

./Wmisleading-indentation-6.c: In function ‘foo’:
./Wmisleading-indentation-6.c:15:3: warning: statement is indented as if
it were guarded by... [-Wmisleading-indentation]
   goto fail;
   ^
./Wmisleading-indentation-6.c:13:2: note: ...this clause, but it is not
  if ((err = foo (b)) != 0)
  ^

and on various other testcases.

That said, there are several major issues with the patch as it stands:

(A) works with the C FE, but not with C++ yet.  The implementation
assumes a single lexer consuming the token stream, so it doesn't yet
work with the C++ FE's ideas of tentatively-consumed tokens
(save/commit/rollback), and with its lexer stack.  I *think* that can be
fixed by relying on the fact that the integer locations of the token
stream are normally monotonic, and using that to capture and handle
those two C++ FE impl details.

(B) GTY/PCH/etc, and performance;  I know that it leaks visual_block
instances, and there's probably a way to do this with much less dynamic
memory allocation, but I wanted to get it right before making it fast.

(C) no ChangeLog yet, though it's heavily commented.

(D) tabs vs spaces.  This is probably the biggest can of worms.

Consider:

{
  if (flagA)
    if (flagB)
      if (flagC)
        foo ();
        bar (); /* would like to warn about this */
}

If this is indented using spaces&tabs with 8-space tabs, we have:

<no indent>{
<2 spaces>if (flagA)
<4 spaces>if (flagB)
<6 spaces>if (flagC)
<1 tab>foo ();
<1 tab>bar ();
<no indent>}

What does this look like in an editor?  Consider emacs, which uses uses
0-based offsets to what I call "visual columns".  The column numbers for
the start of the first token for each line are reported by emacs as:

<0>{
<2>if (flagA)
<4>if (flagB)
<6>if (flagC)
<8>foo ();
<8>bar ();
<0>}

However within libcpp and gcc, in linemap's expanded_location and in
diagnostic messages, the "column" numbers are actually 1-based counts of
*characters*, so the "column" numbers emitted in diagnostics for the
start of the first token in each line are actually:

<1>{
<3>if (flagA)
<5>if (flagB)
<7>if (flagC)
<2>foo ();
<2>bar ();
<1>}

Note the transition at tab offsets, where we go from increasing visual
column numbers:

<6>if (flagC)
<8>foo ();

to a decrease in the character count:

<7>if (flagC)
<2>foo ();

Hence to handle mixed spaces+tabs, we need some way to model "visual
columns".  I think this is doable (at the cost of adding a field to
cpp_token, and to c_token/cp_token), provided we can assume, per-buffer,
either
  (i) a consistent value for tabs in terms of spaces, or
  (ii) that we don't have mixed tabs/spaces in this buffer

An issue here is how to determine (i), or if it's OK to default to 8 and
have a command-line option (param?) to override it? (though what about,
say, each header file?)

Thoughts on this, and on the patch?

Thanks

Dave

-------------- next part --------------
A non-text attachment was scrubbed...
Name: misleading-indentation-v1.patch
Type: text/x-patch
Size: 35383 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150416/b2be6da9/attachment.bin>


More information about the Gcc-patches mailing list